Configuring network requirements for Watson Query

Important: IBM Cloud Pak for Data Version 4.8 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.8 reaches end of support. For more information, see Upgrading from IBM Cloud Pak for Data Version 4.8 to IBM Software Hub Version 5.1.

The Watson Query service exposes network communication ports to allow connections from outside of the Cloud Pak for Data cluster.

Finding ports exposed by Watson Query

Watson Query supports external client connections by providing three ports that customers can connect to from outside the cluster. To allow external connections from customers, Watson Query ports are configured to be exposed externally as Kubernetes NodePort ports. Kubernetes NodePort configuration maps a randomly generated port number (referred to as external port) from a predefined range to the actual port that Watson Query pods use internally (referred to as Internal port).

Best practice: You can run the commands in this task exactly as written if you set up environment variables. For instructions, see Setting up installation environment variables.

Ensure that you source the environment variables before you run the commands in this task.

The following table lists the ports that are exposed by Watson Query and their usage.

External client applications to connect to Watson Query by using JDBC with SSL.
Internal port: 50001
Communication: TCP
To get the external port, follow these steps.
  1. On the navigation menu, click Data > Watson Query. The service menu opens to the Data sources page by default.
  2. On the service menu, click Connection Information.
  3. Select the With SSL option in the Connection configuration resources section.
The external port is the value of the Port number field.
Optionally, you can run the following command.
oc get -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath="{.spec.ports[?(@.name=='ssl-server')].nodePort}" services c-db2u-dv-db2u-engn-svc
External client applications to connect to Watson Query by using JDBC without SSL.
Internal port: 50000
Communication: TCP
To get the external port, follow these steps.
  1. On the navigation menu, click Data > Watson Query. The service menu opens to the Data sources page by default.
  2. On the service menu, click Connection Information.
  3. Select the Without SSL option in the Connection configuration resources section.
The external port is the value of the Port number field.
Optionally, you can run the following command.
oc get -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath="{.spec.ports[?(@.name=='legacy-server')].nodePort}" services c-db2u-dv-db2u-engn-svc
Automated discovery to streamline the process of accessing remote data sources.
See Discovering remote data sources.
Internal port: 7777
Communication: TCP
To get the external port, run the following command.
oc get -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath="{.spec.ports[?(@.name=='qpdiscovery')].nodePort}" services c-db2u-dv-db2u-engn-svc
To get the list of Kubernetes NodePort ports exposed by the Watson Query service and internal-to-external port mapping, run the following command.
oc get -n ${PROJECT_CPD_INST_OPERANDS} services c-db2u-dv-db2u-engn-svc

Examine the output.



NAME                    TYPE     CLUSTER-IP    EXTERNAL-IP PORT(S)                                   AGE
c-db2u-dv-db2u-engn-svc NodePort 172.30.13.109 <none> 50000:30662/TCP,50001:32337/TCP,7777:32178/TCP 2d4h
The example indicates the following mapped ports.
Description Internal port External port
JDBC SSL 50001 32337
JDBC Non-SSL 50000 30662
DV Automated Discovery 7777 32178

Watson Query ports for external connections

Watson Query provides you with external ports for external connections.

Inside the Watson Query head pods, Watson Query opens the following ports, which are mapped to Kubernetes NodePort ports.

Port From To Function

Corresponding NodePort port value for internal 50000 port

External JDBC client

Watson Query head pod

Non-SSL for database communication.

Corresponding NodePort port value for internal 50001 port

External JDBC client

Watson Query head pod

SSL encrypted for database communication.

Corresponding NodePort port value for internal 7777 port

Remote connectors

Watson Query head pod An encrypted data stream but non-SSL.

Setting network requirements for load-balancing environments

By using the iptables utility or the firewall-cmd command, you can ensure that external ports exposed listed in Table 1 and their communication are not blocked by local firewall rules or load balancers.

For more information about checking ports for communication blockages, see Managing data using the NCAT utility in the Red Hat® documentation.

If your Cloud Pak for Data uses a load balancer and you get a timeout error when you try to connect to the Watson Query service, increase the load balancer timeout values by updating the /etc/haproxy/haproxy.cfg file. For more information, see Limitations and known issues in Watson Query.

Updating HAProxy configuration file

If you use an external infrastructure node to route external Watson Query traffic into the Red Hat OpenShift® cluster, you must forward traffic to the master nodes of your cluster by following these steps.
  1. On the infrastructure node, open the HAProxy configuration file at /etc/haproxy/haproxy.cfg.
  2. Update the haproxy.cfg file to specify port information.

    You must update the file directly. Don't copy and paste the following code sample. The values that are specified in the haproxy.cfg file come from the cluster. These values are different for each Watson Query service instance that is provisioned, even if you use the same cluster or namespace to provision the service.

  3. Find the NodePort ports for the namespace. See Finding ports exposed by Watson Query for more instructions.
  4. To get Master<n>-PrivateIP for each master node in the cluster, use the following command and look at the INTERNAL-IP column.
    oc get nodes -o wide
  5. To update the haproxy.cfg file, ensure that the following requirements are met.
    • You include each master node in the cluster in the backend sections, so that if one master node goes down, the connection can go through a different master node.
    • Sections in the file are uniquely named if your cluster runs multiple namespaces, and each namespace has a Watson Query service instance.

      For example, you have the following sections in the haproxy.cfg file for Watson Query in namespace zen. However, if you also have a Watson Query service instance in namespace abc, you must add NodePort values for namespace abc, and ensure that sections in the haproxy.cfg file have a different name, such as dv-abc-ssl, dv-abc-nonssl, and dv-abc-discovery.

    • If you have multiple Watson Query instances, each one must have a different sets of NodePort ports. For example, you can append the namespace to the end of each set of NodePort ports.

    The following example shows the haproxy.cfg file with NodePort ports set for multiple instances in different namespaces:

    defaults
           log                     global
           option                  dontlognull
           option  tcp-smart-accept
           option  tcp-smart-connect
           retries                 3
           timeout queue           1m
           timeout connect         10s
           timeout client          1m
           timeout server          1m
           timeout check           10s
           maxconn                 3000
    
    frontend dv-nonssl-<namespace1>
           bind *:<NodePort value for the internal 50000 port>
           default_backend dv-nonssl-<namespace1>
           mode tcp
           option tcplog
    
    backend dv-nonssl-<namespace1>
           balance source
           mode tcp
           server master0 <Master0-PrivateIP>:<NodePort value for the internal 50000 port> check
           server master1 <Master1-PrivateIP>:<NodePort value for the internal 50000 port> check
          (repeat for each master node in the cluster)
    
    frontend dv-nonssl-<namespace2>
           bind *:<NodePort value for the internal 50000 port>
           default_backend dv-nonssl-<namespace2>
           mode tcp
           option tcplog
    
    backend dv-nonssl-<namespace2>
           balance source
           mode tcp
           server master0 <Master0-PrivateIP>:<NodePort value for the internal 50000 port> check
           server master1 <Master1-PrivateIP>:<NodePort value for the internal 50000 port> check
          
    (repeat for each master node in the cluster)
    
    frontend dv-ssl-<namespace1>
           bind *:<NodePort value for the internal 50001 port>
           default_backend dv-ssl-<namespace1>
           mode tcp
           option tcplog
    
    backend dv-ssl-<namespace1>
           balance source
           mode tcp
           server master0 <Master0-PrivateIP>:<NodePort value for the internal 50001 port> check
           server master1 <Master1-PrivateIP>:<NodePort value for the internal 50001 port> check
          (repeat for each master node in the cluster)
    
    frontend dv-ssl-<namespace2>
           bind *:<NodePort value for the internal 50001 port>
           default_backend dv-ssl-<namespace2>
           mode tcp
           option tcplog
    
    backend dv-ssl-<namespace2>
           balance source
           mode tcp
           server master0 <Master0-PrivateIP>:<NodePort value for the internal 50001 port> check
           server master1 <Master1-PrivateIP>:<NodePort value for the internal 50001 port> check
          (repeat for each master node in the cluster)
    
    frontend dv-discovery-<namespace1>
           bind *:<NodePort value for the internal 7777 port>
           default_backend dv-discovery-<namespace1>
           mode tcp
           option tcplog
    
    backend dv-discovery<namespace1>
           balance source
           mode tcp
           server master0 <Master0-PrivateIP>:<NodePort value for the internal 7777 port> check
           server master1 <Master1-PrivateIP>:<NodePort value for the internal 7777 port> check
          (repeat for each master node in the cluster)
    
    
    frontend dv-discovery-<namespace2>
           bind *:<NodePort value for the internal 7777 port>
           default_backend dv-discovery-<namespace1>
           mode tcp
           option tcplog
    
    backend dv-discovery<namespace2>
           balance source
           mode tcp
           server master0 <Master0-PrivateIP>:<NodePort value for the internal 7777 port> check
           server master1 <Master1-PrivateIP>:<NodePort value for the internal 7777 port> check
          (repeat for each master node in the cluster)
  6. Reload HAProxy by using the following command.
    systemctl reload haproxy

Configuring a public load balancer service to allow external traffic into a Red Hat OpenShift on IBM Cloud® cluster

Because a virtual private cloud (VPC) in IBM® Cloud is geared toward security, workers nodes are not visible from outside of the VPC's LAN. The node servers don't have external IP addresses and cannot be accessed externally. As a result, instead of using NodePort access, you must configure a public load balancer.
  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator.

    oc login ${OCP_URL}
  2. Change to the project where the Cloud Pak for Data control plane is installed:
    oc project ${PROJECT_CPD_INST_OPERANDS}
    This command uses an environment variable so that you can run the command exactly as written. For information about sourcing environment variables, see Setting up installation environment variables.
  3. Go to Databases > Details and note the Deployment ID. You will use this value when you create the load balancer file.
  4. Create a load balancer .yaml file with the following details:
    apiVersion: v1
    kind: Service
    metadata:
      name: lb-dv
      annotations:
        service.kubernetes.io/ibm-load-balancer-cloud-provider-vpc-subnets: "<provide public subnet names here>"
    spec:
      ports:
      - name: db
        protocol: TCP
        port: <external non-ssl port>
        targetPort: 50000
      - name: db-ssl
        protocol: TCP
        port: <external ssl port>
        targetPort: 50001
      type: LoadBalancer
      selector:
        app: db2u-dv
        component: db2dv
        formation_id: db2u-dv
        role: db
        type: engine
        name: dashmpp-head-0
  5. Run the following command to create the .yaml file in the VPC:
    $ oc create -f db2-lb.yaml
  6. Run the following command to see the details:
    $ oc get svc lb-dv

    The following example shows the details of load balancer named lb-db2-2:

    NAME       TYPE           CLUSTER-IP       EXTERNAL-IP                         PORT(S)                           AGE
    lb-db2-2   LoadBalancer   172.21.100.200   fbec480d-eu-de.lb.appdomain.cloud   51000:32149/TCP,51001:32514/TCP   21m
    

Defining gateway configuration to access isolated remote connectors

A remote connector acts as a gateway to remote data sources. If the remote connector host machine has network access to the Cloud Pak for Data cluster, the remote connector automatically contacts and connects to the cluster. If the automated discovery port is not exposed from HAProxy and the firewall rules, or if the physical network configuration allows only one-way communication from Cloud Pak for Data to the Watson Query remote connector, you might need to establish the connection manually.

After deploying the remote connector and validating that it is running on the host, if the remote connector does not appear in the Data sources > Constellation view > , you can manually configure the connection from Watson Query to the remote connector by using the API DEFINEGATEWAYS().

  1. Click Data > Data virtualization > Run SQL.
  2. Run the DVSYS.DEFINEGATEWAYS() stored procedure. This stored procedure has an argument that contains a comma-separated list of hosts where remote connectors are running. In the following example, two remote connectors are running; one on host1 and another on host2.
    CALL DVSYS.DEFINEGATEWAYS('host1:6414, host2:6414') 

    Replace host1 and host2 variables with the remote connector hostname or IP address. This example uses port 6414, which you specify when you generate the dv_endpoint.sh configuration script. To determine which port-mapping to use in the DVSYS.DEFINEGATEWAYS() stored procedure, check the Queryplex_config.log on your remote connector, and search for the GAIAN_NODE_PORT value as shown in the following example.

    GAIAN_NODE_PORT=6414

    If you use port forwarding (for example, NAT or VPN) to the remote connector, you must specify two ports as shown in the following example.

    CALL DVSYS.DEFINEGATEWAYS('host1:37400:6414, host2:37400:6414')

    In this example, two remote connectors are listening internally on ports 6414, but these ports are not exposed externally by the host. For example, remote connectors can be accessible only from Cloud Pak for Data by using a VPN server that is configured to map external VPN port 37400 to internal port 6414. Defining the gateway enables Watson Query to open a connection to the remote connectors that runs on host1 and host2. Watson Query connects to port 37400 on the remote host, and the VPN forwards traffic to the remote connector's internal port 6414.

​​​​​​Removing defined gateway configuration

If a defined gateway is no longer necessary or is unreachable, it can negatively impact query performance. You can remove the gateway by using the API REMOVEGATEWAYS(). This API takes a comma-separated list of gateway ID parameters as a single string literal argument as shown in the following example.

CALL DVSYS.REMOVEGATEWAYS('GTW0,GTW2')
Note: This API removes the gateway from use but does not delete its configuration so it still shows up when you call listrdbc.