Configuring network requirements for Data Virtualization
The Data Virtualization service exposes network communication ports to allow connections from outside of the Cloud Pak for Data cluster.
- Finding ports exposed by Data Virtualization
- Setting network requirements for load-balancing environments
- Updating HAProxy configuration file
- Defining gateway configuration to access isolated remote connectors
- Removing defined gateway configuration
Finding ports exposed by Data Virtualization
The following table lists the ports that are exposed by Data Virtualization and their usage.
- External client applications to connect to Data Virtualization by using JDBC with SSL.
- Internal port: 50001
- External client applications to connect to Data Virtualization by using JDBC without SSL.
- Internal port: 50000
- Automated discovery to streamline the process of accessing remote data sources.
- See Discovering remote data sources.
oc get -n project-name services c-db2u-dv-db2u-engn-svc
Replace project-name with the project (namespace) where the
Data
Virtualization service is installed.For example, use the following command.
oc get -n dv-project services c-db2u-dv-db2u-engn-svc
Examine the output.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
c-db2u-dv-db2u-engn-svc NodePort 172.30.13.109 <none> 50000:30662/TCP,50001:32337/TCP,7777:32178/TCP 2d4h
Description | Internal port | External port |
---|---|---|
JDBC SSL | 50001 | 32337 |
JDBC Non-SSL | 50000 | 30662 |
DV Automated Discovery | 7777 | 32178 |
Setting network requirements for load-balancing environments
By using the iptables utility or the firewall-cmd command, you can ensure that external ports exposed listed in Table 1 and their communication are not blocked by local firewall rules or load balancers.
For more information about checking ports for communication blockages, see Managing data using the NCAT utility in the Red Hat® documentation.
If your Cloud Pak for Data uses a load balancer and you
get a timeout error when you try to connect to the Data
Virtualization service, increase the load
balancer timeout values by updating the /etc/haproxy/haproxy.cfg
file. For more
information, see Limitations and known issues in Data
Virtualization.
Updating HAProxy configuration file
- On the infrastructure node, open the HAProxy configuration file at /etc/haproxy/haproxy.cfg.
- Update the haproxy.cfg file to specify port information.
You must update the file directly. Don't copy and paste the following code sample. The values that are specified in the haproxy.cfg file come from the cluster. These values are different for each Data Virtualization service instance that is provisioned, even if you use the same cluster or namespace to provision the service.
- To get the node ports for the namespace, run the following commands.
- Non-SSL
-
oc get services -n project-name c-db2u-dv-db2u-engn-svc -o jsonpath="{.spec.ports[?(@.name=='legacy-server')].nodePort}"
- SSL
-
oc get services -n project-name c-db2u-dv-db2u-engn-svc -o jsonpath="{.spec.ports[?(@.name=='ssl-server')].nodePort}"
- To get
Master<n>-PrivateIP
for each master node in the cluster, use the following command and look at theINTERNAL-IP
column.oc get nodes -o wide
- To update the haproxy.cfg file, ensure that the following requirements are met.
- You include each master node in the cluster in the
backend
sections, so that if one master node goes down, the connection can go through a different master node. - Sections in the file are uniquely named if your cluster runs multiple namespaces, and each
namespace has a Data
Virtualization service instance.
For example, you have the following sections in the haproxy.cfg file for Data Virtualization in namespace
zen
. However, if you also have a Data Virtualization service instance in namespaceabc
, you must add node ports for namespaceabc
, and ensure that sections in the haproxy.cfg file have a different name, such asdv-abc-ssl
,dv-abc-nonssl
, anddv-abc-discovery
.
defaults log global option dontlognull option tcp-smart-accept option tcp-smart-connect retries 3 timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout check 10s maxconn 3000 frontend dv-nonssl bind *:NodePort for 50000 default_backend dv-nonssl mode tcp option tcplog backend dv-nonssl balance source mode tcp server master0 Master0-PrivateIP:NodePort for 50000 server master1 Master1-PrivateIP:NodePort for 50000 (repeat for each master node in the cluster) frontend dv-ssl bind *:NodePort for 50001 default_backend dv-ssl mode tcp option tcplog backend dv-ssl balance source mode tcp server master0 Master0-PrivateIP:NodePort for 50001 server master1 Master1-PrivateIP:NodePort for 50001 (repeat for each master node in the cluster) frontend dv-discovery bind *:NodePort for 7777 default_backend dv-discovery mode tcp option tcplog backend dv-discovery balance source mode tcp server master0 Master0-PrivateIP:NodePort for 7777 server master1 Master1-PrivateIP:NodePort for 7777 (repeat for each master node in the cluster)
- You include each master node in the cluster in the
- Reload HAProxy by using the following
command.
systemctl reload haproxy
Defining gateway configuration to access isolated remote connectors
A remote connector acts as a gateway to remote data sources. If the remote connector host machine has network access to the Cloud Pak for Data cluster, the remote connector automatically contacts and connects to the cluster. If the automated discovery port is not exposed from HAProxy and the firewall rules, or if the physical network configuration allows only one-way communication from Cloud Pak for Data to the Data Virtualization remote connector, you might need to establish the connection manually.
After deploying the remote connector and validating that it is running on the host, if the remote
connector does not appear in the Data
Virtualization to the remote connector by using the API
DEFINEGATEWAYS()
.
- Click .
- Run the
DVSYS.DEFINEGATEWAYS()
stored procedure. This stored procedure has an argument that contains a comma-separated list of hosts where remote connectors are running. In the following example, two remote connectors are running; one on host1 and another on host2.CALL DVSYS.DEFINEGATEWAYS('host1:6414, host2:6414')
Replace host1 and host2 variables with the remote connector hostname or IP address. This example uses port 6414, which you specify when you generate the dv_endpoint.sh configuration script. To determine which port-mapping to use in the
DVSYS.DEFINEGATEWAYS()
stored procedure, check the Queryplex_config.log on your remote connector, and search for the GAIAN_NODE_PORT value as shown in the following example.GAIAN_NODE_PORT=6414
If you use port forwarding (for example, NAT or VPN) to the remote connector, you must specify two ports as shown in the following example.
CALL DVSYS.DEFINEGATEWAYS('host1:37400:6414, host2:37400:6414')
In this example, two remote connectors are listening internally on ports 6414, but these ports are not exposed externally by the host. For example, remote connectors can be accessible only from Cloud Pak for Data by using a VPN server that is configured to map external VPN port 37400 to internal port 6414. Defining the gateway enables Data Virtualization to open a connection to the remote connectors that runs on host1 and host2. Data Virtualization connects to port 37400 on the remote host, and the VPN forwards traffic to the remote connector's internal port 6414.
Removing defined gateway configuration
If a defined gateway is no longer necessary or is unreachable, it can negatively impact query
performance. You can remove the gateway by using the API REMOVEGATEWAYS()
. This API
takes a comma-separated list of gateway ID
parameters as a single string literal
argument as shown in the following example.
CALL DVSYS.REMOVEGATEWAYS('GTW0,GTW2')
listrdbc
.