Updating the Db2 Big SQL and Hadoop configuration

You can update the Hadoop configuration and configuration parameters for different Db2® Big SQL components.

Updating the Db2 Big SQL configuration

You can use the bigsql config utility to set configuration parameters that are defined in the bigsql-conf.xml file or the bigsql-default.xml file. These parameters are logically grouped for different Db2 Big SQL components or actions. The following table shows the components or actions that you can configure with the bigsql config utility.
Note: In the Db2 Big SQL 7.3.0 service on IBM Cloud Pak® for Data 4.5.0, the db2uctl adm bigsql config command is equivalent to the bigsql-config -set command in Db2 Big SQL 7.1.
Component or action Documentation link for details
Db2 Big SQL scheduler https://www.ibm.com/support/knowledgecenter/SSCRJT_7.1.0/com.ibm.swg.im.bigsql.doc/doc/bigsql_scheduler.html
Db2 Big SQL ANALYZE command https://www.ibm.com/support/knowledgecenter/SSCRJT_7.1.0/com.ibm.swg.im.bigsql.doc/doc/admin_analyze.html
Sync the Db2 Big SQL catalog and the Hive metastore
Validate your authorization to use the LOCATION clause for tables that reside outside of the Hive warehouse https://www.ibm.com/support/knowledgecenter/SSCRJT_7.1.0/com.ibm.swg.im.bigsql.doc/doc/bi_admin_biga_configparms.html

Example: To increase the scheduler.client.request.timeout property from the default value of 2 minutes (120000 milliseconds) to 1 hour (3600000), complete the following steps:

  1. Log in to your OpenShift® cluster as a project administrator:
    oc login <OpenShift_URL>:<port>
  2. Change to the project where the Cloud Pak for Data control plane is installed:
    oc project ${PROJECT_CPD_INSTANCE}
    Note: This command uses an environment variable so that you can run the command exactly as written. For information about sourcing environment variables, see Setting up installation environment variables.
  3. Identify the Db2 Big SQL instance ID:
    oc get cm -l component=db2bigsql -o custom-columns="Instance Id:{.data.instance_id},Instance Name:{.data.instance_name},Created:{.metadata.creationTimestamp}"
  4. Get the name of the Db2 Big SQL head pod:
    head_pod=$(oc get pod -l app=bigsql-<instance_id>,name=dashmpp-head-0 --no-headers=true -o=custom-columns=NAME:.metadata.name)
  5. Run the following command to increase the value of the scheduler.client.request.timeout property:
    oc exec -i $head_pod -- bash -c 'db2uctl adm bigsql config --key scheduler.client.request.timeout --value 3600000'
  6. Restart the Db2 Big SQL service:
    oc exec -i $head_pod -- sudo su - db2inst1 -c "bigsql stop; bigsql start"

Updating and refreshing the Hadoop configuration files

When Db2 Big SQL is configured as a Hadoop side-car instance, the service has its own local client copy of the Hadoop configuration files for services with which Db2 Big SQL interacts on the remote Hadoop cluster. In particular, Db2 Big SQL has a local copy of the following configuration files:
  • core-site.xml
  • hdfs-site.xml
  • hive-site.xml
  • hadoop-env.sh
  • log4j2.properties and hive-log4j2.properties
These files are automatically downloaded when Db2 Big SQL is installed on the Cloud Pak for Data cluster, and when pods are restarted or new workers are added.

You can use the bigsql_config.py utility to update and override properties in the Db2 Big SQL local copies of the Hadoop configuration files hdfs-site.xml, core-site.xml, and hive-site.xml. For an example of how to use bigsql_config.py, see Configuring Db2 Big SQL to use datanode hostnames when connecting to the datanodes on the Hadoop cluster.

When these configuration files are updated on your Hadoop cluster, they need to be refreshed to the Db2 Big SQL service. For example, it might be necessary to add or to update the hadoop.proxy.bigsql.hosts and hadoop.proxy.bigsql.groups properties in the core-site.xml file for Db2 Big SQL impersonation. In this case, these changes would be made in Cloudera Manager on the remote Hadoop cluster first, and then refreshed to the Db2 Big SQL service on the Cloud Pak for Data cluster.

You can refresh the Hadoop configuration files either by restarting all pods in the Db2 Big SQL service or by completing the following procedure, which does not require restarting the pods:

  1. Log in to your OpenShift cluster as a project administrator:
    oc login <OpenShift_URL>:<port>
  2. Change to the project where the Cloud Pak for Data control plane is installed:
    oc project ${PROJECT_CPD_INSTANCE}
    Note: This command uses an environment variable so that you can run the command exactly as written. For information about sourcing environment variables, see Setting up installation environment variables.
  3. Identify the Db2 Big SQL instance ID:
    oc get cm -l component=db2bigsql -o custom-columns="Instance Id:{.data.instance_id},Instance Name:{.data.instance_name},Created:{.metadata.creationTimestamp}"
  4. Get the name of the Db2 Big SQL head pod:
    head_pod=$(oc get pod -l app=bigsql-<instance_id>,name=dashmpp-head-0 --no-headers=true -o=custom-columns=NAME:.metadata.name)
  5. Refreshing the Hadoop configuration is triggered by updating the HadoopCluster generation number in the Db2 Big SQL Custom Resource. To do this, complete the following steps:
    1. Get the current generation number:
      oc get bigsql bigsql-<instance-id> -o custom-columns="hadoopCluster Generation:{.spec.hadoopCluster.generation}"
      Note: If the returned value is <None>, assume that the current generation number is zero.
    2. Run the following command to increment the generation number by 1. In this example, the value that was returned in the previous step is 1, so you increment it to 2:
      oc patch -n <project> bigsql bigsql-<instance-id> --patch '{"spec":{"hadoopCluster":{"generation":2}}}' --type merge
  6. After completing the previous step, wait 2 minutes for the Hadoop configuration to be updated and for a return to the Ready state. Then restart the Db2 Big SQL service:
    oc exec -i $head_pod -- sudo su - db2inst1 -c "bigsql stop; bigsql start"