Updating the Db2 Big SQL and Hadoop configuration
You can update the Hadoop configuration and configuration parameters for different Db2® Big SQL components.
Updating the Db2 Big SQL configuration
Component or action | Documentation link for details |
---|---|
Db2 Big SQL scheduler | https://www.ibm.com/support/knowledgecenter/SSCRJT_7.1.0/com.ibm.swg.im.bigsql.doc/doc/bigsql_scheduler.html |
Db2 Big SQL ANALYZE command | https://www.ibm.com/support/knowledgecenter/SSCRJT_7.1.0/com.ibm.swg.im.bigsql.doc/doc/admin_analyze.html |
Sync the Db2 Big SQL catalog and the Hive metastore | |
Validate your authorization to use the LOCATION clause for tables that reside outside of the Hive warehouse | https://www.ibm.com/support/knowledgecenter/SSCRJT_7.1.0/com.ibm.swg.im.bigsql.doc/doc/bi_admin_biga_configparms.html |
Example: To increase the scheduler.client.request.timeout property from the default value of 2 minutes (120000 milliseconds) to 1 hour (3600000), complete the following steps:
- Log in to your OpenShift® cluster as a project
administrator:
oc login <OpenShift_URL>:<port>
- Change to the project where the Cloud Pak for Data control plane is
installed:
oc project ${PROJECT_CPD_INSTANCE}
Note: This command uses an environment variable so that you can run the command exactly as written. For information about sourcing environment variables, see Setting up installation environment variables. - Identify the Db2 Big SQL instance
ID:
oc get cm -l component=db2bigsql -o custom-columns="Instance Id:{.data.instance_id},Instance Name:{.data.instance_name},Created:{.metadata.creationTimestamp}"
- Get the name of the Db2 Big SQL head
pod:
head_pod=$(oc get pod -l app=bigsql-<instance_id>,name=dashmpp-head-0 --no-headers=true -o=custom-columns=NAME:.metadata.name)
- Run the following command to increase the value of the
scheduler.client.request.timeout
property:
oc exec -i $head_pod -- bash -c 'db2uctl adm bigsql config --key scheduler.client.request.timeout --value 3600000'
- Restart the Db2 Big SQL
service:
oc exec -i $head_pod -- sudo su - db2inst1 -c "bigsql stop; bigsql start"
Updating and refreshing the Hadoop configuration files
- core-site.xml
- hdfs-site.xml
- hive-site.xml
- hadoop-env.sh
- log4j2.properties and hive-log4j2.properties
You can use the bigsql_config.py utility to update and override properties in the Db2 Big SQL local copies of the Hadoop configuration files hdfs-site.xml, core-site.xml, and hive-site.xml. For an example of how to use bigsql_config.py, see Configuring Db2 Big SQL to use datanode hostnames when connecting to the datanodes on the Hadoop cluster.
When these configuration files are updated on your Hadoop cluster, they need to be refreshed to the Db2 Big SQL service. For example, it might be necessary to add or to update the hadoop.proxy.bigsql.hosts and hadoop.proxy.bigsql.groups properties in the core-site.xml file for Db2 Big SQL impersonation. In this case, these changes would be made in Cloudera Manager on the remote Hadoop cluster first, and then refreshed to the Db2 Big SQL service on the Cloud Pak for Data cluster.
You can refresh the Hadoop configuration files either by restarting all pods in the Db2 Big SQL service or by completing the following procedure, which does not require restarting the pods:
- Log in to your OpenShift cluster as a project
administrator:
oc login <OpenShift_URL>:<port>
- Change to the project where the Cloud Pak for Data control plane is
installed:
oc project ${PROJECT_CPD_INSTANCE}
Note: This command uses an environment variable so that you can run the command exactly as written. For information about sourcing environment variables, see Setting up installation environment variables. - Identify the Db2 Big SQL instance
ID:
oc get cm -l component=db2bigsql -o custom-columns="Instance Id:{.data.instance_id},Instance Name:{.data.instance_name},Created:{.metadata.creationTimestamp}"
- Get the name of the Db2 Big SQL head
pod:
head_pod=$(oc get pod -l app=bigsql-<instance_id>,name=dashmpp-head-0 --no-headers=true -o=custom-columns=NAME:.metadata.name)
- Refreshing the Hadoop configuration is triggered by updating the
HadoopCluster generation number in the Db2 Big SQL Custom Resource. To do this, complete the
following steps:
- Get the current generation
number:
oc get bigsql bigsql-<instance-id> -o custom-columns="hadoopCluster Generation:{.spec.hadoopCluster.generation}"
Note: If the returned value is<None>
, assume that the current generation number is zero. - Run the following command to increment the generation number by 1. In this example, the value
that was returned in the previous step is 1, so you increment it to
2:
oc patch -n <project> bigsql bigsql-<instance-id> --patch '{"spec":{"hadoopCluster":{"generation":2}}}' --type merge
- Get the current generation
number:
- After completing the previous step, wait 2 minutes for the Hadoop configuration to be updated
and for a return to the
Ready
state. Then restart the Db2 Big SQL service:oc exec -i $head_pod -- sudo su - db2inst1 -c "bigsql stop; bigsql start"