Once you have set up your CDH and IBM® Spectrum
Conductor environment, you are ready to
configure an existing instance group to
work with the CDH
cluster.
Procedure
-
Copy the HDFS (Hadoop Distributed File System) and Apache Hive configuration files to your
Spark home directory:
- On your CDH cluster server host, go to the
/var/run/cloudera-scm-agent/process/ directory, and locate the current version
of these files:
- hdfs-site.xml
- core-site.xml
- hive-site.xml
- Copy the files to your Spark home directory's /conf directory on
each of your IBM Spectrum
Conductor hosts.
For an example, if the Spark home directory of your instance group is
/opt/SIG243cdh/spark-2.4.3-hadoop-2.7/conf/, then copy the files there.
- Add a Kerberos TGT (Ticket Granting Ticket) secured HDFS data connector to your instance group:
- From the cluster management console,
click select the instance group to update, and then click
Configure.
- Click
.
- Provide the new data connector information, ensuring that you select
Kerberos TGT secured HDFS from the Type list, and then click
Save:
- Add a new environment variable and parameter to the instance group:
- Click Basic Settings, select the Spark version that the
instance group must use, then click
Configuration to open the configuration dialog.
- Click Add an Environment
Variable (under the Driver Environment section) and add the path to your Kerberos
credential cache (krb5cc) file as the value for the
spark.ego.driverEnv.KRB5CCNAME value.
- Click Add a Parameter (under the Additional Parameters section)
to set the spark.sql.catalogImplementation value to
hive and click Save:
- Use open source AdoptOpenJDK JRE instead of the
default IBM Java JRE:
Reasoning for open source JDK: To avoid potential exceptions in your
hive code, use
AdoptOpenJDK JRE.
Cloudera Manager is compatible with OpenJDK; using
IBM Java can cause null pointer
exceptions, such as this
error:
Message: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
StackTrace: at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:214)
…
…
Caused by: java.lang.NullPointerException
at org.apache.hadoop.util.StringUtils.stringifyException(StringUtils.java:91)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:466)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:236)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
To switch to
AdoptOpenJDK JRE, follow these steps:
- Search for the Java home environment variable by typing JAVA_HOME in the
search field.
- Set the JAVA_HOME value (under the Environment Variable section) to the
path for AdoptOpenJDK JRE and
click Save. In this example, it is set to
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7.x86_64/jre/:

- Click Modify Instance Group, which redeploys the instance group with your
configurations.
What to do next
Once you have configured your instance group to work with the CDH cluster, verify this
integration.