Multiple HDFS Transparency clusters on the same set of physical nodes
If you have limited physical nodes or you have too many Hadoop clusters running inside containers, then you might have to set up multiple HDFS Transparency clusters on the same set of physical nodes.
For example, configure HDFS Transparency cluster1 on the physical node 1/2/3 and configure HDFS Transparency cluster2 on the same physical node 1/2/3. This is supported since HDFS Transparency version 2.7.3-1.
Running multiple HDFS Transparency clusters on the same set of physical nodes will require configuration changes, especially network port number assigned to different HDFS Transparency clusters. This section will explain the steps to configure two HDFS Transparency clusters.
- Configure the /usr/lpp/mmfs/hadoop/etc/hadoop to bring the first HDFS
Transparency cluster up. The gpfstest1/2/6/7/9/10/11/12 is configured as the HDFS transparency
cluster1:
[root@gpfstest2 ~]# mmhadoopctl connector getstate gpfstest2.cn.ibm.com: namenode running as process 6699. gpfstest2.cn.ibm.com: datanode running as process 8425. gpfstest9.cn.ibm.com: datanode running as process 13103. gpfstest7.cn.ibm.com: datanode running as process 9980. gpfstest10.cn.ibm.com: datanode running as process 6420. gpfstest11.cn.ibm.com: datanode running as process 83753. gpfstest1.cn.ibm.com: datanode running as process 22498. gpfstest12.cn.ibm.com: datanode running as process 52927. gpfstest6.cn.ibm.com: datanode running as process 48787.
Note: This setup will be configured by HortonWorks HDP through Ambari and the gpfstest2 is configured as the NameNode. - Select any one node from these nodes, and change the configurations:
In this example, the gpfstest1 node is selected.
Step 3 to step 10 are done on the gpfstest1 node as the node selected in step 2.
- Copy the following configurations from /usr/lpp/mmfs/hadoop/etc/hadoop to
/usr/lpp/mmfs/hadoop/etc/hadoop2.Note: /usr/lpp/mmfs/hadoop/etc/Hadoop is the configuration location for the 1st HDFS Transparency cluster and /usr/lpp/mmfs/hadoop/etc/hadoop2 is the configuration location for the second HDFS Transparency cluster.
-rw-r--r-- 1 root root 2187 Oct 28 00:00 core-site.xml -rw------- 1 root root 393 Oct 28 00:00 gpfs-site.xml -rw------- 1 root root 6520 Oct 28 00:00 hadoop-env.sh -rw------- 1 root root 2295 Oct 28 00:00 hadoop-metrics2.properties -rw------- 1 root root 2490 Oct 28 00:00 hadoop-metrics.properties -rw------- 1 root root 1308 Oct 28 00:00 hadoop-policy.xml -rw------- 1 root root 6742 Oct 28 00:00 hdfs-site.xml -rw------- 1 root root 10449 Oct 28 00:00 log4j.properties -rw------- 1 root root 172 Oct 28 00:00 slaves -rw------- 1 root root 884 Oct 28 00:00 ssl-client.xml -rw------- 1 root root 1000 Oct 28 00:00 ssl-server.xml -rw-r--r-- 1 root root 17431 Oct 28 00:00 yarn-site.xml
- Change the fs.defaultFS value in core-site.xml
In /usr/lpp/mmfs/hadoop/etc/hadoop/core-site.xml:
fs.defaultFS=hdfs://gpfstest2.cn.ibm.com:8020
In /usr/lpp/mmfs/hadoop/etc/hadoop2/core-site.xml:
fs.defaultFS=hdfs://gpfstest2.cn.ibm.com:8021
- Change values in the hdfs-site.xml file:In /usr/lpp/mmfs/hadoop/etc/hadoop/hdfs-site.xml:
dfs.datanode.address=0.0.0.0:50010 dfs.datanode.http.address=0.0.0.0:50075 dfs.datanode.https.address=0.0.0.0:50475 dfs.datanode.ipc.address=0.0.0.0:8010 dfs.https.port=50470 dfs.journalnode.http-address=0.0.0.0:8480 dfs.journalnode.https-address=0.0.0.0:8481 dfs.namenode.http-address=gpfstest2.cn.ibm.com:50070 dfs.namenode.https-address=gpfstest2.cn.ibm.com:50470 dfs.namenode.rpc-address=gpfstest2.cn.ibm.com:8020 dfs.namenode.secondary.http-address=gpfstest10.cn.ibm.com:50090
In /usr/lpp/mmfs/hadoop/etc/hadoop2/hdfs-site.xml:dfs.datanode.address=0.0.0.0:50011 dfs.datanode.http.address=0.0.0.0:50076 dfs.datanode.https.address=0.0.0.0:50476 dfs.datanode.ipc.address=0.0.0.0:8011 dfs.https.port=50471 dfs.journalnode.http-address=0.0.0.0:8482 dfs.journalnode.https-address=0.0.0.0:8483 dfs.namenode.http-address=gpfstest2.cn.ibm.com:50071 dfs.namenode.https-address=gpfstest2.cn.ibm.com:50471 dfs.namenode.rpc-address=gpfstest2.cn.ibm.com:8021 <== match the port number in step4 dfs.namenode.secondary.http-address=gpfstest10.cn.ibm.com:50091
Note: Check that the network port numbers for the different HDFS Transparency clusters require to be different. If not, when starting the HDFS Transparency, it will report network port conflicts. - Change values in the hadoop-env.sh file.In /usr/lpp/mmfs/hadoop/etc/hadoop/hadoop-env.sh:
HADOOP_PID_DIR=/var/run/hadoop/$USER HADOOP_LOG_DIR=/var/log/hadoop/$USER
In /usr/lpp/mmfs/hadoop/etc/hadoop2/hadoop-env.sh:HADOOP_PID_DIR=/var/run/hadoop/hdfstransparency2 HADOOP_LOG_DIR=/var/log/hadoop/hdfstransparency2 Change the $USER in HADOOP_JOBTRACKER_OPTS, SHARED_HADOOP_NAMENODE_OPTS,HADOOP_DATANODE_OPTS into value “hdfstransparency2”.
Note: HDFS Transparency can only be started as the root user. If the first HDFS Transparency cluster takes the $USER as root. The second HDFS Transparency cluster, you need to change $USER into a different string value by setting it to hdfstransparency2 to make the second HDFS Transparency able to write logs there. - Change hadoop-metrics2.properties:
In /usr/lpp/mmfs/hadoop/etc/hadoop/hadoop-metrics2.properties:
namenode.sink.timeline.metric.rpc.client.port=8020
In /usr/lpp/mmfs/hadoop/etc/hadoop2/hadoop-metrics2.properties:
namenode.sink.timeline.metric.rpc.client.port=8021
<== match the amenode port number in step 4 - Update /usr/lpp/mmfs/hadoop/etc/hadoop2/gpfs-site.xml, especially for the
gpfs.data.dir field.
- Configure different gpfs.data.dir values for the different HDFS Transparency cluster.
- Configure different gpfs.mnt.dir values if you have multiple file systems.
- Sync the second transparency cluster configuration from node gpfstest1 (selected in
step2):
export HADOOP_GPFS_CONF_DIR=/usr/lpp/mmfs/hadoop/etc/hadoop2 mmhadoopctl connector syncconf /usr/lpp/mmfs/hadoop/etc/hadoop2
- Start the end transparency
cluster:
export HADOOP_GPFS_CONF_DIR=/usr/lpp/mmfs/hadoop/etc/hadoop2 export HADOOP_CONF_DIR=/usr/lpp/mmfs/hadoop/etc/hadoop2/ mmhadoopctl connector start mmhadoopctl connector getstate: [root@gpfstest1 hadoop2]# mmhadoopctl connector getstate gpfstest2.cn.ibm.com: namenode running as process 18234. gpfstest10.cn.ibm.com: datanode running as process 29104. gpfstest11.cn.ibm.com: datanode running as process 72171. gpfstest9.cn.ibm.com: datanode running as process 94872. gpfstest7.cn.ibm.com: datanode running as process 28627. gpfstest2.cn.ibm.com: datanode running as process 25777. gpfstest6.cn.ibm.com: datanode running as process 30121. gpfstest12.cn.ibm.com: datanode running as process 36116. gpfstest1.cn.ibm.com: datanode running as process 21559.
- Check the second transparency cluster:On any node, run the following commands:
hdfs --config /usr/lpp/mmfs/hadoop/etc/hadoop2 dfs -put /etc/passwd / hdfs --config /usr/lpp/mmfs/hadoop/etc/hadoop2 dfs -ls /
- Configure hdfs://gpfstest2.cn.ibm.com:8020 and hdfs://gpfstest2.cn.ibm.com:8021 to different Hadoop clusters running inside the container.