Add HDFS client to CES HDFS nodes

HDFS Transparency does not require you to have the Hadoop distribution installed onto the IBM Storage Scale HDFS Transparency nodes. However, if the HDFS client is not installed on the CES HDFS NameNodes and DataNodes, then functions like distcp will not work because HDFS Transparency does not include the bin/hadoop command.

To execute the hadoop command on the HDFS Transparency nodes, the HDFS client needs to be installed and configured on the HDFS Transparency nodes.

The setup and configuration is similar to the HDFS clients configuration. But all the configurations will stay on the Hadoop distribution path and the HDFS Transparency configurations under /var/mmfs/hadoop/etc/hadoop path will not be changed.

Steps to install and configure on the HDFS Transparency nodes:
  1. Download the Apache Hadoop and extract the packages onto each node.
  2. On one of the CES HDFS node, modify the downloaded Hadoop distribution path HADOOP_HOME/etc/hadoop configurations files based on the settings seen in the HDFS clients configuration.
  3. Manually sync (scp) the HADOOP_HOME/etc/hadoop configurations files to all the other CES HDFS nodes.
  4. Execute the hadoop command from the HADOOP_HOME/etc/hadoop/bin path.

    For example:

    <HADOOP_HOME>/hadoop-3.1.3/bin/hadoop dfs -ls /

    or

    <HADOOP_HOME>/hadoop-3.1.3/bin/hadoop distcp hdfs://nn1:8020/fileA

    hdfs://nn2:8020/fileB