MapReduce/YARN clients configuration

MapReduce and Yarn clients must be configured to launch MapReduce workload on Yarn so that it can read/write data from/into the IBM Storage Scale cluster.

MapReduce/YARN client configuration files are located in the same directory as the HDFS client.

For mapred-site.xml:

Add the following properties to the corresponding value according to your host environment:
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.env</name>
    <value>HADOOP_MAPRED_HOME=/usr/hadoop-3.1.3</value>
    <description>Change this to your hadoop location.</description>
  </property>
  <property>
    <name>mapreduce.map.env</name>
    <value>HADOOP_MAPRED_HOME=/usr/hadoop-3.1.3</value>
    <description>Change this to your hadoop location.</description>
 </property>
  <property>
    <name>mapreduce.reduce.env</name>
    <value>HADOOP_MAPRED_HOME=/usr/hadoop-3.1.3</value>
    <description>Change this to your hadoop location.</description>
  </property>
  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>4096</value>
    <description>Change this according to your cluster configuration.</description>
  </property>
 <property>
   <name>mapreduce.reduce.memory.mb</name>
    <value>8192</value>
    <description>Change this according to your cluster configuration.</description>
  </property>
Note: If the Mapreduce job failed with return code 1, see Mapreduce container job exit with return code 1.

For yarn-site.xml:

Add the following properties to the corresponding value according to your host environment.

For example, c16f1n11.gpfs.net is the Resource Manager.
<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>c16f1n11.gpfs.net</value>
    <description>Configure resourcemanager hostname.</description>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>24576</value>
  </property>

For workers, add the hostname which will act as the Node Manager.

For this example, c16f1n10.gpfs.net and c16f1n12.gpfs.net are the Node Managers.
cat workers
c16f1n10.gpfs.net
c16f1n12.gpfs.net
Start the Resource Manager and Node Manager to launch a MapReduce workload.
cd /usr/hadoop-3.1.2/sbin/
export YARN_NODEMANAGER_USER=root
export YARN_RESOURCEMANAGER_USER=root
./start-yarn.sh
/usr/hadoop-3.1.3/bin/yarn jar /usr/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar teragen 1000 /gen