Configure the Apache Hadoop integration

Deploy Apache Hadoop and prepare LSF for the integration package

Before you begin

The Apache Hadoop integration supports the following platforms:

Download and deploy Apache Hadoop.

The latest version is available from the official Apache Hadoop site: http://hadoop.apache.org/releases

This integration supports Hadoop, Versions 1 and 2, and is tested on Open Source Hadoop, Versions 1.2.1 and 2.7.2; however, this integration should also work with other versions and other Hadoop distributions.

Note: You do not need to configure Apache Hadoop after installation. The Hadoop connector scripts automatically configure Apache Hadoop for you.
Set the $HADOOP_HOME environment variable as the file path to the Hadoop installation directory.

If you do not set the $HADOOP_HOME environment before using the integration, you must run the lsfhadoop.sh connector script with the --hadoop-dir option to specify the file path to the Hadoop installation directory.
Set the $JAVA_HOME environment variable as the file path to the Java runtime installation directory.

If you do not set the $JAVA_HOME environment before using the integration, you must run the lsfhadoop.sh connector script with the --java-home option to specify the file path to the Hadoop installation directory.

Note: Running the --java-home option allows you to overwrite the value of the $JAVA_HOME environment variable on the command line. For more details, refer to Run a Hadoop application on LSF.

You must ensure that the $HADOOP_HOME and $JAVA_HOME directories are accessible to each LSF server host.
You can overwrite the $HADOOP_HOME and $JAVA_HOME environment variables at job submission time by using the --hadoop-dir and --java-home options with lsfhadoop.sh. For more details, refer to Run a Hadoop application on LSF.