Deploy Apache Hadoop and prepare LSF for
the integration package
Before you begin
The Apache Hadoop integration supports the following platforms:
Procedure
-
Download and deploy Apache Hadoop.
The latest version is available from the official Apache Hadoop site: http://hadoop.apache.org/releases
This integration supports Hadoop, Versions 1 and 2, and is tested on Open Source Hadoop, Versions
1.2.1 and 2.7.2; however, this integration should also work with other versions and other Hadoop
distributions.
Note: You do not need to configure Apache Hadoop after installation. The Hadoop connector scripts
automatically configure Apache Hadoop for you.
-
Set the $HADOOP_HOME environment variable as the file path to the Hadoop
installation directory.
If you do not set the $HADOOP_HOME environment before using the integration,
you must run the lsfhadoop.sh connector script with the
--hadoop-dir option to specify the file path to the Hadoop installation
directory.
-
Set the $JAVA_HOME environment variable as the file path to the Java
runtime installation directory.
If you do not set the $JAVA_HOME environment before using the integration,
you must run the lsfhadoop.sh connector script with the
--java-home option to specify the file path to the Hadoop installation
directory.
Note: Running the
--java-home option allows you to overwrite the value of the
$JAVA_HOME environment variable on the command line. For more details, refer to
Run a Hadoop application on LSF.
What to do next
- You must ensure that the $HADOOP_HOME and $JAVA_HOME
directories are accessible to each LSF server
host.
- You can overwrite the $HADOOP_HOME and $JAVA_HOME
environment variables at job submission time by using the --hadoop-dir and
--java-home options with lsfhadoop.sh. For more details, refer
to Run a Hadoop application on LSF.