Submitting Spark batch applications with Kerberos authentication
Use Kerberos authentication to submit Spark workload from the spark-submit command in client and cluster mode.
Before you begin
- Spark versions not supported: 1.5.2, 2.0.1, and 2.1.0.
- The KRB5CCNAME environment variable must be set for your Java. When your instance group uses IBM JRE and the user is logged in to Kerberos at the OS level, KRB5CCNAME is set automatically after logon to the credential cache file. If you are using other Java implementations, you must set KRB5CCNAME to the absolute path of the credential cache file. See Configuring Kerberos credential caching.
- The Kerberos configuration file (krb5.conf) must be in the same directory
on every host in your cluster. If the file is not in the default location
(/etc/krb5.conf), use the JVM option
java.security.krb5.conf to specify the location of the file, as follows:
- Modify the instance group to which you submit Spark batch applications and define the spark.driver.extraJavaOptions and spark.executor.extraJavaOptions parameters to the location of the krb5.conf file. For example, if krb5.conf is under var, set spark.driver.extraJavaOptions and spark.executor.extraJavaOptions as Djava.security.krb5.conf=/var/krb5.conf.
- Before you submit Spark workload with Kerberos authentication, set the SPARK_SUBMIT_OPTS environment variable to the location of the krb5.conf file (for example, SPARK_SUBMIT_OPTS="-Djava.security.krb5.conf=/var/krb5.conf").
About this task
When Kerberos authentication is enabled for Spark workload, submit Spark batch applications to
an instance group and specify Kerberos
information as options that are passed with the --conf flag. For troubleshooting
purposes, set the HADOOP_JAAS_DEBUG environment variable to enable extra debug
traces (export HADOOP_JAAS_DEBUG=true).
Important: To initialize
a ticket cache for Java™ programs and the command-line interface, you must use the
kinit tool from IBM® JDK ($IBM_JAVA_HOME/jre/bin/kinit). IBM
JDK does not support the ticket cache that is generated by the MIT kinit command.
If you use IBM JDK, you must generate a ticket cache with IBM JDK’s kinit. If you
use open source AdoptOpenJDK JRE and MIT, you must follow the MIT Kerberos documentation to generate a ticket cache. For
both options, you must also set the KRB5CCNAME environment variable to point to
the ticket.
Procedure
You can submit Spark batch applications from the cluster management console (on the page or the page), by using ascd Spark RESTful APIs, or by using the spark-submit command in the Spark deployment directory.
Note: ascd Spark RESTful
APIs support only authentication with the user password.
To submit a Spark batch application as a Kerberos user, add the
spark.ego.uname parameter to specify the user principal in the KDC. You can
specify authentication through the user password or the keytab for the user principal. In both
cases, the user's TGT in the Kerberos credential cache file is not used.
- For authentication with the user password, add the spark.ego.uname
parameter to specify the user principal and the spark.ego.passwd parameter to
specify the password for the user principal. For example, to submit SparkPi with a Kerberos user's
principal and password,
enter:
spark-submit --conf spark.ego.uname=userKDC --conf spark.ego.passwd=userKDCpassword --class org.apache.spark.examples.SparkPi $SPARK_HOME/spark-2.1.0-hadoop-2.7/examples/jars/spark-examples_2.11-2.1.0.jar
- For user authentication with the keytab, add the spark.ego.uname
parameter to specify the user principal and the spark.ego.keytab parameter to
specify the location of the user's keytab file. For example, to submit SparkPi with a Kerberos
user's principal and keytab,
enter:
spark-submit --conf spark.ego.uname=userKDC --conf spark.ego.keytab=/tmp/userKDC.keytab --class org.apache.spark.examples.SparkPi $SPARK_HOME/spark-2.1.0-hadoop-2.7/examples/jars/spark-examples_2.11-2.1.0.jar