Submitting Spark batch applications with Kerberos authentication

Use Kerberos authentication to submit Spark workload from the spark-submit command in client and cluster mode.

Before you begin

Spark versions not supported: 1.5.2, 2.0.1, and 2.1.0.
The KRB5CCNAME environment variable must be set for your Java. When your instance group uses IBM JRE and the user is logged in to Kerberos at the OS level, KRB5CCNAME is set automatically after logon to the credential cache file. If you are using other Java implementations, you must set KRB5CCNAME to the absolute path of the credential cache file. See Configuring Kerberos credential caching.
The Kerberos configuration file (krb5.conf) must be in the same directory on every host in your cluster. If the file is not in the default location (/etc/krb5.conf), use the JVM option java.security.krb5.conf to specify the location of the file, as follows:
1. Modify the instance group to which you submit Spark batch applications and define the spark.driver.extraJavaOptions and spark.executor.extraJavaOptions parameters to the location of the krb5.conf file. For example, if krb5.conf is under var, set spark.driver.extraJavaOptions and spark.executor.extraJavaOptions as Djava.security.krb5.conf=/var/krb5.conf.
2. Before you submit Spark workload with Kerberos authentication, set the SPARK_SUBMIT_OPTS environment variable to the location of the krb5.conf file (for example, SPARK_SUBMIT_OPTS="-Djava.security.krb5.conf=/var/krb5.conf").

About this task

When Kerberos authentication is enabled for Spark workload, submit Spark batch applications to an instance group and specify Kerberos information as options that are passed with the --conf flag. For troubleshooting purposes, set the HADOOP_JAAS_DEBUG environment variable to enable extra debug traces (export HADOOP_JAAS_DEBUG=true).

Important: To initialize a ticket cache for Java™ programs and the command-line interface, you must use the kinit tool from IBM® JDK ($IBM_JAVA_HOME/jre/bin/kinit). IBM JDK does not support the ticket cache that is generated by the MIT kinit command. If you use IBM JDK, you must generate a ticket cache with IBM JDK’s kinit. If you use open source AdoptOpenJDK JRE and MIT, you must follow the MIT Kerberos documentation to generate a ticket cache. For both options, you must also set the KRB5CCNAME environment variable to point to the ticket.

Procedure

You can submit Spark batch applications from the cluster management console (on the Workload > My Notebooks & Applications page or the Workload > Instance Groups page), by using ascd Spark RESTful APIs, or by using the spark-submit command in the Spark deployment directory.

Note: ascd Spark RESTful APIs support only authentication with the user password.

To submit a Spark batch application as a Kerberos user, add the spark.ego.uname parameter to specify the user principal in the KDC. You can specify authentication through the user password or the keytab for the user principal. In both cases, the user's TGT in the Kerberos credential cache file is not used.

For authentication with the user password, add the spark.ego.uname parameter to specify the user principal and the spark.ego.passwd parameter to specify the password for the user principal. For example, to submit SparkPi with a Kerberos user's principal and password, enter:
```
spark-submit --conf spark.ego.uname=userKDC --conf spark.ego.passwd=userKDCpassword 
--class org.apache.spark.examples.SparkPi $SPARK_HOME/spark-2.1.0-hadoop-2.7/examples/jars/spark-examples_2.11-2.1.0.jar
```
For user authentication with the keytab, add the spark.ego.uname parameter to specify the user principal and the spark.ego.keytab parameter to specify the location of the user's keytab file. For example, to submit SparkPi with a Kerberos user's principal and keytab, enter:
```
spark-submit --conf spark.ego.uname=userKDC --conf spark.ego.keytab=/tmp/userKDC.keytab
--class org.apache.spark.examples.SparkPi $SPARK_HOME/spark-2.1.0-hadoop-2.7/examples/jars/spark-examples_2.11-2.1.0.jar
```