Setting up a user ID for use with IBM Z Platform for Apache Spark

Complete this task to set up a user ID for use with IBM® Z Platform for Apache Spark.

About this task

For this task, you can either create a new user ID to use for IBM Z Platform for Apache Spark, or you can use an existing user ID.

Note: The user ID of the Spark worker daemon requires READ access to the BPX.JOBNAME profile in the FACILITY class to change the job names of the executors and drivers.

Procedure

Choose or create an appropriate user ID for use with Spark.
Specifically, this is the user ID under which the Spark cluster is started, known as the Spark ID in this documentation. The Spark ID should have a non-zero UID (not a superuser) and should not have access to any data beyond what it needs for running a Spark cluster. Ensure that the default shell program in the OMVS segment for the Spark ID is set to the bash shell. Also, ensure that the user ID has, at a minimum, the authority to create directories and extract archived or compressed files.

Tip: If you need to change or create a user ID, work with your security administrator to do so.
Using an existing user ID
If you intend to use an existing user ID, you might need to first update the OMVS segment to set bash as the default shell program for the user ID. Complete the following steps to determine whether the PROGRAM attribute of the OMVS segment is valid for the target user ID.
1. Use SSH to log on using the user ID.
2. Run echo $SHELL and review the output.
If bash is still not listed as the default shell for the user ID, a potential reason is because /etc/profile is explicitly invoking a shell other than bash. If so, work with your system administrator to update /etc/profile to define the operative shell in the OMVS segment.
The following code provides an example of how /etc/profile might override the bash shell set in the OMVS segment with another shell:
```
if [ -z "$STEPLIB" ] && tty -s;
then
    export STEPLIB=none
    exec -a $0 $SHELL  -
fi
```
Creating a new user ID
If you intend to create a new user ID for Spark, establish the OMVS segment during creation.
The following JCL example shows how to create a new user ID and group for the Spark ID, SPARKID, which will be used to run Spark:
//SPARK JOB (0),'SPARK RACF',CLASS=A,REGION=0M, // MSGCLASS=H,NOTIFY=&SYSUID //*------------------------------------------------------------*/ //RACF EXEC PGM=IKJEFT01,REGION=0M //SYSTSPRT DD SYSOUT=* //SYSTSIN DD * ADDGROUP SPKGRP OMVS(AUTOGID) OWNER(SYS1) ADDUSER SPARKID DFLTGRP(SPKGRP) OMVS(AUTOUID HOME(/u/sparkid) - PROGRAM(/user/bin/bash-4.3/bin/bash)) - NAME('Spark ID') NOPASSWORD NOOIDCARD ALTUSER SPARKID PASSWORD(SPARKID) NOEXPIRED /*
Notes:

Use of AUTOGID and AUTOUID in the example is based on a local preference. Your coding might differ.

Set the PROGRAM attribute to define the path to your own installation of bash 4.3.48 that you noted previously.
The chosen user ID is now properly set up to run Spark. Use this user ID for all remaining customization steps that require a user ID.

Configure the z/OS® UNIX shell environment for both your Spark ID and all users of Spark.

Spark requires certain environment variables to be set. Consider the scope under which you want this environment to take effect. For example:

Do you want to configure Spark for all users or a subset of users?
Do you have other Java™ applications that require a different level of Java or require different (conflicting) Java settings?

At a high level, this environment can be set for all users of both shells, an individual user's shell environment, or, for some settings, for users only when they issue Spark commands. Minimally, you must set up the environment for the Spark ID and for each user of Spark.

Use the information in Table 1 to decide where to set each environment variable. This information applies for users with either a login shell of bash or /bin/sh.

Table 1. Scope of environment variables
Environment variables set in this file…	Have this scope…
/etc/profile	All users, all the time
$HOME/.profile for specific users	Specific users, all the time
spark-env.sh	Specific users, only for Spark commands

Note: The spark-env.sh file is discussed in more detail in Updating the Apache Spark configuration files.

Values that you set for environment variables in the $HOME/.profile file override the values for those variables in the /etc/profile system file. Values that you set in spark-env.sh override any values previously set in either /etc/profile or $HOME/.profile.

Tip: If the Spark ID does not already have a $HOME/.profile file, create one now.

Determine which of the files listed in Table 1 you want to update.
(Creation and customization of the spark-env.sh file will be discussed later.)
For the files (listed in Table 1) that you determined need to be updated, edit each to set the environment variables, as follows:
- Set JAVA_HOME to point to the location of an instance of IBM z/OS Java, either IBM 64-Bit SDK for z/OS Java Technology Edition V8 or IBM Semeru Runtime Certified Edition for z/OS, Version 11.0. For example, either of the following:
  - /usr/lpp/java/java800/J8.0_64
  - /usr/lpp/java/java110/J11.0_64
- Set PATH to include the location of an instance of IBM z/OS Java, either IBM 64-Bit SDK for z/OS Java Technology Edition V8 or IBM Semeru Runtime Certified Edition for z/OS, Version 11.0.
  Tip: You can set this value by using $JAVA_HOME.
- Set PATH to prioritize the path to the /bin directory of bash 4.3.48 higher than any earlier version of bash that exists on your system.
- Set IBM_JAVA_OPTIONS to provide file encoding to UTF-8.
  If running Java 8, ensure that -Dcom.ibm.jsse2.overrideDefaultCSName=true also appears.
  Note: The -Dcom.ibm.jsse2 ... statement can be specified for Java 11, but is ignored.
- Set _BPXK_AUTOCVT to ON to enable the automatic conversion of tagged files.
- Include an export statement to make all the variables available to the z/OS UNIX shell environment.
The following example illustrates how to code the .profile file for these environment variable settings:
```
# Spark ID .profile
JAVA_HOME=/usr/lpp/java/java800/J8.0_64
PATH=$JAVA_HOME/bin:/user/bin/bash-4.3/bin:$PATH:$HOME:
IBM_JAVA_OPTIONS="-Dfile.encoding=UTF8"
_BPXK_AUTOCVT=ON

# This line sets the prompt
PS1='$LOGNAME':'$PWD':' >'

# This line exports the variable settings
export JAVA_HOME PATH IBM_JAVA_OPTIONS _BPXK_AUTOCVT PS1
```
The same syntax applies for /etc/profile, $HOME/.profile, and spark-env.sh.
If you set the environment variables in the profile (as in either of the first two rows in Table 1), skip to step 2.d now. Otherwise, if you set the environment variables only in spark-env.sh (as in the third row in Table 1), issue the following command in an SSH or Telnet shell environment to source the spark-env.sh file:
```
source spark-env.sh
```
In an SSH or Telnet shell environment, run the following command to verify that the bash version is set properly.
```
bash -version
```
The command returns a version number of 4.3.48. If it does not, ensure that the PATH value in the file you updated in step 2.b lists the latest version of the bash /bin directory before any other bash installations.
In an SSH or Telnet shell environment, run the following command to verify that the correct level of bash is set as the default.
```
ps -p $$
```
The command returns the value of the process ID and indicates the shell program that is used, for example:
```
SPARKID:/u/sparkid: >ps -p $$
       PID TTY       TIME CMD
  16777299 ttyp0000  0:00 /shared/rocket/bash-4.2/bin/bash
```
This example output shows that the installation path is correctly set to the 4.2.53 installation of bash as provided on the PROGRAM attribute of the user ID OMVS segment.
If the latest copy of bash is not listed, something in /etc/profile is overriding the shell. Ensure that /etc/profile is correct.
In an SSH or Telnet shell environment, issue the following command to verify that JAVA_HOME home product name is set to IBM 64-Bit SDK for z/OS Java Technology Edition V8 or IBM Semeru Runtime Certified Edition for z/OS, Java 11.
```
java -version
```
You should see output similar to the following for the two Java versions.

For IBM 64-Bit SDK for z/OS Java Technology Edition V8:
```
java version "1.8.0_231"
Java(TM) SE Runtime Environment (build 8.0.6.0 - pmz6480sr6 - 20191107_01(SR6))
IBM J9 VM (build 2.9, JRE 1.8.0 z/OS s390x-64-Bit
Compressed References 20191106_4321 (JIT enabled, AOT enabled)
OpenJ9 - f0b6be7
OMR - 18d8f94
IBM - 233dfb5)
JCL - 20191016_01 based on Oracle jdk8u231-b10
```
For IBM Semeru Runtime Certified Edition for z/OS, Java 11:
```
IBM Semeru Runtime Certified Edition for z/OS 11.0.18.0 
(build 11.0.18+10) IBM J9 VM 11.0.18.0 (build z/OS-Release-11.0.18.0-b01, 
JRE 11 z/OS s390x-64-Bit Compressed References 20230203_261 (JIT enabled, 
AOT enabled)
OpenJ9   - 11387ddf65e
OMR      - 4ef7da79286
IBM      - 7187a01
JCL      - b6e1e1a63d1 based on jdk-11.0.18+10)
```
If the output is incorrect or Java is not found, issue the following command:
```
echo $JAVA_HOME
```
The command returns the path of the selected Java product. If it does not, ensure that the JAVA_HOME value is set correctly in the file you updated in step 2.b and that the PATH value references the same directory.
In an SSH or Telnet shell environment, run the following command to verify the correct file encoding.
```
echo $IBM_JAVA_OPTIONS
```
The command returns -Dfile.encoding=UTF8. If it does not, ensure that the IBM_JAVA_OPTIONS value is set correctly in the file you updated in step 2.b.
If running Java 8, ensure that -Dcom.ibm.jsse2.overrideDefaultCSName=true also appears.
Note: The -Dcom.ibm.jsse2 ... statement can be specified for Java 11, but is ignored.
In an SSH or Telnet shell environment, run the following command to verify the automatic conversion of tagged files.
```
echo $_BPXK_AUTOCVT
```
The command returns ON. If it does not, ensure that the _BPXK_AUTOCVT value is set correctly in the file you updated in step 2.b.

Permit the SPARKID to spawn USS address spaces with specific job names.
The Spark worker spawns new address space using the job name specifications in the spark-defaults.conf file.

Action required:

Permit the SPARKID to the BPX.JOBNAME profile in the security product. For RACF, this would be PERMIT BPX.JOBNAME CLASS(FACILITY) ID(SPARKID) ACCESS(READ)

Results

Your chosen user ID is now ready for use with Spark.

What to do next

Continue with Customizing the Apache Spark directory structure.