Configuring InfoSphere Information Server for Kerberos

You can configure InfoSphere® Information Server to run jobs on a Kerberos enabled cluster.

Procedure

  1. In your Kerberos installation, perform the following steps:
    1. Create a principal for the InfoSphere DataStage® administrator (dsadm). Refer to the Kerberos documentation for information on creating a principal.
    2. Request a ticket-granting-ticket (TGT). Request address-less tickets (tickets that are not linked to an IP address).
      After you request the TGT, you are granted Kerberos credentials for the InfoSphere DataStage administrator. For example, /user/bin/kinit -A -k -t <dsadm_keytab_path> dsadm.
  2. Configure the Big Data File stage to run jobs on a Kerberos enabled cluster:
    1. To run jobs using the Big Data File stage without a user name option on an edge node installation that is not using a conductor node pool, run the kinit command with the IBM Java Development Kit (JDK) on the Conductor node. For example, /opt/IBM/InformationServer/jdk/jre/bin/kinit -A -k -t <dsadm_keytab_path> dsadm.
    2. To run jobs using the Big Data File stage with a user name option, set the APT_YARN_USER_CACHED_CRED_PATH environment variable to point to the credential cache that is available on the Conductor node. The credential cache must be from the IBM Java Development Kit (JDK).
      The credential cache is automatically distributed (localized) to all compute nodes from the engine tier.
    Note: If you are running the Big Data File stage without a user name option on an edge node installation, you can use a conductor node pool in your configuration to get all the Big Data File Stage processes to run on the compute nodes. No additional steps are required if you are using the user name option in an installation that does not include an edge node.
  3. Configure the File Connector stage to run jobs on a Kerberos enabled cluster:
    1. To run jobs using the File Connector stage with the keytab option to access the Hadoop distributed file system (HDFS), you must set the APT_YARN_FC_DEFAULT_KEYTAB_PATH environment variable. You must set this environment variable to point to a default keytab that is available on your Conductor node. The default keytab can be the same as the keytab that is specified in the File connector stage. The default keytab is automatically sent to all compute nodes from the engine tier. Multiple users can be added to the keytab.
      Note: In order to use the APT_YARN_FC_DEFAULT_KEYTAB_PATH environment variable, you must install the latest InfoSphere Information Server on Hadoop patch and the latest File connector patch.
  4. Configure the ODBC stage to run jobs on a Kerberos enabled cluster:
    1. To run jobs using the ODBC stage, you must set the APT_YARN_CONNECTOR_USER_CACHE_CRED_PATH environment variable at the job level. Set this environment variable to point to the ODBC user credential cache that is available on the Conductor node.
      The credential cache is automatically sent to all compute nodes from the engine tier. If you do not set the environment variable, you must verify that the credential cache is already available to the compute nodes before running ODBC stage jobs.
  5. To run jobs using the keytab option to access JDBC or Hive:
    1. To run jobs using the keytab option to access JDBC or Hive:
      1. Set the APT_YARN_CONNECTOR_USER_KEYTAB_PATH environment variable at the job level. You must set this environment variable to point to a default keytab that is available on your Conductor node.
      2. Verify that the APT_YARN_CONNECTOR_USER_KEYTAB_PATH is the same value as the useKeytab value in the JDBCDriverLogin.conf file.

      The default keytab is automatically copied to all compute nodes where jobs InfoSphere DataStage are run. Multiple users can be added to the keytab.

    2. To run jobs using the cached credential option to access JDBC or Hive:
      1. Set the APT_YARN_CONNECTOR_USER_CACHE_CRED_PATH environment variable to point to the cached credentials that are stored on the Conductor node. The cached credentials are automatically distributed (localized) to all compute nodes from the engine tier. The APT_YARN_CONNECTOR_USER_CACHE_CRED_PATH environment variables must be set at the job level.
      2. Verify that the APT_YARN_CONNECTOR_USER_CACHE_CRED_PATH is the same value as the useCredential value in the JDBCDriverLogin.conf configuration file.
      Note: The keytab and cached credential files that get distributed with JDBC stage are not deleted on the compute nodes after jobs are run.
  6. Configure the IBM Java Development Kit kinit and klist commands for all Kerberos enabled clusters.
    By default, the IBM Java Development Kit does include support for Advanced Encryption Standard 256-bit encryption (AES-256). To use this encryption:
    1. Download the IBM Java Cryptography Encryption (JCE) unrestricted policy files and copy and replace the jar files in the /opt/IBM/InformationServer/jdk/jre/lib/security directory.
    2. Run the kinit commands.
      Depending on your encryption sizes, you might need to download and install the unrestricted JCE policy files. For additional information, see Downloading and installing the unrestricted JCE policy files.
  7. Configure Lookup File Set stage, Data Set stage, and File Set stage jobs to run on a Kerberos enabled cluster.
    1. If you are running Lookup File Set stage, Data Set stage, and File Set stage jobs that read and write data from a Hadoop distributed file system (HDFS), you must run the kinit command with the IBM Java Development Kit (JDK) on the Conductor node.
      For example: /opt/IBM/InformationServer/jdk/jre/bin/kinit -A -k -t <dsadm_keytab_path> dsadm.