Start of changeSetting up Kerberos for HDFS Transparency nodesEnd of change

This topic lists the steps to set up the Kerberos clients on the HDFS Transparency nodes. These instructions work for both Cloudera Private Cloud Base and Open Source Apache Hadoop distributions.

Note: Start of change
  • Before you enable Kerberos, configure FQDN for all the hostname entries in your environment.
  • For all the hostname entries that are being replaced in this section, ensure that you use the hostname -f output from your environment. This also includes the workers file hostnames for HDFS Transparency.
  • Hostnames should not be changed after you enable Kerberos. If you change the hostname after enabling Kerberos, you need to recreate the principals and keytab files.
  • If you need to set up more than one HDFS Transparency cluster using a common KDC server, see the Note under Kerberos.
End of change
  1. Install the Kerberos clients package on all the HDFS Transparency nodes.
    yum install -y krb5-libs krb5-workstation
  2. Copy the /etc/krb5.conf file to the Kerberos client hosts on the HDFS Transparency nodes.
  3. Create a directory for the keytab directory and set the appropriate permissions on each of the HDFS Transparency node.
    mkdir -p /etc/security/keytabs/
    chown root:root /etc/security/keytabs
    chmod 755 /etc/security/keytabs
    
  4. Start of changeCreate KDC principals for the components, corresponding to the hosts where they are running, and export the keytab files as follows:
    Service User:Group Daemons Principal Keytab File Name
    HDFS root:root NameNode nn/<NN_Host_FQDN>@<REALM-NAME> nn.service.keytab
    NameNode HTTP HTTP/<NN_Host_FQDN>@<REALM-NAME> spnego.service.keytab
    NameNode HTTP HTTP/<CES_HDFS_Host_FQDN>@<REALM-NAME> spnego.service.keytab
    DataNode dn/<DN_Host_FQDN>@<REALM-NAME> dn.service.keytab

    Replace the < NN_Host_FQDN > with the HDFS Transparency NameNode hostname and the <DN_Host_FQDN> with the HDFS Transparency DataNode hostname. Replace the <CES_HDFS_Host_FQDN> with the CES hostname configured for your CES HDFS cluster.

    You need to create one principal for each HDFS Transparency NameNode and DataNode in the cluster.

    Note: If you are using CDP Private Cloud Base, Cloudera Manager creates the principals and keytabs for all the services except the IBM Spectrum® Scale service. Therefore, you can skip the create service principals section below and go directly to step a.
    If you are using Open source Apache Hadoop, you need to create service principals for YARN and Mapreduce services as shown in the following table:
    Service User:Group Daemons Principal Keytab File Name
    YARN yarn:hadoop ResourceManager rm/<Resource_Manager_FQDN>@<REALM-NAME> rm.service.keytab
    NodeManager nm/<Node_Manager_FQDN>@<REALM-NAME> nm.service.keytab
    Mapreduce mapred:hadoop MapReduce Job History Server jhs/<Job_History_Server_FQDN>@<REALM-NAME> jhs.service.keytab

    Replace the <Resource_Manager_FQDN> with the Resource Manager hostname, the <Node_Manager_FQDN> with the Node Manager hostname and the <Job_History_Server_FQDN> with the Job History Server hostname.

    1. Create service principals for each service. Refer to the sample table above.
      kadmin.local -q "addprinc -randkey  {Principal}"
      For example:
      kadmin.local -q "addprinc -randkey nn/nn01.gpfs.net@IBM.COM"
    2. Create host principals for each Transparency host.
      kadmin.local -q "addprinc -randkey host/{HOST_NAME}@<Realm Name>"
      For example:
      kadmin.local -q "addprinc -randkey host/nn01.gpfs.net@IBM.COM"
    3. For each service on each Transparency host, create a keytab file by exporting its service principal and host principal into a keytab file:
      kadmin.local ktadd -k 
      /etc/security/keytabs/{SERVICE_NAME}.service.keytab {Principal}
      kadmin.local ktadd -k 
      /etc/security/keytabs/{SERVICE_NAME}.service.keytab host/{HOST_NAME}@<Realm Name>

      For example:

      NameNode:
      kadmin.local ktadd -k /etc/security/keytabs/nn.service.keytab nn/nn01.gpfs.net@IBM.COM 
      kadmin.local ktadd -k /etc/security/keytabs/nn.service.keytab host/nn01.gpfs.net@IBM.COM
      NameNode HTTP:
      kadmin.local ktadd -k /etc/security/keytabs/spnego.service.keytab HTTP/nn01.gpfs.net@IBM.COM 
      kadmin.local ktadd -k /etc/security/keytabs/spnego.service.keytab HTTP/myceshdfs.gpfs.net@IBM.COM
      kadmin.local ktadd -k /etc/security/keytabs/spnego.service.keytab host/nn01.gpfs.net@IBM.COM
      Note: The spnego.service.keytab file contains two HTTP principals. myceshdfs.gpfs.net is an example of the CES IP configured for your CES HDFS service.
      DataNode:
      kadmin.local ktadd -k /etc/security/keytabs/dn.service.keytab dn/dn01.gpfs.net@IBM.COM 
      kadmin.local ktadd -k /etc/security/keytabs/dn.service.keytab host/dn01.gpfs
      • Repeat the 4.a, 4.b and 4.c steps for every required keytab file.
      Note:
      • The filename for a service is common (for example, dn.service.keytab) across hosts but the contents would be different because every keytab would have a different host principal component.
      • After a keytab is generated, move the keytab to the appropriate host immediately or move it into a different location to avoid the keytab from getting overwritten.
    End of change
  5. For CES HDFS HA failover, a HDFS client user id is required to be created and setup for the CES NameNodes. This user will be used by the CES framework to initiate the NameNode failover.

    After the HDFS client user is created, create the Kerberos user principal for it.

    In this example, the HDFS client user is hdfs. Create hdfs user that belongs to the Hadoop super group such as supergroup. Refer step 8 for configuring this user into hadoop-env.sh.
    Start of changekadmin.local -q "addprinc hdfs@<REALM_NAME>"
    kadmin.local ktadd -k /etc/security/keytabs/hdfs.headless.keytab hdfs@<REALM_NAME>
    End of change

    Copy the /etc/security/keytabs/hdfs.headless.keytab file to all the NameNodes and change the owner permission of the file to root:

    chown root:root /etc/security/keytabs/hdfs.headless.keytab
    chmod 400 /etc/security/keytabs/hdfs.headless.keytab
  6. Copy the appropriate keytab file to each host. If a host runs more than one component (for example, both NameNode and DataNode), copy the keytabs for both these components.
  7. Set the appropriate permissions for the keytab files.
    On the HDFS Transparency NameNode hosts:
    chown root:root /etc/security/keytabs/nn.service.keytab
    chmod 400 /etc/security/keytabs/nn.service.keytab
    chown root:root /etc/security/keytabs/spnego.service.keytab
    chmod 440 /etc/security/keytabs/spnego.service.keytab
    On the HDFS Transparency DataNode hosts:
    chown root:root /etc/security/keytabs/dn.service.keytab
    chmod 400 /etc/security/keytabs/dn.service.keytab
    On the Yarn resource manager hosts:
    chown yarn:hadoop /etc/security/keytabs/rm.service.keytab
    chmod 400 /etc/security/keytabs/rm.service.keytab
    
    On the Yarn node manager hosts:
    chown yarn:hadoop /etc/security/keytabs/nm.service.keytab
    chmod 400 /etc/security/keytabs/nm.service.keytab
    
    On Mapreduce job history server hosts:
    chown mapred:hadoop /etc/security/keytabs/jhs.service.keytab
    chmod 400 /etc/security/keytabs/jhs.service.keytab
  8. Update the HDFS Transparency configuration files and upload the changes.
    Get the config files
    mkdir /tmp/hdfsconf
    mmhdfs config export /tmp/hdfsconf core-site.xml,hdfs-site.xml,hadoop-env.sh

    Update the config files with the following changes based on your environment.

    File: core-site.xml
    <property>
       <name>hadoop.security.authentication</name>
       <value>kerberos</value>
    </property>
     
    <property>
       <name>hadoop.rpc.protection</name>
       <value>authentication</value>
    </property>
    If you are using Cloudera Private Cloud Base cluster, create the following rules:
    <property>
      <name>hadoop.security.auth_to_local</name>
      <value>
    RULE:[2:$1/$2@$0](nn/.*@.*IBM.COM)s/.*/hdfs/
    RULE:[2:$1/$2@$0](dn/.*@.*IBM.COM)s/.*/hdfs/
    RULE:[1:$1@$0](hdfs@IBM.COM)s/@.*//
    RULE:[1:$1@$0](.*@IBM.COM)s/@.*//
    DEFAULT  </value>
    </property>
    Otherwise, if you are using Open source Apache Hadoop, create the following rules:
    <property>
      <name>hadoop.security.auth_to_local</name>
      <value>
        RULE:[2:$1/$2@$0](nn/.*@.*IBM.COM)s/.*/hdfs/
        RULE:[2:$1/$2@$0](jn/.*@.*IBM.COM)s/.*/hdfs/
        RULE:[2:$1/$2@$0](dn/.*@.*IBM.COM)s/.*/hdfs/
        RULE:[2:$1/$2@$0](nm/.*@.*IBM.COM)s/.*/yarn/
        RULE:[2:$1/$2@$0](rm/.*@.*IBM.COM)s/.*/yarn/
        RULE:[2:$1/$2@$0](jhs/.*@.*IBM.COM)s/.*/mapred/
        DEFAULT
      </value>
    </property>

    Replace IBM®.COM with your Realm name in above example rules.

    File: hdfs-site.xml
    
    <property> 
      <name>dfs.data.transfer.protection</name> 
      <value>authentication</value> 
    </property>
    
    <property> 
      <name>dfs.datanode.address</name> 
      <value>0.0.0.0:1004</value> 
    </property>
    
    <property> 
      <name>dfs.datanode.data.dir.perm</name> 
      <value>700</value> 
    </property>
    
    <property> 
      <name>dfs.datanode.http.address</name> 
      <value>0.0.0.0:1006</value> 
    </property>
    
    <property> 
      <name>dfs.datanode.kerberos.principal</name> 
      <value>dn/_HOST@IBM.COM</value> 
    </property>
    
    <property> 
      <name>dfs.datanode.keytab.file</name> 
      <value>/etc/security/keytabs/dn.service.keytab</value> 
    </property>
    
    <property> 
      <name>dfs.encrypt.data.transfer</name> 
      <value>false</value> 
    </property>
    
    <property> 
      <name>dfs.namenode.kerberos.internal.spnego.principal</name> 
      <value>HTTP/_HOST@IBM.COM</value> 
    </property>
    
    <property> 
      <name>dfs.namenode.kerberos.principal</name> 
      <value>nn/_HOST@IBM.COM</value> 
    </property>
    
    <property> 
      <name>dfs.namenode.keytab.file</name> 
      <value>/etc/security/keytabs/nn.service.keytab</value> 
    </property>
    
    <property> 
      <name>dfs.web.authentication.kerberos.keytab</name> 
      <value>/etc/security/keytabs/spnego.service.keytab</value> 
    </property>
    
    <property> 
      <name>dfs.web.authentication.kerberos.principal</name> 
      <value>*</value> 
    </property>

    Update the HDFS Transparency configuration files and upload the changes.

    File: hadoop-env.sh
    KINIT_KEYTAB=/etc/security/keytabs/hdfs.headless.keytab
    KINIT_PRINCIPAL=hdfs@IBM.COM
  9. Stop the HDFS Transparency services for the cluster.
    1. Stop the DataNodes.

      On any HDFS Transparency node, run the following command:

      mmhdfs hdfs-dn stop
    2. Stop the NameNodes.

      On any CES HDFS NameNode, run the following command:

      mmces service stop HDFS -N <NN1>,<NN2>
  10. Import the files.
    mmhdfs config import /tmp/hdfsconf core-site.xml,hdfs-site.xml,hadoop-env.sh
  11. Upload the changes.
    mmhdfs config upload
  12. Start the HDFS Transparency services for the cluster.
    1. Start the DataNodes.

      On any HDFS Transparency node, run the following command:

      mmhdfs hdfs-dn start
    2. Start the NameNodes.

      On any CES HDFS NameNode, run the following command:

      mmces service start HDFS -N <NN1>,<NN2>
    3. Verify that the services have started.
      On any CES HDFS NameNode, run the following command:
      mmhdfs hdfs status