CES HDFS troubleshooting

This topic contains information on troubleshooting the CES HDFS issues. CES HDFS consists of CES and HDFS Transparency functionality.

For more information on troubleshooting HDFS Transparency, see Second generation HDFS Transparency Protocol troubleshooting.

Debug, trace, and logs.
Solution:

To check the state of the CES HDFS cluster, see the mmhealth command documentation in IBM Storage Scale: Command and Programming Reference Guide guide.
To determine the status of the CES HDFS NameNodes state, run the following command:
```
/usr/lpp/mmfs/hadoop/bin/hdfs haadmin -checkHealth -scale -all
```
For more information, see the hdfs haadmin command.

For HDFS Transparency, see Second generation HDFS Transparency Protocol troubleshooting on how to Enable Debugging.

CES HDFS Transparency cluster failed to start.


mmces service enable HDFS
or
mmces service start hdfs -a

Solution:

Note: Run

/usr/lpp/mmfs/hadoop/bin/hdfs namenode
-initializeSharedEdits

, if the NameNode failed to start with the following exception:

2019-11-22 01:02:01,925 ERROR namenode.FSNamesystem (FSNamesystem.java:<init>(911)) - GPFSNamesystem initialization failed.
java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:789)
        at org.apache.hadoop.hdfs.server.namenode.GPFSNamesystemBase.<init>(GPFSNamesystemBase.java:49)
        at org.apache.hadoop.hdfs.server.namenode.GPFSNamesystem.<init>(GPFSNamesystem.java:74)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:706)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:669)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:731)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:968)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:947)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1680)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1747)

Mapreduce container job exit with return code 1.
Solution:
If Container exited with a non-zero exit code 1. Error file: prelaunch.err occurred while running the Mapreduce workloads, add the following property into the mapred-site.xml to resolve the issue:
```
<property>
   <name>mapreduce.application.classpath</name>
   <value>/usr/hadoop-3.1.2/share/hadoop/mapreduce/*, /usr/hadoop-3.1.2/share/hadoop/mapreduce/lib/*</value>
</property>
```
mmhdfs hdfs status shows node is not a DataNode.
The command mmhdfs hdfs status shows the following errors:
```
c16f1n13.gpfs.net:  This node is not a datanode
mmdsh: c16f1n13.gpfs.net remote shell process had return code 1.
```
Solution:

Remove the localhost value from the host.
On the worker node, run:
```
mmhdfs worker remove localhost 
```
All the NameNodes status shows standby after mmhdfs start/stop/restart commands.
Solution:

Use the mmces service command to start/stop NameNodes so that the proper state is reflected for the NameNodes.

If the mmhdfs start/stop/restart command was executed against the NameNodes, run the mmces service start/stop hdfs to fix the issue.

hdfs dfs -ls or another operation fails with a StandbyException.

Running the hdfs dfs -ls command fails with a StandbyException exception:

[root@scale12 transparency]# /usr/lpp/mmfs/hadoop/bin/hdfs dfs -ls /HDFS
2020-04-06 16:26:25,891 INFO retry.RetryInvocationHandler: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2010)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1447)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3129)
at org.apache.hadoop.hdfs.server.namenode.GPFSNamesystem.getFileInfo(GPFSNamesystem.java:494)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1143)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:939)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over scale12/192.0.2.21:8020 after 1 
failover attempts. Trying to failover after sleeping for 1157ms.
^C2020-04-06 16:26:27,097 INFO retry.RetryInvocationHandler: java.io.IOException: 
The client is stopped, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo 
over scale11/192.0.2.20:8020 after 2 failover attempts. Trying to failover after
sleeping for 2591ms.

Both the NameNodes are in standby and the CES has failed to select one as active. To verify, run the following command:

/usr/lpp/mmfs/hadoop/bin/hdfs haadmin -getAllServiceState

scale01:8020 standby scale02:8020 standby

Solution:

Check the NameNode that should be active by running the following command:
```
/usr/lpp/mmfs/bin/mmhealth node show -v HDFS_NAMENODE -N cesNodes
```
For one of the nodes, the output shows the hdfs_namenode_wrong_state event.
ssh to that node and set it manually to active by running the following command:
```
/usr/lpp/mmfs/hadoop/bin/hdfs haadmin -transitionToActive -scale
```

Wait for 30 seconds and verify if the NameNode is now active by running the following commands:

/usr/lpp/mmfs/hadoop/bin/hdfs haadmin -getAllServiceState

and

/usr/lpp/mmfs/bin/mmhealth node show -v HDFS_NAMENODE -N cesNodes

CES HDFS Transparency fails to start if the Java™ version is upgraded.
Solution

For information on troubleshooting this issue, see HDFS Transparency fails to start if the Java version is upgraded.
The mmhdfs command cannot recognize the FQDN hostnames if the NameNodes or DataNodes were added with short hostname.
If IBM Storage Scale and HDFS Transparency are set up with short hostname then there is no issue with using a short hostname.

If IBM Storage Scale is set up with FQDN and HDFS Transparency is set up with short hostname then mmhdfs does not recognize the node as a NameNode or DataNode.

For example, the mmhdfs hdfs status command will state that this is not a NameNode and will exit with a return code 1.

Solution:

Set up Transparency to use FQDN by updating the hdfs-site.xml to set the NameNodes to FQDN and the worker file hostnames to FQDN.
Multi-HDFS cluster deployment through IBM Storage Scale 5.1.1.0 installation toolkit is not supported.
Solution:
If you want to create multi-hdfs clusters on the same IBM Storage Scale, perform the following:
1. Clear the installation toolkit HDFS metadata, by running the following command:
```
/spectrumscale config hdfs clear
```
2. Follow Adding a new HDFS cluster into existing HDFS cluster on the same GPFS cluster using install toolkit.
  Note: Ensure that the creation of the new HDFS fields are unique from already existing HDFS cluster. The installation toolkit will not be able to check if there are duplicate values. The installation toolkit HDFS metadata will be regenerated after the CES HDFS cluster is deployed but will only contain the new HDFS cluster information.

mmhealth node shows CES in Degraded state.

When you are creating a CES HDFS cluster, mmhealth node shows CES -v as degraded and with hdfs_namenode_wrong_state message.

[root@scale-31 ~]# mmhealth node show CES -v
Node name:      scale-31.openstacklocal
Component         Status        Status Change            Reasons
-------------------------------------------------------------------------------------------------------------
CES               DEGRADED      2021-05-05 09:52:29      hdfs_namenode_wrong_state(hdfscluster3)
  AUTH            DISABLED      2021-05-05 09:49:28      -
  AUTH_OBJ        DISABLED      2021-05-05 09:49:28      -
  BLOCK           DISABLED      2021-05-05 09:49:27      -
  CESNETWORK      HEALTHY       2021-05-05 09:49:58      -
    eth1          HEALTHY       2021-05-05 09:49:44      -
  HDFS_NAMENODE   DEGRADED      2021-05-05 09:52:29      hdfs_namenode_wrong_state(hdfscluster3)
  NFS             DISABLED      2021-05-05 09:49:25      -
  OBJECT          DISABLED      2021-05-05 09:49:28      -
  SMB             DISABLED      2021-05-05 09:49:26      -

[root@scale-31 ~]# mmhealth event show hdfs_namenode_wrong_state
Event Name:              hdfs_namenode_wrong_state
Event ID:                998178
Description:             The HDFS NameNode service state is not as expected (e.g. is in STANDBY but is supposed to be ACTIVE or vice versa)
Cause:                   The command /usr/lpp/mmfs/hadoop/sbin/mmhdfs monitor checkHealth -Y returned serviceState which does not match the expected state when looking at the assigned ces IP attributes
User Action:             N/A
Severity:                WARNING
State:                   DEGRADED

[root@scale-31 ~]# hdfs haadmin -getAllServiceState
scale-31.openstacklocal:8020                       active
scale-32.openstacklocal:8020                       standby
[root@scale-31 ~]#

[root@scale-31 ~]# mmces address list
Address     Node                      Ces Group          Attributes
----------- ----------------------- ------------------ ------------------
192.0.2.0   scale-32.openstacklocal   hdfshdfscluster3   hdfshdfscluster3
192.0.2.1   scale-32.openstacklocal   none               none
192.0.2.2   scale-32.openstacklocal   none               none
192.0.2.3   scale-31.openstacklocal   none               none
192.0.2.4   scale-31.openstacklocal   none               none
192.0.2.5   scale-31.openstacklocal   none               none
[root@scale-31 ~]#

The issue here is that the CES IP is assigned to the Standby NameNode instead of the Active NameNode.

Solution:

The following are the three solutions for this problem:

Manually set the active NameNode to standby on the node by running the /usr/lpp/mmfs/hadoop/bin/hdfs haadmin -transitionToStandby -scale command. Then on the other node, set the standby NameNode to active by running the /usr/lpp/mmfs/hadoop/bin/hdfs haadmin -transitionToActive -scale command.
Move the CES IP to the active NameNode by running the mmces address move --ces-ip <CES IP> --ces-node <node name> command.
Restart the CES HDFS NameNodes by running the following commands:
```
mmces service stop HDFS -a
mmces service start HDFS -a
```

Kerberos principal update not taking effect on changing KINIT_PRINCIPAL in hadoop-env.sh.
Solution:

The CES HDFS Kerberos information is cached at /var/mmfs/tmp/krb5cc_ces. Delete this file to force the update.
If Kerberos was configured on multiple HDFS Transparency clusters using a common KDC server and the supplied gpfs_kerberos_configuration.py script, kinit with the hdfs user principal fails for all the clusters except the most recent one.
The kerberos configuration script gpfs_kerberos_configuration.py, generates a keytab fie for the hdfs user under the /etc/security/keytabs/hdfs.headless.keytab default path. The kinit error occurs because the gpfs_kerberos_configuration.py script updated the keytab file and invalidated the copies of the keytab on the previous cluster.

Solution:

From the most recent HDFS Transparency cluster that the script was run, copy the keytab file to all the other HDFS Transparency cluster nodes where the script was run.

For example:

If Hadoop cluster A ran the gpfs_kerberos_configuration.py script which created the hdfs user principal and Hadoop cluster B ran the gpfs_kerberos_configuration.py script which then updated the original hdfs user keytab, copy the hdfs keytab from Hadoop cluster B to Hadoop cluster A to ensure that the Hadoop cluster A kinit works properly.

This limitation has been fixed in HDFS Transparency 3.1.1.6.
DataNodes are down after system reboot.
Solution:
HDFS Transparency DataNodes may not start automatically after a system reboot. As a workaround, you can manually start the DataNodes after the system reboot by using the following command from one of the CES nodes as root:
```
#/usr/lpp/mmfs/hadoop/sbin/mmhdfs hdfs-dn start
```

HDFS administrative commands, such as hdfs haadmin and hdfs groups cannot be executed from HDFS clients where Kerberos is enabled. The HDFS client ensures that the CES-HDFS user principle has the CES-HOST name instead of the NameNode hostname. The administrative commands fail with the following error:

Caused by: java.lang.IllegalArgumentException: Server has invalid Kerberos principal:
nn/c88f2u33.pokprv.stglabs.ibm.com@HADOOP.COM, expecting:
nn/c88f2u31b.pokprv.stglabs.ibm.com@HADOOP.COM
at org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:337)
at org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:234)
at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:160)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:390)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622)
at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818)
... 15 more

To resolve this, we have to add the following key in the core-site.xml file on the client:

hadoop.security.service.user.name.key.pattern=*

While using Cloudera Manager:

Go to Clusters > IBM Spectrum Scale > Configuration > Cluster-wide Advanced Configuration Snippet (Safety Valve) for the core-site.xml file.
Add the hadoop.security.service.user.name.key.pattern=* parameter and restart related services.