IBM Support

Hadoop HDFS service stops frequently due to Datanode(s) Crashing

Troubleshooting


Problem

During normal operation using IBM BigInsights, the Hadoop HDFS may become unavailable with the following error being reported in the Datanode log "java.lang.OutOfMemoryError: Java heap space".

Symptom

There are several symptoms which result in "java.lang.OutOfMemoryError: Java heap space" as a Datanode running out of Java heap space may fail in varying ways.

Cause

One common reason for a Datanode running out of Java heap space is that the value as specified by:
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote -Xmx2000m $HADOOP_DATANODE_OPTS -Dcom.sun.management.jmxremote.password.file=/opt/ibm/biginsights/conf/jmx/hdfs/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/opt/ibm/biginsights/conf/jmx/hdfs/jmxremote.access -Dcom.sun.management.jmxremote.port=51110"
in hadoop-env.sh is too low for processing requirements.

Environment

A Hadoop Datanode is a core component of Hadoop and will require more Java heap space the more data is being processed.

Diagnosing The Problem

Reviewing the log files of the Datanodes, generally $BIGINSIGHTS_VAR/hadoop/logs/hadoop-hdfs-datanode-`hostname`.log, the following error can be seen:
java.lang.OutOfMemoryError: Java heap space
which is followed by the Java stack of the action that was attempted to be performed.

Resolving The Problem

Increase the Java heap space being allocated to the Datanode:
1. Edit the file $BIGINSIGHTS_HOME/hdm/hadoop-conf-staging/hadoop-env.sh
2. Increase the value specified by $HADOOP_DATANODE_OPTS by 2GB.
The default value is "-Xmx2000m"; increase this value by approximately 2 GB increments, to, for example "-Xmx4096m"; obviously this will require the resource to be available from the O/S.
3. As biadmin, run syncconf.sh hadoop to move this live.
4. Finally, if the Datanodes are running, a restart will be required for this to take effect.

[{"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Server","Platform":[{"code":"PF016","label":"Linux"}],"Version":"2.1.2;3.0.0.2","Edition":"Enterprise Edition;Basic Edition;Community Edition;Quick Start Edition","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
18 July 2020

UID

swg21960814