Resolving red cluster or UNASSIGNED shards

The Elasticsearch service can remain in the TENTATIVE state when at least one primary shard and all its replicas are missing.

Before you begin

Confirm that the problem is not related to the following issues:

About this task

Do not restart the cluster when you experience the Elasticsearch service remains in the TENTATIVE state with UNASSIGNED shards.

Follow this high-level troubleshooting process to isolate shards with an UNASSIGNED error:

Procedure

  1. In a command line, run the following command:
    curl --cacert $EGO_TOP/wlp/usr/shared/resources/security/cacert.pem -u $CLUSTERADMIN:$CLUSTERADMINPASS -XGET $es_protocol://$es_hostname:$es_port/_cluster/health?pretty --tlsv1.2
    1. Check the state of the Elasticsearch:
      • If the cluster is in the red state, the Elasticsearch service remains in the TENTATIVE state until all primary shards are active.
    2. Check the active shards of the Elasticsearch:
      • If the cluster recently restarted, or when the Elasticsearch cluster grows or contracts, the Elasticsearch might be in the process of migrating shards to rebalance the cluster and the value in active_shards_percent_as_number continue to increase as shards become active.
      • Repeat the command after a few minutes and monitor the active_shards_percent_as_number continues to grow. The cluster might need some time to for the shards to become active. The Elasticsearch service goes in the STARTED state when at least one primary or replica shard is active and the active_shards_percent_as_number becomes 50% or greater.
      • If the active_shards_percent_as_number is stuck and does not continue to grow over a significant passing of time, there might be an issue with a particular index or shard.

    The definitions of each variable:
    es_protocol
    Specifies the protocol for the URL. Use http if security is not enabled, or use https if security is enabled.
    es_hostname
    Specifies the hostname of the Elasticsearch client node.
    es_port
    Specifies the port that is used for communication to the Elasticsearch primary node. By default, the port is 9200. For more information, see Ports used by IBM Spectrum Symphony.
  2. In a command line, run the following command to see all shards and then resolve any primary shards that are not in the STARTED state:
    curl --cacert $EGO_TOP/wlp/usr/shared/resources/security/cacert.pem -u $CLUSTERADMIN:$CLUSTERADMINPASS -XGET $es_protocol://$es_hostname:$es_port/_cat/shards --tlsv1.2
    • Before a shard can be used, it goes through the INITIALIZING state. If a shard cannot be assigned, the shard remains in the UNASSIGNED state with a reason code. For a list of these reasons that a primary shard might not be started, see Reasons for unassigned shard.
    • If the primary shard and all replica shards for an index are in the UNASSIGNED state, check the Elasticsearch logs for more details. For more information about the default Elastic Stack log locations, see Elastic Stack troubleshooting.
    • If the primary shard and all replica shards for an index cannot be assigned due to hardware failure or related issues, delete the index. Deleting the index will delete shards and Elasticsearch data for that index.
    • Consider Scaling Elasticsearch replicas to provide redundant copies of data to protect against hard failure for future failures.
  3. In a command line, run the following command to delete an index where es_index specifies the Elasticsearch index to delete:
    curl --cacert $EGO_TOP/wlp/usr/shared/resources/security/cacert.pem -u $CLUSTERADMIN:$CLUSTERADMINPASS -XDELETE $es_protocol://$es_hostname:$es_port/$es_index --tlsv1.2
    Repeat the command to delete required indices.