IBM Support

Storage pods fail to come up after changing analytics from n3 to n1 profile

Troubleshooting


Problem

Analytics storage is no longer ‘Ready’ after switching from n3 to n1 deployment profile. This affects deployment profiles for both a standalone analytics cluster and an analytics cluster that has been deployed as part of an API Connect cluster (including Cloud Pak for Integration).

This problem is specific to 10.0.5.1 and 10.0.5.2, has been fixed in 10.0.5.3.

Analytics storage pod logs show errors similar to:

[2022-11-15T09:28:57,263][WARN ][r.suppressed ] [production-a7s-storage-0] path: /_cluster/health, params: {}
org.opensearch.discovery.ClusterManagerNotDiscoveredException: null
 at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction$2.onTimeout(TransportClusterManagerNodeAction.java:305) [opensearch-2.3.0.jar:2.3.0]

Symptom

Resolving The Problem

During normal operation with 3 nodes, one of the nodes is elected as a master, but all nodes are eligible to become a master. When the 3-node cluster first starts, there is a voting procedure to decide which node will become a master. The cluster cannot tolerate losing more than half of the master-eligible nodes at once due to the voting configuration that has been established with the initial 3 nodes. When the profile is switched from n3.x to n1.x, two thirds of the master-eligible nodes are lost so the cluster becomes unavailable.

If you have already switched profile, you will first need to switch back to the n3 profile where all storage pods should return to ‘Ready’ state.

You can now use the API Connect REST API or toolkit CLI to access the analytics cluster management operations to prepare the cluster for the change in master-eligible nodes. The steps below cover using the CLI:

If you have not already done so, configure the toolkit CLI by referring to the Knowledge Center topic here: https://www.ibm.com/docs/en/api-connect/10.0.5.x_lts?topic=analytics-accessing-data-toolkit-cli

Exclude the 2nd and 3rd storage nodes from voting by running the following commands in the order given:

apic -m analytics clustermgmt:postVotingConfigExclusions --server <management server api endpoint> --analytics-service <analytics service name> --node_names <your analytics instance name>-a7s-storage-2 --format=json
apic -m analytics clustermgmt:postVotingConfigExclusions --server <management server api endpoint> --analytics-service <analytics service name> --node_names <your analytics instance name>-a7s-storage-1 --format=json

where: Storage pod/node 0 should now be your master node. If you monitor the storage pod logs and you might see something similar to:

[2022-11-15T15:42:17,318][INFO ][o.o.c.s.ClusterApplierService] [apic-min-a7s-storage-0] cluster-manager node changed

You can now make the switch to the n1 profile as documented in the Knowledge Center:

Kubernetes: https://www.ibm.com/docs/en/api-connect/10.0.5.x_lts?topic=configuration-changing-deployment-profiles-kubernetes

VMware: https://www.ibm.com/docs/en/api-connect/10.0.5.x_lts?topic=vmware-changing-deployment-profiles

OpenShift: https://www.ibm.com/docs/en/api-connect/10.0.5.x_lts?topic=configuration-changing-deployment-profiles-openshift

At this stage, your analytics deployment has been successfully scaled to an n1.x profile, but to avoid problems when scaling again in the future, you must run the following additional CLI command to reset the voting exclusions:

apic -m analytics clustermgmt:deleteVotingConfigExclusions --server <management server api endpoint> --analytics-service <analytics service name> --format=json

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB67","label":"IT Automation \u0026 App Modernization"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSMNED","label":"IBM API Connect"},"ARM Category":[{"code":"a8m50000000CeBlAAK","label":"API Connect-\u003EManagement and Monitoring (MM)-\u003EAnalytics"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.0.5"}]

Document Information

Modified date:
19 April 2024

UID

ibm17148797