IBM Support

Application server fails to start with WSVR0009E and HMGR0002E/WWLM1014E seen prior

Troubleshooting


Problem

WebSphere processes like Application Server, Nodeagent or dmgr may fail to start with following exceptions: [9/26/14 13:26:54:355 EDT] 0000000a CoordinatorCo E HMGR0002E: HA Manager services on this process were not started. This server is not a member of a core group. [9/26/14 13:27:11:842 EDT] 0000000a ProcessRuntim W WWLM1014E: Server server1 on node myNode01 in cell myCell is not included in the distribution of work for the applications running in server server1. This is because the HAManager service is not available on server server1. Refer to other messages to discover the problems associated with the HAManager service.

Symptom

WebSphere Application Server or nodeagent or dmgr fails to start with the following exception:

[9/26/14 13:27:29:185 EDT] 0000000a WsServerImpl E WSVR0009E: Error
occurred during startup META-INF/ws-server-components.xml
[9/26/14 13:27:29:210 EDT] 0000000a WsServerImpl E WSVR0009E: Error
occurred during startup com.ibm.ws.exception.RuntimeError: com.ibm.ws.exception.RuntimeError:
Lookup of CoreStack service failed
at
com.ibm.ws.runtime.WsServerImpl.bootServerContainer(WsServerImpl.java:199)
.
Caused by: com.ibm.ws.exception.RuntimeError: Lookup of CoreStack service failed
at com.ibm.ws.sib.admin.impl.HAManagerMessagingEngineImpl.join(HAManagerMessagingEngineImpl.java:1117)
at com.ibm.ws.sib.admin.impl.HAManagerMessagingEngineImpl.startConditional(HAManagerMessagingEngineImpl.java:875)


<<< The actual caused by error and stack might vary, but key is to look for HMGR0002E and WWLM1014E on startup. There is also a possibility that HMGR0021E can be thrown, which could be due to a corrupt file. >>>

Cause

The coregroup.xml document is a cell-scoped document. The master copy of this document is stored in the configuration repository for the deployment manager. A copy of this document is shadowed to every node in the cell.

HA manager believes that the failing member is not part of a coregroup because the process that is failing to start is missing its corresponding entry in the coregroup.xml file.

Diagnosing The Problem

Verify the date and time when the coregroup.xml was last modified and see if it can be determined what action or which user lead to the removal of entry from the coregroup.xml file. If it is not a user error, but a valid configuration action that leads to the entry from coregroup.xml being removed, then please report the problem to IBM Support

Resolving The Problem

Entries in coregroup.xml for every process will look similar to this (with the entry for failing process, in this case, server1, missing):

<coreGroupServers xmi:id="CoreGroupServer_1226538530515" nodeName="myNode01" serverName="nodeagent"/>
<coreGroupServers xmi:id="CoreGroupServer_1226538530517" nodeName="myNode01" serverName="server2"/>
<coreGroupServers xmi:id="CoreGroupServer_1226538530518" nodeName="myNode01" serverName="server3"/>
<coreGroupServers xmi:id="CoreGroupServer_1186086834484" nodeName="myCellManager01" serverName="dmgr"/>

The coregroup.xml file needs to have an entry for every process in the coregroup. A careful review will show that the relevant entry will be missing for process that fails to start with exceptions above. Look for a correct version of this file from a valid backup which was taken when this process used to startup fine. In the old version, the process would have its corresponding entry. If so, that is a good configuration which can be used to restore a valid configuration. The restoreConfig utility can be used for this task.

For a coregroup with deployment manager, nodeagent and three servers (server1,server2 and server3), with all valid entries, you could expect to see something similar to this:

<coreGroupServers xmi:id="CoreGroupServer_1226538530515" nodeName="myNode01" serverName="nodeagent"/>
<coreGroupServers xmi:id="CoreGroupServer_1226538530516" nodeName="myNode01" serverName="server1"/>
<coreGroupServers xmi:id="CoreGroupServer_1226538530517" nodeName="myNode01" serverName="server2"/>
<coreGroupServers xmi:id="CoreGroupServer_1226538530518" nodeName="myNode01" serverName="server3"/>
<coreGroupServers xmi:id="CoreGroupServer_1186086834484" nodeName="myCellManager01" serverName="dmgr"/>

If there are other changes done recently or other reasons which prevent you from using the restoreConfig, then you can take a backup of current coregroup.xml and replace the valid copy from backup in the deployment manager configuration repository. To ensure that the deployment manager is aware of this change, it needs to be restarted. Then synchronize the changes with all nodes and restart the failing process which will resolve the problem.

Note: IBM recommends that users take regular backups of their good configuration as checkpoints. This will help in comparing those good backups with current configuration to determine changes and also help in restoring to a good checkpoint.

[{"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"High Availability (HA)","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"9.0;8.5.5;8.0;7.0","Edition":"Base;Network Deployment","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
15 June 2018

UID

swg21449799