IBM Support

ProblemDetermination: Node agent monitoring the WebSphere Application Server

Troubleshooting


Problem

Problem determination for problems with the node agent monitoring a WebSphere® Application Server. This should help address common issues with this component before calling IBM support and save you time.

Resolving The Problem

Tab navigation


The monitoring policy settings of the application server define how it is monitored by the node agent. The main properties involved in this are:
  • Ping Interval
    How often the nodeagent pings the application server process to ensure it is still running and able to handle requests.

  • Ping Timeout
    How long the node agent waits for a response from the app server before determining it is "unreachable".

  • Automatic Restart
    Should the node agent restart the application server if the application server process fails.

  • Node Restart State
    When the node agent starts up, does it also start the application server.


High level component flow

The node agent monitoring the application server begins with the node agent startup. The "Node Restart State" property defines whether or not the nodeagent spawns the application server process during its startup procedure. If this property is set to RUNNING, then it will attempt to start the application server:


ADMN1001I: An attempt is made to launch server1 on node WAS_60_Node01.


When the nodeagent spawns the application server process, the application server process will be considered the child of the node agent, which is the parent process. When the nodeagent does not start the application server (for example: startServer server1), the nodeagent will "adopt" the application server as its child. The node agent will then wait for the number of seconds in the "Ping Timeout" property for a response from the application server process that the startup was successful. This comes in the form of a discovery message.


ADMD0023I: The system discovered process (name: server1, type: ManagedProcess, pid: 2556)


If the node agent does not discover the application server process in the time defined in the "Ping Timeout" property, then it will assume the application server startup failed. Then the nodeagent will spawn a new application server process if the "Automatic Restart" property is set to true. It will continue this procedure for the number of times specified in the "Maximum startup attempts" property.

Once the node agent discovers the application server process, it updates it's routing table and begins to monitor the child process. The nodeagent checks the application server process to ensure it is still up and able to respond in intervals defined by the "Ping Interval" property.

The JMX communications are done via either SOAP or RMI. This is determined by the "Preferred Connector" configuration property for each Application Server. If the Preferred Connector is SOAP (default), then the communication is done through each server's SOAP_CONNECTOR_ADDRESS port (default 8880). If the Preferred Connector is RMI, then the communication is done through each server's BOOTSTRAP_ADDRESS port (default 2809).



Configuration files

The monitoring policy configuration settings are stored in each application server's server.xml file. Example path for server1 would be:


install_root/config/MyCell/nodes/MyNode/servers/server1/server.xml

The monitoring policy configuration settings appear as follows:


<monitoringPolicy xmi:id="MonitoringPolicy_1141403412562" maximumStartupAttempts="3" pingInterval="60" pingTimeout="180" autoRestart="true" nodeRestartState="STOPPED"/>


Example Startup Trace:

Trace output will be in black, while notes about the output will be in blue.

Nodeagent Trace: The nodeagent reads the monitoring policy settings for the application server.

[3/17/06 13:06:02:109 EST] 0000000a NodeAgent     3   process monitor policy:
                                server1
                                PingInterval: 60
                                PingTimeout: 180
                                MaximumStartupAttempts: 3
                                NodeRestartState: 1
                                PreviousState: -1
                                AutoRestart: true

[3/17/06 13:06:02:109 EST] 0000000a NodeAgent     3   pName = server1 currentPid = null
[3/17/06 13:06:02:109 EST] 0000000a NodeAgent     3   isRestartingAllServers = false
[3/17/06 13:06:02:109 EST] 0000000a NodeAgent     3   restartState = RUNNING
[3/17/06 13:06:02:109 EST] 0000000a NodeAgent     3   launching server1
<-- The nodeagent launches the server1 process since restartState = RUNNING
[3/17/06 13:06:02:109 EST] 0000000a NodeAgent     >  launchProcess Entry
                                 server1


Nodeagent Trace: The application server is launched with process ID 2556.

[3/17/06 13:06:05:547 EST] 0000000a NodeAgent     3   Launched process pid: 2556
[3/17/06 13:06:05:547 EST] 0000000a NodeAgent     1   addLaunchedChild
                                 serverName=server1
                                 pid=2556
[3/17/06 13:06:05:547 EST] 0000000a NodeAgent     >  saveNodeState Entry
[3/17/06 13:06:05:547 EST] 0000000a NodeAgent     3   restartServers = false monitorFile = C:\WebSphere60/profiles/AppSrv01/logs/nodeagent/monitor.state
[3/17/06 13:06:05:562 EST] 0000000a NodeAgent     3   launchedChildren 2556
[3/17/06 13:06:05:594 EST] 0000000a NodeAgent     <  saveNodeState Exit



Example Monitoring Trace:

Nodeagent Trace: The nodeagent pings the application server to see if it is up and running using a queryNames JMX call.

[3/17/06 13:07:35:391 EST] 00000059 PidWaiter     >  contact Entry
                                 2556
<-- PidWaiter contact Entry starts the monitoring process. Look for the final result when PidWaiter contact returns.
[3/17/06 13:07:35:391 EST] 00000059 SecurityHelpe 3   Getting server subject.
[3/17/06 13:07:35:391 EST] 00000059 AdminServiceI >  queryNames Entry
                                 WebSphere:type=Server,process=server1,*
...
[3/17/06 13:07:35:859 EST] 00000059 AdminServiceI >  getAttribute Entry                         WebSphere:name=server1,process=server1,platform=proxy,node=WAS_60_Node01,j2eeType=J2EEServer,version=6.0.2.7,type=Server,
mbeanIdentifier=cells/WAS_60_Cell01/nodes/WAS_60_Node01/servers/server1/server.xml#Server_1141403412266,
cell=WAS_60_Cell01,processType=ManagedProcess
<-- Sends a JMX call to the application server to check if server1 is still running

AppServer Trace: The application server receives the nodeagent's ping and responds if it is running.
[3/17/06 13:07:35:922 EST] 0000002b AdminServiceI >  getAttribute Entry                      WebSphere:name=server1,process=server1,platform=proxy,node=WAS_60_Node01,j2eeType=J2EEServer,version=6.0.2.7,type=Server,
mbeanIdentifier=cells/WAS_60_Cell01/nodes/WAS_60_Node01/servers/server1/server.xml#Server_1141403412266,
cell=WAS_60_Cell01,processType=ManagedProcess
<-- The application server receives the call
...
[3/17/06 13:07:35:922 EST] 0000002b AdminServiceI <  getAttribute Exit
                                 STARTED
[3/17/06 13:07:35:922 EST] 0000002b AdminServiceD <  getAttribute Exit
[3/17/06 13:07:35:922 EST] 0000002b SOAPConnector 3   return object type = class java.lang.String; value = STARTED
<-- The application server sends back the status


Nodeagent Trace: The nodeagent receives the server1's response.
[3/17/06 13:07:35:938 EST] 00000059 AdminServiceI <  getAttribute Exit
                                 STARTED
[3/17/06 13:07:35:938 EST] 00000059 PidWaiter     <  contact Exit
                                 true
<-- The PidWaiter contact method returns true since the server is started.
[3/17/06 13:07:35:938 EST] 00000059 PidWaiter     3   Pid 2556: For server1, bContact = true isProcessStopping = false alarmSyncObject = 0


Example trace showing server1 process was killed and restarted:

Nodeagent Trace: The nodeagent detects that server1 has been killed and restarts it.
[3/17/06 13:08:18:094 EST] 00000072 RoutingTable  >  RemoveChildThread.run Entry
[3/17/06 13:08:18:094 EST] 00000072 RoutingTable  3   RoutingListner.parentRemoved: com.ibm.ws.management.event.ProcessListener
[3/17/06 13:08:18:094 EST] 00000072 ProcessListen >  childRemoved Entry
                                 {cell=WAS_60_Cell01, version=6.0.2.7, pid=2556, name=server1, node=WAS_60_Node01, role=ManagedProcess}
...
[3/17/06 13:08:22:359 EST] 00000072 NodeAgentStat <  childRemoved Exit
...
[3/17/06 13:08:22:344 EST] 00000059 PidWaiter     3   Pid 2556: Process is being relaunched
[3/17/06 13:08:22:344 EST] 00000059 PidWaiter     >  reLaunchProcess Entry
                                 2556
[3/17/06 13:08:22:344 EST] 00000059 PidWaiter     A   ADML0064I: Restarting an unreachable server "server1".

[{"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"System Management\/Repository","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"8.5.5;8.5;8.0;7.0;6.1;6.0","Edition":"Base;Express;Network Deployment","Line of Business":{"code":"LOB45","label":"Automation"}},{"Product":{"code":"SSNVBF","label":"Runtimes for Java Technology"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Java SDK","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Document Information

Modified date:
15 June 2018

UID

swg21273613