IBM Support

Incorrect "Server start failed" message when using a java agent

Troubleshooting


Problem

When using a java agent, for example, WILY introspector, start of a Liberty Profile application server can display a "Server start failed" message even though the server startup was successful.

Cause


The problem occurs when an agent startup delays the usual server startup sequence by three or more seconds, pushing usual server startup steps outside of their usual time window.

IBM WebSphere Application Server v8.5.0.0, including v8.5.0.1 and v8.5.0.2, but not including v8.5.5.0 or higher, performs server startup with two process launches. A first process launch is the actual server startup. A second process launch verifies that the server startup was successful. The first process places a file lock as a very initial step, while the second process tests that file lock to verify that the server was successfully started. The second process retries for up to three seconds before concluding that the server launch failed.

Usually, retrying for three seconds is more than sufficient because the server lock is placed almost immediately. When the agent startup takes longer than three seconds, retrying for three seconds is no longer sufficient.

Environment

This problem can occur in all environments, but only when adding an agent to the server process.

Diagnosing The Problem

The problem is diagnosed by noting a server "Server failed to start" message, while at the same time seeing no indication of a failure in the server logs, and while server status indicates that the server is running.

The failure message will include the server name:


    server start defaultServer

    Server defaultServer start failed. Check server logs for details.


The server status is obtained using the server script, for example, if the server is running:

    server status defaultServer

    Server defaultServer is running.


Or, if the server is not running:

    server status defaultServer

    Server defaultServer is not running.

Resolving The Problem

Two work-arounds are provided. First, the error message may be ignored, with the server startup verified using an explicit call through the server script to obtain the server status.

Second, to completely avoid the error message, the server launch script may be edited to add a timing delay between the server startup process launch and the server verification process launch.



For non-windows non-OS/400 type launches, the updates are to "wlp/bin/server" near line 650. The "if" command, shown below, is on or near line 650. The update is to add a diagnostic statement plus a sleep statement. (OS/400 type launches use a different section of the launch script. Windows uses a completely different launch script. Similar modifications can be made for OS/400 and Windows, but are not presented in this note.)

The numeric value is a number of seconds to wait before obtaining server startup status. In this example, a value of "8" is used. Depending on the actual time added by the agent, a different value may be necessary.

if $JAVA_CMD_BACKGROUND; then
# Verify/wait for the process to start

safeEcho Wait 8 seconds to allow added processing to complete
sleep 8

clientCmd start "${SERVER_NAME}" --pid="${PID}" --status:start
"$@"
rc=$?

if [ $rc = 0 ]; then
safeEcho "${PID}" > "${PID_FILE}"
rm "${CLIENT_CMD_LOG}"
fi
fi
fi

[{"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF012","label":"IBM i"},{"code":"PF016","label":"Linux"},{"code":"PF014","label":"iOS"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"},{"code":"PF035","label":"z\/OS"}],"Version":"8.5.0.2;8.5.0.1;8.5","Edition":"Liberty","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
15 June 2018

UID

swg21636791