Fixes are available
APAR status
Closed as program error.
Error description
In a server farm running MobileFirst, there is a heartbeat between the MobileFirst Runtime and the Admin Services. The heartbeat is supposed to ensure that the Admin Service knows whether the runtime is still alive and running. The heartbeat mechanism is implemented through a JMX call. If the server is very busy, this JMX call can time out. In this case, the runtime is immediately set into "require synchronization" mode, which causes all other requests to the runtime to be responded with a 503 Denial of Service response. The runtime cannot exit this mode since no code triggers a re-synchronization, hence it stays in this mode until the server is restarted.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * Users of MobileFirst Server * **************************************************************** * PROBLEM DESCRIPTION: * * In a server farm, after a while and some potential network * * instability or high load, the MobileFirst Server starts * * responding with the HTTP code 503 "Denial of service". Even * * when the network is back or the load decreased, the Server * * remains in this mode and the user must restart the server. * * The expected behavior is that the MobileFirst Server is more * * forgiving against network/load fluctuations and does not * * enter the 503 "Denial of service" mode on the first * * temporary failure, but only when there are unrecoverable * * failures over a longer time. * * * * The problem is related to the heartbeat mechanism between * * Administration Services and MobileFirst Runtime. If the * * heartbeat fails the first time due to a timeout, the server * * enters the 503 "Denial of service" mode immediately. * * Instead, it should retry and enter the mode only when * * multiple heartbeats fail over long time. * * * * Only server farm topologies are affected. Websphere Network * * Deployment or any Standalone topology is not affected by * * this problem. * **************************************************************** * RECOMMENDATION: * * - * ****************************************************************
Problem conclusion
The problem was solved by changing the code so that the heartbeat enters the 503 "Denial of service" mode only if many heartbeats fail over a longer time. Minimally, the worklight-jee-library.jar must be reinstalled to install the fix.
Temporary fix
Comments
APAR Information
APAR number
PI49225
Reported component name
MFPF/WORKLIGHT
Reported component ID
5725I4301
Reported release
700
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2015-09-22
Closed date
2015-09-23
Last modified date
2015-09-23
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
MFPF/WORKLIGHT
Fixed component ID
5725I4301
Applicable component levels
R700 PSY
UP
R710 PSY
UP
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSZH4A","label":"IBM Worklight"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"700","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
17 October 2021