PH39030: WEBSPHERE BATCH JOB DISPATCH CAN TIMEOUT UNDER LOAD

APAR status

Closed as program error.

Error description

When Java batch queues the dispatch request, under load the
expected response may not sent back in a timely manner, causing
the dispatch request to timeout and be dispatched again even
though it was actually dispatched successfully the first time.
The re-dispatch may result in exceptions like those below as a
result of the log directory already existing:

1) CWLRB3860W: "04/15/21 22:45:31:135 EDT" Job "<JobId>" ended
abnormally "and is restartable".

2) java.lang.Exception: Job log part already exists
	at com.ibm.ws.gridcontainer.services.impl.JobLogManagerImpl$Jo
bLogWriter._openCurrentLogPartFile(JobLogManagerImpl.java:1289)

The reporting customer also saw exceptions raised by their batch
job application when a dataset the job needed was already
gone/modified because the job had already run/was already
running:
com.ibm.batch.api.BatchContainerApplicationException:
CWLRB2240E: "Grid Execution Environment step setup open Batch
Data Stream failed¨ "jobid <JobId>¨: <AppExceptionClass>:
<AppExceptionMessage>

Local fix

```
N/A
```

Problem summary

****************************************************************
* USERS AFFECTED:  All users of IBM WebSphere Application      *
*                  Server                                      *
*                  Java Batch                                  *
****************************************************************
* PROBLEM DESCRIPTION: WebSphere Java Batch job dispatch       *
*                      requests could be dispatched twice      *
*                      under                                   *
*                      load                                    *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
When the Java batch Job Scheduler sends a job dispatch request
via
HTTP to the batch endpoint servlet, the typical behavior is that
the request is processed by the Channel Framework and queued to
WLM, and a response is returned to the Job Scheduler dispatch
client relatively quickly so it can continue on to the next job
to
dispatch.
Rarely under heavy load the expected response may not be sent
back
within 30 seconds, causing the dispatch request to timeout with
an
error like the following:
Retry attempt#0 failed: caught exception during http POST:
java.net.SocketTimeoutException: Read timed out
The original dispatch does eventually complete queuing by
Channel
Framework but due to the timeout, the dispatch process of the
same
job would be retried by the Job Scheduler.
The re-dispatch may result in an exception due to the job log
directory already existing if the dispatch ended up on the same
endpoint, and a CWLRB5815E message:  Job nnn cannot be
dispatched
when it is in [executing or submitted] state

Problem conclusion

A code update has been made to add a new Job Scheduler custom
property job.dispatch.wait.timeout with default value of 60000
milliseconds.  This property value can be increased to wait long
before timing out the dispatch request.

A documentation update has been made to add this new property to
the Job scheduler custom properties documentation.

The fix for this APAR is targeted for inclusion in fix pack
8.5.5.22 and 9.0.5.12. For more information, see 'Recommended
Updates for WebSphere Application Server':
https://www.ibm.com/support/pages/node/715553

Temporary fix

Comments

APAR Information

APAR number
PH39030
Reported component name
WEBSPHERE FOR Z
Reported component ID
5655I3500
Reported release
850
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-07-15
Closed date
2022-03-18
Last modified date
2022-03-18

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
WEBSPHERE FOR Z
Fixed component ID
5655I3500

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS7K4U","label":"WebSphere Application Server for z\/OS"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"850"}]

Document Information

Modified date:
19 March 2022

Tips

PH39030: WEBSPHERE BATCH JOB DISPATCH CAN TIMEOUT UNDER LOAD

Subscribe to this APAR

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

Document Information

Share your feedback

Need support?