IBM Support

PH39030: WEBSPHERE BATCH JOB DISPATCH CAN TIMEOUT UNDER LOAD

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • When Java batch queues the dispatch request, under load the
    expected response may not sent back in a timely manner, causing
    the dispatch request to timeout and be dispatched again even
    though it was actually dispatched successfully the first time.
    The re-dispatch may result in exceptions like those below as a
    result of the log directory already existing:
    
    1) CWLRB3860W: "04/15/21 22:45:31:135 EDT" Job "<JobId>" ended
    abnormally "and is restartable".
    
    2) java.lang.Exception: Job log part already exists
    	at com.ibm.ws.gridcontainer.services.impl.JobLogManagerImpl$Jo
    bLogWriter._openCurrentLogPartFile(JobLogManagerImpl.java:1289)
    
    The reporting customer also saw exceptions raised by their batch
    job application when a dataset the job needed was already
    gone/modified because the job had already run/was already
    running:
    com.ibm.batch.api.BatchContainerApplicationException:
    CWLRB2240E: "Grid Execution Environment step setup open Batch
    Data Stream failed¨ "jobid <JobId>¨: <AppExceptionClass>:
    <AppExceptionMessage>
    

Local fix

  • N/A
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  All users of IBM WebSphere Application      *
    *                  Server                                      *
    *                  Java Batch                                  *
    ****************************************************************
    * PROBLEM DESCRIPTION: WebSphere Java Batch job dispatch       *
    *                      requests could be dispatched twice      *
    *                      under                                   *
    *                      load                                    *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    When the Java batch Job Scheduler sends a job dispatch request
    via
    HTTP to the batch endpoint servlet, the typical behavior is that
    the request is processed by the Channel Framework and queued to
    WLM, and a response is returned to the Job Scheduler dispatch
    client relatively quickly so it can continue on to the next job
    to
    dispatch.
    Rarely under heavy load the expected response may not be sent
    back
    within 30 seconds, causing the dispatch request to timeout with
    an
    error like the following:
    Retry attempt#0 failed: caught exception during http POST:
    java.net.SocketTimeoutException: Read timed out
    The original dispatch does eventually complete queuing by
    Channel
    Framework but due to the timeout, the dispatch process of the
    same
    job would be retried by the Job Scheduler.
    The re-dispatch may result in an exception due to the job log
    directory already existing if the dispatch ended up on the same
    endpoint, and a CWLRB5815E message:  Job nnn cannot be
    dispatched
    when it is in [executing or submitted] state
    

Problem conclusion

  • A code update has been made to add a new Job Scheduler custom
    property job.dispatch.wait.timeout with default value of 60000
    milliseconds.  This property value can be increased to wait long
    before timing out the dispatch request.
    
    A documentation update has been made to add this new property to
    the Job scheduler custom properties documentation.
    
    The fix for this APAR is targeted for inclusion in fix pack
    8.5.5.22 and 9.0.5.12. For more information, see 'Recommended
    Updates for WebSphere Application Server':
    https://www.ibm.com/support/pages/node/715553
    

Temporary fix

Comments

APAR Information

  • APAR number

    PH39030

  • Reported component name

    WEBSPHERE FOR Z

  • Reported component ID

    5655I3500

  • Reported release

    850

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-07-15

  • Closed date

    2022-03-18

  • Last modified date

    2022-03-18

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WEBSPHERE FOR Z

  • Fixed component ID

    5655I3500

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS7K4U","label":"WebSphere Application Server for z\/OS"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"850"}]

Document Information

Modified date:
19 March 2022