IBM Support

VM65802: SMAPI VSMWORK* MACHINES ABEND WITH PROTECTION EXC

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Running with the SMAPI server environment, frequently, the SMAPI
    worker servers, VSMWORK1, VSMWORK2, VSMWORK3, etc, ABEND with a
    protection exception.  For example, VSMWORK2 ABENDs with:
                                                                   .
    23:27:18 DMSITP144T Protection exception occurred at 8114B4EC in
             routine RXDMSSIT while UFDBUSY = 01; re-IPL CMS
    23:27:18  * MSG FROM VSMWORK2: DMSDIE3550I All APPC/VM and IUCV
              paths have been severed.
    23:27:18 HCPMFS057I VSMWORK2 not receiving; disconnected
    23:27:18 DMSWSP314W Automatic re-IPL by CP due to disabled wait;
             PSW 000A0000 80F3BAE4
                                                                   .
    After one VSMWORKn machine ABENDs, all other VSMWORKn machines
    end up ABENDing the same way within a few seconds of each other
    until none are working properly.  The ABENDs occur about every
    three days or so and hits all VSMWORKn machines at nearly the
    same time.  At the time of the ABEND, SFS server VMSERVS (for
    filepool VMSYS), which is running with 64M of virtual storage,
    runs out of storage and processes storage reclamation.
         The protection exception occurs in DMSJCM (SFS Cache
    Update) when the code is handling Invalidate CNRs.  When the
    SFS server ran out of storage, it sent invalidate CNRs to the
    CMS client (VSMWORKn) to process.  It's during that processing
    in DMSJCM in which field DCHFSTIU (count of FSTs in use) in
    the current hyperblock becomes corrupted with a very large
    number (X'FFFFFFFF').  However, the ABEND does not occur at this
    time but instead the next time the SFS server runs out of
    storage and sends out the invalidate CNRs.  The corrupted
    value in field DCHFSTIU causes the code to attempt to write
    into the CMS nucleus which causes the protection exception.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All SFS and SMAPI users.                     *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    ****************************************************************
    * RECOMMENDATION: APPLY PTF                                    *
    ****************************************************************
    CMS clients who have an SFS directory accessed may experience a
    Protection Exception under certain conditions.  The CMS client
    maintains a cache (in the form of FSTs) representing the files
    in the directory that is built when the directory is accessed.
    As changes are made to files in the directory, the file pool
    server maintains a record of the update and will send the
    update when the client interacts with the file pool.  If
    the client is idle, the list of changes can build up in the
    file pool server, consuming virtual storage.  If eventually
    the file pool server runs out of storage, it will start a
    reclamation process, part of which is to send "Invalidate"
    records to all idle clients who have a directory accessed.
    It is the processing of these "Invalidates" that is defective
    and can lead to an overwrite of the CMS nucleus and the
    Protection Exception.
    

Problem conclusion

  • DMSJCM is responsible for processing the "Invalidate" records.
    The "Invalidate" records represent the metadata for all FSTs
    that are currently in the directory.  DMSJCM's job is to
    reconcile this data against the backlevel FST data that the
    CMS client currently has. It does so by first locating each
    HyperBlock for the accessed directory.  Each HyperBlock
    contains an array of FSTs that can fit in one 4K page, but
    there can be empty slots where files have been erased. DMSJCM
    steps through the FSTs in each HyperBlock to see if the FST
    needs updating. It used the counter, DCHFSTIU, which indicates
    how many active FSTs there are per HyperBlock to know how many
    entries per HyperBlock to process. It did not consider that
    there could be empty slots. This led to marking an empty slot
    for subsequent removal and ultimately DCHFSTIU became negative
    (x'FFFFFFFF').  On the next invocation of DMSJCM for
    Invalidate processing, the bad DCHFSTIU counter led to an
    overlay of a bit in x'FFFFFFFF' locations, eventually spilling
    in to the area occupied by the CMS nucleus.
    
    This APAR corrects the logic in DMSJCM to skip over empty
    slots in the HyperBlocks and only reconcile valid FST entries.
    

Temporary fix

Comments

APAR Information

  • APAR number

    VM65802

  • Reported component name

    VM CMS

  • Reported component ID

    568411201

  • Reported release

    630

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2016-03-10

  • Closed date

    2017-01-09

  • Last modified date

    2017-04-28

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UM35006 UM35007

Modules/Macros

  • DMSJCM
    

Fix information

  • Fixed component name

    VM CMS

  • Fixed component ID

    568411201

Applicable component levels

  • R630 PSY UM35006

       UP17/01/16 I 1000

  • R640 PSY UM35007

       UP17/01/16 P 1701

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG27M","label":"APARs - z\/VM environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"630","Edition":"","Line of Business":{"code":"LOB16","label":"Mainframe HW"}}]

Document Information

Modified date:
28 April 2017