IBM Support

IT38758: BACKUPS OF LARGE OR BUSY DB2 OR MONGDB DATABASES CAN BE MARKED SUCCESSFUL BUT ARE INCOMPLETE, AND THUS CANNOT BE RESTORED.

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • IBM Spectrum Protect Plus agents for Db2 and MongoDB do not
    verify that snapshots created during a backup are still active
    and mounted before starting to copy data from them. The backup
    is reported as successful when it is actually incomplete. As a
    result, the database cannot be restored.
    
    Created snapshots for a large or high-load database can become
    'INACTIVE' for Linux LVM or 'INVALID' for AIX JFS2 and are
    silently unmounted from their mount points during backup because
    the snapshots are not large enough to hold all of the changes on
    the source logical volumes or JFS2 file systems.
    
    How to check for messages related to the failed backup in the
    IBM Spectrum Protect Plus logs that can confirm this issue:
    1. Generate a job log for the backup that is being checked.
    2. Extract the command.log file from this location:
      <Job_Name>/application/<Job_Number>/<Job_GUID>/<Agent_IP_addr
    ess>/
    3. Search the log for all occurrences of 'not mounted', without
    quotes, and make note of each of the Snapshot Volume Names.
      Example: umount: /tmp/<Snapshot_Volume_Name>: not mounted
    4. If the 'not mounted' message is NOT seen the issue was not
    encountered and there is no need to proceed further.
    5. Otherwise, choose one Snapshot Volume Name and search for
    'MainThread _backup_data: src path:
    //tmp/<Snapshot_Volume_Name>', without quotes.
      Example: [YYYY-MM-DD HH:MM:SS] DEBUG pid:NNNN MainThread
    _backup_data: src path:
    //tmp/<Snapshot_Volume_Name>/NODE0000/sqldbdir
    6. Make note of the PID. If you follow the PID down the logs,
    you may see messages like this:
      DEBUG pid:NNNN MainThread _backup_data: src path:
    //tmp/<Snapshot_Volume_Name>/NODE0000/sqldbdir
      DEBUG pid:NNNN MainThread _backup_data: dest path: /mnt/spp/v
    snap/vpool1/fsX/AA_BB_CC_DD/<DB>/db2/<DB>/<DB_Name>/NODE0000/sq
    ldbdir
      DEBUG pid:NNNN MainThread _backup_data: sign path: /mnt/spp/v
    snap/vpool1/fsX/AA_BB_CC_DD/<DB>/signature/db2/<DB>/<DB_Name>/N
    ODE0000/sqldbdir
      DEBUG pid:NNNN MainThread incremental_copy: Number of worker
    processes: 4
      DEBUG pid:NNNN MainThread incremental_copy: Read existing file
    catalog with 0 records
      DEBUG pid:NNNN MainThread incremental_copy: Obsolete files to
    wipe: 0
      DEBUG pid:NNNN MainThread incremental_copy: Writing new
    catalog with 0 records
      DEBUG pid:NNNN MainThread incremental_copy: Total size
    processed: 0B
      DEBUG pid:NNNN MainThread incremental_copy: Effectively copied
    data amount: 0B
      JOBLOG pid:NNNN MainThread joblog: <CTGGH0006> Time elapsed:
    0.15 seconds
      JOBLOG pid:NNNN MainThread joblog: <CTGGH0002> Data
    transferred in the backup operation: 0.0 MB
      JOBLOG pid:NNNN MainThread joblog: <CTGGH0003> Copied 0 files
    successfully for partition 0
    
    7. If all three of these messages - 'not mounted', 'Total size
    processed: 0B', and '<CTGGH0003> Copied 0 files successfully'
    are seen for the same Snapshot Volume Name, the issue was
    encountered and the backup should be considered to have failed.
    8. Repeat steps 5 - 7 for the each unique Snapshot Volume Name
    found in Step 3.
    
    Versions affected:
    10.1.x
    

Local fix

  • To fix the issue, configure parameters in the
    /etc/guestapps.conf and possibly add storage space to volume
    groups containing database data.
    
    
    These parameters in the guestapps.conf file allow
    customization of the size of the snapshots created by IBM
    Spectrum Protect Plus agents:
    
    Db2MinimumFreeSpaceInPercent ? default value is 10 %;
    Db2MaximumAllocationInPercent ? default value is 25 %;
    Db2MinimumSnapshotVolumeSize ? default value is 50 MB.
    
    guestapps.conf example:
    [DEFAULT]
    Db2MinimumSnapshotVolumeSize = 250
    Db2MinimumFreespaceInPercent = 50
    Db2MaximumAllocationInPercent = 100
    
    The following steps can be taken to manually test the number of
    changes for each of the Snapshot Volume Names that were
    previously identified:
    1. Create a snapshot. The size of the snapshot should be bigger
    than 25% of the source logical volume size:
      For Linux LVM: 'lvcreate -s -n <Snapshot_Name> -L
    <Snapshot_Size> <Source_LV>'
      For AIX JFS2: 'snapshot -o snapfrom=<SRC_fs_path> -o
    size=<snapshot_size> M'
    
    2. Monitor its status for a period of time equal to the duration
    of the database backup by periodically executing the following
    commands:
      For Linux LVM: 'lvs <Snapshot_Name>' or 'lvdisplay
    <Snapshot_Name>'
      For AIX JFS2: 'snapshot -q <SRC_fs_path>'
    
    'Good' case example:
    
    Linux
    lvdisplay /dev/SPPlog0vg/snap-test0
     ? Logical volume ?
     LV Path                /dev/SPPlog0vg/snap-test0
     LV Name                snap-test0
     VG Name                SPPlog0vg
     LV UUID                pkKCh1-C5mm-oCs8-JMFc-kTiY-FOtE-r3isw8
     LV Write Access        read/write
     LV Creation host, time floridaprod1, 2021-10-12 16:06:15 +0200
     LV snapshot status     active destination for lvSPPlog0
     LV Status              available
     open                   0
     LV Size                252.00 MiB
     Current LE             63
     COW-table size         52.00 MiB
     COW-table LE           13
     Allocated to snapshot  96.25%
     Snapshot chunk size    4.00 KiB
     Segments               1
     Allocation             inherit
     Read ahead sectors     auto
     currently set to       8192
     Block device           253:25
    
    AIX
    snapshot -q /db2/SPN/log_dir/NODE0000
    Snapshots for /db2/SPN/log_dir/NODE0000
    Current  Location          512-blocks        Free Time
    *   /dev/fslv00            65536       64768 Wed Oct 13 19:41:19
    CEST 2021
    ++
    
    
    'Bad' case example:
    
    Linux
    lvdisplay /dev/SPPlog0vg/snap-test0
     ? Logical volume ?
     LV Path                /dev/SPPlog0vg/snap-test0
     LV Name                snap-test0
     VG Name                SPPlog0vg
     LV UUID                pkKCh1-C5mm-oCs8-JMFc-kTiY-FOtE-r3isw8
     LV Write Access        read/write
     LV Creation host, time floridaprod1, 2021-10-12 16:06:15 +0200
     LV snapshot status     INACTIVE destination for lvSPPlog0
     LV Status              available
     open                   0
     LV Size                252.00 MiB
     Current LE             63
     COW-table size         52.00 MiB
     COW-table LE           13
     Snapshot chunk size    4.00 KiB
     Segments               1
     Allocation             inherit
     Read ahead sectors     auto
     currently set to       8192
     Block device           253:25
    
    AIX
    snapshot -q /db2/SPN/log_dir/NODE0000
    Snapshots for /db2/SPN/log_dir/NODE0000
    Current  Location          512-blocks        Free Time
    INVALID  /dev/fslv00            65536             Wed Oct 13
    19:41:19 CEST 2021
    
    
    3. If the snapshot becomes 100% full and 'INACTIVE' (for LVM) or
    'INVALID' (for JFS2) at any time during the testing period, then
    its size (<Snapshot_Size>) should be increased and the test
    should be done again after manually deleting any snapshots that
    were previously created.
      To remove a snapshot use:
      For Linux LVM: 'lvremove -f <Snapshot_Name>
      For AIX JFS2: 'snapshot -d <snapshot_LV>'
    
    4. Once a size that doesn't result in a failure is found, then
    the Db2MaximumAllocationInPercent setting in the SPP DB2 agent's
    /etc/guestapps.conf file should be updated with the new value.
      The new value for the Db2MaximumAllocationInPercent can be
    calculated as:
      snapshot_size / sourceLV_size * 100
      if a volume group holding the source logical volume (or JFS2
    file system) has a 'free' to 'used' space ratio bigger than this
    value.
      Otherwise, the IBM Spectrum Protect Plus agent for Db2 (or
    MongoDB) selects the minimum from the calculated snapshot size
    and available free space.
    
    5. If there are several identified snapshot volumes, then the
    maximum value from all calculated values for
    Db2MaximumAllocationInPercent should be used.
    
    
    
    The most reliable snapshot size that gives 100% guarantee that
    the snapshot will not become full and 'invalid' is equal to the
    size of the source logical volume (file system). To set it, set
    Db2MaximumAllocationInPercent to 100% and, if needed, add some
    free space to a volume group containing the source logical
    volume.
    
    It is sometimes useful to set a lower limit for a snapshot size
    by setting the parameter Db2MinimumSnapshotVolumeSize parameter
    (default value is 50 MB). In this case, the IBM Spectrum Protect
    Plus agent for Db2 (or MongoDB) will create snapshots with a
    size not less than this value.
    
    For example, if the calculated size is 52 MB and
    Db2MinimumSnapshotVolumeSize = 1024 then the resulting size will
    be 1024 MB.
    
    Another useful parameter is Db2MinimumFreeSpaceInPercent
    (default value is 10%) which prevents starting a backup when
    free space on at least one volume group containing logical
    volumes that should be backed up is less than
    Db2MinimumFreeSpaceInPercent of used space on the volume group.
    
    For example, the agent will not start a backup if
    Db2MinimumFreeSpaceInPercent = 10 and the Volume Group contains
    < 10 GB of free space when used space on the Volume Group is 100
    GB.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus levels 10.1.2, 10.1.3, 10.1.4,     *
    * 10.1.5, 10.1.6, 10.1.7, 10.1.8 protecting Db2 or MongoDB     *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See ERROR description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in the 10.1.9. Note that this          *
    * information is subject to change at the discretion of IBM.   *
    ****************************************************************
    

Problem conclusion

  • The IBM Spectrum Protect Plus Db2 and MongoDB agents have been
    fixed to correctly detect snapshot states when they are
    unavailable for backup operations and issue appropriate error
    messages to report backup failures. When the backup operation
    for Db2 or MongoDB database fails with an error, the user can
    find the instructions to resolve the issue in the IBM Spectrum
    Protect Plus 10.1.9 Documentation for various operating systems
    (Linux/AIX). These procedures have been added to the IBM
    documentation to ensure successful backup operations for Db2 and
    MongoDB databases. For information, refer to Troubleshooting
    failed backup operations for large Db2 and MongoDB databases.
    IBM Docs URL:
    https://www.ibm.com/docs/en/spp/10.1.9?topic=troubleshooting-fai
    led-backup-operations-large-db2-mongodb-databases
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT38758

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A16

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-10-20

  • Closed date

    2021-12-09

  • Last modified date

    2021-12-13

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • Apps     Db2      MongoDB
    

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A16","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
31 January 2024