Flashes (Alerts)
Abstract
APAR HU02064 is a software issue that can cause detected data loss on RtC (Real-time Compression) volumes, affecting a 32 KB region of compressed data.
This can occur on SVC, V9000 and V7000 gen2/gen2+ systems, running 8.2.1.0-8.2.1.6 or 8.3.0.0 software.
The issue does not affect other hardware platforms and software levels, or compressed volumes in a Data Reduction Pool.
Content
If data is read from an affected region, this will cause the volume to be taken offline to prevent incorrect data being returned to the host. A volume repair operation is required to bring the volume online.
- Systems running 8.2.1.x with RtC compressed volumes should upgrade to the 8.2.1.8 PTF.
- Systems running 8.3.0.0 with RtC compressed volumes should upgrade to the 8.3.0.1 PTF.
- For each compressed volume, the risk of being affected by this issue, if the volume has not already been taken offline, is very low.
- In most cases, if incorrect compressed data is written, it is then detected soon afterwards, due to the host reading the incorrect data - causing the volume to be taken offline until it is repaired.
- If host backups have been reading the compressed volume, then it is very unlikely the volume has been affected (because the problem would already have been detected when the incorrect data block was read).
- If two copies exist, and only one is compressed, then delete the compressed copy and recreate it. This will copy the data from the uncompressed copy, which cannot be affected by this issue. Then move on to the next volume.
- If two copies exist, and both are compressed, no action is required. If one copy becomes offline due to corruption in the future, remove that copy and create a new copy to restore redundancy.
- If only one copy exists and it is compressed, move on to step 2.
addvdiskcopy -syncrate 100 -mdiskgrp <mdiskgrp_id> -ignoresyncerrors -autodelete <volume_id>
This action must be completed using the CLI and not the GUI. If required, the new volume copy can be created as thin-provisioned by adding "-rsize 2% -autoexpand", or compressed by adding "-rsize 2% -compressed -autoexpand".
Note that autoexpand must be enabled for thin or compressed copies, otherwise the new copy will go offline out-of-space.
Notes:
- Ensure there is enough free space in the given MDisk group (storage pool) for the new volume copy.
- The old volume copy will automatically be deleted once the data has been copied. Either maintain a list of volumes that have been copied, or rename the volume as each new copy is created, so that you know which volumes have already been dealt with.
- Sync rate of 100 will copy attempt to copy data at 64 MB/sec (depending on MDisk performance). If you are mirroring multiple volumes at once, ensure that the MDisk is capable of the total sync rate for the volumes being mirrored. You can reduce the sync rate by using the "chvdisk -syncrate X" CLI command
- The -ignoresyncerrors option changes the system behavior during volume copy synchronization. If data loss is detected during synchronization, the volume would normally be taken offline - but with this option, a medium error will instead appear in the eventlog. The synchronization process will skip the lost 32KB data block, and then continue copying the remainder of the volume.
3. Monitor the progress of the volume synchronization using the lsvdisksyncprogress CLI command
> lsvdisksyncprogress vdisk_id vdisk_name copy_id progress estimated_completion_time 9 vdisk9 1 0 191101182521
- If there is an 1840 error, use the GUI "run fix" button to produce a list of volumes and LBAs where data has been lost.
- If there is not an 1840 error, then no volumes were affected by APAR HU02064.
Was this topic helpful?
Document Information
Modified date:
28 March 2023
UID
ibm11099977