Flashes (Alerts)
Abstract
In the unlikely event that an NVMe drive loses a small amount of data (known as a medium error) and this medium error is discovered for the first time during an array rebuild, an array may store bad data on that drive rather than repairing the data.
The system is continuously checking for and automatically repairing medium errors – so the likelihood of a medium error being discovered for the first time during a rebuild is very low.
If your system hits this issue, an error will be logged in the event log within 7 days.
This issue does not affect arrays containing SAS drives.
Content
- Full rebuilds occur after normal drive failures.
- Partial rebuilds occur after drive upgrades.
- Partial rebuilds occur if a drive is temporarily running slower than the rest of the RAID array.
- If the system detects a drive medium error during normal operation (not during a rebuild) then the system will correctly reconstruct the data and permanently repair the medium error so that the issue cannot occur.
- The RAID scrub will ensure that all medium errors are detected and repaired within 7 days of occurrence.
- If the system detects a drive medium error during a rebuild, incorrectdata will be written to a drive causing an undetected data corruption.
- This condition should only occur if the medium error was created by the drive within the last 7 days, and has not yet been found by the RAID scrub.
- In this case, the RAID scrub will read that incorrect data within 7 days and log a 1691 error indicating that the array's data and parity information do not match.
- If no 1691 errors are present in the error log, then this issue has not occurred.
This issue affects all software releases that support NVMe drives and was fixed under APAR HU02186 in v8.2.1.11 and v8.3.1.2, however subsequent changes have caused this fix to no longer be effective in specific releases v8.2.1.12 and v8.3.1.4.
The fix for APAR HU02186 is present in releases v8.2.1.13 and v8.3.1.5 (or later) which are now available on IBM Fix Central.
Was this topic helpful?
Document Information
Modified date:
28 March 2023
UID
ibm16254742