IBM Support

ServeRAID recovery from single defunct disk drive (DDD) failures - Servers and IntelliStation

Troubleshooting


Problem

ServeRAID recovery from single defunct disk drive (DDD) failures

Resolving The Problem

Overview

Recovering from a single disk failure on a ServeRAID controller may require multiple steps depending on the ServeRAID configuration. Ensure you have a recent backup of the data before proceeding.

If the disk that fails is configured in a redundant logical drive (RAID level-1, 1E, 5, 5E, 5EE, 10, 1E0, or 50), recovery is a two step process. The first step is to replace the defunct drive, and the second step is to regenerate the "Critical" logical drive back to an online or "Okay" state. If a qualified Hot Spare or Standby Hot Spare is configured, ServeRAID will automatically regenerate a "Critical" logical drive to an "Okay" state.

Recovering from a single disk failure when the disk is configured in a non-redundant logical drive (RAID level-0, or 00), all data on the logical drive is lost. The first step is to replace the defunct drive. The second step is to remove the affected "Offline" logical drive then recreate the non-redundant logical drive. The final step is to restore the data from a recent backup.

Replacing a defunct hard disk drive
Replacing a defunct hard disk drive attached to a Hot-Swap Backplane (internal backplane or EXP Storage Enclosure)
  1. While the system is powered on, identify the defunct physical drive to be replaced in one of the following ways: A solid Amber drive LED indicator will be on Using ServeRAID Manager, right click on the defunct drive and choose "Identify Drive". This will flash the drive LED and make it easier to locate the physical drive in larger configurations.
  2. Without removing the drive completely, gently remove the physical drive from hot-swap backplane, using the handle of the hot-swap tray.
  3. Wait 45 seconds to allow the hard disk drive motor to completely stop spinning.
  4. Remove the defunct drive and insert the replacement hard disk drive that is the same size (or larger) and use the handle of the hot-swap tray to secure the new drive into place against the hot-swap backplane.
  5. Within a few minutes, ServeRAID should detect the hot-swap event and, depending on the configuration, take one of the following actions:
    • If no Hot Spare or Standby Hot Spare was previously defined, ServeRAID should automatically initiate a rebuild to the replacement disk
    • If a Hot Spare or Standby Hot Spare was previously defined, ServeRAID should automatically set the state of the replacement disk to a Hot Spare or Standby Hot Spare respectively
    • If the hot-swap rebuild BIOS setting has been disabled, you will need to manually rebuild the hard disk drive
  6. Observe the drive for normal device startup behavior and LED activity
    • The replacement hard disk will spin up
    • The activity LED (green drive LED) should begin flashing and may eventually turn off or become very busy depending on the configuration.
    • The solid Amber drive LED indicator should turn off

Note: These state transitions can be observed in real time using ServeRAID Manager.

Optional: These same instructions can be used for an Offline hard disk drive replacement by booting to the IBM ServeRAID Support CD, if desired.

Replacing a hard disk drive attached to a standard SCSI cable
  1. Determine the Channel and SCSI ID of the defunct disk drive using ServeRAID Manager or IPSSEND GETCONFIG command Note: If an OS is not bootable to use these tools, boot the IBM ServeRAID Support CD to determine the Channel and SCSI ID, or the bootable Command Line diskette to run the IPSSEND GETCONFIG command.
  2. Power the ServeRAID system off, and open the system chassis.
  3. Locate the defunct hard disk drives by checking the physically attached channel and SCSI ID jumper settings for each drive Note: SCSI ID jumper settings are usually labeled on the drive.
  4. Disconnect the SCSI cable and power cable from the defunct drive, and remove the drive from the system.
  5. Configure the SCSI ID jumper settings for the replacement disk drive to the same SCSI ID as the defunct drive.
  6. Reconnect the SCSI cable and power cable.
  7. Review Best Practices for hard drives attached to SCSI cables.
  8. Close the system chassis.
  9. Power on the ServeRAID system.
  10. Verify the ServeRAID POST Banner properly detects the replaced drive.
  11. Within a few minutes, ServeRAID should detect the cold-swap event and, depending on the configuration, take one of the following actions:
  • If no Hot Spare or Standby Hot Spare was previously defined, ServeRAID should automatically initiate a rebuild to the replacement disk
  • If a Hot Spare or Standby Hot Spare was previously defined, ServeRAID should automatically set the state of the replacement disk to a Hot Spare or Standby Hot Spare respectively
  • If the hot-swap rebuild BIOS setting has been disabled, you will need to manually rebuild the hard disk drive
Best Practices for Hard Drives attached to SCSI cables
  • Use internal SCSI cables with embedded terminators at the end of the cable, whenever possible
  • Attach devices to the cable starting with the connector closest to the SCSI terminator (end of the cable) and work your way forward to the connectors closest to the controller
  • Each device attached to the SCSI cable must have a unique SCSI ID settings
  • The last device on a SCSI cable must terminate the SCSI bus. If there is an embedded terminator on the cable, ensure all other attached devices are NOT configured to provide termination. If the cable is not terminated, the device attached to the end of the cable must be jumpered/configured to provide termination.
Initiating a Rebuild using the ServeRAID Manager program or the IBM ServeRAID Support CD

Use ServeRAID Manager to manually initiate a rebuild operation or change the state of the drive to a hot spare. This is done by right clicking on the defunct drive that was recently replaced and select the rebuild operation or to change the state of the disk as appropriate.

Initiating a Rebuild using IPSSEND

The same commands can be initiated by using the IPSSEND command line tool. The IPSSEND REBUILD command or the IPSSEND SETSTATE command will force a rebuild operation to start or change the state of the disk respectively. Run these commands without switches to see the complete syntax for the commands. This utility is located on the IBM ServeRAID Support CD.

Note: Hot spare restriction - a HSP cannot be used if its capacity is smaller than that of the failed disk drive.

ServeRAID Manager screen during a rebuild

ServeRAID Manager during rebuild

Additional information
Need more help?
Please select one of the the following options for further assistance:

Contact your local technical Support Center
Return to the main Troubleshooting page

 

Document Location

Worldwide

Operating System

IntelliStation Pro:All operating systems listed

System x:All operating systems listed

Older System x:All operating systems listed

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW184","label":"Older System x->xSeries Internet Appliances"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW189","label":"Older System x->xSeries 360"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18E","label":"Older System x->xSeries 135"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18F","label":"Older System x->xSeries 150"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18G","label":"Older System x->xSeries 200"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18H","label":"Older System x->xSeries 220"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18J","label":"Older System x->xSeries 230"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18K","label":"Older System x->xSeries 240"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18L","label":"Older System x->xSeries 330"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18M","label":"Older System x->xSeries 340"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18Q","label":"System x->xSeries 232"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18R","label":"Older System x->xSeries 250"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18T","label":"Older System x->xSeries 300"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18U","label":"Older System x->xSeries 342"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18V","label":"Older System x->xSeries 350"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18W","label":"Older System x->xSeries 370"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW18X","label":"Older System x->xSeries 380"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW191","label":"Older System x->xSeries 440"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW19K","label":"Older System x->xSeries 255"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW19L","label":"Older System x->xSeries 343"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW19M","label":"Older System x->xSeries 205"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW19P","label":"Older System x->xSeries 235"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW19Q","label":"Older System x->xSeries 305"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW19R","label":"Older System x->xSeries 335"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW19S","label":"Older System x->xSeries 345"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW19T","label":"Older System x->xSeries 225"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW19U","label":"Older System x->xSeries 445"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW19W","label":"Older System x->xSeries 450"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW208","label":"Older System x->xSeries RXE-100"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20A","label":"Older System x->xSeries 382"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20E","label":"Older System x->xSeries 365"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20J","label":"Older System x->xSeries 206"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20K","label":"Older System x->xSeries 306"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20N","label":"Older System x->xSeries 226"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20P","label":"Older System x->xSeries 336"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20Q","label":"Older System x->xSeries 346"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20R","label":"Older System x->xSeries 236"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20Y","label":"Older System x->xSeries 366"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW21F","label":"Older System x->xSeries 455"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW21H","label":"Older System x->xSeries 260"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW21J","label":"Older System x->xSeries 460"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW21K","label":"Older System x->xSeries MXE 460"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW21L","label":"Older System x->xSeries 100"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW21M","label":"Older System x->xSeries 206m"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW21N","label":"xSeries 306m"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"SUNSET","label":"PRODUCT REMOVED"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HWP01","label":"IntelliStation Pro->IntelliStation M Pro"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HWP02","label":"IntelliStation Pro->IntelliStation Z Pro"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HWP03","label":"IntelliStation Pro->IntelliStation E Pro"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HWP99","label":"IntelliStation Pro->IntelliStation R Pro"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
28 January 2019

UID

ibm1MIGR-40364