IBM Support

QRadar: Troubleshooting Disk Failure or Predictive Disk Failure Notifications

Troubleshooting


Problem

In the event that a system notification message is received for a QRadar appliance with one of the following two warnings: "Predictive Disk Failure: Hardware Monitoring has determined that a disk is in predictive failed state." or "Disk Failure: Hardware Monitoring has determined that a disk is in failed state. "

Cause

Disk Failure: hardware - Onboard system tools detected that a disk failed.

Predictive disk failure
The system monitors the status of the hardware on an hourly basis to determine when hardware support is required on the appliance. The on-board system tools detected that a disk is approaching failure or end of life. The slot or bay location of the failure is identified.

Resolving The Problem

If this is not a QRadar appliance, such as a Software install on your own hardware, then you should engage your hardware vendor directly. Otherwise, there is information that can be collected prior to contacting QRadar Support for QRadar Disk Failure issues to verify the problem and improve time to resolution as described below. Be aware that if you are not sure how to proceed, open a QRadar Support case as a SOFTWARE issue and the support representative will assist you in validating any hardware issues or questions.

  1. Log in to the QRadar Console.
  2. Review the System Notifications for disk failure messages.
    NOTE: If you are unsure if you have experienced any system notifications, you can do a search from the Log Activity tab for the following QRadar identifiers (QIDs): Add Filter > QID is 38750110 or 38750111

    The results will return matching events for these QIDs, along with the IP address of the Console and the timestamp of the event:
    38750110 - Disk Failure: Hardware Monitoring has determined that a disk is in failed state.
    38750111 - Predictive Disk Failure: Hardware Monitoring has determined that a disk is in predictive failed state.
     
  3. Be sure to note the IP address of the QRadar appliance from the system notification.
  4. SSH to the Console, then to the appliance from the notification message if it is a Managed Host.
  5. If uncertain as to whether the appliance is an IBM or Lenovo or Dell server, run the following command, which provides server manufacturer, product name and serial number:

    dmidecode -t System

    Example:

    [root@myserver ~]# dmidecode -t System
    # dmidecode 2.12
    SMBIOS 2.6 present.

    Handle 0x0100, DMI type 1, 27 bytes
    System Information
    Manufacturer: Dell Inc.
    Product Name: PowerEdge R510
    Version: Not Specified
    Serial Number: BPWX***
    UUID: 4C******-****-****-****-************
    Wake-up Type: Power Switch
    SKU Number: Not Specified
    Family: Not Specified




    Note: The Manufacturer, Product Name, and Serial Number is listed in the output of the command.
  6. Run the following command to verify which drive has failed:
    /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | egrep "^(Enclosure Device ID|Slot Number|Firmware state|Media Error Count|Other Error Count|Predictive Failure Count|Drive has flagged a S.M.A.R.T alert ):"

    image-20180801165452-1

    Note: The drive that needs to be replaced will show Drive has flagged a S.M.A.R.T alert = yes
  7. Using the information from step #3 Enclosure = 16; Slot =1 has failed. Set the drive offline using this example.
    /opt/MegaRAID/MegaCli/MegaCli64 PDOffline PhysDrv [16:1] a0
    /opt/MegaRAID/MegaCli/MegaCli64 PDMarkMissing PhysDrv [16:1] a0
    /opt/MegaRAID/MegaCli/MegaCli64 PDPrpRmv PhysDrv [16:1] a0
  8. Depending on whether this is an IBM or Dell appliance, run the following:
    1. IBM Lenovo System x: DSA
      • The Dynamic System Analysis (DSA) executable can be found installed by default in the directory /opt/qradar/support/. The executable file's name begins with ibm_utl for M3 and M4 Appliances and lnvgy_utl for M5 appliances. Run the command:

        [root@myserver ~]# /opt/qradar/support/<DSA_Version>_x86-64.bin

      • The report will be written to /var/log/IBM_Support/.
      • If the appliance is offline / unreachable, then a DSA report will need to be collected by using the Preboot DSA method.
         
    2. Dell PowerEdge servers, provide QRadar support with information requested so that they can provide a report to Dell.
       
  9. Open a new case as a SOFTWARE issue against QRadar SIEM and attach the generated DSA or  information about your Dell Appliance. NOTE: If you are unsure if your contact information has changed, include your phone number of email in your case description.
     
  10. The QRadar Support representative will review the information and triage the issue. If hardware or replacement parts are required the QRadar support representative will work with you and the hardware team.
     
  11. Only fill out this form if requested. Your QRadar Support Engineer might request that you complete the following replacement parts form: IBM QRadar Appliance Repair Form.


 

 

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"Component":"Hardware","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB24","label":"Security Software"}}]

Document Information

Modified date:
16 April 2021

UID

swg21985113