IBM Support

NVIDIA Quadro K600 causes software NMI and all PCI error - System x3650 M4

Troubleshooting


Problem

System x3650 M4 is installed with Nvidia graphic processing unit (GPU) Quadro K600 adapter in Peripheral Component Interconnect Express (PCIe) riser card, Customer Replaceable Unit (CRU) part number 00D9530. The server encountered Software Non Mask Interrupt (NMI) error and ALL PCI error in Intelligent Platform Management Interface (IPMI) event. On control panel, NMI and Fault error Light Emitting Diode (LED) were illuminated. During failure, the following error messages will be seen in IBM Dynamic System Analysis (DSA): IPMI:(10/01/2014 11:32:53) System chassis 1 (Critical Interrupt - NMI State): Assertion: Software NMI. IPMI:(10/01/2014 11:32:45) Group (not a physical entity) 130 (Slot/Connector - All PCI Error): Assertion: Fault Status asserted.

Resolving The Problem

Source

RETAIN tip: H213571

Symptom

System x3650 M4 is installed with Nvidia graphic processing unit (GPU) Quadro K600 adapter in Peripheral Component Interconnect Express (PCIe) riser card, Customer Replaceable Unit (CRU) part number 00D9530. The server encountered Software Non Mask Interrupt (NMI) error and ALL PCI error in Intelligent Platform Management Interface (IPMI) event. On control panel, NMI and Fault error Light Emitting Diode (LED) were illuminated.

During failure, the following error messages will be seen in IBM Dynamic System Analysis (DSA):

 

IPMI:(10/01/2014 11:32:53) System chassis 1 (Critical Interrupt - NMI State): Assertion: Software NMI.

IPMI:(10/01/2014 11:32:45) Group (not a physical entity) 130 (Slot/Connector - All PCI Error): Assertion: Fault Status asserted.

Affected configurations

The system may be any of the following IBM servers:

  • System x3650 M4, type 7915, any model
  • System x3650 M4, type 7915 E5-xxxxV2, any model

The system is configured with one or more of the following IBM Options:

  • NVIDIA Quadro K600, Option part number 90Y2383, any replacement part number (CRU)

This tip is not software specific.

Solution

Nvidia GPU Quadro K600 adapter should be installed with PCIe riser card Customer Replaceable Unit (CRU) part number 94Y6707.

  1. Check the CRU part number of the PCIe riser card.
  2. If the PCIe riser card CRU part number is 00D9530, replace the PCIe riser card with CRU 94Y6707.

Additional information

The symptom was caused by Nvidia GPU Quadro K600 adapter installed with the wrong PCIe riser card, Customer Replaceable Unit (CRU) part number 00D9530. The symptom will be fixed after replacing with PCIe riser card 94Y6707.

Document Location

Worldwide

Operating System

System x:Operating system independent / None

Lenovo x86 servers:Operating system independent / None

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU01DKP","label":"System x->System x3650 M4->7915"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"QUOFNIO","label":"Lenovo x86 servers->Lenovo System x3650 M4->7915"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
30 January 2019

UID

ibm1MIGR-5096660