Troubleshooting
Problem
VMware ESXi crashes with a kernel panic stop error on a purple screen reporting a saved backtrace of 'Heartbeat NMI' (where NMI stands for Non-Maskable Interrupt). VMware 'vmkernel.log' or 'vmkernel-log.1' will show messages similar to these: -WARNING: Heartbeat: 645: PCPU 22 didn't have a heartbeat for 21 seconds; *may* be locked up. -ALERT: NMI: 579: NMI IPI recvd. We Halt. -Panic: 909: Saved backtrace: pcpu 22 Heartbeat NMI
Resolving The Problem
Source
RETAIN tip: H21462
Symptom
VMware ESXi crashes with a kernel panic stop error on a purple screen reporting a saved backtrace of 'Heartbeat NMI' (where NMI stands for Non-Maskable Interrupt).
VMware log file vmkernel.log or vmkernel-log.1 has messages similar to these:
-WARNING: Heartbeat: 645: PCPU 22 didn't have a heartbeat for 21 seconds; *may* be locked up. -ALERT: NMI: 579: NMI IPI recvd. We Halt. -Panic: 909: Saved backtrace: pcpu 22 Heartbeat NMI |
Affected configurations
The system can be any of the following IBM servers:
- BladeCenter HS23, type 1929, any model
- BladeCenter HS23, type 7875, any model
- BladeCenter HS23, Type 7875 E5-xxxxV2, any model
- Flex System x222 Compute Node, Type 7916, any model
- Flex System x240 Compute Node, Type 8737, any model
- Flex System x440 Compute Node, Type 2584, any model
The system is configured with one or more of the following IBM options:
- Emulex 10 Gb Ethernet Virtual Fabric Adapter Advanced II for IBM BladeCenter HS23, option part number 90Y9332, any replacement part number
- Emulex 10 Gb Ethernet Virtual Fabric Adapter Advanced II for IBM BladeCenter, option part number 90Y3566, any replacement part number
- Emulex 10 Gb Ethernet Virtual Fabric Adapter II Option, for IBM system x, option part number 49Y7951, any replacement part number
- Emulex 10 Gb Ethernet Virtual Fabric Adapter II for IBM BladeCenter HS23, option part number 81Y3120, any replacement part number
- Emulex 10 Gb Ethernet Virtual Fabric Adapter II for IBM BladeCenter, option part number 90Y3550, any replacement part number
- Emulex Dual-Port 10 Gb Virtual Fabric Adapter Advanced II for IBM BladeCenter, option part number 00Y3264, any replacement part number
- Emulex Dual-Port 10 Gb Virtual Fabric Adapter II for IBM BladeCenter, option part number 00Y3266, any replacement part number
- Flex System CN4054 10 Gb Virtual Fabric Adapter, Option 90Y3554, any CRU
- Flex System CN4054R 10 Gb Virtual Fabric Adapter, Option 00Y3306, any CRU
This tip is not software specific.
The system has the symptom described above.
Solution
Update to Emulex's firmware 10.2.261.36 and driver 10.2.x.x released with '14a' code package.
The firmware and driver files are available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on IBM Support's Fix Central web page, at the following URL:
http://www.ibm.com/support/fixcentral/
Additional information
The reason for the stop error is that the OS kernel is not receiving a reply to an IOCTL (a device specific, input/output control, system call) in an allotted amount of time.
The delayed response was a result of the Emulex adapter's processor (BE3 ARM) being tied up managing burst traffic or a broadcast storm and could not process the IOCTL in enough time to satisfy the Operating System (OS).
As a result, the OS has a kernel panic and halts at a stop error.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
30 January 2019
UID
ibm1MIGR-5093255