IBM Support

EEH errors on IBM POWER9 systems that are enabled with Virtual Persistent Memory

Flashes (Alerts)


Abstract

On IBM POWER9 systems that are enabled with Virtual Persistent Memory, Enhanced Error Handling (EEH) errors might be observed.

Content

Linux® Releases Affected
SUSE Linux ® Enterprise Server 15, Service Pack 1

IBM Systems Affected
All IBM POWER9 systems

I/O Devices Affected
All dedicated I/O adapters that are assigned to a logical partition with virtual persistent memory enabled, might be impacted by this issue. Virtualized I/O, such as virtual Ethernet, VNIC, virtual Fibre Channel, virtual SCSI, and SR-IOV are not affected by this issue.
Symptoms
When you perform I/O operation between a network or a storage device and a virtual persistent memory, EEH errors might be observed. The following example shows an EEH error:
[ 1988.400852] EEH: Frozen PHB#15-PE#800000 detected
[ 1988.400865] EEH: PE location: N/A, PHB location: N/A
[ 1988.400870] EEH: Frozen PHB#15-PE#800000 detected
[ 1988.400874] EEH: Call Trace:
[ 1988.400881] EEH: [c000000000047968] eeh_dev_check_failure.part.2+0x2b8/0x530
[ 1988.400890] EEH: [c00000000004849c] eeh_check_failure+0xfc/0x140
[ 1988.400900] EEH: [d00000000458a6f0] ipr_eh_abort+0x4d8/0x920 [ipr]
[ 1988.400906] EEH: [c000000000948290] scmd_eh_abort_handler+0x100/0x360
[ 1988.400912] EEH: [c000000000183804] process_one_work+0x304/0x5d0
[ 1988.400917] EEH: [c00000000018433c] worker_thread+0xcc/0x7a0
[ 1988.400923] EEH: [c00000000018e3dc] kthread+0x1ac/0x1c0
[ 1988.400929] EEH: [c00000000000b75c] ret_from_kernel_thread+0x5c/0x80
Workaround
Workaround for this issue is to boot the system with the disable_ddw=1 kernel parameter by completing the following steps:
  1. Edit the file /etc/default/grub. In this file, append disable_ddw=1 to the GRUB_CMDLINE_LINUX_DEFAULT entry.
  2. Run the grub2-mkconfig -o /boot/grub2/grub.cfg command to update the bootloader configuration.

Fix Outlook
IBM is working with SUSE to release a fix for this issue. The fix would come as part of a future SLES maintenance release. Open a support ticket with SUSE if a hot fix is needed before the next corresponding SUSE  maintenance release.
See SUSE bug number 1167867.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SGDMMD","label":"Power System AC922 Server (8335-GTC)"},"ARM Category":[],"Platform":[{"code":"PF048","label":"SUSE"}],"Version":"All versions","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Product Synonym

SUSE Linux® Enterprise Server 15, Service Pack 1

Document Information

Modified date:
07 December 2021

UID

ibm16194589