IBM Support

Unable to capture the vmcore file on an IBM Power System AC922 with 4 or 6 GPUs and is running a GPU workload

News


Abstract

An inconsistent issue with kernel dump (kdump) is observed when you are running a graphics processing unit (GPU) workload on an IBM® Power System AC922 (8335-GTG) server with either 4 or 6 GPUs. Occasionally, the system is unable to capture the vmcore file.

Content

Linux Releases Affected
Red Hat® Enterprise Linux 8.1, and later
Red Hat Enterprise Linux 8.2, and later
IBM Systems Affected
All IBM POWER9™ systems with GPU.
Symptoms
When the system crashes, it occasionally hangs instead of capturing the vmcore file (when kdump is configured).
A trace information that is similar to the following example is logged:
[ 1271.368470] [c000001dc13d7da0] [c0000000005975b4] proc_reg_write+0x84/0x100 [ 1271.368520] [c000001dc13d7dd0] [c0000000004c8208] sys_write+0x128/0x390
[ 1271.368562] [c000001dc13d7e30] [c00000000000b388] system_call+0x5c/0x70 [ 1271.368619] Instruction dump:
[ 1271.368660] 4bfffe38 00000000 3c4c00e6 38428c10 7c0802a6 60000000 39200001 3d42ffeb
[ 1271.368724] 394a8248 912a0000 7c0004ac 39400000 <992a0000> 4e800020 3c4c00e6 38428be0
[ 1271.368790] ---[ end trace c0d54a4d67e3d96a ]--- [ 1271.491006]
Workaround
There is no workaround for this issue.

[{"Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SGMV157","label":"IBM Support for Red Hat Enterprise Linux Server"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Document Information

Modified date:
09 November 2020

UID

ibm16359057