IBM Support

Dump collection failed due to out of memory (OOM) error

Flashes (Alerts)


Abstract

The crashkernel memory reservation for kernel dump (kdump) and firmware assisted dump (FADump) operations is recommended based on few assumptions about the resources available for a specific system random access memory (RAM) size. If a power system fails to capture dumps with OOM error logs even after using the recommended crashkernel memory reservations for kdump and fadump operations, it is possible that the CPU core count is higher than average for the specific system RAM size.

For example, if a system crashes, then the kdump and FADump operations are likely to fail in collecting the dump if the CPU core count exceeds 40 and the system RAM is less than 128 GB. The kdump and FADump operations fail to collect the dump in such cases because the recommended crashkernel memory size for the capture kernel is not sufficient.

Content

Linux Releases Affected
Red Hat Enterprise Linux (RHEL) 8.6, 8.7, 8.8, 8.9
Red Hat Enterprise Linux (RHEL) 9.1, 9.2, 9.3
IBM Systems Affected

Power10 systems

Symptoms

If the capture kernel boots with the following backtrace when a system crashes, then the dump capture is likely to fail:

[    4.467979] swapper/2 invoked oom-killer: gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=0, oom_score_adj=0
[    4.467992] CPU: 2 PID: 1 Comm: swapper/2 Not tainted 5.14.0-362.el9.ppc64le #1
[    4.467999] Call Trace:
[    4.468002] [c00000001684b3e0] [c000000010897fa0] dump_stack_lvl+0x74/0xa8 (unreliable)
[    4.468019] [c00000001684b420] [c000000010441a78] dump_header+0x64/0x250
[    4.468026] [c00000001684b4a0] [c000000010441910] out_of_memory+0x3d0/0x440
[    4.468030] [c00000001684b530] [c0000000104d1aa4] __alloc_pages_may_oom+0x154/0x230
[    4.468035] [c00000001684b5d0] [c0000000104d262c] __alloc_pages_slowpath.constprop.0+0x78c/0xb40
[    4.468038] [c00000001684b720] [c0000000104d2bf0] __alloc_pages+0x210/0x2b0
[    4.468042] [c00000001684b7b0] [c0000000105064c0] alloc_page_interleave+0x30/0xb0
[    4.468047] [c00000001684b7e0] [c00000001051b928] allocate_slab+0x4e8/0x570
[    4.468051] [c00000001684b850] [c00000001051fff8] ___slab_alloc+0x468/0x8c0
[    4.468055] [c00000001684b960] [c000000010523b44] kmem_cache_alloc+0x1e4/0x620
[    4.468058] [c00000001684b9c0] [c000000010138444] create_events_from_catalog.constprop.0+0x44/0xd50
[    4.468065] [c00000001684bb50] [c0000000101393c4] hv_24x7_init+0xe4/0x260
[    4.468068] [c00000001684bbd0] [c000000010012120] do_one_initcall+0x60/0x2c0
[    4.468072] [c00000001684bca0] [c0000000120053c4] do_initcalls+0x13c/0x190
[    4.468079] [c00000001684bd50] [c0000000120056f4] kernel_init_freeable+0x240/0x2b4
[    4.468082] [c00000001684bdb0] [c000000010012730] kernel_init+0x30/0x1a0
[    4.468085] [c00000001684be10] [c00000001000cd64] ret_from_kernel_thread+0x5c/0x64
[    4.468089] Mem-Info:
[    4.468093] active_anon:0 inactive_anon:658 isolated_anon:0
A slight change in system configuration can lead to the same problem with a different backtrace. However, a backtrace that includes an out_of_memory function call or a system that invokes the oom-killer process indicates that the dump capture is likely to fail.
Workaround
As a workaround, you can revise the recommended crashkernel memory reservation for kdump and fadump by using the following formula:
Revised crashkernel size = Recommended crashkernel size + (Number of CPU cores * X)
where X = 12MB for kdump and 18MB for FADump
For example, if the recommended crashkernel size for a system is 2048M and the system has 50 CPU cores, then the updated crashkernel value must be calculated as follows:
  • For kdump: Revised crashkernel size = 2048 M + (50*12 M) = 2648 M
  • For FADump: Revised crashkernel size = 2048 M + (50*18 M) = 2848 M
Fix Outlook
Red hat Bug : 2236564
I/O device impacted

None

[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SGMV157","label":"IBM Support for Red Hat Enterprise Linux Server"},"ARM Category":[{"code":"a8m0z000000Gnl7AAC","label":"Red Hat Enterprise Linux"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
15 November 2023

UID

ibm17060322