Flashes (Alerts)
Abstract
Power Systems with Broadcom Emulex Fibre Channel adapters might fail to add or remove CPUs by using Dynamic Logical Partitioning (DLPAR). This failure can be seen in Red Hat Enterprise Linux 8, Red Hat Enterprise Linux 9.x, and SUSE Linux Enterprise Server 15.
Content
Power Systems with Emulex FC adapters
When performing DLPAR operations on CPUs while the Emulex FC adapter is installed, there is a possibility that the driver might not register the addition of new CPUs or the removal of active CPUs. This failure might cause the system to hit a soft lock up that can look similar to the following trace:
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
Modules linked in: rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding ip_set nf_tables nfnetlink dm_service_time dm_multipath pseries_rng mlx5_ib xts vmx_crypto ib_uverbs ib_core binfmt_misc xfs libcrc32c sd_mod sg ibmvscsi scsi_transport_srp ibmveth mlx5_core lpfc nvmet_fc nvmet nvme_fc nvme_fabrics mlxfw nvme_core tls t10_pi scsi_transport_fc psample dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: nft_compat]
CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 4.18.0-408.el8.ppc64le #1
NIP: c00800000f0fbbfc LR: c00800000f1135dc CTR: c00800000f0ff598
REGS: c0000000021832f0 TRAP: 0901 Not tainted (4.18.0-408.el8.ppc64le)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28000288 XER: 00000003
CFAR: c00800000f0fb7cc IRQMASK: 0
GPR00: c00800000f1135dc c000000002183580 c00800000f23ae00 c00000362cfc0000
GPR04: c000003635119348 0000000000000000 0000000000000004 0000000000000001
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000002
GPR12: c00800000f0ff598 c000000002e10000
NIP [c00800000f0fbbfc] lpfc_sli4_process_eq+0x544/0x710 [lpfc]
LR [c00800000f1135dc] lpfc_sli4_poll_hbtimer+0xc4/0xe0 [lpfc]
Call Trace:
[c000000002183640] [c00800000f1135dc] lpfc_sli4_poll_hbtimer+0xc4/0xe0 [lpfc]
[c000000002183680] [c00000000025bb70] call_timer_fn+0x50/0x200
[c000000002183710] [c00000000025be68] expire_timers+0x148/0x230
[c000000002183780] [c00000000025c7f0] run_timer_softirq+0x3f0/0xe80
[c000000002183850] [c000000000f6370c] __do_softirq+0x16c/0x3e4
[c000000002183940] [c000000000179f94] irq_exit_rcu+0x1a4/0x1d0
[c000000002183970] [c000000000179fe0] irq_exit+0x20/0x40
[c000000002183990] [c000000000020958] timer_interrupt+0x128/0x2f0
[c0000000021839f0] [c0000000000091b0] decrementer_common+0x110/0x120
--- interrupt: 901 at plpar_hcall_norets+0x1c/0x28
LR = dedicated_cede_loop+0x168/0x1d0
[c000000002183cf0] [c0000000021c754c] cpu_idle_force_poll+0x0/0x4 (unreliable)
[c000000002183d70] [c000000000bb21ac] cpuidle_enter_state+0x33c/0x7e0
[c000000002183de0] [c000000000bb26f0] cpuidle_enter+0x50/0x70
[c000000002183e20] [c0000000001cdf38] do_idle+0x3d8/0x470
[c000000002183ea0] [c0000000001ce218] cpu_startup_entry+0x38/0x40
[c000000002183ed0] [c0000000000106c4] rest_init+0xe0/0xf8
[c000000002183f00] [c0000000016a44b0] start_kernel+0x690/0x6cc
[c000000002183f90] [c00000000000adcc] start_here_common+0x1c/0x550
There is no workaround for this issue currently. It is advised to shut down the logical partition rather than using the DLPAR operation before adding or removing CPUs from the configuration.
For more information about DLPAR, see Dynamic logical partitioning.Was this topic helpful?
Document Information
Modified date:
21 December 2022
UID
ibm16847539