APAR status
Closed as program error.
Error description
Corrupted directory causes GPFS daemon crash due to Assert exp(offset < ddbP->mappedLen). This can be hit when the corrupted directory is accessed, or hit when the file system manager is recovering the log. Reported in: Spectrum Scale 5.0.1 on RHEL7 Verification steps: You can find the following two different types of error: (1) [X] logAssertFailed: offset < ddbP->mappedLen [X] return code 0, reason code 0, log record tag 0 [I] Freezing overwrite mode tracing to preserve failure data [X] *** Assert exp(offset < ddbP->mappedLen) in line 421 of file /project/spreltac502/build/rtac5021835d/export/x86_64-lin ux/usr/include/mmfs/cxi/cxiIOBuffer.h [E] *** Traceback: [E] 2:0x7F4603D05D18 logAssertFailed + 0x418 at ??:0 [E] 3:0x7F460391D43E Direct::dinitContig(DirLayout const*, StripeGroup*, FileUID const&, int, char*, int, int, DirReservations*, KernelOperation*) + 0x23E at ??:0 [E] 4:0x7F460392ED65 Direct::dinitContigInode(StripeGroup*, FileUID const&, char*, int, char**, int) + 0x185 at ??:0 [E] 5:0x7F460393F6C0 RecoverDirdataInInode(StripeGroup*, LogRecovery*, LogFile*, LogRecordType, long long, int, unsigned int*, char*, int*, RepDiskAddr) + 0x680 at ??:0 [E] 6:0x7F4603CE40D1 LogRecovery::recoverOneObject(long long) + 0x1E1 at ??:0 [E] 7:0x7F46037C70C2 MultiThreadWork::doNextStep() + 0xC2 at ??:0 [E] 8:0x7F46037C755B MultiThreadWork::helperThreadBody(void*) + 0xCB at ??:0 [E] 9:0x7F4603811FC6 Thread::callBody(Thread*) + 0x46 at ??:0 [E] 10:0x7F46037FFB02 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 [E] 11:0x7F4602EA2DC5 start_thread + 0xC5 at ??:0 [E] 12:0x7F4601FAA76D __clone + 0x6D at ??:0 [N] Restarting mmsdrserv [E] Signal 6 at location 0x7F4601F7166D in process 3843, link reg 0xFFFFFFFFFFFFFFFF. [N] mmfsd is shutting down. [N] Reason for shutdown: Signal handler entered (2) [X] logAssertFailed: (!"kernel requested to die") [X] return code 0, reason code 0, log record tag 0 [X] *** Assert exp((!"kernel requested to die") threadId 22204 Failure at line 421 in file /project/sprelttn423/build/rttn423s006a/export/x86_64-lin ux/usr/include/mmfs/cxi/cxiIOBuffer.h rc 15360 reason 4096 data (offset < ddbP->mappedLen)) in line 383 of file /project/sprelttn423/build/rttn423s006a/src/avs/fs/mmfs/t s/fs/svfs.C [E] *** Traceback: [E] 2:0x7F9548ADAFB6 logAssertFailed + 0x1B6 at ??:0 [E] 3:0x7F95488AFB05 HandleMBDaemonToDie(MBDaemonToDieParms*) + 0x75 at ??:0 [E] 4:0x7F95485F5131 Mailbox::msgHandlerBody(void*) + 0x3D1 at ??:0 [E] 5:0x7F95485D9EA6 Thread::callBody(Thread*) + 0x46 at ??:0 [E] 6:0x7F95485C7442 Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0 [E] 7:0x7F9547C8D9D1 start_thread + 0xD1 at ??:0 [E] 8:0x7F9546E318FD clone + 0x6D at ??:0 mmfsd: /project/sprelttn423/build/rttn423s006a/src/avs/fs/mmfs/t s/fs/svfs.C:383: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion '(!"kernel requested to die") threadId 22204 Failure at line 421 in file /project/sprelttn423/build/rttn423s006a/export/x86_64-lin ux/usr/include/mmfs/cxi/cxiIOBuffer.h rc 15360 reason 4096 data (offset < ddbP->mappedLen)' failed. [E] Signal 6 at location 0x7F9546D7B625 in process 5168, link reg 0xFFFFFFFFFFFFFFFF. [I] rax 0x0000000000000000 rbx 0x00007F9548296000 [I] rcx 0xFFFFFFFFFFFFFFFF rdx 0x0000000000000006 [I] rsp 0x00007F953F5A7948 rbp 0x00007F9549759908 [I] rsi 0x00000000000016CE rdi 0x0000000000001430 [I] r8 0xFEFEFEFEFEFEFEFF r9 0xFF092D63646B6860 [I] r10 0x0000000000000008 r11 0x0000000000000206 [I] r12 0x00007F953F5A7B30 r13 0x00007F95497AB320 [I] r14 0x00007F9549F56C00 r15 0x0000000000000000 [I] rip 0x00007F9546D7B625 eflags 0x0000000000000206 [I] csgsfs 0x0000000000000033 err 0x0000000000000000 [I] trapno 0x0000000000000000 oldmsk 0x0000000010017807 [I] cr2 0x0000000000000000 [D] Traceback: [D] 0:00007F9546D7B625 raise + 35 at ??:0 [D] 1:00007F9546D7CE05 abort + 175 at ??:0 [D] 2:00007F9546D7474E __assert_fail_base + 11E at ??:0 [D] 3:00007F9546D74810 __assert_fail + 50 at ??:0 [D] 4:00007F9548ADAFDA logAssertFailed + 1DA at ??:0 [D] 5:00007F95488AFB05 HandleMBDaemonToDie(MBDaemonToDieParms*) + 75 at ??:0 [D] 6:00007F95485F5131 Mailbox::msgHandlerBody(void*) + 3D1 at ??:0 [D] 7:00007F95485D9EA6 Thread::callBody(Thread*) + 46 at ??:0 [D] 8:00007F95485C7442 Thread::callBodyWrapper(Thread*) + A2 at ??:0 [D] 9:00007F9547C8D9D1 start_thread + D1 at ??:0 [D] 10:00007F9546E318FD clone + 6D at ??:0 [N] Restarting mmsdrserv [E] Signal 6 at location 0x7F9546DF5A3D in process 5168, link reg 0xFFFFFFFFFFFFFFFF. [N] mmfsd is shutting down. [N] Reason for shutdown: Signal handler entered
Local fix
Problem summary
GPFS log assert "(offset < ddbP->mappedLen)" while accessing corrupted data in inode directory.
Problem conclusion
Replace log assert with fsstruct error while accessing corrupted data in inode directory. Work Around: Run offline fsck Problem trigger: GPFS defect D.1054097 may corrupt data in inode directory. Accessing this kind directory will log assert. It will abend the mmfsd daemon or crash the kernel. Symptom: Abend/Crash Platforms affected: ALL Operating System environments Functional Area affected: All Scale Users Customer Impact: High Importance Changed Externals: None
Temporary fix
Comments
APAR Information
APAR number
IJ12735
Reported component name
SPEC SCALE STD
Reported component ID
5737F33AP
Reported release
502
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2019-01-15
Closed date
2019-01-15
Last modified date
2019-02-12
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE STD
Fixed component ID
5737F33AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"502","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
12 February 2019