IBM Support

IJ12735: GPFS DAEMON CRASH: ASSERT EXP(OFFSET < DDBP->MAPPEDLEN)

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • Corrupted directory causes GPFS daemon crash due to
    Assert exp(offset < ddbP->mappedLen).   This can be hit
    when the corrupted directory is accessed, or hit when the
    file system manager is recovering the log.
    
    
    Reported in:
    Spectrum Scale 5.0.1 on RHEL7
    
    Verification steps:
    
    You can find the following two different types of error:
    
    (1)
    [X] logAssertFailed: offset < ddbP->mappedLen
    [X] return code 0, reason code 0, log record tag 0
    [I] Freezing overwrite mode tracing to preserve failure
    data
    [X] *** Assert exp(offset < ddbP->mappedLen) in line 421
    of file
    /project/spreltac502/build/rtac5021835d/export/x86_64-lin
    ux/usr/include/mmfs/cxi/cxiIOBuffer.h
    [E] *** Traceback:
    [E]         2:0x7F4603D05D18 logAssertFailed + 0x418 at
    ??:0
    [E]         3:0x7F460391D43E
    Direct::dinitContig(DirLayout const*, StripeGroup*,
    FileUID const&, int, char*, int, int, DirReservations*,
    KernelOperation*) + 0x23E at ??:0
    [E]         4:0x7F460392ED65
    Direct::dinitContigInode(StripeGroup*, FileUID const&,
    char*, int, char**, int) + 0x185 at ??:0
    [E]         5:0x7F460393F6C0
    RecoverDirdataInInode(StripeGroup*, LogRecovery*,
    LogFile*, LogRecordType, long long, int, unsigned int*,
    char*, int*, RepDiskAddr) + 0x680 at ??:0
    [E]         6:0x7F4603CE40D1
    LogRecovery::recoverOneObject(long long) + 0x1E1 at ??:0
    [E]         7:0x7F46037C70C2
    MultiThreadWork::doNextStep() + 0xC2 at ??:0
    [E]         8:0x7F46037C755B
    MultiThreadWork::helperThreadBody(void*) + 0xCB at ??:0
    [E]         9:0x7F4603811FC6 Thread::callBody(Thread*) +
    0x46 at ??:0
    [E]         10:0x7F46037FFB02
    Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0
    [E]         11:0x7F4602EA2DC5 start_thread + 0xC5 at ??:0
    [E]         12:0x7F4601FAA76D __clone + 0x6D at ??:0
    [N] Restarting mmsdrserv
    [E] Signal 6 at location 0x7F4601F7166D in process 3843,
    link reg 0xFFFFFFFFFFFFFFFF.
    [N] mmfsd is shutting down.
    [N] Reason for shutdown: Signal handler entered
    
    (2)
    [X] logAssertFailed: (!"kernel requested to die")
    [X] return code 0, reason code 0, log record tag 0
    [X] *** Assert exp((!"kernel requested to die")
    threadId 22204 Failure at line 421 in file
    /project/sprelttn423/build/rttn423s006a/export/x86_64-lin
    ux/usr/include/mmfs/cxi/cxiIOBuffer.h rc 15360 reason
    4096 data
    (offset < ddbP->mappedLen)) in line 383 of file
    /project/sprelttn423/build/rttn423s006a/src/avs/fs/mmfs/t
    s/fs/svfs.C
    [E] *** Traceback:
    [E]         2:0x7F9548ADAFB6 logAssertFailed + 0x1B6 at
    ??:0
    [E]         3:0x7F95488AFB05
    HandleMBDaemonToDie(MBDaemonToDieParms*) + 0x75 at ??:0
    [E]         4:0x7F95485F5131
    Mailbox::msgHandlerBody(void*) + 0x3D1 at ??:0
    [E]         5:0x7F95485D9EA6 Thread::callBody(Thread*) +
    0x46 at ??:0
    [E]         6:0x7F95485C7442
    Thread::callBodyWrapper(Thread*) + 0xA2 at ??:0
    [E]         7:0x7F9547C8D9D1 start_thread + 0xD1 at ??:0
    [E]         8:0x7F9546E318FD clone + 0x6D at ??:0
    mmfsd:
    /project/sprelttn423/build/rttn423s006a/src/avs/fs/mmfs/t
    s/fs/svfs.C:383: void logAssertFailed(UInt32, const
    char*, UInt32, Int32, Int32, UInt32, const char*, const
    char*): Assertion '(!"kernel requested to die")
    threadId 22204 Failure at line 421 in file
    /project/sprelttn423/build/rttn423s006a/export/x86_64-lin
    ux/usr/include/mmfs/cxi/cxiIOBuffer.h rc 15360 reason
    4096 data
    (offset < ddbP->mappedLen)' failed.
    [E] Signal 6 at location 0x7F9546D7B625 in process 5168,
    link reg 0xFFFFFFFFFFFFFFFF.
    [I] rax    0x0000000000000000  rbx    0x00007F9548296000
    [I] rcx    0xFFFFFFFFFFFFFFFF  rdx    0x0000000000000006
    [I] rsp    0x00007F953F5A7948  rbp    0x00007F9549759908
    [I] rsi    0x00000000000016CE  rdi    0x0000000000001430
    [I] r8     0xFEFEFEFEFEFEFEFF  r9     0xFF092D63646B6860
    [I] r10    0x0000000000000008  r11    0x0000000000000206
    [I] r12    0x00007F953F5A7B30  r13    0x00007F95497AB320
    [I] r14    0x00007F9549F56C00  r15    0x0000000000000000
    [I] rip    0x00007F9546D7B625  eflags 0x0000000000000206
    [I] csgsfs 0x0000000000000033  err    0x0000000000000000
    [I] trapno 0x0000000000000000  oldmsk 0x0000000010017807
    [I] cr2    0x0000000000000000
    [D] Traceback:
    [D] 0:00007F9546D7B625 raise + 35 at ??:0
    [D] 1:00007F9546D7CE05 abort + 175 at ??:0
    [D] 2:00007F9546D7474E __assert_fail_base + 11E at ??:0
    [D] 3:00007F9546D74810 __assert_fail + 50 at ??:0
    [D] 4:00007F9548ADAFDA logAssertFailed + 1DA at ??:0
    [D] 5:00007F95488AFB05
    HandleMBDaemonToDie(MBDaemonToDieParms*) + 75 at ??:0
    [D] 6:00007F95485F5131 Mailbox::msgHandlerBody(void*) +
    3D1 at ??:0
    [D] 7:00007F95485D9EA6 Thread::callBody(Thread*) + 46 at
    ??:0
    [D] 8:00007F95485C7442 Thread::callBodyWrapper(Thread*) +
    A2 at ??:0
    [D] 9:00007F9547C8D9D1 start_thread + D1 at ??:0
    [D] 10:00007F9546E318FD clone + 6D at ??:0
    [N] Restarting mmsdrserv
    [E] Signal 6 at location 0x7F9546DF5A3D in process 5168,
    link reg 0xFFFFFFFFFFFFFFFF.
    [N] mmfsd is shutting down.
    [N] Reason for shutdown: Signal handler entered
    

Local fix

Problem summary

  • GPFS log assert "(offset < ddbP->mappedLen)" while accessing
    corrupted data in inode directory.
    

Problem conclusion

  • Replace log assert with fsstruct error while accessing
    corrupted data in inode directory.
    
    Work Around:
    Run offline fsck
    
    Problem trigger:
    GPFS defect D.1054097 may corrupt data in inode directory.
    Accessing this kind directory will log assert. It will abend
    the mmfsd daemon or crash the kernel.
    
    Symptom:
    Abend/Crash
    
    Platforms affected:
    ALL Operating System environments
    
    Functional Area affected:
    All Scale Users
    
    Customer Impact:
    High Importance
    
    Changed Externals:
    None
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ12735

  • Reported component name

    SPEC SCALE STD

  • Reported component ID

    5737F33AP

  • Reported release

    502

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-01-15

  • Closed date

    2019-01-15

  • Last modified date

    2019-02-12

  • APAR is sysrouted FROM one or more of the following:

    IJ12683

  • APAR is sysrouted TO one or more of the following:

    IJ13560

Fix information

  • Fixed component name

    SPEC SCALE STD

  • Fixed component ID

    5737F33AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"502","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
12 February 2019