APAR status
Closed as program error.
Error description
Under some timing condition, a migrate or a recall operation might deadlock itself trying to create a temporary file in the /fs/.SpaceMan/logdir/ directory while simultaneously holding an exclusive right on the file object that is being migrated or recalled on the same HSM managed file system. Customer/Support Diagnostics (If Applicable): The following script might show potentially hanging dsmmigrate / dsmrecalld processes in the system log (e.g. "/var/log/messages" on Linux): for i in /var/log/messages* ; do who=$(grep "blocked for more than" $i | awk '{gsub(":", " ", $0); print $10}' | sort | uniq -c); echo $i, $who; done ... /var/log/messages, 4 dsmmigrate 1 dsmrecalld ... There might be the following dsmmigrate / dsmrecalld related messages in the system log as well (e.g. "/var/log/messages" on Linux) with some hang symptoms inside Linux kernel filename_lookup() VFS call: ... 2021-03-29T00:13:38.403410+11:00 hostname kernel: [6364412.376579] INFO: task dsmmigrate:73490 blocked for more than 120 seconds. 2021-03-29T00:13:38.403428+11:00 hostname kernel: [6364412.378576] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2021-03-29T00:13:38.403430+11:00 hostname kernel: [6364412.380530] dsmmigrate D ffff88103df99208 0 73490 73488 0x00000080 2021-03-29T00:13:38.403431+11:00 hostname kernel: [6364412.380536] ffff880493f7f440 0000000000000086 ffff88054ffa5ee0 ffff880493f7ffd8 2021-03-29T00:13:38.403432+11:00 hostname kernel: [6364412.380540] ffff880493f7ffd8 ffff880493f7ffd8 ffff88054ffa5ee0 ffff88103df99218 2021-03-29T00:13:38.403433+11:00 hostname kernel: [6364412.380542] 0000000000000000 ffff88054ffa5ee0 ffffffffc0c75760 ffff88103df99208 2021-03-29T00:13:38.403434+11:00 hostname kernel: [6364412.380546] Call Trace: 2021-03-29T00:13:38.403436+11:00 hostname kernel: [6364412.380556] [<ffffffff816a94e9>] schedule+0x29/0x70 2021-03-29T00:13:38.403437+11:00 hostname kernel: [6364412.380581] [<ffffffffc070dee1>] cxiWaitEventWait+0x1d1/0x2f0 [mmfslinux] ... 2021-03-29T00:13:38.403460+11:00 hostname kernel: [6364412.380980] [<ffffffff8120f34b>] filename_lookup+0x2b/0xc0 2021-03-29T00:13:38.403461+11:00 hostname kernel: [6364412.380983] [<ffffffff81212ec7>] user_path_at_empty+0x67/0xc0 2021-03-29T00:13:38.403476+11:00 hostname kernel: [6364412.380985] [<ffffffff81212f31>] user_path_at+0x11/0x20 2021-03-29T00:13:38.403477+11:00 hostname kernel: [6364412.380990] [<ffffffff81206473>] vfs_fstatat+0x63/0xc0 2021-03-29T00:13:38.403478+11:00 hostname kernel: [6364412.380993] [<ffffffff81206a41>] SYSC_newlstat+0x31/0x60 2021-03-29T00:13:38.403479+11:00 hostname kernel: [6364412.380999] [<ffffffff8111f5c6>] ? __audit_syscall_exit+0x1e6/0x280 2021-03-29T00:13:38.403480+11:00 hostname kernel: [6364412.381002] [<ffffffff81206cce>] SyS_newlstat+0xe/0x10 2021-03-29T00:13:38.403482+11:00 hostname kernel: [6364412.381007] [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b ... 2021-03-30T08:13:38.416027+11:00 hostname kernel: [6479609.674271] INFO: task dsmrecalld:161985 blocked for more than 120 seconds. 2021-03-30T08:13:38.416047+11:00 hostname kernel: [6479609.676145] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2021-03-30T08:13:38.416048+11:00 hostname kernel: [6479609.677961] dsmrecalld D ffff881594fc71c8 0 161985 161960 0x00000080 2021-03-30T08:13:38.416074+11:00 hostname kernel: [6479609.677967] ffff8813cabb3400 0000000000000086 ffff88201eb3eeb0 ffff8813cabb3fd8 2021-03-30T08:13:38.416075+11:00 hostname kernel: [6479609.677972] ffff8813cabb3fd8 ffff8813cabb3fd8 ffff88201eb3eeb0 ffff881594fc71d8 2021-03-30T08:13:38.416077+11:00 hostname kernel: [6479609.677975] 0000000000000000 ffff88201eb3eeb0 ffffffffc0c75eb0 ffff881594fc71c8 2021-03-30T08:13:38.416078+11:00 hostname kernel: [6479609.677980] Call Trace: 2021-03-30T08:13:38.416079+11:00 hostname kernel: [6479609.677991] [<ffffffff816a94e9>] schedule+0x29/0x70 2021-03-30T08:13:38.416080+11:00 hostname kernel: [6479609.678013] [<ffffffffc070dee1>] cxiWaitEventWait+0x1d1/0x2f0 [mmfslinux] ... 2021-03-30T08:13:38.417064+11:00 hostname kernel: [6479609.678439] [<ffffffff8120f34b>] filename_lookup+0x2b/0xc0 2021-03-30T08:13:38.417065+11:00 hostname kernel: [6479609.678442] [<ffffffff81212ec7>] user_path_at_empty+0x67/0xc0 2021-03-30T08:13:38.417065+11:00 hostname kernel: [6479609.678454] [<ffffffffc06fe98f>] ? ss_fs_unlocked_ioctl+0x13f/0x530 [mmfslinux] 2021-03-30T08:13:38.417066+11:00 hostname kernel: [6479609.678457] [<ffffffff81212f31>] user_path_at+0x11/0x20 2021-03-30T08:13:38.417067+11:00 hostname kernel: [6479609.678461] [<ffffffff81206473>] vfs_fstatat+0x63/0xc0 2021-03-30T08:13:38.417068+11:00 hostname kernel: [6479609.678465] [<ffffffff812069de>] SYSC_newstat+0x2e/0x60 2021-03-30T08:13:38.417069+11:00 hostname kernel: [6479609.678468] [<ffffffff81200dfa>] ? vfs_write+0x17a/0x1e0 2021-03-30T08:13:38.417070+11:00 hostname kernel: [6479609.678472] [<ffffffff8111f5c6>] ? __audit_syscall_exit+0x1e6/0x280 2021-03-30T08:13:38.417071+11:00 hostname kernel: [6479609.678476] [<ffffffff81206cbe>] SyS_newstat+0xe/0x10 2021-03-30T08:13:38.417071+11:00 hostname kernel: [6479609.678481] [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b ... 2021-03-30T08:14:04.121036+11:00 hostname dsmwatchd: HSM(pid:9824): Restart local recall service. Reason: Recall service hang ... IBM Spectrum Protect Versions Affected: Space Management Client for Unix and Linux 7.1.x and 8.1.x on all supported platforms Initial Impact: Low Additional Keywords:IBM Spectrum Protect; HSM; TS004921636; recall; deadlock ; dsmrecalld ; migrate; recall
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect for Space Management (HSM) client * * versions 7.1.x and 8.1.x on all platforms * **************************************************************** * PROBLEM DESCRIPTION: * * See ERROR DESCRIPTION. * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is projected * * to be fixed in IBM Spectrum Protect for Space Management * * client level 8.1.13. Note that this is subject to change at * * the discretion of IBM. * ****************************************************************
Problem conclusion
After the fix, the IBM Spectrum Protect for Space Management recall function will work as expected.
Temporary fix
Comments
APAR Information
APAR number
IT36608
Reported component name
TSM SPACE MGMT
Reported component ID
5698HSMCL
Reported release
81L
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-04-27
Closed date
2021-06-08
Last modified date
2021-06-08
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
dsmrecal
Fix information
Fixed component name
TSM SPACE MGMT
Fixed component ID
5698HSMCL
Applicable component levels
[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSSR2R","label":"Tivoli Storage Manager for Space Management"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81L"}]
Document Information
Modified date:
09 June 2021