APAR status
Closed as program error.
Error description
desc-only NSD unexpectedly reported "Disk failure" on its NSD server when the current FS manager was shudown and FS was taken over to this NSD server. PR enabled, and desc-only disks don't support PR and AIX logical volume is in used Unexpected disk down and node unhealthy, which can lead problem to application problems like for DB2. Reported in: 5.1.8
Local fix
Problem summary
In a two-node+tiebreaker cluster using server-based cluster configuration when one of the nodes is powered off and the other node tries to run election and opens the tiebreaker disk, it tries to call Disk::devOpen() which has a side effect of retrieving the WWN from the device. This logic of retrieving WWN from the device and has check on disk lease before sending the SCSI request,hitting a deadlock there. With CCR configuraiton, when it goes through election and tries to access tiebreaker disk, it invokes OpenDevice() from CCR directly, therefore it doesn't hit the problem.Removing the call of wwnFromDevice() from Disk::devOpen() eliminates this deadlock.
Problem conclusion
This problem is fixed in 5.1.9.9 To see all Spectrum Scale APARs and their respective Fix solutions refer to page: https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale _apars.html Benefits of the solution: Fix a deadlock situation during cluster probe in that configuration. Work Around: None Problem trigger: Deadlock during cluster probing (after node failure) in a two node cluster with tiebreaker disk and server-based cluster configuration Symptom: Deadlock during cluster probing in a two ndoe cluster with tiebreaker and server-based cluster configuration. Platforms affected: ALL Operating System environments Functional Area affected: Cluster configuraiton Customer Impact: High Importance
Temporary fix
Comments
APAR Information
APAR number
IJ53647
Reported component name
SPEC SCALE DME
Reported component ID
5737F34AP
Reported release
518
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2025-02-18
Closed date
2025-04-01
Last modified date
2025-04-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE DME
Fixed component ID
5737F34AP
Applicable component levels
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"518","Line of Business":{"code":"LOB69","label":"Storage TPS"}}]
Document Information
Modified date:
01 April 2025