Explanation
In the message,
text is:
GROUP gnme MEMBER mnme JOB jnme ASID asid
STALLED AT sdate stime ID: s#.r#
LAST MSGX: adate sitime siexit STALLED swork PENDINGQ
LAST GRPX: gdate gtme gnexit STALLED gwork PENDINGQ
LAST STAX: stdate sttime stexit STALLED
The
indicated XCF Group Member is not processing its XCF work in a timely
manner. The processing of at least one work item appears to be stalled.
Note: It is very unlikely that the delays are caused by a
problem in XCF.
Possible explanations include:
- Contention problems in the user exit routine(s). Perhaps the exit
routine is suspended waiting to obtain the local lock or a latch.
DISPLAY GRS,C will identify latch contention.
- SRBs not dispatched in a timely manner. Perhaps the member address
space is swapped out or a dump is in progress. Perhaps
the dispatch priority of the member address space is too low. Perhaps
a loop in some other work unit is consuming most of the CPU resource.
The looping work unit need not be in the member address space. It
could be in an address space other than those identified by the IXC431I
message(s).
- An influx of work has exceeded the processing capacity of the
member or system. The influx may be a temporary spike that the system
can work through with time. It could be the residual effect of some
other problem that caused processing of an otherwise normal workload
to be delayed.
- Some other member or system in the sysplex is not processing its
work in a timely manner. Although XCF may have identified the indicated
member as stalled, the situation could be the result of sympathy sickness
arising from processing delays elsewhere in the sysplex (which may
or may not have been identified).
- A member or system might be engaged in reconfiguration or recovery
processes that must complete before normal processing can proceed.
For example, a system may have just become active in the sysplex,
a system may have just been removed from the sysplex, a member may
be joining the group, a member may be leaving the group, or some other
application specific processes may be running.
- The user exit routine may have a coding error in which it returns
to the dispatcher instead of returning to XCF. One would expect this
situation to occur only when testing a new application that exploits
XCF services.
It may not be possible to determine the impact
to the application without understanding the nature and content of
the item(s) experiencing the delay. The impact may not be limited
to the stalled member if it provides services to other applications
or subsystems in the sysplex. Failure to process this work in a timely
manner could account for delays or performance problems elsewhere
in the sysplex.
If multiple members appear to
be stalled, or if other indicators suggest work is not being processed,
check the status of the system because there may be an underlying
problem affecting them all.
In the message text:
- gnme
- The name of the XCF group whose member stalled.
- mnme
- The name of the stalled member.
- jnme
- The name of the job.
- asid
- The hexadecimal ASID of the address space.
- sdate
- The date when XCF believes the member stalled.
- stime
- The time when XCF believes the member stalled.
- s#
- A number to help correlate other instances of message IXC431I
that are issued for the indicated member with regard to this stall.
Also appears in message IXC432I. In general this number is incremented
each time a new stall is detected for the member. However it can be
reset to zero if no stalls are detected for the member for a sufficiently
long time.
- r#
- A number to help indicate whether message IXC431I is being issued
or reissued for the same stall condition. Equals one when message
IXC431I is first issued for a stall, and incremented each time IXC431I
is reissued with new data.
- adate
- The date when a signal exit most recently completed. Blank if
no signal exit ever completed.
- sitime
- The time when a signal exit most recently completed. Blank if
no signal exit ever completed.
- siexit
- The number of stalled signal exit routines.
- swork
- The number of signal work items queued for processing by or on
behalf of the indicated member. These items include messages to be
delivered to the member, notifications to be presented to the member,
and internal XCF signaling related requests that need to be processed
in the member address space.
- gdate
- The date when a group exit most recently completed. Blank if no
group exit ever completed.
- gtime
- The time when a group exit most recently completed. Blank if no
group exit routine ever completed.
- gnexit
- The number of stalled group exit routines.
- gwork
- The number of group work items queued for processing by or on
behalf of the indicated member. These items include events that are
to be presented to the member.
- stdate
- The date when a status exit most recently completed. Blank if
no status exit routine ever completed or when the member does not
have a status exit.
- sttime
- The time when a status exit most recently completed. Blank if
no status exit routine ever completed or when the member does not
have a status exit.
- stexit
- The number of stalled status exit routines.
System action
XCF continues to monitor the situation. If the
stalled condition persists, but other items are being successfully
processed, XCF periodically reissues message IXC431I with updated
information. XCF may issue abend X'00C' reason X'020F0006' to
initiate internal XCF self verification and other actions to address
the situation. The abend does not directly impact the stalled application
in any way. If an internal XCF problem is discovered, a dump will
be taken. An entry in logrec is made to document the situation even
if no dump is taken. Message IXC432I is issued if the stalled member
resumes normal processing or terminates.
Operator response
This message is issued to the system log so
no operator response is expected. If through customer action, the
message is rerouted to an operator console, the operator should
monitor the situation. If there does not seem to be any detrimental
impact, no further action may be needed. Use DISPLAY XCF,GROUP,grpname,membername to
get detailed information about the stalled member of group grpname named membername.
Message IXC333I provides status information about the member and
indicates what work appears to be stalled.
There may be other
commands provided by the indicated application/subsystem that will
allow you to determine its status and/or alleviate the problem. If more than one member is impacted, there may be an
underlying system problem affecting them all. If so, investigate the
status of the system at large. At the direction of the system
programmer, you may need to obtain dumps for problem diagnosis and/or
terminate the indicated application.
XCF monitors its own internal
use of the XCF signalling service and may issue message IXC431I
if XCF itself appears to be stalled. However, the DISPLAY XCF,GROUP
command cannot be used to investigate such stalls since the command
does not support the internal XCF group.
System programmer response
Check the status of the stalled application/subsystem. If multiple members appear to be stalled, or if other
indicators suggest work is not being processed, there may be an underlying
problem affecting them all. If so, a broader system diagnosis my be
warranted because the impacted members may not be at fault. On
many occasions the system will successfully resolve the situation
during the course of normal processing, in which case no further
action is warranted. If necessary, take appropriate action to correct
the situation or cancel/terminate the application. Before terminating
the application, issue DISPLAY XCF,GROUP,grpname,ALL
and any relevant application display, then collect the following diagnostic
information: system log, application log, and an appropriate dump.
In addition to application specific diagnostic data, the dump should
include XCF data (SDATA=COUPLE). Then using its normal shut down procedure,
terminate the application.
Source
Cross System Coupling Facility (SCXCF)
Module
Routing code
Descriptor code