IXC431I
text

Explanation

In the message, text is:
GROUP gnme MEMBER mnme JOB jnme ASID asid 
STALLED AT sdate stime ID: s#.r# 
LAST MSGX: adate sitime siexit STALLED swork PENDINGQ 
LAST GRPX: gdate gtme   gnexit STALLED gwork PENDINGQ
LAST STAX: stdate sttime stexit STALLED

The indicated XCF Group Member is not processing its XCF work in a timely manner. The processing of at least one work item appears to be stalled.

Note: It is very unlikely that the delays are caused by a problem in XCF.
Possible explanations include:
  • Contention problems in the user exit routine(s). Perhaps the exit routine is suspended waiting to obtain the local lock or a latch. DISPLAY GRS,C will identify latch contention.
  • SRBs not dispatched in a timely manner. Perhaps the member address space is swapped out or a dump is in progress. Perhaps the dispatch priority of the member address space is too low. Perhaps a loop in some other work unit is consuming most of the CPU resource. The looping work unit need not be in the member address space. It could be in an address space other than those identified by the IXC431I message(s).
  • An influx of work has exceeded the processing capacity of the member or system. The influx may be a temporary spike that the system can work through with time. It could be the residual effect of some other problem that caused processing of an otherwise normal workload to be delayed.
  • Some other member or system in the sysplex is not processing its work in a timely manner. Although XCF may have identified the indicated member as stalled, the situation could be the result of sympathy sickness arising from processing delays elsewhere in the sysplex (which may or may not have been identified).
  • A member or system might be engaged in reconfiguration or recovery processes that must complete before normal processing can proceed. For example, a system may have just become active in the sysplex, a system may have just been removed from the sysplex, a member may be joining the group, a member may be leaving the group, or some other application specific processes may be running.
  • The user exit routine may have a coding error in which it returns to the dispatcher instead of returning to XCF. One would expect this situation to occur only when testing a new application that exploits XCF services.

It may not be possible to determine the impact to the application without understanding the nature and content of the item(s) experiencing the delay. The impact may not be limited to the stalled member if it provides services to other applications or subsystems in the sysplex. Failure to process this work in a timely manner could account for delays or performance problems elsewhere in the sysplex.

If multiple members appear to be stalled, or if other indicators suggest work is not being processed, check the status of the system because there may be an underlying problem affecting them all.

In the message text:
gnme
The name of the XCF group whose member stalled.
mnme
The name of the stalled member.
jnme
The name of the job.
asid
The hexadecimal ASID of the address space.
sdate
The date when XCF believes the member stalled.
stime
The time when XCF believes the member stalled.
s#
A number to help correlate other instances of message IXC431I that are issued for the indicated member with regard to this stall. Also appears in message IXC432I. In general this number is incremented each time a new stall is detected for the member. However it can be reset to zero if no stalls are detected for the member for a sufficiently long time.
r#
A number to help indicate whether message IXC431I is being issued or reissued for the same stall condition. Equals one when message IXC431I is first issued for a stall, and incremented each time IXC431I is reissued with new data.
adate
The date when a signal exit most recently completed. Blank if no signal exit ever completed.
sitime
The time when a signal exit most recently completed. Blank if no signal exit ever completed.
siexit
The number of stalled signal exit routines.
swork
The number of signal work items queued for processing by or on behalf of the indicated member. These items include messages to be delivered to the member, notifications to be presented to the member, and internal XCF signaling related requests that need to be processed in the member address space.
gdate
The date when a group exit most recently completed. Blank if no group exit ever completed.
gtime
The time when a group exit most recently completed. Blank if no group exit routine ever completed.
gnexit
The number of stalled group exit routines.
gwork
The number of group work items queued for processing by or on behalf of the indicated member. These items include events that are to be presented to the member.
stdate
The date when a status exit most recently completed. Blank if no status exit routine ever completed or when the member does not have a status exit.
sttime
The time when a status exit most recently completed. Blank if no status exit routine ever completed or when the member does not have a status exit.
stexit
The number of stalled status exit routines.

System action

XCF continues to monitor the situation. If the stalled condition persists, but other items are being successfully processed, XCF periodically reissues message IXC431I with updated information. XCF may issue abend X'00C' reason X'020F0006' to initiate internal XCF self verification and other actions to address the situation. The abend does not directly impact the stalled application in any way. If an internal XCF problem is discovered, a dump will be taken. An entry in logrec is made to document the situation even if no dump is taken. Message IXC432I is issued if the stalled member resumes normal processing or terminates.

Operator response

This message is issued to the system log so no operator response is expected. If through customer action, the message is rerouted to an operator console, the operator should monitor the situation. If there does not seem to be any detrimental impact, no further action may be needed. Use DISPLAY XCF,GROUP,grpname,membername to get detailed information about the stalled member of group grpname named membername. Message IXC333I provides status information about the member and indicates what work appears to be stalled.

There may be other commands provided by the indicated application/subsystem that will allow you to determine its status and/or alleviate the problem. If more than one member is impacted, there may be an underlying system problem affecting them all. If so, investigate the status of the system at large. At the direction of the system programmer, you may need to obtain dumps for problem diagnosis and/or terminate the indicated application.

XCF monitors its own internal use of the XCF signalling service and may issue message IXC431I if XCF itself appears to be stalled. However, the DISPLAY XCF,GROUP command cannot be used to investigate such stalls since the command does not support the internal XCF group.

System programmer response

Check the status of the stalled application/subsystem. If multiple members appear to be stalled, or if other indicators suggest work is not being processed, there may be an underlying problem affecting them all. If so, a broader system diagnosis my be warranted because the impacted members may not be at fault. On many occasions the system will successfully resolve the situation during the course of normal processing, in which case no further action is warranted. If necessary, take appropriate action to correct the situation or cancel/terminate the application. Before terminating the application, issue DISPLAY XCF,GROUP,grpname,ALL and any relevant application display, then collect the following diagnostic information: system log, application log, and an appropriate dump. In addition to application specific diagnostic data, the dump should include XCF data (SDATA=COUPLE). Then using its normal shut down procedure, terminate the application.

Source

Cross System Coupling Facility (SCXCF)

Module

IXCS1DCM

Routing code

2, 10

Descriptor code

12