A fix is available
APAR status
Closed as program error.
Error description
[Problem] After applying FP1, some jobs have intermittently the following FATAL errors. (since I translated the Japanese message to English, the wording might not be accurate...) Jobs were running successfully before applying FP1. Item #: 22 Event ID: 153 Timestamp: 2010-01-26 17:32:26 Type:FATAL Username: dsadm Message ID: IIS-DSEE-TFIO-00231 Message: /gpf/data/mid/GPF_DS_CD/GPFDSU1200_2.ds,1: Configured timeout of 600 seconds reached for accepting player connections for pid 13,636. Pending fifo count: 0. Pending shared memory count: 1. This is most likely due to the failure of an upstream operator. Item #: 23 Event ID: 154 Timestamp: 2010-01-26 17:32:26 Type: FATAL Username: dsadm Message ID: IIS-DSEE-TFPM-00123 Message: /gpf/data/mid/GPF_DS_CD/GPFDSU1200_2.ds,1: Fatal Error: Cannot start ORCHESTRATE network connection on Node node2 (gpfds). APT_PMConnectionSetup::acceptConnection: Cannot accept the connection. [Additional info.] - The same issue happens on several jobs. - This happens intermittently. Some times the job aborts but some times the job finishes without any problem even though the same job and same data is used. - The error message shows 600sec Timeout, but it does not take 600 sec. when the issue happens. - If the number of node is 1, the issue does not happen even if he tries to test 10 times. But the issue happens when the number of node is more than 2. - now I'm confirming if there is any change on the system around when applying FP1. - I'm requesting the job design by using which it is possible to reproduce the issue.
Local fix
Problem summary
When using multi-node APT_CONFIG_FILE, a job or jobs may abort with following error even the time interval is much less than 10 minute (600 seconds.) Message ID: IIS-DSEE-TFIO-00231 Message: <the-stage-name with node-number>: Configured timeout of 600 seconds reached for accepting player connections for pid <the-pid>. Pending fifo count: 0. Pending shared memory count: 1. This is most likely due to the failur of an upstream operator.
Problem conclusion
Install the patch.
Temporary fix
Using 1 node configuration file.
Comments
APAR Information
APAR number
JR35910
Reported component name
WIS DATASTAGE
Reported component ID
5724Q36DS
Reported release
810
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2010-03-14
Closed date
2011-05-13
Last modified date
2011-05-13
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WIS DATASTAGE
Fixed component ID
5724Q36DS
Applicable component levels
R810 PSY
UP
R850 PSY
UP
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSVSEF","label":"InfoSphere DataStage"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.1","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Document Information
Modified date:
12 October 2021