Product Documentation
Abstract
This describes how to diagnose FTP timeouts when transferring data to and from the z/OS FTP server.
Content
FTP timeouts can occur during data transfer for the z/OS FTP server. In most cases, this problem will be reported through messages in the FTP client. Although we cannot anticipate what those messages might be, they will most likely indicate a connection has been reset, a connection was interrupted or timed out, or that the server is not responding.
Before you begin:
1. Be sure the following z/OS FTP server timer default values are set to these defaults or higher in the FTP.DATA configuration file. If you need assistance identifying which FTP.DATA the FTP server is using, the hierarchical search order can be found in the topic called TCP/IP configuration data sets in the IP Configuration Reference manual.
- DATACTTIME 300
DCONNTIME 120
INACTTIME 300
3. If you have a checkpoint firewall, make sure you have turned off the FTP enforce new line setting.
- 3.1 Edit the $FWDIR/lib/base.def
3.2 Change (comment out) the following line:
- #define FTP_ENFORCE_NL
to:
//#define FTP_ENFORCE_NL
Steps for resolving FTP timeouts during data transfer for the server :
1. Before retrying the failing data transfer, start the packet trace facility. Detailed instructions for doing this can be found in the Technote How to Collect PKTTRACE and CTRACE on z/OS. If this is a high-volume system, be sure to reference the section on "Packet Trace Parameters" on how to filter tracing based on IP addresses and ports.
2. Obtain an FTP server trace with options BAS, TIM, and FLO. You can do this by issuing the following MVS MODIFY console command before starting the failing data transfer (note that FTP_servername is simply the jobname under which the FTP server is running, such as FTPD1 or FTPSERV):
- F FTP_servername,DEBUG=(FLO,BAS)
3. Check for the output from the trace in SYSLOGD (see the Technote MustGather: Collect Troubleshooting Data, Read First: Syslogd for z/OS Communications Server for a SYSLOGD How-To Guide).
4. Re-create the problem and deactivate the traces. To deactivate the trace, issue MVS console command:
- F FTP_servername,DEBUG=(none)
- 5.1 The IP address to which the data connection will be opened is u.v.w.x.
5.2 The port to which it will be opened can be calculated using the formula: (y x 256) + z.
5.3 At this point, you have identified the timestamp, the IP address, and the port to which the data connection will be opened.
- CTRACE COMP(SYSTCPDA) LOCAL FULL OPTIONS((SESSION))
7. Find the summary that represents the failing connection by looking for an FTP connection which encompasses the time stamp from the client. Use the combination of the port, the IP address, and the timestamp to ensure you have the correct summary. Here is a sample of what you will see:
Local Ip Address: x.x.x.x Remote Ip Address: y.y.y.y Host: Local, Remote Client or Server: Unknown, Unknown Port: 21, 1095 Application: ftp,, Link speed (parm): 10, 10 Megabits/s Connection: First timestamp: 2007/08/24 10:21:54.688167 Last timestamp: 2007/08/24 11:46:16.591480
8. Scroll down to the bottom of the summary portion and review the count of different types of packets received during this connection. Pay particular attention to duplicate ACKs, Zero Window sizes, Retransmissions, Out of order Packets, and Fragments.
Packets: 97, 109 (x) Untraced Packets: 3, 0 (.) In-order data: 52, 33 (53.60%), (30.27%) (a) Acknowledgments: 20, 40 (20.61%), (36.69%) (+) Data and ACK: 23, 30 (23.71%), (27.52%) (u) Duplicate ACKs: 0, 0 ( 0.00%), ( 0.00%) (w) Window size updates: 1, 0 ( 1.03%), ( 0.00%) (z) Zero window sizes: 0, 0 ( 0.00%), ( 0.00%) (p) Window probes: 0, 0 ( 0.00%), ( 0.00%) (k) Keepalive segments: 0, 0 ( 0.00%), ( 0.00%) (r) Retransmissions: 0, 0 ( 0.00%), ( 0.00%) (o) Out-of-order: 1, 2 ( 1.03%), ( 1.83%) (d) Delayed ACKs: 0, 4 ( 0.00%), ( 3.66%) (f) Fragments: 0, 0 ( 0.00%), ( 0.00%) (!) Dropped: 0, 0 ( 0.00%), ( 0.00%)
9. If any of those have significantly higher percentages for inbound data, then the problem resides on the network, and further diagnostics will be required outside of the mainframe. Even if you see evidence of problems outside of the mainframe, continue with step 6 to investigate possible scenarios.
10. Scroll down past the summary to the individual descriptions and examine the approximate timestamp obtained from the trace.
11. Some of the things you can look for include:
- A large value in the delta time column.
TcpHdr IO F Seq Ack RcvWnd Data Delta Time TimeStamp 1 S I 2499151066 0 64512 0 0.000000 14:23:50.221695 2 A S O 1632413670 2499151067 65535 0 0.000027 14:23:50.221722 3 A S O 1632413670 2499151067 65535 0 2.943323 14:23:53.165045 4 S I 2499151066 0 64512 0 0.002726 14:23:53.167771 5 S I 2499151066 0 64512 0 5.919115 14:23:59.086886
This shows a timeout attempting to establish a connection. However, similar sequences may also occur during mid-transfer. In packet 1, we see the remote host attempting to open a connection by sending the local host a SYN packet. The local side correctly sends the ACK SYN response in packet 2. The next packet should be an ACK sent from the remote host. Because this never arrives, the local host retransmits the ACK SYN in packet 3 (2.94 seconds later). Packets four and five are retransmissions of the initial SYN the remote host sent in packet 1. This indicates that the local host's ACK SYNs never arrived at the remote host, and that there may be routing issues between the two hosts, or a firewall/router in between the two hosts that is dropping the responses.
- A capital R indicating reset in the TcpHdr column.
Example 1
TcpHdr IO F Seq Ack RcvWnd Data Delta Time TimeStamp 1 S I 965574368 0 368640 0 0.000000 10:44:43.727489 2 A S O 2501664216 965574369 368640 0 0.000125 10:44:43.727614 3 A I u 965574369 2501664217 368640 0 0.000819 10:44:43.728433 4 A O . 2501664217 965574369 368640 8940 0.268255 10:44:43.996688 5 AP O . 2501673157 965574369 368640 3695 0.000140 10:44:43.996828 6 AP F O ? 2501676852 965574369 368640 0 0.002937 10:44:43.999765 7 AP O r 2501664217 965574369 368640 8940 2.191677 10:44:46.191442 8 AP O r 2501664217 965574369 368640 8940 4.438225 10:44:50.629667 9 AP O r 2501664217 965574369 368640 8940 8.958191 10:44:59.587858 10 AP O r 2501664217 965574369 368640 8940 17.822049 10:45:17.409907 11 AP O r 2501664217 965574369 368640 8940 36.668251 10:45:54.078158 12 R O 2501664217 965574369 0 0 72.829141 10:45:54.078158
Example 1 shows a remote host failing to acknowledge data sent by the local host. The connection is typical up until packet 7. Packet 7 should show an ACK for the 12635 bytes of data, plus the FIN sent on packet 6. Instead, nothing arrives from the remote host, and the data is retransmitted on packets 7 through 11. Eventually, the local host assumes an error has occurred and resets the connection.
Example 2
TcpHdr IO F Seq Ack RcvWnd Data Delta Time TimeStamp 1 A I . 1428014775 99166319 64856 524 0.015828 10:07:53.533832 2 A I . 1428015299 99166319 64856 24 4.941343 10:07:58.475175 3 A O d 99166319 1428015323 262144 0 0.261318 10:07:58.736493 4 A I . 1428015323 99166319 64856 524 0.055756 10:07:58.792249 5 A I . 1428015847 99166319 64856 524 0.000005 10:07:58.792254 6 A I . 1428016371 99166319 64856 292 4.960406 10:08:03.752660 7 A O d 99166319 1428016663 262144 0 0.261318 10:07:58.736493 8 R O d 99166319 1428016663 262144 0 73.878330 10:09:17.630990
In Example 2, the connection proceeds normally until packet 8. Note that in packet six, the amount of data sent by the remote host (292 bytes) is lower than the 524 bytes sent on most of the other packets. This usually indicates that there is no more data left to be sent. On packet 7, the local host ACKs the last data sent. However, it can do nothing more until either the remote host sends more data or it sends a FIN packet to indicate that all of the data has been transferred. Because the remote host does neither, it resets the connection in packet 8.
- A lower-case Z in the F (flag) column indicating that the window size has gone to zero.
TcpHdr IO F Seq Ack RcvWnd Data Delta Time TimeStamp 1 A I 3809785819 2800163550 27768 0 0.000021 10:36:57.019183 2 A O 2800164998 3809785819 65535 1448 0.000048 10:36:57.019231 3 A O 2800173686 3809785819 65535 1448 0.000001 10:36:57.019232 4 A O 2800175134 3809785819 65535 1448 0.000007 10:36:57.019239 5 A I a 3809785819 2800191062 256 0 0.000035 10:36:57.019274 6 A O 2800163550 3809785819 65535 1448 0.000551 10:36:57.019825 7 A O 2800164998 3809785819 65535 12 0.000005 10:36:57.019830 8 A I 3809785819 2800191062 256 0 0.000029 10:36:57.019859 9 A O o 2800191062 3809785819 65535 1448 0.000032 10:36:57.019891 10 A O . 2800192510 3809785819 65535 1448 0.000203 10:36:57.020094 11 A I a 3809785819 2800191318 0 0 0.000029 10:36:57.020123 12 A O . 2800193958 3809785819 65535 1448 0.000032 10:36:57.020155 13 A I z 3809785819 2800191318 0 0 0.000016 10:36:57.020171 14 A O . 2800195406 3809785819 65535 1448 0.000128 10:36:57.020299 15 A I z 3809785819 2800191318 0 0 0.000021 10:36:57.020320 16 A O r 2800191318 3809785819 65535 1 0.260479 10:36:57.280799 .......lines 17 - 29 removed for brevity 30 R I z 3809785819 2800191318 0 0 25.544452 10:38:00.887886
This example shows a situation in which the remote side has set its window size to zero. The window size tells the local host how much unacknowledged data it can send at once. In packets five and eight, we see that the remote host has a window size of 256 bytes. However, in packet eleven, this drops to zero. Starting in packet 16, the local host begins sending 1 byte probes to see if the window size has increased. This will continue indefinitely until the remote host increases its window size or the connection is otherwise terminated. In this example, the remote side ultimately RESETs the connection in packet 30.
- A lower-case O in the F (flag) column indicating that the remote host sent out of order data.
TcpHdr IO F Seq Ack RcvWnd Data Delta Time TimeStamp 1 AP O a 663518973 1680312427 65535 0 0.000094 21:53:58.470801 2 A I . 1680312427 663518973 64094 1460 0.000020 21:53:58.470821 3 A I . 1680313887 663518973 64094 1460 0.000093 21:53:58.470914 4 AP O a 663518973 1680315347 65535 0 0.000065 21:53:58.470979 5 A I . 1680315347 663518973 64094 1460 0.000254 21:53:58.471233 6 A I . 1680316807 663518973 64094 1460 0.000006 21:53:58.471239 7 AP O a 663518973 1680318267 65535 0 0.000145 21:53:58.471384 8 A I . 1680318267 663518973 64094 1460 0.000018 21:53:58.471402 9 A I o 1690049219 663518973 64094 1460 1.064534 21:53:59.535936 10 A I . 1690050679 663518973 64094 1460 0.000047 21:53:59.535983
In this example, packets are processed correctly on lines 1 through 8. Note that the remote host is sending 1460 byte packets to the local host, who is ACKing them correctly. However, on line nine the remote host sends an out-of-order packet. This can be identified by two methods. First, the F column shows an 'o' flag, meaning out-of-order. This can also be verified by reviewing the sequence numbers (Seq column) of the inbound packets. They progress as follows:
1680318267
1690049219
1690050679
Because the sequence numbers increment by the number of bytes sent on each, they should have progressed as:
1680318267
1680319727
1680321187
<...roughly 6661 more missing packets>
1690046299
1690047759
1690049219
1690050679
- The remote host stops responding to keep-alive packets.
TcpHdr IO F Seq Ack RcvWnd Data Delta Time 1 AP O + 3101947168 1513769269 65535 54 0.000318 2 A I a 1513769269 3101947222 65535 0 0.166097 3 AP O k 3101947221 1513769269 65535 0 185.834001 4 A I u 1513769269 3101947222 65535 0 0.001453 5 AP O u 3101947221 1513769269 65535 0 190.768437 6 A I u 1513769269 3101947222 65535 0 0.001386 7 AP O u 3101947221 1513769269 65535 0 190.735614 8 A I u 1513769269 3101947222 65535 0 0.001089 9 AP O u 3101947221 1513769269 65535 0 190.714878 10 A I u 1513769269 3101947222 65535 0 0.002175 11 AP O u 3101947221 1513769269 65535 0 190.763630 12 A I u 1513769269 3101947222 65535 0 0.001706 13 AP O u 3101947221 1513769269 65535 0 190.757661 14 AP O u 3101947221 1513769269 65535 0 76.292870 15 AP O u 3101947221 1513769269 65535 0 76.299600 16 AP O u 3101947221 1513769269 65535 0 76.299781 17 AP O u 3101947221 1513769269 65535 0 76.307884 18 AP O u 3101947221 1513769269 65535 0 76.296677 19 AP O u 3101947221 1513769269 65535 0 75.286330 20 AP O u 3101947221 1513769269 65535 0 75.281099 21 AP O u 3101947221 1513769269 65535 0 76.294458
This shows the events occurring on a control connection during a long data transfer. The data transfer itself completes normally. In order to keep the control connection from being timed out, the local TCP/IP stack sends keepalive packets. These are seen on lines 3, 5, 7, 9, 11, and 13. The remote stack responds correctly to these on lines 4, 6, 8, 10, and 12. However, it never responds to the keepalive sent on line 13. Instead, this packet is retransmitted with no response on lines 14 through 21. As a result, the local stack assumes that the remote host has dropped the connection, and closes its own connection. Note that no RESET is sent. In this situation, a RESET would be sent only if the remote host attempted to send another packet to the local stack.
12. Once you identify a problematic area in the trace, pay particular attention to both the IO and the Delta Time columns. IO indicates if the packet was inbound or outbound, and Delta Time indicates how much time has passed since the last packet was sent or received. The combination of these two fields can help determine whether or not the problem resides on the local server or on the remote client. For example, an inbound RESET might suggest a problem on the remote client. Conversely, an inbound RESET with a large delta time suggests that the remote client timed out waiting for a response from the server, and further investigation should be performed locally.
- 12.1 If you determine that the problem is occurring outside of the mainframe, additional investigation must be done on the remote client itself, or possibly on the network segments between the client and server.
12.2 If you determine that the problem is occurring on the mainframe, additional diagnostics must be done on the local TCPIP stack and the FTP server. Collect a new set of documentation including:
- A new client trace
A new SYSTCPDA
A dump of TCPIP and, if possible, the FTP client, with SDATA options
- SDATA=(ALLNUC,CSA,LSQA,SQA,PSA,RGN,SWA,TRT,LPA)
- OPTIONS=(TCP,INTERNET,PFS,SOCKET)
Again, you can find instructions for running SYSTCPIP and SYSTCPDA traces in the Technote How to Collect PKTTRACE and CTRACE on z/OS. With this set of documentation, contact the IBM support center.
Original Publication Date
22 January 2008
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
swg27011272