Question & Answer
Question
Answer
TCP keepalive probes provide a method to remove dead sockets and notify applications of unresponsive peers across a TCP connection.
While terminating or killing a program causes a FIN packet or possibly a RST packet to be sent, a system crash, hard reboot or network outage does not generate any packets. Applications could therefore wait indefinitely on a remote peer that has crashed. For example, if a telnet connection is created and then left idle, there is no further exchange of data and so there would be no indication of a network failure or peer problem.
While telnet provides a trivial example there are other cases where a remote response could take many minutes to arrive. A local application would not automatically detect a loss of access to the remote system and could wait indefinitely for a response that will never arrive.
The TCP keepalive facility can be used to address the issue of unresponsive peers by sending probes at the TCP layer, below the application. This functionality can also help prevent firewalls or other network appliances from terminating idle connections that need to be kept open.
The option is enabled on a per-application basis by using the setsockopt() subroutine to set the socket option SO_KEEPALIVE to 1. There is no option available to enable keepalive system-wide. Many programs, such as telnetd, provide a way to enable or disable the TCP keepalive via command line arguments or configuration options.
TCP keepalive has three timer options:
TCP_KEEPIDLE: How long to wait before sending out the first probe on an idle connection
TCP_KEEPINTVL: The frequency of keepalive packets after the first one is sent
TCP_KEEPCNT: The number of unanswered probes required to force closure of the socket
NOTE: The default values for the system are set using the "no" command, where the values are specified in half-second units whereas with the setsockopt() subroutine, the units are in seconds. These defaults can be overridden within the application using the setsockopt() subroutine.
Below are two examples showing TCP keepalive with a simple client program that uses setsockopt() to set TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT.
---- TEST ONE ----
Severing a connection with keepalive active
In this test, a connection is established, a small amount of data is exchanged and then the connection becomes idle. Network connectivity with the server is then severed in a way that prevents packets from reaching it and prevents intermediate network devices from responding that the host or network is unreachable (for example, the server is unplugged from its switch port rather than turning off a router).
With the options specified below, the initial interval between the end of data transmission and the first keepalive packet is approximately 40 seconds. After that, because the first keepalive packet is not acknowledged, another keepalive packet is sent 20 seconds later. After 5 such probes, the client aborts the connection with an RST packet at the 140 second mark.
Even though the RST packet may not reach the remote system the tcp_keepalive values used in this test result in the socket being destroyed in 140 seconds. Also note this does not necessarily mean the application terminates.
> ----------- TCP Keepalive Test --------
> Creating TCP socket
> SO_KEEPALIVE is OFF
> Socket Connected
> Write and read to peer
> Enabling SO_KEEPALIVE
> setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, 1, sizeof(optval)
> TCP_KEEPIDLE 7200
> TCP_KEEPINTVL 75
> TCP_KEEPCNT 8
> Changing tcp keepalive values:
> -----------------------------
> TCP_KEEPIDLE changed to: 40
> TCP_KEEPINTVL changed to: 20
> TCP_KEEPCNT changed to: 5
> Write and read data to peer
> Idling for 240 seconds
---- iptrace summary of activity ----
Time since
No. previous pkt src-port dst-port Info
1 0.000000000 44539 9300 [SYN]
2 0.000110003 9300 44539 [SYN, ACK]
3 0.000004840 44539 9300 [ACK]
4 0.000032719 9300 44539 [TCP Window Update]
5 0.000026137 44539 9300 [PSH, ACK]
6 0.000449679 9300 44539 [PSH, ACK]
7 0.000255969 44539 9300 [PSH, ACK]
8 0.000063309 9300 44539 [PSH, ACK]
9 0.012850326 44539 9300 [ACK]
24 39.660387502 44539 9300 [TCP Keep-Alive] <<<< KEEPIDLE
31 20.004177791 44539 9300 [TCP Keep-Alive] <<<< KEEPINTVL/KEEPCNT 1
32 20.010662734 44539 9300 [TCP Keep-Alive] <<<< KEEPINTVL/KEEPCNT 2
33 20.001108748 44539 9300 [TCP Keep-Alive] <<<< KEEPINTVL/KEEPCNT 3
34 20.001046118 44539 9300 [TCP Keep-Alive] <<<< KEEPINTVL/KEEPCNT 4
35 20.000886781 44539 9300 [RST, ACK] <<<< KEEPINTVL/KEEPCNT 5
------ TEST TWO -----
This is the same test except that the network link is not severed. As no data is being transmitted, every TCP_KEEPIDLE seconds the client sends a keepalive probe packet and the server responds with a keepalive ACK packet each time. Because of this, TCP_KEEPINTVL and TCP_KEEPCNT do not play a role unless and until the remote server ACKs stop arriving.
The test program does a normal exit and socket shutdown after 250 seconds (six probes).
> ----------- TCP Keepalive Test --------
> Creating TCP socket
> SO_KEEPALIVE is OFF
> Socket Connected
> Write and read to peer
> Enabling SO_KEEPALIVE
> setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, 1, sizeof(optval)
> TCP_KEEPIDLE 7200
> TCP_KEEPINTVL 75
> TCP_KEEPCNT 8
> Changing tcp keepalive values:
> -----------------------------
> TCP_KEEPIDLE changed to: 40
> TCP_KEEPINTVL changed to: 20
> TCP_KEEPCNT changed to: 5
> Write and read data to peer
> Idling for 240 seconds
---- iptrace summary of activity ----
Time since
No. previous pkt src-port dst-port Info
1 0.000000000 44320 9300 [SYN]
2 0.000074207 9300 44320 [SYN, ACK]
3 0.000028210 44320 9300 [ACK]
4 0.000033899 9300 44320 [TCP Window Update]
5 0.000002898 44320 9300 [PSH, ACK]
6 0.000521559 9300 44320 [PSH, ACK]
7 0.000262672 44320 9300 [PSH, ACK]
8 0.000069935 9300 44320 [PSH, ACK]
9 0.191224633 44320 9300 [ACK]
19 39.619628860 44320 9300 [TCP Keep-Alive] <<<<< KEEPIDLE
20 0.000043783 9300 44320 [TCP Keep-Alive ACK]
27 40.066520457 44320 9300 [TCP Keep-Alive] <<<<< KEEPIDLE
28 0.000035262 9300 44320 [TCP Keep-Alive ACK]
32 40.001145349 44320 9300 [TCP Keep-Alive] <<<<< KEEPIDLE
33 0.000070219 9300 44320 [TCP Keep-Alive ACK]
43 40.062791307 44320 9300 [TCP Keep-Alive] <<<<< KEEPIDLE
44 0.000060748 9300 44320 [TCP Keep-Alive ACK]
51 40.009741752 44320 9300 [TCP Keep-Alive] <<<<< KEEPIDLE
52 0.000041156 9300 44320 [TCP Keep-Alive ACK]
59 40.000299006 44320 9300 [TCP Keep-Alive] <<<<< KEEPIDLE
60 0.000035119 9300 44320 [TCP Keep-Alive ACK]
61 0.048555097 44320 9300 [FIN, ACK] << NORMAL PROGRAM EXIT
62 0.000077215 9300 44320 [ACK]
63 0.000111860 9300 44320 [FIN, ACK]
64 0.000001578 44320 9300 [ACK]
Additional Information
SUPPORT:
If additional assistance is required after completing all of the instructions provided in this document, please follow the step-by-step instructions below to contact IBM to open a case for software under warranty or with an active and valid support contract. The technical support specialist assigned to your case will confirm that you have completed these steps.
a. Document and/or take screen shots of all symptoms, errors, and/or messages that might have occurred
b. Capture any logs or data relevant to the situation.
c. Contact IBM to open a case:
-For electronic support, please visit the IBM Support Community:
https://www.ibm.com/mysupport
-If you require telephone support, please visit the web page:
https://www.ibm.com/planetwide/
d. Provide a good description of your issue and reference this technote
e. Upload all of the details and data to your case
-You can attach files to your case in the IBM Support Community
-Or Upload data to IBM testcase server analysis:
Related Information
Was this topic helpful?
Document Information
Modified date:
22 July 2020
UID
ibm10886355