IBM Support

IBM AIX: TCP keepalive probes

Question & Answer


Question

What are tcp keep alive probes used for and what parameters control them.

Answer

TCP keepalive probes provide a method to remove dead sockets and notify applications of unresponsive peers across a TCP connection.

While terminating or killing a program causes a FIN packet or possibly a RST packet to be sent, a system crash, hard reboot or network outage does not generate any packets. Applications could therefore wait indefinitely on a remote peer that has crashed.  For example, if a telnet connection is created and then left idle, there is no further exchange of data and so there would be no indication of a network failure or peer problem.

While telnet provides a trivial example there are other cases where a remote response could take many minutes to arrive. A local application would not automatically detect a loss of access to the remote system and could wait indefinitely for a response that will never arrive.

The TCP keepalive facility can be used to address the issue of unresponsive peers by sending probes at the TCP layer, below the application.  This functionality can also help prevent firewalls or other network appliances from terminating idle connections that need to be kept open.

The option is enabled on a per-application basis by using the setsockopt() subroutine to set the socket option SO_KEEPALIVE to 1.  There is no option available to enable keepalive system-wide.  Many programs, such as telnetd, provide a way to enable or disable the TCP keepalive via command line arguments or configuration options.

TCP keepalive has three timer  options:

TCP_KEEPIDLE: How long to wait before sending out the first probe on an idle connection
TCP_KEEPINTVL: The frequency of keepalive packets after the first one is sent
TCP_KEEPCNT: The number of unanswered probes required to force closure of the socket

NOTE: The default values for the system are set using the "no" command, where the values are specified in half-second units whereas with the setsockopt() subroutine, the units are in seconds. These defaults can be overridden within the application
using the setsockopt() subroutine.

Below are two examples showing TCP keepalive with a simple client program that uses setsockopt() to set TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT.

---- TEST ONE ----

Severing a connection with keepalive active

In this test, a connection is established, a small amount of data is exchanged and then the connection becomes idle.  Network connectivity with the server is then severed in a way that prevents packets from reaching it and prevents intermediate network devices from responding that the host or network is unreachable (for example, the server is unplugged from its switch port rather than turning off a router).

With the options specified below, the initial interval between the end of data transmission and the first keepalive packet is approximately 40 seconds.  After that, because the first keepalive packet is not acknowledged, another keepalive packet is sent 20 seconds later.  After 5 such probes, the client aborts the connection with an RST packet at the 140 second mark. 

Even though the RST packet may not reach the remote system the tcp_keepalive values used in this test result in the socket being destroyed in 140 seconds. Also note this does not necessarily mean the application terminates.

> ----------- TCP Keepalive Test --------
> Creating TCP socket
> SO_KEEPALIVE is OFF
> Socket Connected
> Write and read to peer
> Enabling SO_KEEPALIVE
>    setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, 1, sizeof(optval)
> TCP_KEEPIDLE 7200
> TCP_KEEPINTVL 75
> TCP_KEEPCNT 8
> Changing tcp keepalive values:
> -----------------------------
> TCP_KEEPIDLE changed to: 40
> TCP_KEEPINTVL changed to: 20
> TCP_KEEPCNT changed to: 5
> Write and read data to peer
> Idling for 240 seconds


---- iptrace summary of activity ----
    Time since
No. previous pkt  src-port dst-port Info
1   0.000000000   44539    9300     [SYN]
2   0.000110003   9300     44539    [SYN, ACK]
3   0.000004840   44539    9300     [ACK]
4   0.000032719   9300     44539    [TCP Window Update]
5   0.000026137   44539    9300     [PSH, ACK]
6   0.000449679   9300     44539    [PSH, ACK]
7   0.000255969   44539    9300     [PSH, ACK]
8   0.000063309   9300     44539    [PSH, ACK]
9   0.012850326   44539    9300     [ACK]
24 39.660387502   44539    9300     [TCP Keep-Alive]   <<<< KEEPIDLE
31 20.004177791   44539    9300     [TCP Keep-Alive]   <<<< KEEPINTVL/KEEPCNT 1
32 20.010662734   44539    9300     [TCP Keep-Alive]   <<<< KEEPINTVL/KEEPCNT 2
33 20.001108748   44539    9300     [TCP Keep-Alive]   <<<< KEEPINTVL/KEEPCNT 3
34 20.001046118   44539    9300     [TCP Keep-Alive]   <<<< KEEPINTVL/KEEPCNT 4
35 20.000886781   44539    9300     [RST, ACK]         <<<< KEEPINTVL/KEEPCNT 5


------ TEST TWO -----

This is the same test except that the network link is not severed.  As no data is being transmitted, every TCP_KEEPIDLE seconds the client sends a keepalive probe packet and the server responds with a keepalive ACK packet each time.  Because of this, TCP_KEEPINTVL and TCP_KEEPCNT do not play a role unless and until the remote server ACKs stop arriving.

The test program does a normal exit and socket shutdown after 250 seconds (six probes).

> ----------- TCP Keepalive Test --------
> Creating TCP socket
> SO_KEEPALIVE is OFF
> Socket Connected
> Write and read to peer
> Enabling SO_KEEPALIVE
>    setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, 1, sizeof(optval)
> TCP_KEEPIDLE 7200
> TCP_KEEPINTVL 75
> TCP_KEEPCNT 8
> Changing tcp keepalive values:
> -----------------------------
> TCP_KEEPIDLE changed to: 40
> TCP_KEEPINTVL changed to: 20
> TCP_KEEPCNT changed to: 5
> Write and read data to peer
> Idling for 240 seconds


---- iptrace summary of activity ----
    Time since
No. previous pkt  src-port dst-port Info
1   0.000000000   44320    9300     [SYN]
2   0.000074207   9300     44320    [SYN, ACK]
3   0.000028210   44320    9300     [ACK]
4   0.000033899   9300     44320    [TCP Window Update]
5   0.000002898   44320    9300     [PSH, ACK]
6   0.000521559   9300     44320    [PSH, ACK]
7   0.000262672   44320    9300     [PSH, ACK]
8   0.000069935   9300     44320    [PSH, ACK]
9   0.191224633   44320    9300     [ACK]
19 39.619628860   44320    9300     [TCP Keep-Alive]     <<<<< KEEPIDLE
20  0.000043783   9300     44320    [TCP Keep-Alive ACK]
27 40.066520457   44320    9300     [TCP Keep-Alive]     <<<<< KEEPIDLE
28  0.000035262   9300     44320    [TCP Keep-Alive ACK]
32 40.001145349   44320    9300     [TCP Keep-Alive]     <<<<< KEEPIDLE
33  0.000070219   9300     44320    [TCP Keep-Alive ACK]
43 40.062791307   44320    9300     [TCP Keep-Alive]     <<<<< KEEPIDLE
44  0.000060748   9300     44320    [TCP Keep-Alive ACK]
51 40.009741752   44320    9300     [TCP Keep-Alive]     <<<<< KEEPIDLE
52  0.000041156   9300     44320    [TCP Keep-Alive ACK]
59 40.000299006   44320    9300     [TCP Keep-Alive]     <<<<< KEEPIDLE
60  0.000035119   9300     44320    [TCP Keep-Alive ACK]
61  0.048555097   44320    9300     [FIN, ACK]           << NORMAL PROGRAM EXIT
62  0.000077215   9300     44320    [ACK]
63  0.000111860   9300     44320    [FIN, ACK]
64  0.000001578   44320    9300     [ACK]

 

Additional Information

 

SUPPORT:

If additional assistance is required after completing all of the instructions provided in this document, please follow the step-by-step instructions below to contact IBM to open a case for software under warranty or with an active and valid support contract.  The technical support specialist assigned to your case will confirm that you have completed these steps.

a.  Document and/or take screen shots of all symptoms, errors, and/or messages that might have occurred

b.  Capture any logs or data relevant to the situation.

c.  Contact IBM to open a case:

   -For electronic support, please visit the IBM Support Community:
     https://www.ibm.com/mysupport
   -If you require telephone support, please visit the web page:
      https://www.ibm.com/planetwide/

d.  Provide a good description of your issue and reference this technote

e.  Upload all of the details and data to your case

   -You can attach files to your case in the IBM Support Community
   -Or Upload data to IBM testcase server analysis:

    http://www.ibm.com/support/docview.wss?uid=ibm10733581

Click here to submit feedback for this document.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
22 July 2020

UID

ibm10886355