IBM Support

Tracking TCP retransmissions on Linux

Troubleshooting


Problem

TCP retransmissions are a frequent cause of connection timeouts and throughput problems, but they are invisible to the application layer.  This document provides instructions for correlating TCP retransmissions with application errors in a way that is less resource intensive and easier to review compared to packet captures.

Symptom

Symptoms that may be a result of TCP retransmissions in WebSphere Application Server include the following:
  1. TCP connection timeouts between the WebSphere WebServer Plugin and the application server
  2. HTTP request delays or timeouts when transmitting large HTTP request bodies.
  3. HTTP response delays or timeouts between the WebSphere WebServer Plugin and the application server or between IBM HTTP Server and the browser

Diagnosing The Problem

To correlate TCP retransmissions with some other symptom, collect the following data contemporary with the original symptom
  1. Install your Linux distributions "bpftrace" package. This allows the port numbers of retransmitted packets to be efficiently logged.
  2. As root, invoke the following to start the lightweight capture of retransmissions:
    test -x /usr/sbin/tcpretrans.bt && TCPRETRANS=/usr/sbin/tcpretrans.bt
    test -x /usr/share/bpftrace/tools/tcpretrans.bt && TCPRETRANS=/usr/share/bpftrace/tools/tcpretrans.bt 
    
    OUT=/tmp/tcpretransmits.log
    
    if [ -z "$TCPRETRANS" ]; then
      echo "It looks like 'bpftrace' is not installed"
    else
      date > $OUT
      netstat -s |awk '/segments sent out$/ { R=$1; } /retransmitted$/ { printf("%.4f\n", ($1/R)*100); }' >> $OUT
      $TCPRETRANS | tee -a $OUT
      netstat -s |awk '/segments sent out$/ { R=$1; } /retransmitted$/ { printf("%.4f\n", ($1/R)*100); }' >> $OUT
    fi
    
    
  3. When you confirm at least one new instance of the symptom has been generated, interrupt the script with control-C.
  4. When submitting data to IBM, include the generated log file "/tmp/tcpretransmits.log"

Resolving The Problem

TCP retransmissions are almost exclusively caused by failing network hardware, not applications or middleware.  Report the failing IP pairs to a network administrator.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEQTJ","label":"IBM HTTP Server"},"ARM Category":[{"code":"a8m50000000Cd10AAC","label":"IHS"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
01 August 2022

UID

ibm16480315