IBM Support

Triaging WebSphere Network Issues

Troubleshooting


Problem

WebSphere relies on the network infrastructure to establish connections between endpoints. Generally speaking, troubleshooting network related issues is outside the scope of WebSphere product support. WebSphere support can provide best-effort guidance to assist in the investigation with the goal of determining whether the issue is caused by WebSphere or by the underlying network components. When troubleshooting network issues, it is critical to engage the network administrator team early so they can lead the investigation. The techniques and tooling discussed in this article are not owned by IBM, so there may be limits to the feedback provided.

Symptom

Network issues can cause a variety of symptoms. The following errors suggest there may be an issue at the network layer. Expand the sections below for further details.
Hung Thread in SocketRead
Threads in Java are intended to be short-lived. WebSphere monitors the activity of threads to ensure they progress. A hung threads alert appears in the log when a thread is around for more than 10 minutes. To investigate hung threads, capture the performance MustGather for Windows, Linux, or AIX.
A hung thread alert will provide an exception stack which is useful in determining the execution path. If socketRead is observed, WebSphere (acting as a client) is waiting for the server endpoint to respond. For more information regarding socketRead, click here
[2/15/23 9:32:13:492 PDT] 00000023 ThreadMonitor W   WSVR0605W: Thread "WebContainer : 0" (00000021) has been active for 692481 milliseconds and may be hung.  There is/are 1 thread(s) in total in the server that may be hung.
  at java.net.SocketInputStream.socketRead0(Native Method)
  at java.net.SocketInputStream.read(SocketInputStream.java:129)
  at resourcename.net.ns.Packet.receive(Packet.java:283)
  at com.mycompany.myapp.MyClass.executeQuery(MyClass.java:123)
  ... many more lines in stack
LDAP Issues
LDAP connections are intended to be short-lived, completing within milliseconds. Depending on how LDAP is configured within WebSphere, the following settings determine how long to wait for a connection to be established.
When using Standalone LDAP, the timeout is determined by Java. If you do not specify a timeout value, WebSphere will wait indefinitely for the LDAP to respond. Typically the connection is terminated by some external component, such as a network firewall, once a keep-alive threshold is met.
javax.naming.NamingException: LDAP response read timed out
When using Federated Repositories, the timeout can be set on the performance panel in the WebSphere console. Context pooling settings can cause hung threads when using default settings.
CWWIM4520E The 'javax.naming.CommunicationException: sample.ibm.com:60002 [Root exception is java.net.SocketTimeoutException: Read timed out]
It is common to use a load-balanced IP address, which has many domain controllers. It can be helpful to isolate the configuration to an explicit domain controller to verify connectivity. The external MTR tool can be helpful to determine which hop on the network path is dropping packets. For more information on the use of this tool, click here.
Connection Refused
The "connection refused" message suggests that the network connection is being refused by the local network. Most commonly, this exception is due to the backend process not existing or having crashed, or firewall issues. Use external commands like telnet, netstat, or curl to verify connectivity to the host and port of concern. Depending on what component observes the failure, the message can take various forms.
WebSphere Managed Secure Connection
Caused by: java.net.ConnectException: Connection refused (Connection refused)
    at java.net.Socket.connect(Socket.java:643) 
    at com.ibm.ws.ssl.config.WSSocket.connect(WSSocket.java:236)
    ...
Application Connection Using IBM Java
Caused by: java.net.ConnectException: Connection refused (Connection refused)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:380)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:236)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:218)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
    at java.net.Socket.connect(Socket.java:666)
    at com.ibm.jsse2.av.connect(av.java:35)
    …
Federated Repositories (LDAP)
CWWIM4520E The 'javax.naming.CommunicationException: 10.16.152.35:389 [Root exception is java.net.ConnectException: Connection refused (Connection refused)]'
Database
Caused by: com.ibm.db2.jcc.am.DisconnectNonTransientConnectionException: [jcc][t4][2043][11550][4.28.11] Exception java.net.ConnectException: Error opening socket to server sample.ibm.com/10.11.xxx.yyy on port 50,000 with message: Connection refused (Connection refused). ERRORCODE=-4499, SQLSTATE=08001
Unknown Host
An "unknown host" error suggests that the Operating System is unable to identify the host at the networking layer. Typically, the workstation would be connected to the organizational DNS, which has an entry mapping the hostname alias to an explicit IP address. The mapping could also be defined in the local /etc/hosts file.
Caused by: java.net.UnknownHostException: sample.ibm.com
Java Cryptographic Failures
A "bad_record_mac" error suggests that invalid TLS padding was observed at the network layer. For more information, see page 22 of RFC5246.
javax.net.ssl.SSLException: Received fatal alert: bad_record_mac
    …
    at com.ibm.jsse2.as.unwrap(as.java:473)
    at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:4)
    at com.ibm.ws.ssl.channel.impl.SSLReadServiceContext.decryptMessage
        (SSLReadServiceContext.java:1231)
Java can encounter an integrity check failure during decryption due to network related data corruption. This is commonly seem with AEAD block ciphers like AES in GCM mode. For more information, click here.
javax.crypto.AEADBadTagException
    at com.ibm.crypto.provider.GCTRInHardware.gcm_ad(Unknown Source)
    at com.ibm.crypto.provider.aI.c(Unknown Source)
    at com.ibm.crypto.provider.AESGCMCipher.engineDoFinal(Unknown Source)
    at javax.crypto.Cipher.doFinal(Unknown Source)
    at com.ibm.jsse2.n.a(n.java:279)
    …
Java Socket Exceptions
Time Outs

Connection time outs can be thrown by various Java APIs, and generally suggest that a TCP connection has been open for a while with no traffic. The application has not closed the socket. This issue could be remedied by adding a "keep alive" mechanism within the application.
java.net.SocketException: Connection timed out (Read failed)
java.net.ConnectException: Connection timed out
Port Access Issues

Another common Java network exception can suggest a port contention issue. To investigate, review netstat output to ensure the port is listening and associated with the intended process.
java.net.SocketException: Resource temporarily unavailable
Connection Interruption

A "broken pipe" error suggests that the application attempted to write to a connection which has been closed by the other end. This means that something terminated the connection while it was in use, such as an idle timeout somewhere within the network path. This exception is unrecoverable so the application should perform any required cleanup actions (closing connection ,etc).
java.net.SocketException: Broken pipe (Write failed)

Diagnosing The Problem

There is no WebSphere specific trace for investigating network issues. When investigating, it is important to determine the source and destination host and port information. If other components are referenced in the exception stack, enable tracing for that component to get further details on the failure. Click here for more information regarding component MustGathers.
Review the messages observed in the WebSphere logs to identify the source and destination for the network failure. Capture TCP/IP data between the two endpoints of interest and review them with the network administrator. Checking for packet loss is a good first step. Using timestamps, correlate the activity between the TCP/IP data and the WebSphere errors. 

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB67","label":"IT Automation \u0026 App Modernization"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"ARM Category":[{"code":"a8m50000000Cd9BAAS","label":"WebSphere Application Server traditional-All Platforms-\u003EHang Performance CPU-\u003ENetwork issues"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
08 March 2024

UID

ibm17123993