SQL30081N TCP/IP communication errors

SQL30081N TCP/IP communication protocol errors and recommended solutions are presented.

Cause

The cause of communication protocol errors can vary based on your platform. Each protocol error has its own definition and corresponding action plan.

The SQL30081N error message has the following format:

A communication error has been detected. Communication protocol being used: protocol. 
Communication API being used: interface. 
Location where the error was detected: location. 
Communication function detecting the error: function. 
Protocol specific error code(s): rc1, rc2, rc3.

Example

SQL30081N A communication error has been detected. Communication protocol
being used: "TCP/IP". Communication API being used: "SOCKETS". Location
where the error was detected: "". Communication function detecting the error: "connect".
Protocol specific error code(s): "111", "*", "*".

These error messages are returned when Db2® calls operating system socket APIs that receive an error message from a component outside of Db2. These socket errors are propagated back up to Db2, which encapsulates the error inside the SQL30081N message. The root cause is outside of Db2, which might be in the client/server network stack or network device that is situated between them. Have a network administrator collect and analyze network traces from both the client and server sides to determine the root cause.

Solutions

Table 1 lists protocol-specific errors that occur on different platforms and the corresponding action plans to resolve these errors. If an error code is not listed in the table, search the operating system documentation.
  • For Linux and AIX systems, go to /usr/include/errno.h.
  • For Windows systems, see System Error Codes .
For non-Java application keywords, reference the following materials:
  • db2cli.ini keywords, used by CLI/ODBC applications.

  • db2dsdriver.cfg keywords, used by Windows applications that use the IBM .NET provider.

  • DB2 Registry variables, under the Communications group.

  • FAQ JDBC ERRORCODE=-4499 connectivity, used by Java applications

    Table 1. TCP/IP-related errors by platform
    Windows AIX Linux Short Name Action Plan
    10061 79 111 ECONNREFUSED Connection Refused The client attempts to establish a connection to the server by using an invalid IP or port.
    On the server side, check that the following conditions exist:
    • Db2 environment variable DB2COMM is set such as: DB2COMM=TCPIP
    • DBM CFG's SVCENAME is set to the instance's port number or service name. The command to update this parameter is db2 update dbm cfg using svcename <port/service name>
    • If service name is set, check the 'services' file to see whether the name corresponds to an unused port number.
    • Make sure that the Db2 server instance is started properly.
    On the client side, check the node directory entry and that the following conditions exist:
    • Service name shows the right port number or service name that corresponds to Db2 server's instance port (svcename setting)
    • To check whether server's port is opened: telnet <hostname> <port>
    • If the command fails, then the port on server is not opened and the problem is outside of Db2 area.

    For more information about ECONNREFUSED, see

    http://www.ibm.com/support/docview.wss?rs=71&uid=swg21328644
    10053 72 103 SOCECONNABORTED Software caused a connection to end. Software closed the connection.
    If the error is reported on a client application that uses ODBC/CLI to connect to a Db2 server, complete the following steps:
    1. Disable the Db2 CLI timeout by adding QUERYTIMEOUTINTERVAL=0 to the db2cli.ini file on the client's side.
    2. Check whether the application has any timeout, for example, ADO timeout or VB timeout.
    3. If application connects to OS390 server, check idlethreadtimeout parameter (IDTHTOIN) on OS390.
      Note: This parameter sets the active thread timeout limit on OS390.
    10054 73 104 ECONNRESET Connection is reset by a partner. A connected partner closes the connection.

    Check any timeout limit on partner side, such as firewall, application, Db2 CLI layer.

    If the error is reported on a client application that uses ODBC/CLI to connect to the Db2 server, complete the following steps:
    1. Disable the Db2 CLI timeout.
    2. Add QUERYTIMEOUTINTERVAL=0 to the db2cli.ini file on the client's side.
    3. Check whether a firewall exists between the client and the server.
    4. If it has any time limit on open connections, check if the application has any timeout, such as ADO timeout or VB timeout.
    Note: Error SQL30081 can occur when you are connecting to a database by using a TCPIP connection. If the database connection is local and is cataloged by using a different alias name than the database name, the error can occur. If you get the SQL30081 error, make sure that the database is not cataloged by using a different alias name than the name of the database on the server.
    10060 78 110 ETIMEDOUT Connection timeout A connection reaches the network timeout limit and is disconnected by the network.

    The timeout is initiated by TCP/IP layer.

    TCP/IP has a timeout value. If the open connection stayed too long, TCP/IP forces the connection off.

    In many cases, this disconnection is a network issue.

    Try the following workarounds:
    1
    10048 67 98 EADDRINUSE The specified address is already in use. The error is commonly seen in the following situations:
    • When two instances that are starting on the same server are listening on the same port. (This error would commonly trap on db2start).
    • When a client application or agent is connecting to a database by using a socket that is being used by another connection.
    • When a client application or agent is connecting to a database by using a socket that is in the wait state (2MSL state).
    This error commonly happens to Windows clients.

    This error is a Microsoft error. Winsock provides a port that is already in use (Winsock defect) or is closed but still waiting in the wait state.

    Try the following workaround for Windows:
    • Adjust the length of time that a socket sits in wait state after being closed (default is two minutes)

      TcpTimedWaitDelay

      2

    • Adjust the number of ports available (default is 5000)

      MaxUserPort

      3

    • Adjust the connect and disconnect settings so that they do not cycle rapidly in the program (best solution). 10048 is most often caused by rapid connection and disconnection logic in the application, which puts too many ports in the time_wait state (2MSL). Reusing the connection handle when an application is issuing multiple statements is the best way of handling this situation (do not disconnect then reconnect every time that a statement completes)
    • Implement client-side connection pooling so that the application logic internally does not have to change. Make sure that the pool is large enough to handle 80% of the connections. Make sure that the pool has some form of reconnect logic for a disconnect-while-idle situation.
    10055 74 105 ENOBUFSNo buffer space available This error occurs when the system runs out of resources to complete the TCP/IP call.

    On Windows systems, the problem is caused from running out of desktop heap or system page table entries. It is not Db2 related.

    To solve this issue, increase the Windows SystemPages registry entry.

      32 32 EPIPE (Broken Pipe)

    Network connectivity problem between the client and the server.

    To solve this issue, run a network sniffer trace on both the client and the server.

    10004 (WSAEINTR)       The connection is closed by the client. For Db2 see technote
    10065 81 113 WSAEHOSTUNREACH

    No route to the host exists.

    For Windows clients that are attempting to connect to a Linux server, unset the firewall on the Linux server to allow connections to go through.

In some situations, no return code is returned.
Symptom 1
SQL30081N A communication error has been detected. Communication protocol being used 'TCP/IP'. Communication API being used 'SOCKETS'. Location where error was detected "192.168.1.200'. Communication function detecting the error 'SelectForConnectTimeout'. Protocol specific error code '0','*','*'. SQLSTATE=08001.
Possible cause
192.168.1.200 is a virtual IP that is also assigned to other devices. This assignment causes intermittent SQL30081N error messages. Other issues might generate this error message.
Symptom 2
SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "192.168.1.200". Communication function detecting the error: "recv".Protocol specific error code(s): "*", "*", "0". SQLSTATE=08001.
Possible cause

The error codes *,* and "0" indicate that the connection was closed by the peer. This peer might be any network device, such as a firewall, router or workload balancing device, between the client and Db2 server, or the Db2 server itself.

Disabling any network security and server security software that is running on the Db2 database server might resolve the issue. Multiple security products from different vendors might interact negatively and generate the *,*,0 error codes.

Check whether the Db2 server has Workload Manager (WLM) enabled. If WLM is enabled, check to see whether SQL queries that take longer than xx minutes are disconnected. This example shows a UOWTOTALTIME setting of 15 minutes.

db2 "select substr(workloadname,1,25) as workloadname,serviceclassname from syscat.workloads with UR"

WORKLOADNAME SERVICECLASSNAME
---------------- ------------------
SYSDEFAULTUSERWORKLOAD SYSDEFAULTSUBCLASS
SYSDEFAULTADMWORKLOAD SYSDEFAULTSUBCLASS
TEST_WL MAIN_SC

db2 "select workloadname,ENABLED from syscat.workloads with UR"

WORKLOADNAME ENABLED
-------------------- ----------
SYSDEFAULTUSERWORKLOAD Y SYSDEFAULTADMWORKLOAD Y
TEST_WL Y

CREATE THRESHOLD "TEST_UOW_CONC_TH"
FOR WORKLOAD TEST_WL ACTIVITIES
ENFORCEMENT DATABASE
WHEN UOWTOTALTIME > 15 MINUTES
COLLECT ACTIVITY DATA
ON COORDINATOR DATABASE PARTITION
WITH DETAILS
FORCE APPLICATION;

Also, check the Db2 server to determine whether any communications registry variables are set. Check whether any messages around the same timestamp exist in the db2diag.log file.

What to do next

If further action is needed to deal with TCP/IP errors, try the following tasks.

  • Collect Db2 trace results from both the Db2 client and Db2 server.
    Note: This task does not apply to Java applications that use a JDBC driver (db2jcc.jar or db2jcc4.jar). For Java applications, refer to Collecting Data: Tracing with the IBM Data Server Driver for JDBC and SQL J.
    On the client side, complete the following tasks:
    1. Run db2trc on -t -f ctrace.dmp -Madd SQLJC -Madd SQLJR -Madd SQLR -Madd SQLCC.
    2. Wait for the SQL30081 error to be reported.
    3. Run db2trc off.
    4. Format the traces:
      db2trc flw -t -wc ctrace.dmp ctrace.flw
      Note: Older versions of Db2 do not support "-wc" option. Remove this parameter on older versions.
      db2trc fmt ctrace.dmp ctrace.fmt
      db2trc fmt -c ctrace.dmp ctrace_drda.fmt
    On the server side, complete the following tasks:
    1. Run db2trc on -t -f strace.dmp -Madd SQLJC -Madd SQLJS -Madd SQLR -Madd SQLCC.
    2. Wait for the SQL30081 error to be reported.
    3. Run db2trc off.
    4. Format the traces:
      db2trc flw -t -wc strace.dmp strace.flw
      db2trc flw -t -wc strace.dmp strace.flw
      db2trc fmt strace.dmp strace.fmt
      db2trc fmt -c strace.dmp strace_drda.fmt
  • Have your network administrator collect and analyze network sniffer traces (Linux: tcpdump, Windows: Wireshark) from both the client and server sides. The results might help determine the root cause of the SQL30081 error, if the suggestions from the Db2 traces do not help. Db2 diagnostics do not provide visibility into operating system and network layers after the socket API provided by the operating system is called.
  • Run the stand-alone pct tool to test network connectivity in both client and server mode, outside of Db2. The pctt tool is located in the ~/sqllib/bin folder on Linux and AIX systems, and the C:\Program Files\IBM\SQLLIB\bin folder on (Windows) systems. The tool sets up separate network connections between the client and server for testing.
    Note: Specifying a large buffer size when in server mode (for example, pctt s /b 32767) might cause the listener to core dump. Use a smaller buffer size. This program might not reproduce any intermittent connectivity issues. Usage of pctt is documented in page 132 of the IBM Db2 Universal Database Troubleshooting Guide.

    Run the ping command that is included with Db2 to determine whether inconsistent latency issues exist.

    $ db2 ping dbName request 32767 5
    
    Elapsed time: 692983 microseconds
    
    Elapsed time: 419739 microseconds
    
    Elapsed time: 419867 microseconds
    
    Elapsed time: 210229 microseconds
    
    Elapsed time: 210103 microseconds
    
    $ db2 ping dbName response 32767 5
    
    Elapsed time: 1241 microseconds
    
    Elapsed time: 1217 microseconds
    
    Elapsed time: 1236 microseconds
    
    Elapsed time: 2575774 microseconds   <<<<< 2.5 seconds
    
    Elapsed time: 3526 microseconds
1

Db2 uses the TCP/IP connection KEEPALIVE option to detect whether a connection failure occurred. This option transmits a message periodically to determine whether the partner is still alive. If the partner fails to respond to this message, the connection is considered to be broken, and an error is returned.

2 This parameter determines the length of time that a connection stays in the TIME_WAIT state when they are being closed. While a connection is in the TIME_WAIT state, the socket pair cannot be reused. This state is also known as the 2MSL state because the value is approximately twice the maximum segment lifetime on the network. See RFC 793 for further details.
3 This parameter controls the maximum port number that is used when an application requests any available user port from the system. Normally, short-lived ports are allocated in the range from 1024 through 5000. Setting this parameter to a value outside of the valid range causes the nearest valid value to be used (5000 or 65534).