IBM Support

WebSphere Application Server Hang in Federated Repository LDAP when using the default settings

Troubleshooting


Problem

When a federated repository is configured with a Lightweight Directory Access Protocol (LDAP), after an idle period, subsequent attempts to log in trigger hung threads or report a timeout error.  Users can find themselves frozen at the login prompt or find they cannot log in at all. 

It might take 10 - 15 minutes before a user can complete a successful login, where it would appear as if the issue cleared itself up.

Symptom

  • After a long period of idle time where no login activity occurs, access to an application becomes hung. But after a user connection is successful (10 - 15 minutes of hanging might elapse during login), everything is working again.
  • A firewall is configured between the application server (or deployment manager) and the LDAP, and is configured with an idle timeout
  • Default settings are in use for the LDAP configuration
    • The "context pooling times out" setting is configured to 0 by default
  • Context Pooling applies to the Federated Repository only, not the stand-alone LDAP

Cause

Due to the way the default settings for the LDAP configured in a Federated Repository in WebSphere Application Server (or the default settings for an <ldapRegistry> in WebSphere Liberty), a firewall can idle timeout a connection.  Since the appserver pools used connections so that they can be reused, they aren't immediately removed after use. 

A connection's context is stored in a pool ready to be reused, but if the remote firewall idles the connection and closes it, WebSphere Application Server is unaware the connection is closed.  When a context is retried, it can become hung, and might eventually time out, such as when a new login occurs.

Technically it's trying and retrying the connection over several minutes (usually 10 - 15 minutes), and eventually it might force a new connection after that length of time, or it can time out much sooner. 

Diagnosing The Problem

Most likely the first observation is that the application hangs or freezes, and eventually hung threads are observed in the SystemOut.log.

A Javacore file or the logs might reveal threads similar to the example, where the WIM adapter is calling LDAP search or reply operations.

The threads might be in a wait state or might be directly calling readReply(), or something similar to the example thread stack:

EXAMPLE THREAD STACK

At java/lang/Object.wait(Native Method)
At java/lang/Object.wait(Object.java:231(Compiled Code))
At com/sun/jndi/ldap/Connection.readReply(Connection.java:446)

At com/sun/jndi/ldap/LdapClient.getSearchReply(LdapClient.java:626)
At com/sun/jndi/ldap/LdapClient.search(LdapClient.java:549)
At com/sun/jndi/ldap/LdapCtx.doSearch(LdapCtx.java:1959)
At com/sun/jndi/ldap/LdapCtx.searchAux(LdapCtx.java:1821)
At com/sun/jndi/ldap/LdapCtx.c_search(LdapCtx.java:1746)
At com/sun/jndi/toolkit/ctx/ComponentDirContext.p_search(ComponentDirContext.java:383)
At com/sun/jndi/toolkit/ctx/PartialCompositeDirContext.search(PartialCompositeDirContext.java:353)
At javax/naming/directory/InitialDirContext.search(InitialDirContext.java:268)
At com/ibm/ws/wim/adapter/ldap/LdapConnection.search(LdapConnection.java:2512)
At com/ibm/ws/wim/adapter/ldap/LdapConnection.checkSearchCache(LdapConnection.java:2473)
At com/ibm/ws/wim/adapter/ldap/LdapConnection.search(LdapConnection.java:2653)
At com/ibm/ws/wim/adapter/ldap/LdapConnection.searchEntities(LdapConnection.java:2797)
At com/ibm/ws/wim/adapter/ldap/LdapAdapter.login(LdapAdapter.java:2572)
At com/ibm/ws/wim/ProfileManager.loginImpl(ProfileManager.java:3329)
At com/ibm/ws/wim/ProfileManager.genericProfileManagerMethod(ProfileManager.java:273)
At com/ibm/ws/wim/ProfileManager.login(ProfileManager.java:377)
At com/ibm/websphere/wim/ServiceProvider.login(ServiceProvider.java:482)
At com/ibm/ws/wim/registry/util/LoginBridge.checkPassword(LoginBridge.java:169)
At com/ibm/ws/wim/registry/WIMUserRegistry$1.run(WIMUserRegistry.java:173)
At com/ibm/ws/security/auth/ContextManagerImpl.runAs(ContextManagerImpl.java:4132)


Even with the default timeout setting of 0, the backend resource (usually a firewall) can close the connection.  An idle timeout is usually responsible... but the appserver is unaware and continues to wait.

By default, the LDAP on WebSphere Application Server (and WebSphere Liberty) is configured to use context pooling with no timeout configured.  Every LDAP connection is sharing a pool of open connections, and if these connections sit idle for a long time and then are accessed, hung threads might be observed

The LDAP request can retry again and again (with longer retry intervals) until it closes the unresponsive connection.  It might timeout depending on the situation, and in some cases it might force a new connection (might take upwards of 15 minutes). The new connections succeed through the firewall without issues. 

Resolving The Problem

While a restart of the application server can repair the issues temporally (that is until the next idle timeout), a long-term solution would be to configure the context pool times out.  These steps can apply to both WebSphere Application Server and WebSphere Liberty.

SET "CONTEXT POOL TIMES OUT"

WebSphere Application Server traditional

  • From the WebSphere Application Server admin console, click Security > Global Security > click Configure... (with Federated Repository selected) > Manage Repositories > Repository Name > Performance
  • Set the "Context Pool Times Out" setting
    • The timeout setting "Context pool times out" is located inside the Context Pool section. The value needs to be set lower than the idle timeout on a firewall (or backend server) to be effective.

      Do not set the timeout too small, as that would defeat the purpose of even using a context pool to reuse already available connections.

  • Press OK or Apply

  • Save the changes, and restart the appserver instances

WebSphere Liberty

  • Edit the server.xml
    • Add a subelement named <contextPool> to an already defined <ldapRegistry> and configure the timeout attribute.  The value must be smaller than the firewall (or backend instance's) idle timeout to be effective.  Do not set the timeout too small, as that would defeat the purpose of using a context pool. 
      • If I use the default values as an example aside from the timeout (the example sets the timeout to 4 minutes instead of 0), it can resemble the mock example:
        <ldapRegistry ... ... ...>
           <contextPool enabled="true" intialSize="1" maxSize="0" preferredSize="3" timeout="4m" waitTime="3s"/>
        </ldapRegistry>
    • Instead, setting the timeout attribute without the others inside the contextPool is acceptable. 

DISABLE CONTEXT POOLING (recommended for testing only)

Disabling the context pool is not a recommended solution, as it disables the pooling completely. Meaning every new connection to the LDAP generates a brand new socket, which can impact performance over the long term. 
However, the procedure can be attempted in a test environment to help determine whether context pooling is a factor in the problems. 

WebSphere Application Server traditional

  • From the WebSphere Application Server admin console, click Security > Global Security > click Configure... (with Federated Repository selected) > Manage Repositories > Repository Name > Performance
  • Disable Context Pooling by removing the check for Enable Context Pool.
  • Press OK or Apply
  • Save the changes, and restart the appserver instances

WebSphere Liberty

  • Edit the server.xml
    • Adjust the contextPool element to disable the context pooling by adding the attribute enabled="false". 
    • The ldapRegisry element example is provided here, where enabled="false" is added to the element <contextPool>.
        There are other attributes for the contextPool, and might be present, but they are not shown and are not required.

      <ldapRegistry ... ... ...>
         <contextPool enabled="false"/>
      </ldapRegistry>

Frequently Asked Questions (FAQ)

* Can disabling the context pool result in performance issues?

The idea of using a context pool for the LDAP connections is to allow the appserver to reuse the connections it already created, thus conserving memory resources as less individual connections are opened.  Disabling it might cause a lot more connections to be opened, temporally.  Each login attempt, for example, would then use a connection a single time, and then discard it. 

Testing user load with the configuration changes would be one way to determine whether the new configuration has any performance effect on the environment.

* Why do I need to configure a "Context Pool Times Out" timeout setting to resolve the issue?  Doesn't the timeout setting indicate how long the login can hang for?

Because the connections are being reused, by setting the timeout allows WebSphere Application Server (or WebSphere Liberty) to automatically discard the connection after a set amount of time.  The idea here is that you want to discard the connection before the firewall's idle timeout triggers, thus preventing the hangs and freezing before they can occur

* But I thought the hangs were caused by WebSphere Application Server being unable to connect to the LDAP server?

The inquiry is partially correct.  The connections to the LDAP are closed only by the firewall, but WebSphere Application Server is unaware of the closure.  The appserver then waits until all retry attempts before it closes the connection.  One might discover that the issues are resolved 10 - 15 minutes later (after the initial hangs are observed), as if it was fixed automatically.  The closed connection on the appserver is technically generating a brand new connection to the LDAP.

* I'm ready to set the timeout value for the context pool, what do I set it to?

You need to set it lower than the firewall's idle timeout.  The lower you set it determines the time in which the connections are discarded when not used.  Generally set it lower to the firewall's timeout (work with your network administrator to find out the appropriate values)


Documentation

 

WebSphere Application Server 8.5 & 8.5.5

WebSphere Application Server 9.0

WebSphere Liberty

Related Technotes

How can LDAP connection pooling be configured in WebSphere Application Server with Standalone LDAP repository (if you are not using an LDAP configuration in the admin console)

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"ARM Category":[{"code":"a8m50000000CdYPAA0","label":"Security->User Registry->LDAP->Federated Repositories"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"8.5.0;8.5.5;9.0.0;9.0.5"},{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSD28V","label":"WebSphere Application Server Liberty Core"},"ARM Category":[{"code":"a8m50000000CdOoAAK","label":"Security->Liberty Profile"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
23 July 2021

UID

swg21650510