WLM even distribution of HTTP requests
The z/OS® workload management (WLM) component supports distributing incoming HTTP requests without servant affinity in a round-robin manner across the servants. This functionality is intended for, but not limited to long lasting HTTP session objects that are maintained in memory, stateless session Enterprise JavaBeans (EJB), and the create method for stateful session enterprise beans. You can configure the product to use this functionality to spread HTTP requests among active servants that are currently bound to the same work queue as the inbound requests.
The following diagram represents one clustered server instance. The azsr01 cluster contains the azsr01a application server instance. In the application server instance is a controller, the workload manager (WLM) queue, and the servants where applications run. The controller is the HTTP and IIOP termination point. The WLM queue controls the flow of work from the controller to one of the servants. Each of the servants contains worker threads that select work from the WLM queue.
In the preceding diagram, the application server is configured to have the minimum and maximum number of servants set to three.
The product supports the use of HTTP session objects in memory for application servers with multiple servants, also known as the hot servant strategy. In the following diagram, two users accessed an application in the azsr01a application server instance. User 1 established an HTTP session object in servant 3. User 2 established an HTTP session object in servant 2.
- The configuration allows creating new servants
- The workload manager logic determines that the system can sustain an additional servant
- Adding another servant leads to reduced queue delay and allows enclaves to be completed within the specified goal
When multiple servants are bound to the same service class, WLM attempts to dispatch the new requests to a hot servant. A hot servant has a recent request dispatched to it and has threads available. If the hot servant has a backlog of work, WLM dispatches the work to another servant.
Normally running this hot servant strategy is good because the hot servant likely has all its necessary pages in storage, has the just-in-time (JIT) compiled application methods saved close by, and has a cache full of data for fast data retrieval. However, this strategy presents a problem in the following situations:
- HTTP session objects in memory are used, causing dispatching affinities.
- The HTTP session objects last for many hours or days.
- A large number of clients with HTTP session objects that must be kept in memory.
- The loss of a session object is disruptive to the client or server and the amount of time between requests that create HTTP sessions is large.
- If the application creates a large number of objects in a single servant, long garbage collection times might result.
- If all the HTTP session objects are bound to one servant, requests might be held in the queue for a long time because the work cannot be managed by WLM and cannot be dispatched in any servant.
- If all HTTP session objects reside in one or two servants, a timeout in a single servant can affect a larger number of users than if the HTTP session objects are divided equally among several servants.
If your configuration experiences one of the described situations that cause a problem with the hot servant strategy, you can configure your application server to support the distribution of incoming HTTP requests across servants without servant affinity. When you enable this functionality, the application server uses a round-robin distribution of HTTP requests to the servants.
In the following example, assume that the application server was configured to use the round-robin distribution of HTTP requests among the servants and multiple servants are started for the work queue requests that have the same service class assigned.
When a new HTTP request without affinity arrives on a work queue, the WLM checks to see if there is a servant that has at least one worker thread waiting for work. If there are no available worker threads in any servants, WLM queues the request until a worker thread in any of the servants becomes available. If there are available worker threads, WLM finds the servant with the smallest number of affinities. If there are servant regions with equal number of affinities, then WLM dispatches the work to the servant region with the smaller number of busy server threads.
The goal of this algorithm is for WLM to balance the incoming requests without servant affinity among waiting servants while considering changing conditions. The algorithm does not blindly assign requests to servers in a true round-robin manner. The following diagram shows the balanced distribution of HTTP session objects across servants.
This distribution mechanism works for all inbound requests without affinity. After the HTTP session object is created, all the client requests are directed to that servant until the HTTP session object is removed.
If you decide to enable the distribution of incoming HTTP requests without servant affinity, you might need to make some changes to your classification mapping file. If you have set up your classification mapping file to specify more than one transaction class on a mapping rule for the managed round-robin support that the product provides, you should remove this section from your classification mapping file.