CPU scaling study
This study shows how the workload scales when workload submission rates are increased while available dedicated Central Processing Units (CPUs) are scaled.
Introduction to CPU Scaling
CPU scaling is a measure of how much workload can be driven when the CPU resources are increased. An increase of workload can occur when the number of total transactions or the transaction rate are increased. For this workload, the workload submission rate (the rate at which work is submitted to the J2EE middleware layer), has to be increased. However, increased workload in this study also requires a larger database, which means that not only the workload must be scaled, but the whole environment. Scaling the whole environment might have other effects on the performance than just doing more work with the same data.
Maximizing CPU utilization
To determine the performance characteristics of the workload, measurements are taken using one, two, four, and eight dedicated CPUs on the WebSphere® system. A workload entry rate is chosen that is high enough to drive the CPUs to near full utilization. The results can be used to gain a better understanding of the scalability of the workload, and can be used as a way to measure differences in the performance of the same workload on 64-bit WebSphere versus 31-bit WebSphere.
In all 64-bit WebSphere measurements, the heap settings for the JVM are set to 75% of the 8 GB available memory. This is the optimum percentage derived from the study Heapsize for the 64-bit Java Virtual Machine. That worked out to a 64-bit WebSphere JVM heap settings of -Xms6144m -Xmx6144m. A memory size of 8 GB is also configured for the DB2® LPAR, which runs with four configured CPUs for all of the tests.
10 Gb Ethernet chosen for highest workload
The workload submission rate of 600 was found to exceed the capacity of the 1 Gb Ethernet network. This causes network saturation, dampening throughput and providing additional work for error handling. The 600 workload submission rate tests are therefore run using a 10 Gb Ethernet, to remove the effects of a network bottleneck on the results. A submission rate higher than 600 would have required a larger restructuring of the environment, because of the higher resource usage from the clients to WebSphere and up to the database. This would have exceeded the scope of the study.
CPU Scaling
Dedicated CPUs are assigned to the WebSphere System being tested. The experiments use one, two, four, or eight dedicated CPUs. The workload is then varied until a CPU utilization close to or greater than 90% is observed. The workload is adjusted by changing the workload submission rate. When eight CPUs are dedicated to the WebSphere image, only approximately 80% total CPU utilization at a workload submission rate of 600 was observed. This is because, as explained in 10 Gb Ethernet chosen for highest workload, different WebSphere or client tuning values would have been needed for a submission rate greater than 600.
Transaction scaling and response time measurements
The transaction rate is the throughput as reported by the client-side summary reports. Response time measured is the observed response time of a simulated Web operation (such as an online Web purchase or a Web browse operation) and the turnaround of the Web request from the WebSphere system after the completion of some business logic. These response times are averaged with response times for manufacturing operations. The performance of the DB2 subsystem is also represented in this data. Table 1 summarizes these results in tabular format. Figure 1, Figure 2, and Figure 3 are graphical representations of the results.
With this workload, the throughput measurements stay fairly constant and performance degradations are first indicated by increasing response times and CPU utilization.
31-bit or 64-bit JVM | Workload submission rate | Workload throughput | Number of CPUs | CPU utilization | Response time (ms) |
---|---|---|---|---|---|
31-bit | 110 | 101% | 1 | 87% | 432 |
64-bit | 110 | 100% | 1 | 97% | 793 |
31-bit | 190 | 174% | 2 | 175% | 385 |
64-bit | 190 | 174% | 2 | 189% | 654 |
31-bit | 350 | 322% | 4 | 353% | 361 |
64-bit | 350 | 321% | 4 | 371% | 404 |
31-bit | 600 | 551% | 8 | 612% | 562 |
64-bit | 600 | 550% | 8 | 641% | 518 |
Observations
The workload scales very linearly for both 31-bit and 64-bit WebSphere Application Servers. The 31-bit version requires a little less CPU at higher workloads than the 64-bit version. The utilization of the CPUs also scales very linearly for both 31-bit and 64-bit WebSphere.
An unexpected behavior is shown by the response time. The response time becomes shorter with the higher workloads when using a larger number of CPUs, and increases again on the last scaling step with the highest workload. Here, the 31-bit WebSphere Application Server differs significantly from the 64-bit WebSphere Application Server; the response time with one CPU is much shorter, but the gap decreases with the scaling. At a submission rate of 600 with eight CPUs, the 64-bit WebSphere Application Server's response time becomes shorter.
Conclusions
The very good linear scaling in throughput and CPU utilization makes scaling of this workload easy for a system administrator. The difference between the 31-bit WebSphere and the 64-bit WebSphere is small. The more efficient garbage collection of the 64-bit version seems to compensate for the drawback of the larger memory addresses, as seen in other studies. See https://download.boulder.ibm.com/ibmdl/pub/software/dw/linux390/perf/ZSW03030-USEN-00.pdf.
At higher workload submission rates, the advantages of a larger heap on 64-bit WebSphere result in an improving response time curve. A more detailed analysis of garbage collection can be found in Comparing 64-bit WebSphere versus 31-bit WebSphere.
A CPU utilization greater than 90% was not observed for the higher workloads, indicating an unidentified bottleneck, which might be the HiperSockets connection between the WebSphere Application Server and the database. Additional investigation would be required to determine the cause of this bottleneck. The high CPU utilization of 97% of the one CPU run with the 64-bit WebSphere becomes critical for a system running with HiperSockets, and is very likely the reason for the high response times there.
Large heaps provide more space for both long-lived and newly-created objects. It seems that the Generational Concurrent garbage collection option works very efficiently, even for this workload, which was designed to have a high load and resource utilization on the WebSphere Application Server.