IBM Support

AIX Workload Manager

How To


Summary

The AIX Workload Manager ((WLM) feature groups running processes into classes to monitor workloads and optionally control them.
WLM is a powerful tool for the system administrator and is simple to set up.

Objective

Nigels Banner

Environment

The AIX Workload Manager ((WLM) feature dates back to AIX version 5.1 and approximately the year 2000. WLM is a simple to operate and yet powerful performance monitor statistics and performance-controlling tool for every system administrator. Processes are grouped into what are called "classes" in a number of ways. Each class can be monitored for CPU time, memory use, and block I/O.  Then, classes can be allocated resources by use shares (like percentages but much simpler to use) and used but the AIX kernel to boost or limit the various classes' performance.
Do not confuse
  • Workload Manager (WLM) covered here and a core part of AIX
  • Workload Partition Manager (WPAR Manager) an entirely different tool that is no longer available.
However, if you are using Workload Partitions (WPAR) that technology automatically turns on Workload Manager and creates WLM classes to allow it to monitor WPARs.

Steps

This article is going to cover the basics only as the details can found in the following two places:
The AIX Workload Manager Redbook
WLM Redbook cover
Highlights:
  • Published in June 2001
  • 350 pages
  • Chapters: The need for WLM, functions, administration, performance tools, manual assignment, application programming interface, sizing with WLM, practical experience, and then a series of appendix.
  • Many famous AIX technical people are named in the writers and contributors list. Amazingly, not my name but I covered WLM in technical conferences shortly after the book was released.
  • Download: https://www.redbooks.ibm.com/redbooks/pdfs/sg245977.pdf
AIX Workload Manager in the AIX manuals
WLM Manual page mini
Highlights:
  • From the AIX development team
  • Lots of chapters and details. See the picture or go to the website
  • There appear the are no differences between the manual pages for AIX version 7.1, 7.2, and 7.3.
  • Download: https://www.ibm.com/docs/en/aix/7.3?topic=management-workload-manager
  • You can also fine the manual pages for the WLM commands: wlmcntrl, wlmassign, wlmcheck, and wlmstat
There is a video on Workload manager by this author: 
AIX Workload Manager The Basics
Imaging a time long ago before Power workload partitions (WPAR), virtual machines (VM), and containers (Docker, and so on.). Information technology departments ran much bigger UNIX servers running large numbers of application on a single operating system. Now these workloads are split out across servers.
Large servers or virtual machines still have many workloads, for example, a simple RDBMS might have:
  • The database itself,
  • Incoming data transfer tools,
  • Data transformation tacks,
  • Batch processing at night,
  • User tasks logging in remotely,
  • Backup,
  • And many more.
So in old time and now we run many services at the same time on a single operating system instance.
A graph of too many processes is a total complicated mess
Too many processes makes graphs pointless
What we need it to group these processes into workloads then the graphs show which workloads are the CPU, memory, and lock I/O hogs over time. Just for illustration purposes grouping processes in to classes simplifies the graph and suddenly workload information and trends emerge:
WLM shows trends
For illustration purposes, let us focus on CPU time.
Install WLM - it is part of AIX and already installed. Nothing to do here.
Switch in WLM in to active mode. Active means monitor and possibly controlling class resource (if you assign shares):
wlmctrl -a
Alternatively, start in passive mode (monitor only and no possibility of changing the behavior):
wlmctrl -p
Create four new classes for RDBMS, Batch, Backup and remote user connection application:
A simple edit to the test file as the root user:  /etc/wlm/current/classes
Name the class and specify how AIX can decide which processes are in which class. A simple way is to name the AIX user that starts the workload programs. Alternatives, are using user groups or the directories that contain the binary program files. Once a process is in a class, the default setting is for all processes it creates are in the same class.
Tell AIX WLM to reevaluate ("u" for update) the classes file:
wlmcntrl -u
Monitor the WLM statistics with classes aware tools to monitor performance
Simple first - wlmstat
$ wlmstat
         CLASS    CPU    MEM   DKIO 
  Unclassified   0.00  22.14   0.00 
     Unmanaged   0.00  11.49   0.00 
       Default   0.00   0.81   0.00 
        Shared   0.00   0.79   0.00 
        System   0.01   4.57   0.00 
         RDBMS   0.00   0.00   0.00 
         Batch   0.00   0.00   0.00 
        Backup   0.00   0.00   0.00 
         Users   0.00   0.00   0.00 
         TOTAL   0.01  28.31   0.00 
You can get the output regularly in a similar option like vmstat or iostat.
For output every three seconds and outputting hundred times, use:
wlmstat 3 100
While wlmstat is useful for a quick look, it is not good for long-term data collection nor for creating graphs as the output is in an awkward text format.
Online WLM with monitor CPU, memory, and I/O with nmon (type "W" to toggle the WLM stats):
WLM in nmon
Simple post-mortem graphing
The AIX command nmon can save the data in a machine-usable comma-separated values format.
To collect Workload Manager stats to a file, add the -W option. For example,
nmon -f -s 60 -1440 -W
The Excel based nmon Analyser spread sheet can graph the statistics after the data collection finishes.
Advanced-live statistics to be monitored, with instant changes to the graphing period
My replacement tool for nmon is called njmon
From version 80, njmon collects workload Manager statistics and data can be stored in various time series databases. My preferred database in InfluxDB 1.8 due to its ease of getting started, the free open source version makes a great start and along with the Grafana tool for beautiful and flexible graphing system.
First, see how WLM was set up . . .
In this case, I used a further way to get certain processes in to a class.  The method uses the wlmassign command to force a running process into the particular class.
I added a class named "applications" to the /etc/wlm/current/classes file by added at the end of the file:
applications:
        inheritance = "yes"
        localshm = "yes"
        adminuser = "nag"
Key:
  • "application" is the name of the class
  • inheritance = "yes"
    Means any process that creates a subprocess has the new process added to the same class.
  • localshm = "yes"
    Means process shared memory is counted as part of the class.
  • adminuser ="nag"
    Means the user "nag" (that is my username) can add processes to this class.  This option stops other users placing their processes in other peoples classes to avoid detection.
For automatically adding processes to a class, the rules are specified in /etc/wlm/current/rules
This rules file is covered later in this article.
I then use a simple ksh script that uses the ps command to find all processes called ncpu (which spins in a loop on the CPUs), extracts the process IDs and then uses wlmassign. if the ncpu processes have IDs 12345 and 67890:
wlmassign applications 12345
wlmassign applications 67890

- or - 

wlmassign applications 12345,67890
The wlmassign command can be used as follows to force a batch job in to a class of its own (batch jobs are often started by a shell script):
# Start 3 am batch
/usr/lbin/batchrunner mode6 all phase8
# batch process id = $$
wlmassign batch8 $$
The njmon command then collects the Workload Manager and 100's of other statistics.
Here are the Grafana graphs and settings covering a single user-defined class "applications" and the rest are WLM standard built-in classes:
WLM graphs from njmon InfluxDB and the Grafana

In the graphs there are:

  1. CPU Consumed
    • Showing the CPU utilisation percentages for each class. 
  2. CPU Physical Consumed 
    • Showing the number of actual CPU cores being used.
  3. Memory Consumed
    • Showing the memory-used percentages.
  4. Block I/O (BIO) 
    • It is disk I/O. Shows a percentage of the disk device throughput being used.
  5. Shares per class
    • Used to limit the resources (CPU, memory, and BIO). A value of -1 means the resource control is off.
  6. Desired per class
    • The WLM calculated target for resource control.
  7. Processes by class
    • The number of processes in the class.  A useful check that the class definitions are correct.
  8. State by class
    • Shows the status, like Active or not.

Notes on Classes

There are two WLM standard useful special classes:

System classes:

  • Operating system kernel internal processes get automatically added to the System class.
  • Allows managing system resource usage but it is not recommended to actively manage this class.
  • The UNIX CPU utilisation statistics are: User, System, Idle, and Wait for I/O. The System class matches the System Utilisation statistics.
  • Note there are a few system classes - see the Redbooks: WLM for the full details.

Default:

  • All user processes that not classified get add to Default.
  • Allows managing unclassified jobs>
  • Use Default for small bits & pieces.
  • If this number becomes large, it is recommended that you investigate why? Create a class for these unexpected processes. For example, some shell program pointlessly spinning in a loop.

Shared, Unclassified, and Unmanaged Classes

  • Shared covers shared memory pages that are not assigned to other classes.
  • Unclassified and Unmanaged - see the AIX Workload Manager Redbook section 2.2 for details.

System administrators can create up to 270 more classes.

 
Assigning processes to classes is controlled by the configuration file /etc/wlm/current/classes.
The chart points out the class of a process can be one of many methods:
  • The user ID that started the process
  • The user group ID of the user that started the process
  • The program name in the file system used to start the process
Also, there is
  • The class is set to the class of the process, which started a subprocess that uses fork and exec system calls - this setting is called inheritance
  • A user forcing a process into a class by that uses the AIX wlmassign command
Defining classes

Legend:

  • "*" wildcard can be used for example /usr/local/bin/*
  • "-" = all
  • "!" = exclusion

After the classes are defined, start WLM or tell it to reapply the settings with: wlmcntrl -a

The following diagram shows for each class the CPU, memory, and disk I/O in monitored. The WLM tool does not consume many CPU cycles. But is useful for understanding the workloads.

Not just CPU but memory and I/O
Notes on Shares
Once the workloads are understood, there is an opportunity to adjust priorities. Perhaps, some classes are vital to the server and get more resources or other classes are less important and yield resources to boost performance to other classes. This control is managed by the assigning of shares to classes for all three resources: CPU, memory, and disk I/O. Most people want to assign percentages but if you make many changes that become tedious, adjusting the number to add up to 100. Shares are a far better method. If all the shares add up to 100, then they are percentages.

Shares Example: Adding a Class

WLM Shared are simpler than percentages

Example of share calculations:

  • There are threes classes but a fourth class (blue) is required.
  • With percentages, all other shares must  be adjusted.

Note:

  • Before a share is 1/14th = 7.14 %
  • After a share is 1/17th = 5.88%
  • Let the computer do that maths.
Example of controlling workloads to reduce the virtual machine size:
WLM to reduce VM size
WLM can shrink the CPU resources for classes with low numbers of shares during the peaks. The trick is effectively, postponing the low-priority classes to a time after the peak.
Notes on from WLM Experience
  1. WLM is quick and simple yielding a massive boost in understanding of the workloads = "non-brainer".
  2. WLM does not affect the performance or behavior of workloads.
  3. Aim for five to ten of classes for helpful clear graphs.
  4. WLM will quickly identify changes in workload over time and rogue processes that will need urgent investigation.
  5. Many system administrators monitor with WLM and stop there.
  6. To start proactive controlling with shares, require a clear objective:
    • Ensure that high-priority classes get everything they need. For example, boosting the RDBMS.
    • Ensure that low-priority classes get resources after the high-priority class have everything they want. For example, running the data load class, when nothing else needs the CPU.
    • Use soft limits only.
    • Reserve hard limits only for classes where soft limits do not work.
  7. Add shares to one class at a time. Changing shares is live and immediate: edit the configuration file and then run: wlmcntrl -a
  8. Many system administrators stop at just using WLM control for CPU resources.
  9. Do not control memory until controlling CPU with shares is working well.
  10. Warning: if you force a class with a low number of memory shares into paging, it can be a disaster. This paging and poor performance can happen even with unused memory in the server (virtual machine).
  11. Controlling Disk I/O is rare - the only way that it can achieve the wanted results is by postponing disk I/O for classes with low numbers of shares
The End

Additional Information


Other places to find Nigel Griffiths IBM (retired)

Document Location

Worldwide

[{"Line of Business":{"code":"LOB08","label":"Cognitive Systems"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"ARM Category":[{"code":"","label":""}],"Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions"}]

Document Information

Modified date:
08 June 2023

UID

ibm16587951