IBM Support

AIX Memory usage: 100% used, filecache and paging

How To


Summary

How to check whether the system is memory over-committed, more memory is needed or it can be reduced.

Objective

A common concern is the amount of memory being used, since by default AIX uses all the available memory: it is normal to see almost 100% of
real memory being used, but it does not necessarily mean the system
is over-committed and more memory is needed.

Steps

Memory in AIX is managed by VMM, Virtual Memory Manager, that works with pages, a fixed-size block of data.
There are fundamentally two types of pages on AIX:

  • Working storage pages (Computational memory)
  • Permanent storage pages (Non-Computational memory)

Working storage
Working storage pages are pages that contain volatile data (in other words, data that is not preserved across a reboot).
On other platforms, working storage memory is sometimes referred to as anonymous memory.
Examples of virtual memory regions that consist of working storage pages are:

  • Process data
  • Stack
  • Shared memory
  • Kernel data

When modified working storage pages need to be paged out (moved from memory to the disk), they are written to paging space;
working storage pages are never written to a file system. When a process exits, the system releases all of its private working storage pages.
Thus, the system releases the working storage pages for the data of a process and stack when the process exits.
The working storage pages for shared memory regions are not released until the shared memory region is deleted.

Permanent storage
Permanent storage pages are pages that contain permanent data (that is, data that is preserved across a reboot).
This permanent data are file data, so permanent storage pages are just pieces of files cached in memory.
An unmodified permanent storage page can be released without being written to the file system, since the file system contains a pristine copy of the data; when a modified permanent storage page needs to be paged out (moved from memory to disk),
it is written to a file system.

For example, if an application is reading a file, the file data is cached in memory in permanent storage pages.
These permanent storage pages are unmodified, meaning that the pages were not modified in memory.
So, the in-memory permanent storage pages are equivalent to the file data on the disk. When AIX needs to free up memory, it can just "release" these pages without having to write anything to disk.
If the application did writes to the file instead of reads, the permanent storage pages would be "modified," and AIX would have to flush the pages to disk before releasing the pages.

You can divide permanent storage pages into two subtypes:

  • Client pages
  • Non-client pages

Non-client pages are pages containing cached Journaled FileSystem (JFS) file data. Non-client pages are sometimes referred to as persistent pages.
Client pages are pages containing cached data for all other file systems (for example, JFS2 and Network FileSystem (NFS).

Is my system memory over-committed? Do I need to add more memory?

Memory overcommittment is determined by the amount of Computational memory vs the real memory available to the system.
Looking at the memory usage, you must verify that Computational memory does not exceed the system real memory: if it does, then the system is memory over-committed resulting in paging space activity, which has a performance impact, and more memory has to be added.

You can use different commands to verify the allocated memory, for example
- vmstat
    Running vmstat (no specific flags needed) you can see the "avm" column
                             
image-20200331214356-1

    The "avm" represents the "active virtual memory" that basically corresponds to the Computational memory: the number is in 4KB units (memory 
 
   pages), here about 42GB.

- svmon
   "svmon -G" shows the global report where you can compare "virtual" to "size" (units are in 4KB pages)

                 image-20200331183409-1
    
size         =   25165824 4KB pages = 98GB
    
virtual =   11281059 4KB pages = 44GB

- topas
    Executed with no flags, the memory section is in the right side of the output showing the percentage of Computational memory (here at 41%).

                                         image-20200331174501-1   

Here the system is using about 42% of memory for Computational memory, thus not over-committed in terms of memory: depending on the system activity the remaining 58% can be free or used for filecache (Non-Computational memory).

Note: topas output might not be as accurate as the vmstat-svmon one since it refers to real memory only: data swapped to the paging space are not accounted by topas, so the total virtual memory can be higher than the one in real memory reported by topas (vmstat and svmon show you the virtual memory, whether it's in RAM or in a paging device).

AIX uses what's left from Computational memory for the Non-Computational one: it is thus normal to see about 100% memory used, the system works fine and there are no problems.

The following example shows a system with 100% memory used which is perfectly fine; here we “cat” some large files to /dev/null, but the same applies to any cached I/Os (like backups, application reads or writes):

image-20200331174705-4

The system has 8GB RAM, where we start with Computational Memory around 22% and the remaining 78% real memory free.
 
avm    =   461462   =  1.8GB
 
fre   = 1628290   =  6.3GB

We then start doing some I/Os with file pages being read (“fi” column, note
vmstat "-I" flag must be used to see file pages details) and the free memory going down; when the free memory drops to “minfree” value the system starts scanning the memory (“sr” column) to find enough free pages to load the data being read from the disk. (the number of pages freed, “fr” column, here corresponds to the number of pages being read, “fi” column).
All the real memory is used and the system needs to free some of the allocated pages for the new data to read,
but Non-Computational memory only (filecache) will be freed.

When the I/O terminated the page activity returns to 0s, but you can notice that the “avm” value did not change, since the Computational memory didn’t increase.

Couple of things to note:

  • “fi” and “fo” refer to file pages being read into and written out of memory.
     This is different than paging activity, swapping data to-from the paging space, which corresponds to “pi” and “po” columns: a value of 0 for pi-po means the system is *not* paging (no activity to-from the paging devices). You can verify this by looking at the paging space usage, like later shown by
    svmon or topas (or executing "lsps -a"), than has not increased.
  • Non-Computational memory is not released after the data were read-written, as you can see in vmstat output where the freelist remains at 3056 after the I/O terminated.
    The filecache is released when:
    - the cached file is removed
    - the filesystem where the cached file resides is unmounted
    - memory is needed by any process and there is not enough free one to accomplish the request: in this case VMM starts page replacement (via “
    lrud” daemon) and steals file pages.

The same can also be seen with

- svmon comparing before

image-20200331174727-5

  and after output

image-20200331174747-6

  All the memory is now used ("inuse" is almost the same as "size") with only 2911 free pages, but the virtual remained the same, the increase is all for Non-Computational memory: the I/O was from JFS2 filesystems so these pages are shown under “clnt” column (for “in use”).

- topas 
  We started with Computational memory at 22% and ended with the same, but now we have 77% for Non-Computational memory.

                                image-20200331174747-7                      image-20200331174747-8

Can I safely remove some memory from the system?

If you want to reduce the amount of real memory assigned to a system, you must make sure it has enough memory for the Computational pages: if the Computational memory is closed to the real memory then it should not be reduced.

Lets take an example where we have a loaded system with 40GB Real Memory, 50% Computational, 50% Non-Computational: the system needs 20GB real memory to avoid paging activity, so you can reduce the current 40GB memory assigned to it.
You have to consider it anyway needs some memory to cache the I/O, so you can lower this system memory to something like 25GB: the result is 80% Computational (20GB out of 25GB) and about 5GB for filecache that is Non-Computational pages.
There are two things to consider here:
- The recommendation is to have Computational memory not exceeding 90% of real memory
   Note it is just a "reference" threshold: the system works fine even with higher values, but an increase in the load with more memory to be allocated    
   can quickly over-commit the memory and swapping data to the paging space.
- The amount of memory left for Non-Computational pages (filecache)
   Depending on the amount of cached I/Os issued by the applications, too low memory for filecache can also have a performance impact.

In the previous example the amount of memory to be used for Non-Computational pages is reduced from 20GB to 5GB: on systems with low-moderate cached I/Os it can be enough, while systems with heavy cached I/Os benefit from having more memory.
As this is really load dependent it needs to be evaluated on a case-by-case basis, either reviewing the system I/O type and load, or
reducing the memory in different steps, reassessing the performance at each step (in the previous example, reducing the memory by 5GB at time).


 

AIX provides several tunables that affect VMM and I/Os, changing them without understanding all their effects can result in performance impact:

  • It is recommended to not change "Restricted" tunables
  • Apply the same tunables changes to all the systems as "general tuning" is not recommended
  • Reassess the need of tuning changes when migrating from an AIX Version to another

Additional Information

SUPPORT

If you require more assistance, use the following step-by-step instructions to contact IBM to open a case for software with an active and valid support contract.  

1.  Document (or collect screen captures of) all symptoms, errors, and messages related to your issue.

2.  Capture any logs or data relevant to the situation.

3.  Contact IBM to open a case:

   -For electronic support, see the IBM Support Community:
     https://www.ibm.com/mysupport
   -If you require telephone support, see the web page:
      https://www.ibm.com/planetwide/

4.  Provide a clear, concise description of the issue.

5.  If the system is accessible, collect a system snap, and upload all of the details and data for your case.

 - For guidance, see: Working with IBM AIX Support: Collecting snap data

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
01 April 2020

UID

ibm16147765