Question & Answer
Question
If you run a job which uses shared memory on a host with Linux kernel below 2.6.24, the job swap value may be quite large (unreasonable).
Cause
For LSF 9.1 and higher, the job processes can be tracked through Linux Cgroup which are supported on x86_64 and PowerPC LINUX with kernel version 2.6.24 or later. Here are the ways LSF calculates the job swap.
1. If the LSF_LINUX_CGROUP_ACCT=n parameter is set in the lsf.conf file, LSF uses PIM to collect the memory and swap usage of all processes in a job.
1) If the EGO_PIM_SWAP_REPORT=n parameter is set in the lsf.conf file (by default), swap usage is the total virtual memory (VSZ) of all job processes.
2) If the EGO_PIM_SWAP_REPORT=y parameter is set in the lsf.conf file, the resident set size (RSS) is subtracted from the virtual memory usage. RSS is the portion of memory occupied by a process that is held in main memory. Swap usage is collected as the total (VSZ-RSS) of all job processes.
2. If the LSF_LINUX_CGROUP_ACCT=y parameter is set in the lsf.conf file, LSF uses the cgroup memory subsystem to collect the memory and swap usage of all processes in a job. The job swap is the total swap usage of all processes in a job.
When a job runs on a host with Linux kernel below 2.6.24 (Linux Cgroup is not supported), the job swap is calculated as the total virtual memory (VSZ) of all job processes by default. For jobs use shared memory, LSF can’t get the shared VSZ value for each process of a job so that LSF will calculate the shared VSZ multiple times, that is why the swap value for the job may be quite large (unreasonable).
In short, the reason of the quite large job swap is the Linux Cgroup not being enabled on the host where the job runs.
Answer
When a job which uses shared memory runs on a host with Linux kernel below 2.6.24 (Linux Cgroup is not supported), the job swap value is not the real swap usage of the job and it may be quite large (unreasonable). If you want to get the real swap usage of the job, please upgrade the Linux kernel to 2.6.24 or higher and enable Linux Cgroup.
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
isg3T1024098