Disk space restrictions causes IBM SOAR App Host pods to be evicted

Troubleshooting

Problem

Deployments low on disk space pods can show as "Evicted" and App Host apps do not function.

Symptom

Typical symptoms are:

App Host shows as offline under Administration Settings - Apps
Apps might not be running and show that they are in an error state
Pods are shown as "Evicted"

Cause

Applications run inside pods, which store their data in /var/lib.

For some stand-alone installations of the software, the file system has too little disk assigned to /var which causes the pods to stop running.

Environment

Stand-alone deployments require the client to add disk, while virtual appliance installs set the /var/lib in the tens of gigabytes. In instances of stand-alone deployments where the disk is undersized this problem is seen.

Diagnosing The Problem

Following MustGather: Information to Collect when Troubleshooting Issues with IBM Security SOAR AppHost the pods have a status of "Evicted".

$ sudo kubectl get pods -A

NAMESPACE                              NAME                                                    READY   STATUS    RESTARTS   AGE
kube-system                            metrics-server-7566d596c8-4lzrr                         1/1     Running   0          168d
8ddc6284-f5da-4521-9486-06c2c4d7acdc   701814f3-7ebc-432a-979a-79e8bec086f2-5546cf8779-d97z9   1/1     Running   10         36d
8ddc6284-f5da-4521-9486-06c2c4d7acdc   d9f56acf-381a-4f6b-9f12-04d1e21c742f-587bd4cc9-qx9s9    1/1     Running   0          9d
8ddc6284-f5da-4521-9486-06c2c4d7acdc   52aa6dc0-fde7-4907-9ae7-03f22d2e0e12-6d49657476-5hgl2   1/1     Running   0          9d
8ddc6284-f5da-4521-9486-06c2c4d7acdc   deployment-operator-b44c9d9b-zhfdx                      0/1     Evicted   0          36d
8ddc6284-f5da-4521-9486-06c2c4d7acdc   deployment-operator-b44c9d9b-kgvh7                      0/1     Evicted   0          9d
8ddc6284-f5da-4521-9486-06c2c4d7acdc   deployment-operator-b44c9d9b-rlwwf                      0/1     Evicted   0          9d
8ddc6284-f5da-4521-9486-06c2c4d7acdc   deployment-operator-b44c9d9b-2rpv5                      0/1     Evicted   0          9d
8ddc6284-f5da-4521-9486-06c2c4d7acdc   deployment-operator-b44c9d9b-vnmh8                      0/1     Evicted   0          9d
8ddc6284-f5da-4521-9486-06c2c4d7acdc   deployment-operator-b44c9d9b-z7sdz                      0/1     Evicted   0          9d
8ddc6284-f5da-4521-9486-06c2c4d7acdc   deployment-synchronizer-7768954475-6bkmz                0/1     Evicted   0          36d
kube-system                            coredns-c95899d75-v2l24                                 1/1     Running   0          91d

Running other commands to see the state of the deployment NodeHasDiskPressure and EvictionThresholdMet are seen.

$ sudo kubectl get events

LAST SEEN   TYPE      REASON                    OBJECT              MESSAGE
26m         Warning   EvictionThresholdMet      node/tmnl-cp4s-ah   Attempting to reclaim ephemeral-storage
24m         Normal    Starting                  node/tmnl-cp4s-ah   Starting kubelet.
24m         Normal    Starting                  node/tmnl-cp4s-ah   Starting kube-proxy.
24m         Warning   InvalidDiskCapacity       node/tmnl-cp4s-ah   invalid capacity 0 on image filesystem
24m         Normal    NodeHasSufficientMemory   node/tmnl-cp4s-ah   Node tmnl-cp4s-ah status is now: NodeHasSufficientMemory
24m         Normal    NodeHasSufficientPID      node/tmnl-cp4s-ah   Node tmnl-cp4s-ah status is now: NodeHasSufficientPID
24m         Normal    NodeNotReady              node/tmnl-cp4s-ah   Node tmnl-cp4s-ah status is now: NodeNotReady
24m         Normal    NodeAllocatableEnforced   node/tmnl-cp4s-ah   Updated Node Allocatable limit across pods
24m         Normal    NodeReady                 node/tmnl-cp4s-ah   Node tmnl-cp4s-ah status is now: NodeReady
24m         Normal    RegisteredNode            node/tmnl-cp4s-ah   Node tmnl-cp4s-ah event: Registered Node tmnl-cp4s-ah in Controller
17m         Normal    NodeHasNoDiskPressure     node/tmnl-cp4s-ah   Node tmnl-cp4s-ah status is now: NodeHasNoDiskPressure
16m         Normal    NodeHasDiskPressure       node/tmnl-cp4s-ah   Node tmnl-cp4s-ah status is now: NodeHasDiskPressure
14m         Warning   FreeDiskSpaceFailed       node/tmnl-cp4s-ah   failed to garbage collect required amount of images. Wanted to free 1587504742 bytes, but freed 0 byte
s
9m36s       Warning   ImageGCFailed             node/tmnl-cp4s-ah   failed to garbage collect required amount of images. Wanted to free 1669768806 bytes, but freed 0 byte
s
4m36s       Warning   EvictionThresholdMet      node/tmnl-cp4s-ah   Attempting to reclaim ephemeral-storage

Examining the disk the /var directory is low on disk space triggering the disk-related errors seen in the previous output.

$ sudo  df -h

Filesystem                 Size  Used Avail Use% Mounted on
devtmpfs                   3.9G     0  3.9G   0% /dev
tmpfs                      3.9G     0  3.9G   0% /dev/shm
tmpfs                      3.9G  410M  3.5G  11% /run
tmpfs                      3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/mapper/rootvg-rootlv  2.0G  291M  1.8G  15% /
/dev/mapper/rootvg-usrlv    10G  2.7G  7.4G  27% /usr
/dev/sda2                  494M  107M  388M  22% /boot
/dev/sda1                  500M  9.9M  490M   2% /boot/efi
/dev/mapper/rootvg-optlv   2.0G  791M  1.3G  39% /opt
/dev/mapper/rootvg-homelv 1014M   75M  940M   8% /home
/dev/mapper/rootvg-tmplv   2.0G   33M  2.0G   2% /tmp
/dev/mapper/rootvg-varlv   8.0G  7.3G  752M  91% /var
/dev/sdb1                   16G   45M   15G   1% /mnt

Resolving The Problem

Increase the disk associated with /var.

If the pods do not return to a "Running" state, consider running the following commands.

$ sudo restartAppHost # restart the pods

$ sudo systemctl restart k3s # restart Kubernetes

The product documentation for IBM SOAR specifically App Host Deployment Guide - Prerequisites provides the suggested minimums for disk space.

The resources required by the App Host server are variable due to the requirements of the apps installed. Some apps that operate on files in memory may have extra memory requirements. Apps that perform considerable computations, such as decryption tasks, might need more CPU. Therefore, you might need to increase those resources.

Related Information

MustGather: Information to Collect when Troubleshooting Issues with IBM Securit…

How to increase partition size using an existing disk on RHEL with LVM

How to increase partition size by using a new disk on RHEL with LVM

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSIP9Q","label":"IBM Security SOAR"},"ARM Category":[{"code":"a8m0z0000001jTpAAI","label":"Integrations-\u003EAppHost"}],"ARM Case Number":"TS006804376","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEGM63","label":"IBM Security QRadar SOAR on Cloud"},"ARM Category":[{"code":"a8m0z0000001jTpAAI","label":"Integrations-\u003EAppHost"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Tips