IBM Support

QRadar: Delete files or directories to gain space in /opt partition

Troubleshooting


Problem

When the root /opt partition in QRadar® SIEM does not have enough space, it can affect the regular functioning of QRadar. The purpose of this article is to help the administrator with the removal of files and directories when the /opt partition has not enough available disk space.

Symptom

Lack of available space in the /opt partition can cause the following issues:
  • Alerts about "Process monitor application failed to start multiple times".
  • Searches reporting I/O errors.
  • Services not starting.
  • Configuration deployment changes due to critical disk space
    [tomcat.tomcat] /console/JSON-RPC/QRadar.scheduleDeployment QRadar.scheduleDeployment] com.q1labs.configservices.util.ConfigServicesUtil: 
    [INFO] [-/--] Deployment is blocked due to critical disk space issue
  • Failed disk space checks when a software update runs
    =-= DiskSpace Report for Mountpoint '/opt' =-=
    
    =-= Available: 1735980 Kb,  Required: 1932367.2 KB =-=
    =-= Total Patch Files: 3524 Kb =-=
    =-= Total RPM Files: 1159000 Kb =-=
    =-= Directories over 1G on mountpoint /opt to a depth of 3: /opt =-=
    
    Size (MB)     Directory
    10109         /opt
    7572          /opt/qradar
    4780          /opt/qradar/bin
    4597          /opt/qradar/bin/ca_jail
    2071          /opt/ibm
    1656          /opt/ibm/si
    1640          /opt/ibm/si/services
    1163          /opt/qradar/conf
    
    =-= Files on mountpoint /opt over 1G =-=
    
    =-= Disk Space Report Complete for '/opt'
    <Hostname>:  patch test failed.

Cause

By default, the QRadar disk sentry check runs every 60 seconds and looks for high disk usage across the partitions. When a partition goes beyond the critical warning threshold, an alert is triggered for administrators to investigate.

Diagnosing The Problem

First, complete the steps in the "Troubleshooting disk space usage problems" article to identify large directories and files inside the partition and remove them.

Once identified, run the following steps to identify specific causes of the /opt partition filling up, and the matching section in Resolving the Problem can be used to solve them.

Leftover replication files
  1. Check whether failed replication files exist in /opt/qradar/support/. No output is expected from this command when no leftover files exist.
    ls -lah /opt/qradar/support/failed_tx*
Leftover ecs-ec-ingress, ecs-ec, and ecs-ep services configuration files
  1. Get QRadar build version.
    /opt/qradar/bin/myver -b
    Output example:
    2021.6.2.20220527130137
  2. Verify the services version matches the build version.
    for service in ecs-ec-ingress ecs-ec ecs-ep; do echo -e $service && siemctl list-versions $service ; done
    Output example:
    ecs-ec-ingress
    7.3.2.20190410024210
    2020.11.0.20210517144015
    2021.6.2.20220527130137 (active)
    ecs-ec
    7.3.2.20190410024210
    2020.11.0.20210517144015
    2021.6.2.20220527130137 (active)
    ecs-ep
    7.3.2.20190410024210
    2020.11.0.20210517144015
    2021.6.2.20220527130137 (active)
    
    In the previous output, 2021.6.2.20220527130137 is the active version. The other entries are leftovers from previous versions in the system.
  3. Verify the previous output matches the directories inside /opt/ibm/si/services/<ecs service>/.
    find /opt/ibm/si/services/<ecs service>/ -mindepth 1 -maxdepth 1 -type d -not -name current -prune -not -name eventgnosis -prune -print
    Output Example that uses ecs-ec. The same is applicable to ecs-ec-ingress and ecs-ep:
    find /opt/ibm/si/services/ecs-ec/ -mindepth 1 -maxdepth 1 -type d -not -name current -prune -not -name eventgnosis -prune -print
    
    /opt/ibm/si/services/ecs-ec/2021.6.2.20220527130137      <- Active
    /opt/ibm/si/services/ecs-ec/7.3.2.20190410024210         <- Leftover
    /opt/ibm/si/services/ecs-ec/2020.11.0.20210517144015     <- Leftover
    In the previous output, 2021.6.2.20220527130137 is the active version, which matches the output in previous step. The other two lines are leftovers.
Stalled PIDs preventing the system to provide accurate values
  1. Identify the conflicting directory inside the /opt partition.
    du -xch -d 3 /opt | sort -h | tail -n 10
    Output example:
    795M    /opt/ibm/si/services
    802M    /opt/ibm/si
    866M    /opt/qradar/conf
    991M    /opt/ibm/forensics/decapper
    1.4G    /opt/ibm/forensics
    2.5G    /opt/ibm
    7.2G    /opt/qradar/support
    8.8G    /opt/qradar
    13G     /opt
    13G     total
    
    In the previous example, /opt/qradar/support/ is the largest directory and likely the cause.
  2. Verify there are stalled files in the conflicting directory. They are shown as deleted.
    lsof +L1 /opt/ | grep <directory name>
    Output example:
    lsof +L1 /opt/ | grep /opt/qradar/support
    
    root       5403     root  166r   REG  253,4   187138     1   18911400 /opt/qradar/support/testfile.zip (deleted)
    
Manual auto update leftover files
Verify the /opt/qradar/www/autoupdates/ exists but does not have a link to /storetmp/. For more information about this procedure, see: How to manually install the QRadar weekly auto update bundle.
Bad Output. Note the "-> /storetmp/" portion that indicates the link does not exists.
 
[root@qradar ~]# ls -lad /opt/qradar/www/autoupdates
drwxr-xr-x 2 root root 6 Oct 11 13:48 /opt/qradar/www/autoupdates
Good Output. Note the "-> /storetmp/" portion that indicates the link exists.
 
[root@qradar ~]# ls -lad /opt/qradar/www/autoupdates
lrwxrwxrwx 1 root root 10 Oct 11 13:45 /opt/qradar/www/autoupdates -> /storetmp/
Third-party software installed on the system
IMPORTANT: Certain third-party software such as monitoring agents, antiviruses that can write logs in the /opt partition. QRadar does not require or support traditional anti-virus or malware agents, or support the installation of third-party packages or programs.

Resolving The Problem

Administrators must run all the steps in the Diagnosing the Problem section to identify partition-specific issues that must be resolved to regain space. For each section where you found leftover files or another issue, use the corresponding section here to resolve the issue.
Leftover ecs-ec-ingress, ecs-ec, and ecs-ep services configuration files
  1. Move or remove the conflicting directories to a larger partition. In this example, the /store/IBM_Support directory is used.
    • To move the conflicting directories, run:
       
      for service in ecs-ec-ingress ecs-ec ecs-ep; \
       do mkdir -pv /store/IBM_Support/$service; \
       mv -v $(find /opt/ibm/si/services/$service/ -mindepth 1 -maxdepth 1 -type d -not -name current -not -name eventgnosis -not -path $(readlink -f /opt/ibm/si/services/$service/current) -prune -print) /store/IBM_Support/$service; \
      done
      Output Example
      mkdir: created directory ‘/store/IBM_Support/ecs-ec-ingress’
      ‘/opt/ibm/si/services/ecs-ec-ingress/2021.6.2.20220527130123’ -> ‘/store/IBM_Support/ecs-ec-ingress/2021.6.2.20220527130123’
      removed directory: ‘/opt/ibm/si/services/ecs-ec-ingress/2021.6.2.20220527130123’
      
      mkdir: created directory ‘/store/IBM_Support/ecs-ec’
      ‘/opt/ibm/si/services/ecs-ec/2021.6.2.20220527130123’ -> ‘/store/IBM_Support/ecs-ec/2021.6.2.20220527130123’
      removed directory: ‘/opt/ibm/si/services/ecs-ec/2021.6.2.20220527130123’
      
      mkdir: created directory ‘/store/IBM_Support/ecs-ep’
      ‘/opt/ibm/si/services/ecs-ep/2021.6.2.20220527130123’ -> ‘/store/IBM_Support/ecs-ep/2021.6.2.20220527130123’
      removed directory: ‘/opt/ibm/si/services/ecs-ep/2021.6.2.20220527130123’
    • To remove the conflicting directories, run:
       
      for service in ecs-ec-ingress ecs-ec ecs-ep; \
       do rm -rfv $(find /opt/ibm/si/services/$service/ -mindepth 1 -maxdepth 1 -type d -not -name current -not -name eventgnosis -not -path $(readlink -f /opt/ibm/si/services/$service/current) -prune -print); \
      done
      Output Example
      removed directory: ‘/opt/ibm/si/services/ecs-ec-ingress/2021.6.2.20220527130123’
      removed directory: ‘/opt/ibm/si/services/ecs-ec/2021.6.2.20220527130123’
      removed directory: ‘/opt/ibm/si/services/ecs-ep/2021.6.2.20220527130123’
      
  2. Verify only the leftover directories were removed and the current, eventgnosis, and active version remained on the system.
    ll /opt/ibm/si/services/*/
    Output example:
    /opt/ibm/si/services/ecs-ec/:
    drwxr-xr-x 5 root root 59 Jun  9 21:55 2021.6.2.20220527130137
    lrwxrwxrwx 1 root root 51 Jun  9 22:20 current -> /opt/ibm/si/services/ecs-ec/2021.6.2.20220527130137
    drwxr-xr-x 2 root root  6 Sep 25 18:25 eventgnosis
    
    /opt/ibm/si/services/ecs-ec-ingress/:
    drwxr-xr-x 5 root root 59 Jun  9 21:54 2021.6.2.20220527130137
    lrwxrwxrwx 1 root root 59 Jun  9 22:51 current -> /opt/ibm/si/services/ecs-ec-ingress/2021.6.2.20220527130137
    drwxr-xr-x 3 root root 17 Aug  7  2019 eventgnosis
    
    /opt/ibm/si/services/ecs-ep/:
    drwxr-xr-x 5 root root 59 Jun  9 21:55 2021.6.2.20220527130137
    lrwxrwxrwx 1 root root 51 Jun  9 22:20 current -> /opt/ibm/si/services/ecs-ep/2021.6.2.20220527130137
  3. Verify the /opt partition usage decreased.
    df -Th /opt
Stalled PIDs preventing the system to provide accurate values
Kill the stalled PIDs.

IMPORTANT: Administrators must ensure the process to be killed and verify it is not a core process. When a core process is killed, the core services need to be restarted which can cause user interface disruption, gaps, latency in the events process and offenses generation. No output is generated from this command.
lsof +L1 /<directory_name> | grep 'deleted' | awk '{print $2}' | xargs kill -9
Example
lsof + L1 /opt/qradar/support | grep 'deleted' | awk '{print $2}' | xargs kill -9
Manual auto update leftover files
Remove the files inside/opt/qradar/www/autoupdates/ and its subdirectories.
 
rm -rfv /opt/qradar/www/autoupdates/*
Third-party software installed on the system
Administrators must refer to the third-party application documentation to remove the software from the QRadar appliance.
 
Result
The /opt partition no longer has disk space constraints. If the partition reached the point of critical services stop, administrators must restart the services in the proper order and wait 5 mins with the following commands:
 
IMPORTANT: When QRadar core services restart, the QRadar UI, event processing, and database are not available to all users. Administrators with strict outage policies are advised to complete the next step during a scheduled maintenance window for their organization.
 
systemctl stop hostcontext
systemctl stop tomcat
systemctl restart hostservices
systemctl start tomcat
systemctl start hostcontext
If the partition does not decrease its usage or the services do not start properly, contact QRadar Support for assistance.

Resolution for QRadar 7.3.0 and 7.3.1 only

Administrators can run the partitionDiagnostic.sh script. This utility does not run on 7.3.2 and later versions.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwsyAAA","label":"Admin Tasks"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]

Document Information

Modified date:
19 October 2022

UID

ibm16823721