Troubleshooting
Problem
When the root /opt partition in QRadar® SIEM does not have enough space, it can affect the regular functioning of QRadar. The purpose of this article is to help the administrator with the removal of files and directories when the /opt partition has not enough available disk space.
Symptom
Lack of available space in the /opt partition can cause the following issues:
- Alerts about "Process monitor application failed to start multiple times".
- Searches reporting I/O errors.
- Services not starting.
- Configuration deployment changes due to critical disk space
[tomcat.tomcat] /console/JSON-RPC/QRadar.scheduleDeployment QRadar.scheduleDeployment] com.q1labs.configservices.util.ConfigServicesUtil: [INFO] [-/--] Deployment is blocked due to critical disk space issue
- Failed disk space checks when a software update runs
=-= DiskSpace Report for Mountpoint '/opt' =-= =-= Available: 1735980 Kb, Required: 1932367.2 KB =-= =-= Total Patch Files: 3524 Kb =-= =-= Total RPM Files: 1159000 Kb =-= =-= Directories over 1G on mountpoint /opt to a depth of 3: /opt =-= Size (MB) Directory 10109 /opt 7572 /opt/qradar 4780 /opt/qradar/bin 4597 /opt/qradar/bin/ca_jail 2071 /opt/ibm 1656 /opt/ibm/si 1640 /opt/ibm/si/services 1163 /opt/qradar/conf =-= Files on mountpoint /opt over 1G =-= =-= Disk Space Report Complete for '/opt' <Hostname>: patch test failed.
Cause
By default, the QRadar disk sentry check runs every 60 seconds and looks for high disk usage across the partitions. When a partition goes beyond the critical warning threshold, an alert is triggered for administrators to investigate.
Diagnosing The Problem
First, complete the steps in the "Troubleshooting disk space usage problems" article to identify large directories and files inside the partition and remove them.
Once identified, run the following steps to identify specific causes of the /opt partition filling up, and the matching section in Resolving the Problem can be used to solve them.
Leftover replication files
- Check whether failed replication files exist in /opt/qradar/support/. No output is expected from this command when no leftover files exist.
ls -lah /opt/qradar/support/failed_tx*
Leftover ecs-ec-ingress, ecs-ec, and ecs-ep services configuration files
- Get QRadar build version.
/opt/qradar/bin/myver -b
Output example:2021.6.2.20220527130137
- Verify the services version matches the build version.
for service in ecs-ec-ingress ecs-ec ecs-ep; do echo -e $service && siemctl list-versions $service ; done
ecs-ec-ingress 7.3.2.20190410024210 2020.11.0.20210517144015 2021.6.2.20220527130137 (active) ecs-ec 7.3.2.20190410024210 2020.11.0.20210517144015 2021.6.2.20220527130137 (active) ecs-ep 7.3.2.20190410024210 2020.11.0.20210517144015 2021.6.2.20220527130137 (active)
- Verify the previous output matches the directories inside /opt/ibm/si/services/<ecs service>/.
find /opt/ibm/si/services/<ecs service>/ -mindepth 1 -maxdepth 1 -type d -not -name current -prune -not -name eventgnosis -prune -print
Output Example that uses ecs-ec. The same is applicable to ecs-ec-ingress and ecs-ep:find /opt/ibm/si/services/ecs-ec/ -mindepth 1 -maxdepth 1 -type d -not -name current -prune -not -name eventgnosis -prune -print /opt/ibm/si/services/ecs-ec/2021.6.2.20220527130137 <- Active /opt/ibm/si/services/ecs-ec/7.3.2.20190410024210 <- Leftover /opt/ibm/si/services/ecs-ec/2020.11.0.20210517144015 <- Leftover
Stalled PIDs preventing the system to provide accurate values
- Identify the conflicting directory inside the /opt partition.
du -xch -d 3 /opt | sort -h | tail -n 10
Output example:795M /opt/ibm/si/services 802M /opt/ibm/si 866M /opt/qradar/conf 991M /opt/ibm/forensics/decapper 1.4G /opt/ibm/forensics 2.5G /opt/ibm 7.2G /opt/qradar/support 8.8G /opt/qradar 13G /opt 13G total
In the previous example, /opt/qradar/support/ is the largest directory and likely the cause. - Verify there are stalled files in the conflicting directory. They are shown as deleted.
lsof +L1 /opt/ | grep <directory name>
Output example:lsof +L1 /opt/ | grep /opt/qradar/support root 5403 root 166r REG 253,4 187138 1 18911400 /opt/qradar/support/testfile.zip (deleted)
Manual auto update leftover files
Verify the /opt/qradar/www/autoupdates/ exists but does not have a link to /storetmp/. For more information about this procedure, see: How to manually install the QRadar weekly auto update bundle.
Bad Output. Note the "-> /storetmp/" portion that indicates the link does not exists.
[root@qradar ~]# ls -lad /opt/qradar/www/autoupdates
drwxr-xr-x 2 root root 6 Oct 11 13:48 /opt/qradar/www/autoupdates
Good Output. Note the "-> /storetmp/" portion that indicates the link exists.
[root@qradar ~]# ls -lad /opt/qradar/www/autoupdates
lrwxrwxrwx 1 root root 10 Oct 11 13:45 /opt/qradar/www/autoupdates -> /storetmp/
Third-party software installed on the system
IMPORTANT: Certain third-party software such as monitoring agents, antiviruses that can write logs in the /opt partition. QRadar does not require or support traditional anti-virus or malware agents, or support the installation of third-party packages or programs.
Resolving The Problem
Administrators must run all the steps in the Diagnosing the Problem section to identify partition-specific issues that must be resolved to regain space. For each section where you found leftover files or another issue, use the corresponding section here to resolve the issue.
IMPORTANT: Administrators must ensure the process to be killed and verify it is not a core process. When a core process is killed, the core services need to be restarted which can cause user interface disruption, gaps, latency in the events process and offenses generation. No output is generated from this command.
Leftover ecs-ec-ingress, ecs-ec, and ecs-ep services configuration files
- Move or remove the conflicting directories to a larger partition. In this example, the /store/IBM_Support directory is used.
- To move the conflicting directories, run:
for service in ecs-ec-ingress ecs-ec ecs-ep; \ do mkdir -pv /store/IBM_Support/$service; \ mv -v $(find /opt/ibm/si/services/$service/ -mindepth 1 -maxdepth 1 -type d -not -name current -not -name eventgnosis -not -path $(readlink -f /opt/ibm/si/services/$service/current) -prune -print) /store/IBM_Support/$service; \ done
mkdir: created directory ‘/store/IBM_Support/ecs-ec-ingress’ ‘/opt/ibm/si/services/ecs-ec-ingress/2021.6.2.20220527130123’ -> ‘/store/IBM_Support/ecs-ec-ingress/2021.6.2.20220527130123’ removed directory: ‘/opt/ibm/si/services/ecs-ec-ingress/2021.6.2.20220527130123’ mkdir: created directory ‘/store/IBM_Support/ecs-ec’ ‘/opt/ibm/si/services/ecs-ec/2021.6.2.20220527130123’ -> ‘/store/IBM_Support/ecs-ec/2021.6.2.20220527130123’ removed directory: ‘/opt/ibm/si/services/ecs-ec/2021.6.2.20220527130123’ mkdir: created directory ‘/store/IBM_Support/ecs-ep’ ‘/opt/ibm/si/services/ecs-ep/2021.6.2.20220527130123’ -> ‘/store/IBM_Support/ecs-ep/2021.6.2.20220527130123’ removed directory: ‘/opt/ibm/si/services/ecs-ep/2021.6.2.20220527130123’
- To remove the conflicting directories, run:
for service in ecs-ec-ingress ecs-ec ecs-ep; \ do rm -rfv $(find /opt/ibm/si/services/$service/ -mindepth 1 -maxdepth 1 -type d -not -name current -not -name eventgnosis -not -path $(readlink -f /opt/ibm/si/services/$service/current) -prune -print); \ done
removed directory: ‘/opt/ibm/si/services/ecs-ec-ingress/2021.6.2.20220527130123’ removed directory: ‘/opt/ibm/si/services/ecs-ec/2021.6.2.20220527130123’ removed directory: ‘/opt/ibm/si/services/ecs-ep/2021.6.2.20220527130123’
- To move the conflicting directories, run:
- Verify only the leftover directories were removed and the current, eventgnosis, and active version remained on the system.
ll /opt/ibm/si/services/*/
/opt/ibm/si/services/ecs-ec/: drwxr-xr-x 5 root root 59 Jun 9 21:55 2021.6.2.20220527130137 lrwxrwxrwx 1 root root 51 Jun 9 22:20 current -> /opt/ibm/si/services/ecs-ec/2021.6.2.20220527130137 drwxr-xr-x 2 root root 6 Sep 25 18:25 eventgnosis /opt/ibm/si/services/ecs-ec-ingress/: drwxr-xr-x 5 root root 59 Jun 9 21:54 2021.6.2.20220527130137 lrwxrwxrwx 1 root root 59 Jun 9 22:51 current -> /opt/ibm/si/services/ecs-ec-ingress/2021.6.2.20220527130137 drwxr-xr-x 3 root root 17 Aug 7 2019 eventgnosis /opt/ibm/si/services/ecs-ep/: drwxr-xr-x 5 root root 59 Jun 9 21:55 2021.6.2.20220527130137 lrwxrwxrwx 1 root root 51 Jun 9 22:20 current -> /opt/ibm/si/services/ecs-ep/2021.6.2.20220527130137
- Verify the /opt partition usage decreased.
df -Th /opt
Stalled PIDs preventing the system to provide accurate values
Kill the stalled PIDs.
IMPORTANT: Administrators must ensure the process to be killed and verify it is not a core process. When a core process is killed, the core services need to be restarted which can cause user interface disruption, gaps, latency in the events process and offenses generation. No output is generated from this command.
lsof +L1 /<directory_name> | grep 'deleted' | awk '{print $2}' | xargs kill -9
Example
lsof + L1 /opt/qradar/support | grep 'deleted' | awk '{print $2}' | xargs kill -9
Manual auto update leftover files
Remove the files inside/opt/qradar/www/autoupdates/ and its subdirectories.
rm -rfv /opt/qradar/www/autoupdates/*
Third-party software installed on the system
Administrators must refer to the third-party application documentation to remove the software from the QRadar appliance.
Result
The /opt partition no longer has disk space constraints. If the partition reached the point of critical services stop, administrators must restart the services in the proper order and wait 5 mins with the following commands:
IMPORTANT: When QRadar core services restart, the QRadar UI, event processing, and database are not available to all users. Administrators with strict outage policies are advised to complete the next step during a scheduled maintenance window for their organization.
systemctl stop hostcontext
systemctl stop tomcat
systemctl restart hostservices
systemctl start tomcat
systemctl start hostcontext
If the partition does not decrease its usage or the services do not start properly, contact QRadar Support for assistance.
Resolution for QRadar 7.3.0 and 7.3.1 only
Administrators can run the partitionDiagnostic.sh script. This utility does not run on 7.3.2 and later versions.
Related Information
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwsyAAA","label":"Admin Tasks"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
19 October 2022
UID
ibm16823721