IBM Support

QRadar: Delete files or directories to gain space in /store partition

Troubleshooting


Problem

When the /store partition in QRadar does not have enough space, it can affect the regular functioning of QRadar® SIEM. The purpose of this article is to help the administrator with the removal of files and directories when the /store partition has not enough available disk space.

Symptom

Lack of available space in the /store partition can cause the following issues:
  • Alerts about "Process monitor application failed to start multiple times".
  • Searches reporting I/O errors.
  • Services not starting.
  • Configuration deployment changes due to critical disk space
    [tomcat.tomcat] /console/JSON-RPC/QRadar.scheduleDeployment QRadar.scheduleDeployment] com.q1labs.configservices.util.ConfigServicesUtil: 
    [INFO] [-/--] Deployment is blocked due to critical disk space issue
  • Failed disk space checks when a software update runs
    [INFO](testmode) Checking Disk Space...
    [ERROR](testmode) /store has 645428846.200001 Kb needed and only 460660028 Kb available
    =-= DiskSpace Report for Mountpoint '/store' =-=
    =-= Available: 460660028 Kb,  Required: 645428846.200001 KB =-=

Cause

By default, the QRadar disk sentry check runs every 60 seconds and looks for high disk usage across the partitions. When a partition goes beyond the critical warning threshold, an alert is triggered for administrators to investigate.

Diagnosing The Problem

Follow both diagnosis sections and complete the Resolving the Problem steps for the issues you confirm your appliance has.
Appliances with undersized disks
Appliances can have undersized disks when the /store partition does not exist and its contents are placed inside the root (/) partition instead. For more information about why this symptom occurs,
  1. SSH to the Console. If applicable, SSH to the managed host.
  2. Use the lsblk command to find out whether the disk size is less than 256GB and the /store partition does not exist.
    lsblk
    In this example, the /store partition does not exist which means it is inside the "/" partition:

    Figure01
Identify and delete large directories and files in the /store partition.
If the appliance has a disk allocation that meets the storage requirements, the administrators can identify the largest directories and files by following the steps in Troubleshooting disk space usage problems. Once these large directories are identified, follow the instructions in Resolving the Problem to remove them.
Note: Administrator might see the directories /store/docker-data/engine/ taking up to 100GB. These partitions are thin-provisioned and do not use the amount reported, therefore can be safely ignored.

Resolving The Problem

Follow the steps in Diagnosing the Problem to determine whether you must complete the instructions under Appliances with undersized disks or Identify and delete large directories and files in the /store partition. If both issues appear on your appliance, follow both sections.
Appliances with undersized disks
Administrators with disks that do not meet the storage requirements, must reinstall their systems by following the steps in QRadar: Delete files or directories to gain space in the / partition.
Identify and delete large directories and files in the /store partition
Use the following instructions to identify safe to remove files and regain space.

IMPORTANT: Do not delete any data in postgres or configservices even if listed. If only those directories are listed during diagnoses, contact QRadar Support for assistance.
Depending on the directory reported during diagnosis, follow the suggestions provided. You might follow some or all of the suggestions, depending on your needs.
  1. Event and Flows retention in /store/ariel/events or /store/ariel/flows.
    • Delete leftovers events references.
      find /store/ariel/{events,flows}/records/ -type f -name "Q1Tmp*" -exec rm -Rfv {} \;
    • Reduce the data retention in the retention buckets or configure buckets to remove faster unnecessary log sources data.
    • Add a Data Node on the conflicting appliance to offload data from /store/ariel.
  2. Backups in /store/backup.
    • Find the oldest backups and remove them.
       find /store/backup -type f -name "backup.nightly*" -mtime +3 -exec rm -fv {} \;
      Note: The previous command deletes any backup older than 3 days. Administrator must tune the +3 parameter to suit their needs.

      Output example:
      removed ‘/store/backup/backup.nightly.<hostname>_53.19_09_2022.config.1663647054635.tgz’
      removed ‘/store/backup/backup.nightly.<hostname>_53.19_09_2022.data.1663648005438.tgz’
    • Configure an external NFS storage to save the backups in it instead of the /store partition.
  3. Report files in /store/reporting/templates and /store/reporting/reports.
    Note: Administrators must ensure the reports to be removed don't interfere with their regular monitoring operations before removal.
    • Remove old reports not used or the biggest ones.
      rm -fv /store/reporting/templates/<report template file>
      rm -fv /store/reporting/reports/<report file>
  4. Content exported files in /store/cmt.
    • Remove the unnecessary files.
      rm -fv /store/cmt/exports/*
  5. Out of memory dump files in /store/jheap.
    • Remove the unnecessary files.
      Note: If a memory dump is required for an ongoing investigation, extract the file first.
      find /store/jheap -type f -name "*dmp" -exec rm -fv {} \;
      Output example:
      removed ‘/store/jheap/hostcontext.hostcontext/hostcontext.hostcontext.system.dmp’
      removed ‘/store/jheap/hostcontext.hostcontext/hostcontext.hostcontext.javacore.dmp’
  6.  Quick filter search (Lucene) indexes in /store/ariel/events/records/Y/M/D/HH/lucene.
    • Remove oldest indexes directory.
      find /store/ariel/{events,flows}/records/ -type d -name "lucene" -mtime +15 -exec rm -Rfv {} \;
      Note: The previous command deletes the Lucene directories older than 15 days. Administrator must tune the +15 parameter to suit their needs.

      Output example:
      removed ‘/store/ariel/events/records/2022/9/11/20/lucene/_d4.cfs’
      removed ‘/store/ariel/events/records/2022/9/11/20/lucene/_d4.cfe’
      removed ‘/store/ariel/events/records/2022/9/11/20/lucene/_9p.nvm’
      removed ‘/store/ariel/events/records/2022/9/11/20/lucene/_d6.cfe’
      removed directory: ‘/store/ariel/events/records/2022/9/11/20/lucene’
      
    • Reduce the Payload Index retention.
  7. Failed replication files in /store/replication/failed.
    • Remove the files.
      rm -fv /store/replication/failed/*
  8. Applications log in /store/docker/volumes/qapp-<ID>/log/startup.log.
    • Truncate the log.
      truncate -s0 /store/docker/volumes/qapp-<ID>/log/startup.log
      Output example:
      # ls -lah /store/docker/volumes/qapp-1001/log/startup.log
      -rw-r--r-- 1 nobody nobody 3G Sep 16 14:41 /store/docker/volumes/qapp-1001/log/startup.log
      
      # truncate -s0 /store/docker/volumes/qapp-1001/log/startup.log
      
      # ls -lah /store/docker/volumes/qapp-1001/log/startup.log
      -rw-r--r-- 1 nobody nobody 0 Sep 27 20:43 /store/docker/volumes/qapp-1001/log/startup.log
      
      In the previous example, the log file of qapp-1001 size got reduced from 3GB to 0.
  9. Run a vacuum and reindex of the database.

    Result
    The /store partition no longer has disk space constraints. If the partition reached the point of critical services stop, restart the services in the proper order and wait 5 mins with the following commands:
     
    IMPORTANT: When the QRadar core service restart, the QRadar UI, event processing, and database are not available to all users. Administrators with strict outage policies are advised to complete the next step during a scheduled maintenance window for their organization.
     
    systemctl stop hostcontext
    systemctl stop tomcat
    systemctl restart hostservices
    systemctl start tomcat
    systemctl start hostcontext
    If the partition does not decrease its usage or the services do not start properly, contact QRadar Support for assistance.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwsyAAA","label":"Admin Tasks"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]

Document Information

Modified date:
30 September 2022

UID

ibm16824233