IBM Support

QRadar: GlusterFS migration script encounters a "Failed to get store information on the deployment" error

Troubleshooting


Problem

The QRadar® upgrade to version 7.4.2 requires you to run a migration script on the console. This script migrates the High Availability file system from GlusterFS to Distributed Replication Block Device on all Event Collectors in your deployment (irrespective of whether they are currently part of an HA setup or not).
 
In some rare scenarios, the script can fail on Event Collectors if the /store partition is not available in the partition table.

Symptom

When run, the migration script fails and displays these error messages:
On the Console:
Jul 15 10:14:23 [ERROR] Migration process did not start successfully for test_ec. Received a return code of 1.
On the affected Event Collector:
Jul 15 10:14:23 [WARNING] Could not locate store on LVM. Upgraded system detected
Jul 15 10:14:23 [ERROR] Failed to get store information on the deployment

Cause

When QRadar builds a Managed Host, the partitioning scheme depends on the amount of disk space that is available. If the disk space is under the recommended value, QRadar will not create the /store partition. Instead it will keep all data destined for /store, in the / directory.
The issue addressed in this article happens when the GlusterFS migration script encounters an Event Collector with that unusual partition scheme.

Environment

QRadar® Event Collectors upgrading to 7.4.2

Diagnosing The Problem

When the script fails on one or more Event Collectors, check these points:
  1. On the Console, you will see error messages like these in the /var/log/remove_glusterfs.log:
     
    Jul 15 10:14:18 [WARNING] During migration to DRBD, event collection does not occur.
    Jul 15 10:14:18 [WARNING] QRadar uses /store during the migration to DRBD. All other processes that use /store are terminated during migration.
    Jul 15 10:14:18 [WARNING] The migration to DRBD restricts software updates to 7.4.2 or higher.
    Jul 15 10:14:20 [INFO] Copying migration binary to test_ec
    Jul 15 10:14:21 [INFO] Running migration precheck on: test_ec
    Jul 15 10:14:23 [INFO] The following hosts require a migration from GlusterFS to DRBD: ['test_ec']
    Jul 15 10:14:23 [INFO] Starting the migration process on: test_ec from console
    Jul 15 10:14:23 [ERROR] Migration process did not start successfully for test_ec. Received a return code of 1.
    Jul 15 10:14:23 [ERROR] Migration failed for the applicable host.

    Note that the above snippet pertains to a particular Event Collector called test_ec. There might be other Event Collectors that could be affected - follow the same set of steps for those.
     
  2. On the Event Collector mentioned in the error messages from (1)(for example test_ec), check the /var/log/remove_glusterfs.log file for these messages:

    Jul 15 10:14:22 [INFO] Migration needed on the EC(s)
    Jul 15 10:14:23 [INFO] Checking for drbd_metadata space
    Jul 15 10:14:23 [INFO] Creating tmp drbd conf to verify the drbd metadata space
    Jul 15 10:14:23 [WARNING] Could not locate store on LVM. Upgraded system detected
    Jul 15 10:14:23 [ERROR] Failed to get store information on the deployment

     
  3. If the messages on the console and the Event Collector match the ones in (1) and (2) respectively, run these commands on the affected Event Collector:
     
    df -h 
    lsblk 
    
    Note the output of those commands and check whether there is a /store partition. For example, that partition is missing from the sample outputs provided below:

    [root@test_ec ]# df -h
    Filesystem                        Size  Used Avail Use% Mounted on
    /dev/mapper/rootrhel-root          29G  5.8G   23G  21% /
    devtmpfs                           12G     0   12G   0% /dev
    tmpfs                              12G  8.0K   12G   1% /dev/shm
    tmpfs                              12G   73M   12G   1% /run
    tmpfs                              12G     0   12G   0% /sys/fs/cgroup
    /dev/sda2                        1014M  226M  789M  23% /boot
    /dev/sda3                          32G  4.1G   28G  13% /recovery
    /dev/mapper/rootrhel-home        1014M   33M  982M   4% /home
    /dev/mapper/rootrhel-tmp          3.0G   33M  3.0G   2% /tmp
    /dev/mapper/rootrhel-opt           13G  3.1G  9.5G  25% /opt
    /dev/mapper/rootrhel-storetmp      15G   42M   15G   1% /storetmp
    /dev/mapper/rootrhel-var          5.0G  164M  4.9G   4% /var
    /dev/mapper/rootrhel-varlog        15G  146M   15G   1% /var/log
    /dev/mapper/rootrhel-varlogaudit  3.0G   47M  3.0G   2% /var/log/audit
    tmpfs                             2.4G     0  2.4G   0% /run/user/0

    [root@test_ec ]# lsblk
    NAME                     MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    sda                        8:0    0  120G  0 disk
    ├─sda1                     8:1    0    1M  0 part
    ├─sda2                     8:2    0    1G  0 part /boot
    ├─sda3                     8:3    0   32G  0 part /recovery
    ├─sda4                     8:4    0   83G  0 part
    │ ├─rootrhel-root        253:0    0 28.5G  0 lvm  /
    │ ├─rootrhel-storetmp    253:1    0   15G  0 lvm  /storetmp
    │ ├─rootrhel-tmp         253:2    0    3G  0 lvm  /tmp
    │ ├─rootrhel-home        253:3    0    1G  0 lvm  /home
    │ ├─rootrhel-opt         253:4    0 12.5G  0 lvm  /opt
    │ ├─rootrhel-varlogaudit 253:5    0    3G  0 lvm  /var/log/audit
    │ ├─rootrhel-varlog      253:6    0   15G  0 lvm  /var/log
    │ └─rootrhel-var         253:7    0    5G  0 lvm  /var
    └─sda5                     8:5    0    4G  0 part [SWAP]
    sr0                       11:0    1  4.1G  0 rom

Resolving The Problem

The only way to resolve this issue is to rebuild the Event Collector with the appropriate amount of storage. The minimum amount of storage required is 256 GB. Please refer to the 7.4 Installation Guide for detailed hardware prerequisites.
STEPS: Just before you upgrade the whole deployment, follow these steps for every Event Collector where the /store partition is missing:
 
  1. Remove the Event Collector from the deployment and run a Full Deploy.
  2. Assign more than 256 GB of space to the affected Event Collector and rebuild the Event Collector on the version being upgraded to.
  3. Once the rest of the deployment is upgraded to the target version, add the Event Collector back to the deployment.
NOTE:
  1. In the steps, we have recommended the Event Collector to be directly upgraded to the target version. This will mean that the Event Collector can only be added back to a QRadar deployment that is on that target version. Hence, to ensure minimum downtime for that Event Collector, it is best to run the steps just before the overall deployment is upgraded.
  2. Rebuilding an Event Collector will mean a loss of data from the buffers maintained on the Event Collector's storage. The loss is minimal on a QRadar environment that is functioning optimally.
The words LINSTOR®, DRBD®, LINBIT®, and the logo LINSTOR®, DRBD®, and LINBIT® are trademarks or registered trademarks of LINBIT in Austria, the United States, and other countries.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtdAAA","label":"Upgrade"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"7.4.2"}]

Document Information

Modified date:
05 August 2021

UID

ibm16478081