Troubleshooting
Problem
The QRadar® upgrade to version 7.4.2 requires you to run a migration script on the console. This script migrates the High Availability file system from GlusterFS to Distributed Replication Block Device on all Event Collectors in your deployment (irrespective of whether they are currently part of an HA setup or not).
In some rare scenarios, the script can fail on Event Collectors if the /store partition is not available in the partition table.
Symptom
When run, the migration script fails and displays these error messages:
On the Console:
Jul 15 10:14:23 [ERROR] Migration process did not start successfully for test_ec. Received a return code of 1.
On the affected Event Collector:
Jul 15 10:14:23 [WARNING] Could not locate store on LVM. Upgraded system detected
Jul 15 10:14:23 [ERROR] Failed to get store information on the deployment
Jul 15 10:14:23 [ERROR] Failed to get store information on the deployment
Cause
When QRadar builds a Managed Host, the partitioning scheme depends on the amount of disk space that is available. If the disk space is under the recommended value, QRadar will not create the /store partition. Instead it will keep all data destined for /store, in the / directory.
The issue addressed in this article happens when the GlusterFS migration script encounters an Event Collector with that unusual partition scheme.
Environment
QRadar® Event Collectors upgrading to 7.4.2
Diagnosing The Problem
When the script fails on one or more Event Collectors, check these points:
- On the Console, you will see error messages like these in the /var/log/remove_glusterfs.log:
Jul 15 10:14:18 [WARNING] During migration to DRBD, event collection does not occur.
Jul 15 10:14:18 [WARNING] QRadar uses /store during the migration to DRBD. All other processes that use /store are terminated during migration.
Jul 15 10:14:18 [WARNING] The migration to DRBD restricts software updates to 7.4.2 or higher.
Jul 15 10:14:20 [INFO] Copying migration binary to test_ec
Jul 15 10:14:21 [INFO] Running migration precheck on: test_ec
Jul 15 10:14:23 [INFO] The following hosts require a migration from GlusterFS to DRBD: ['test_ec']
Jul 15 10:14:23 [INFO] Starting the migration process on: test_ec from console
Jul 15 10:14:23 [ERROR] Migration process did not start successfully for test_ec. Received a return code of 1.Jul 15 10:14:23 [ERROR] Migration failed for the applicable host.
Note that the above snippet pertains to a particular Event Collector called test_ec. There might be other Event Collectors that could be affected - follow the same set of steps for those.
- On the Event Collector mentioned in the error messages from (1)(for example test_ec), check the /var/log/remove_glusterfs.log file for these messages:
Jul 15 10:14:22 [INFO] Migration needed on the EC(s)
Jul 15 10:14:23 [INFO] Checking for drbd_metadata space
Jul 15 10:14:23 [INFO] Creating tmp drbd conf to verify the drbd metadata space
Jul 15 10:14:23 [WARNING] Could not locate store on LVM. Upgraded system detected
Jul 15 10:14:23 [ERROR] Failed to get store information on the deployment
- If the messages on the console and the Event Collector match the ones in (1) and (2) respectively, run these commands on the affected Event Collector:
df -h lsblk
Note the output of those commands and check whether there is a /store partition. For example, that partition is missing from the sample outputs provided below:
[root@test_ec ]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rootrhel-root 29G 5.8G 23G 21% /
devtmpfs 12G 0 12G 0% /dev
tmpfs 12G 8.0K 12G 1% /dev/shm
tmpfs 12G 73M 12G 1% /run
tmpfs 12G 0 12G 0% /sys/fs/cgroup
/dev/sda2 1014M 226M 789M 23% /boot
/dev/sda3 32G 4.1G 28G 13% /recovery
/dev/mapper/rootrhel-home 1014M 33M 982M 4% /home
/dev/mapper/rootrhel-tmp 3.0G 33M 3.0G 2% /tmp
/dev/mapper/rootrhel-opt 13G 3.1G 9.5G 25% /opt
/dev/mapper/rootrhel-storetmp 15G 42M 15G 1% /storetmp
/dev/mapper/rootrhel-var 5.0G 164M 4.9G 4% /var
/dev/mapper/rootrhel-varlog 15G 146M 15G 1% /var/log
/dev/mapper/rootrhel-varlogaudit 3.0G 47M 3.0G 2% /var/log/audit
tmpfs 2.4G 0 2.4G 0% /run/user/0
[root@test_ec ]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 120G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 1G 0 part /boot
├─sda3 8:3 0 32G 0 part /recovery
├─sda4 8:4 0 83G 0 part
│ ├─rootrhel-root 253:0 0 28.5G 0 lvm /
│ ├─rootrhel-storetmp 253:1 0 15G 0 lvm /storetmp
│ ├─rootrhel-tmp 253:2 0 3G 0 lvm /tmp
│ ├─rootrhel-home 253:3 0 1G 0 lvm /home
│ ├─rootrhel-opt 253:4 0 12.5G 0 lvm /opt
│ ├─rootrhel-varlogaudit 253:5 0 3G 0 lvm /var/log/audit
│ ├─rootrhel-varlog 253:6 0 15G 0 lvm /var/log
│ └─rootrhel-var 253:7 0 5G 0 lvm /var
└─sda5 8:5 0 4G 0 part [SWAP]
sr0 11:0 1 4.1G 0 rom
Resolving The Problem
The only way to resolve this issue is to rebuild the Event Collector with the appropriate amount of storage. The minimum amount of storage required is 256 GB. Please refer to the 7.4 Installation Guide for detailed hardware prerequisites.
STEPS: Just before you upgrade the whole deployment, follow these steps for every Event Collector where the /store partition is missing:
- Remove the Event Collector from the deployment and run a Full Deploy.
- Assign more than 256 GB of space to the affected Event Collector and rebuild the Event Collector on the version being upgraded to.
- Once the rest of the deployment is upgraded to the target version, add the Event Collector back to the deployment.
NOTE:
- In the steps, we have recommended the Event Collector to be directly upgraded to the target version. This will mean that the Event Collector can only be added back to a QRadar deployment that is on that target version. Hence, to ensure minimum downtime for that Event Collector, it is best to run the steps just before the overall deployment is upgraded.
- Rebuilding an Event Collector will mean a loss of data from the buffers maintained on the Event Collector's storage. The loss is minimal on a QRadar environment that is functioning optimally.
The words LINSTOR®, DRBD®, LINBIT®, and the logo LINSTOR®, DRBD®, and LINBIT® are trademarks or registered trademarks of LINBIT in Austria, the United States, and other countries.
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtdAAA","label":"Upgrade"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"7.4.2"}]
Was this topic helpful?
Document Information
Modified date:
05 August 2021
UID
ibm16478081