IBM Support

QRadar: How to check Distributed Replication Block Device status on an HA setup

How To


Summary

Distributed Replicated Block Device (DRBD) is an open source distribution for replicated storage on the Linux platform. DRBD layers logical block devices over existing logical block devices on participating cluster nodes. Writes to the primary node are transferred to the lower-level block device and simultaneously propagated to the secondary node.

When HA is enabled, the /store file system on the QRadar appliance peers (primary and secondary) is replicated using the drbd feature. Every write request to /store, is replicated and written to the secondary peer node in real time, and then the write request "completes" and control is passed back to the Linux kernel. This is called a "synchronous mode" replication. This disk replication is run over the management interface (normally eth0) unless the system is configured with a crossover connection using another LAN interface. When a node is detected to be out of sync, data is automatically synchronized from the other node.

Objective

The overall status of the DRBD setup can be found using the output of the following command:
For QRadar 7.4 and 7.5.0 UP 7 (all interim fixes), run:
cat /proc/drbd
For 7.5.0 UP8 or later, run:
cat /sys/kernel/debug/drbd/resources/store/connections/<hostname>/0/proc_drbd
Replace <hostname> with your system's hostname or use an asterisk (*) in place of <hostname> 
This article helps administrators interpret the various fields included in the output of the above command.
NOTE: For more information on the /proc filesystem, please check this page.

Environment

QRadar managed hosts in a High Availability (HA) setup.
NOTE: In some versions of QRadar (7.3.x, 7.4.0 and 7.4.1), Event Collectors in an HA setup, do not use the Distributed Replication Block Device as the data replication mechanism. Instead, they rely on a software called GlusterFS to replicate data. This article is not relevant to those Event Collector HA setups.

Steps

For QRadar 7.4 and 7.5.0 UP 7 (all interim fixes):
A typical output of the /proc/drbd file-system is provided here:
cat /proc/drbd
version: 8.4.11-1 (api:1/proto:86-101)
GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by vagrant@rhel7.localdomain, 2020-10-09 13:46:53
 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
    ns:211359746 nr:2718712 dw:63591809 dr:864686747 al:472794 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:137219716
For 7.5.0 UP8 or later:
A typical output of the /sys/kernel/debug/drbd/resources/store/connections/<hostname>/0/proc_drbd file-system is provided here:
cat /sys/kernel/debug/drbd/resources/store/connections/*/0/proc_drbd 
 0: cs:Established ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:169002070 nr:569514849 dw:733548027 dr:20516751 al:12473 bm:802359 lo:0 pe:[17;0] ua:0 ap:[17;0] ep:1 wo:1 oos:0
        act_log: used:7/1237 hits:495991 misses:251891 starving:0 locked:0 changed:51682
        blocked on activity log: 0/0/0
There are three important fields in the output:
cs Connection State
ds Disk State
ro Resource Role
IMPORTANT: 
As per the QRadar documentation on HA clusters, a cluster consists of a primary HA host and a secondary HA host. The listing of /proc/drbd can be done on either the primary or the secondary. In this article, the host where the command is run is referred to as the local node and the other host is referred to as the remote node.
The values for Resource Role and Disk State are in the format ValueA/ValueB. The ValueA denotes the value for the local node and ValueB denotes the value for the remote node.

Connection State

The Connection State field can have any one of these outputs:
Value Interpretation
Connected A DRBD connection has been established, data mirroring is now active. This is the normal state.
Standalone No remote node configuration available. The remote node has not yet been connected, or has been deliberately disconnected or has dropped its connection due to failed authentication or a split-brain situation.
WFConnection The local node is waiting until the remote node starts responding to communication requests from the local node.
SyncSource Synchronization is currently running, with the local node being the source of synchronization.
SyncTarget Synchronization is currently running, with the local node being the target of synchronization.
There could be issues with the HA setup that need further investigation under these circumstances:
  • If HA has been configured and the local node has Connection State set to Standalone
  • If the remote node is operational but the connection status is WFConnection

Disk State

The Disk State field can have any one of these outputs:
Value Interpretation
Diskless DRBD is facing issues accessing the disk.
Failed Lower level I/O issues when accessing the disk.
Negotiating Disk is getting connected to DRBD.
Inconsistent
The data on the disk is inconsistent and not usable.
This status is usually found on the secondary HA host when it is added and when the initial full synchronization of data from the primary HA host is being done.
Outdated
The data on the disk is consistent (and usable) but outdated when compared to the other node.
This status can be seen on the secondary HA host when it goes offline for a bit and comes back online.
DUnknown This state is used for the remote disk if no network connection is available.
Consistent Consistency of data on a node without a connection. When the connection is established, it is decided whether the data on the node is UpToDate or Outdated.
UpToDate Consistent, up-to-date state of the data. This is the normal state.
For example, consider the /proc/drbd is listed on the primary HA host and the output looks like this:
 
cat /proc/drbd
version: 8.4.11-1 (api:1/proto:86-101)
GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by vagrant@rhel7.localdomain, 2020-10-09 13:46:53
 0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
    ns:211359746 nr:2718712 dw:63591809 dr:864686747 al:472794 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:137219716
The Disk State is:
ds:UpToDate/DUnknown
In this case, the local node is the primary HA host and the remote node is the secondary HA host. So the local disk on the primary HA host has the Disk State set to UpToDate and the local disk on the secondary has the Disk State set to DUnknown.
If a node has Disk State set to Failed or Diskless, there could be issues with the HA setup that need further investigation.

Resource Role

The Resource Role field can have any one of these outputs:
Value Interpretation
Primary
The resource is currently in the primary role, and maybe read from and written to. This role only occurs in one of the two nodes.
Secondary
The resource is currently in the secondary role. It normally receives updates from its peer (unless running in disconnected mode), but may neither be read from nor written to.
Unknown The resource’s role is currently unknown. The local resource role never has this status. It is only displayed for the peer’s resource role, and only in disconnected mode.

The words LINSTOR®, DRBD®, LINBIT®, and the logo LINSTOR®, DRBD®, and LINBIT® are trademarks or registered trademarks of LINBIT in Austria, the United States and other countries.

Document Location

Worldwide

[{"Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtXAAQ","label":"High Availability"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)"}]

Document Information

Modified date:
19 July 2024

UID

ibm16420029