IBM Support

QRadar: Validate the configuration database is sychnonized with replicationVerify.pl

Troubleshooting


Problem

You can use the replicationVerify.pl script to validate the QRadar configuration database is synchronized across the environment. This tool verifies that the replication process is working and the databases are the same on all managed hosts.

Cause

During incremental replication, changes are replicated from the Console to the Managed Hosts every minute, while a full replication happens every 2 hours. Since data can accumulate quickly on all managed hosts, tables not being fully replicated even after a full deploy is a common issue admins can face.

Diagnosing The Problem

You can search the qradar.error logs for evidence of the message "Database is out-of-sync with the console. We will attempt to begin with a full dump next interval" on the managed host in question to get an indication whether there are replication issues on the managed host.
  1. SSH into the QRadar console
  2. SSH into the affected managed host
  3. Run the following command to grep the error log for database-related errors.
    grep "Database is out-of-sync" /var/log/qradar.error

    Result
    If you see errors, this is a strong indication of replication errors. Running the replicationVerify.pl can help you diagnose the issue.

Resolving The Problem

The replicationVerify.pl script displays a list of tests and their results. This script gives a general idea of what is happening in the deployment with regards to replication.  To get more details about what is happening, you can run the script with a details flag "-d" or the debug option "-d -d".  If the script returns errors, see the Understanding the detailed output section for information on steps to address them.
  1. SSH into the QRadar console.
  2. Run the following command:
    /opt/qradar/support/replicationVerify.pl 

    Result
    The following is an example of a system with no errors:
    connecting to console DB
    Collecting list of managedhosts
    Gathering Console's table definitions for replicated tables
    Gathering Console's replication stored procedures
    Gathering Table Sizes of replicated tables on console
    checking console for Bloat [OK]
    comparing MH to console's replication setup
    x.x.x.x tests:
        comparing schema [OK]
        comparing counts [OK] comparing output of 'hostname -i' [OK]
        comparing Stored Procedures [OK]
        comparing table sizes [OK]
        checking for bloat [OK]
    The following is an example of a system with replication issues:
    connecting to console DB
    Collecting list of managedhosts
    Gathering Console's table definitions for replicated tables
    Gathering Console's replication stored procedures
    Gathering Table Sizes of replicated tables on console
    checking console for Bloat [OK]
    comparing MH to console's replication setup
    x.x.x.x tests:
        comparing schema [ERROR] 1 tables with different column config between console and MH
        comparing counts [WARN] 1 tables with different counts between Console and MH
        comparing output of 'hostname -i' [ERROR] hostname not proper in /etc/hosts for 192.168.12.41
        comparing Stored Procedures [ERROR] 1 differences in stored procedures
        comparing table sizes [WARN] 6 tables where the sizes are different
        checking for bloat [WARN] 1 tables with potential bloated 1 of 1 managed hosts had at least one problem with replication rerun the script with the -d option for more details on the problems use the --ip option to target the host(s) that had problems
    Note: If nothing is returned, try using the option "-d -d". If you see the warning "[WARN] No managed hosts. No need to test replication", your system might not have any managed hosts set up.

Understanding the detailed output

1 Connecting to console DB Connecting to the console's database.
2 Collecting list of managedhosts Getting the list of the managed hosts to test against.
3 Gathering Console's table definitions for replicated tables Collecting the list of tables involved in the replication.
4 Gathering Console's replication stored procedures Collecting the stored procedures on the console.
5 Gathering Table Sizes of replicated tables on console Collecting the tables sizes for the console.
6 Checking the console for Bloat [OK] Checking the console for bloat.
7 Comparing MH to console's replication setup Most of the work starts here.
8 <IP address> tests The host being tested.
9 Comparing schema
[ERROR] columns for public.managedhost  do not match between MH and Console
[ERROR] console columns:
public managedhost
id NO bigint
ip NO character varying
hostname YES character varying
status NO character varying
isconsole NO boolean
appliancetype YES character varying
creationdate YES timestamp without time zone
updatedate YES timestamp without time zone
qradar_version NO character varying
primary_host YES bigint
secondary_host YES bigint
haoptions YES character varying
[ERROR] mh columns:
public managedhost
id NO integer
ip NO character varying
hostname YES character varying
status NO character varying
isconsole NO boolean
appliancetype YES character varying
creationdate YES timestamp without time zone
updatedate YES timestamp without time zone
qradar_version NO character varying
primary_host YES bigint
secondary_host YES bigint
haoptions YES character varying
[ERROR] 1 table with different column config between console and MH
The comparing schema test is complaining about the public.managedhost table. Next, it prints the summary of columns for both the console and the managed host. It requires a line by line comparison to see where the problem is. In this case, the id column is a bigint on the console, and it is an integer on the managed host.
This is likely caused by either a patch failing on one of the systems, or a system not patched. Verify all systems are at the same patch level.

For information on verifying that systems in your deployment are properly patched to the same version,
If any systems in your deployment are not at the same QRadar version, rerun the patch on those managed hosts.
10 Comparing counts
[ERROR] asset.asset property Count is different console=60000 mh=60303
[WARN] 1 table with different counts between Console and MH
The comparing count comparison. It does a select count(*) of the table on both the console and the managed host, and it displays the table the counts for both the console and the managed host.
This could be because a recent Console update has not been pushed to the managed hosts. The update will go in the next replication bundle.

To force the deployment to replicate, go to the Admin tab and click Advanced > Deploy Full Configuration.

Note: A Deploy Full Configuration restarts services and might cause an interruption in collecting events. Schedule a maintenance period before you run a Full Deploy.
11 [ERROR] Managed hosts state did not sync in the with the console’s TX - can not test table counts since MH never synched with console’s transaction number
This error message means that the console and managed host were on different Transactions. The script waits for up to 60 seconds for them to be synced again. If they cannot be synchronized in 60 seconds, it times out and moves on, since it cannot do a COUNT test comparison when the Transaction IDs are not the same.
Try to rerun the script for only the troublesome host by using --ip <IP address> to see whether it can synchronize the transactions. If it still cannot get in sync, then it is possible the host is too far behind to catch-up.

To force the deployment to replicate, go to the Admin tab and click Advanced > Deploy Full Configuration.

Note: A Deploy Full Configuration restarts services and might cause an interruption in collecting events. Schedule a maintenance period before you run a Full Deploy.
12 Comparing the output of 'hostname -i'
[ERROR] 'hostname -i' doesn't return the proper value. returned 192.168.12.41 192.168.12.41, expecting 192.168.12.41
[ERROR] hostname not proper in /etc/hosts for 192.168.12.41
Comparing the 'hostname - i' output points out there is something wrong with the managed host with its /etc/hosts file. Clean up this file to resolve this error.
13 Comparing stored procedures
[ERROR] Stored Procedure replicate_fake_proc, p_relname,p_schemaname,p_threshold,p_build_triggers, 25 25 1186 16 are different between console and mh
[ERROR] 1 difference in stored procedures
The comparing test looks at the stored procedures, used for replication, on the console to see whether they are the same as the ones on the managed host. If these procedures are different, it could cause replication to stop. This is because the console could be formatting the data in one method, and the managed host is expecting it in another.

To force the deployment to replicate, go to the Admin tab and click Advanced > Deploy Full Configuration.

Note: A Deploy Full Configuration restarts services and might cause an interruption in collecting events. Schedule a maintenance period before you run a Full Deploy.
 
14 Comparing table sizes
[WARN] asset.asset Size is different (Console=4.00 MB| MH=11.01 MB) percent Error = 175.20%
[WARN] asset.assetproperty Size is different (Console=18.26 MB| MH=39.53 MB) percent Error = 116.52%
[WARN] asset.vulninstancestatistics Size is different (Console=81.39 MB| MH=164.74 MB) percent Error = 102.40%
[WARN] public.dsmevent Size is different (Console=51.99 MB| MH=104.00 MB) percent Error = 100.03%
[WARN] public.vuln Size is different (Console=26.09 MB| MH=52.22 MB) percent Error = 100.15%
[WARN] q_catalog.productversionvariant Size is different (Console=4.90 MB| MH=9.82 MB) percent Error = 100.16%
[WARN] 6 tables where the sizes are different
The comparing table size test looks at the values in q_table_size to see whether the size between the console and the managed host are close. It does a percent error calculation to determine whether the different is too great. The script, by default, alerts at anything over 100% different.
15 Checking for bloat
[WARN] asset needs autovacuum (last autovacuum: 2017-06-08 08:36:46.452545-04)
[WARN] 1 table with potential bloated
The bloat test is used to determine whether autovacuum is not working on certain tables. It first tests to determine whether the table is bloated, and if it is bloated, then it checks to see when the last autovacuum was run. If the last autovacuum was greater than 600 seconds, then it alerts.
16 1 of 1 managed hosts had at least one problem with replication Summary of all the tests.
Script Options
/opt/qradar/support/replicationVerify.pl
---------------
Usage:
        TEST OPTIONS:
        -a | --all              Run all tests (default, if no options are passed).
        -b | --bloat            Check all replicated tables to see if the last autovacuum was too long ago.
        -c | --count            Compare table counts between console and managed hosts.
        -n | --hostname         Check for valid IP address from "hostname -i" test.
        -p | --proc             Comparison of the replication stored procedures between console and managed hosts.
        -s | --schema           Comparison of the schema between console and managed hosts.
        -z | --size             Compare table sizes between console and managed hosts.
        EXTRA OPTIONS:
             --ip "<list>"      Quoted and comma separated list of IP addresses (i.e. "10.0.0.1,192.168.10,172.16.3.4").
             --pctErr <#>       Percent Error.  Used in conjunction with size test. (default = 100%)
             --vacuumTime <#>   Time in seconds.  Used in conjunction with bloat test.  (default = 600 seconds)
        -d | --details          Provides more details. Can specify -d multiple times for more information. 3 levels (details, debug, devel)
        -h | --help             Displays this dialog.
        More details available for each test if you pass the test flag with the help flag (i.e. -h -b or -h -a)

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwsyAAA","label":"Admin Tasks"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions"}]

Document Information

Modified date:
10 July 2023

UID

ibm11086555