Troubleshooting
Problem
Resolving The Problem
cat /opt/qradar/conf/capabilities/hostcapabilities.xml
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<HostCapabilities
isConsole="true"
IP="10.1xx.xx.xxx"
applianceType="3178"
hostName="abc"
qradarVersion="7.5.0"
hardwareSerial="d1234-c3456-a678"
activationKey="XXXXX-XXXXX"
managementInterface="eth0"
xmlns="http://www.q1labs.com/products/qradar"
/>
lscpu
Memory
watch -n 7 free -m
top
Disk Usage
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 63G 0 63G 0% /dev
tmpfs 63G 708K 63G 1% /dev/shm
tmpfs 63G 4.1G 59G 7% /run
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/xvda2 20G 12G 7.0G 63% /
/dev/xvda1 2.0G 236M 1.6G 13% /boot
/dev/mapper/rootrhel-tmp 3.0G 53M 3.0G 2% /tmp
/dev/mapper/rootrhel-opt 10G 4.2G 5.9G 42% /opt
/dev/mapper/rootrhel-home 1014M 33M 982M 4% /home
/dev/mapper/rootrhel-var 5.0G 2.5G 2.6G 50% /var
/dev/mapper/rootrhel-varlog 15G 7.4G 7.7G 49% /var/log
/dev/mapper/rootrhel-varlogaudit 3.0G 429M 2.6G 14% /var/log/audit
/dev/mapper/conf 10G 1.2G 8.9G 12% /opt/qradar/conf
/dev/mapper/storetmp 15G 634M 15G 5% /storetmp
/dev/mapper/store 7.9T 658G 7.2T 9% /store
Command:-
iostat -dmx sda
iostat -dmx sdb
Sample output:
Linux 3.xx.0-xx.xx.1.exx.x86_64 (qradar.csdd.lx) 01/09/2023 _x86_64_ (20 CPU)
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda1 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 0.25 0.25 0.00 0.25 0.00
[root@qradar ~]# iostat -dmx sda
Linux 3.xx.0-xx.xx.1.exx.x86_64 (qradar.csdd.lx) 01/09/2023 _x86_64_ (20 CPU)
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 70.02 2.13 793.60 16.71 121.59 1.10 310.09 1.89 2.33 2.35 1.73 0.47 37.94
[root@qradar ~]# iostat -dmx sdb
Linux 3.xx.0-xx.xx.1.exx.x86_64 (qradar.csdd.lx) 01/09/2023 _x86_64_ (20 CPU)
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 335.46 3.30 3411.21 45.34 653.43 2.11 388.41 0.48 0.14 0.12 1.57 0.23 78.06
Use the r_await and await metrics from the preceding output to monitor current disk read/writes. Values consistently higher than 15 might indicate an issue. If you see consistent r_await, then you have multiple processes waiting for reads.
More information about working with iostat in RHEL can be found here: https://www.redhat.com/sysadmin/io-reporting-linux
- avgqu-sz - average queue length of a request issued to the device
- await - average time for I/O requests issued to the device to be served (milliseconds)
- r_await - average time for read requests to be served (milliseconds)
- w_await - average time for write requests to be served (milliseconds)
If the avgrq-sz is greater than 200 or avgqu-sz is greater than 20-30, it can be indicative of decreased disk performance.
Next, use iotop -aoP to track what QRadar services are using the most disk reads.
Command:-
iotop -aoP
Sample Output:
Total DISK READ : 476.90 M/s | Total DISK WRITE : 7.35 M/s
Actual DISK READ: 477.48 M/s | Actual DISK WRITE: 579.87 K/s
PID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
52249 be/7 root 83.57 G 42.30 M 0.00 % 0.84 % java -Dapplication.name=ariel_proxy -Dapp_id=ariel_proxy_server -Dj~tutil-8.3.0.jar:/opt/qradar/jars/fontbox-2.0.4.jar:/opt/qradar/jars/
58549 be/4 postgres 0.00 B 524.00 K 0.00 % 0.04 % postgres: qradar qradar 127.0.0.1(44058) idle
125 be/4 root 16.00 K 0.00 B 0.00 % 0.03 % [kswapd0]
130888 be/4 postgres 0.00 B 1120.00 K 0.00 % 0.03 % postgres: walwriter
24370 be/4 postgres 0.00 B 688.00 K 0.00 % 0.02 % postgres: qradar qradar 127.0.0.1(43475) SELECT saction
58541 be/4 postgres 4.00 K 536.00 K 0.00 % 0.02 % postgres: qradar qradar 127.0.0.1(44047) idle in transaction
54296 be/4 postgres 0.00 B 476.00 K 0.00 % 0.02 % postgres: qradar qradar 127.0.0.1(43764) idle
2869 be/0 root 0.00 B 552.00 K 0.00 % 0.01 % [loop0]
58553 be/4 postgres 16.00 K 108.00 K 0.00 % 0.00 % postgres: qradar qradar 127.0.0.1(44060) idle
57107 be/4 postgres 0.00 B 104.00 K 0.00 % 0.00 % postgres: qradar qradar 127.0.0.1(43992) idle
1155 be/4 root 1504.00 K 0.00 B 0.00 % 0.00 % [xfsaild/dm-9]
105851 be/4 root 2.74 M 0.00 B 0.00 % 0.05 % defect-inspector -fingerprint /opt/qradar/support/data/inspector/ /var/log/qradar.log
58586 be/4 postgres 16.00 K 40.00 K 0.00 % 0.00 % postgres: qradar qradar 127.0.0.1(44081) idle
81830 be/4 root 0.00 B 0.00 B 0.00 % 0.00 % [kworker/15:3]
19857 be/4 nobody 0.00 B 8.00 K 0.00 % 0.00 % python3.6 /usr/bin/celery beat -A app.celery_worker.beat-config --l~vel=INFO --schedule /tmp/celerybeat.db --pidfile=/tmp/celerybeat.pid
1266 be/3 root 0.00 B 116.00 K 0.00 % 0.00 % auditd
58587 be/4 postgres 0.00 B 32.00 K 0.00 % 0.00 % postgres: qradar qradar 127.0.0.1(44080) idle
35651 be/4 root 0.00 B 0.00 B 0.00 % 0.00 % [kworker/15:1]
49933 be/4 root 50.49 M 76.62 M 0.00 % 0.00 % java -Dapplication.name=accumulator -Dapp_id=accumulator -Djava.lib~.3.0.jar:/opt/qradar/jars/fontbox-2.0.4.jar:/opt/qradar/jars/fop-2.2
50112 ?dif root 3.32 M 49.48 M 0.00 % 0.00 % java -Dapplication.name=ecs-ep -Dapp_id=ecs-ep -Djava.library.path=~/ibm/si/services/ecs-ep/current/eventgnosis ecs-ep.ecs 220 noconsole
9826 be/4 root 168.00 K 16.00 K 0.00 % 0.00 % conman-server --scheme=https --tls-host=:: --tls-port=9000 --tls-ce~nman.key --tls-ca=/etc/conman/tls/conman_ca.crt --write-timeout=900s
50948 be/4 root 648.00 K 0.00 B 0.00 % 0.00 % java -Dapplication.name=arc_builder -Dapp_id=arc_builder -Djava.lib~.3.0.jar:/opt/qradar/jars/fontbox-2.0.4.jar:/opt/qradar/jars/fop-2.2
5406 be/4 root 96.00 K 0.00 B 0.00 % 0.00 % conwrap -healthCheckPrefix=HEALTH_CHECK_ -portPrefix=PORT -volumePrefix=VOL -envPrefix=ENV -secretPrefix=SECRET
50779 ?dif root 24.00 K 112.00 K 0.00 % 0.00 % java -Dapplication.name=ecs-ec-ingress -Dapp_id=ecs-ec-ingress -Dja~/ecs-ec-ingress/current/eventgnosis ecs-ec-ingress.ecs 220 noconsole
70311 be/4 postgres 0.00 B 8.00 K 0.00 % 0.00 % postgres: qradar qradar 127.0.0.1(46078) idle
73954 be/4 nobody 0.00 B 4.00 K 0.00 % 0.00 % httpd -DFOREGROUND
53666 be/4 root 0.00 B 12.00 K 0.00 % 0.00 % qflow -p -r60 -c /opt/qradar/conf/nva.qflow.qflow0.conf -nens224 -t0 -w56 -ndefault_Netflow -t3 -w55
98740 be/4 nobody 0.00 B 4.00 K 0.00 % 0.00 % httpd -DFOREGROUND
104980 be/4 root 0.00 B 4.00 K 0.00 % 0.00 % bash --login /opt/qradar/perf/systemStabMon -interval 23
105009 be/4 root 0.00 B 4.00 K 0.00 % 0.00 % bash --login /opt/qradar/perf/systemStabMon -interval 23
103036 be/4 nobody 0.00 B 4.00 K 0.00 % 0.00 % httpd -DFOREGROUND
70292 be/4 postgres 0.00 B 4.00 K 0.00 % 0.00 % postgres: qradar qradar 127.0.0.1(46064) idle
51869 ?dif root 0.00 B 112.00 K 0.00 % 0.00 % java -Dapplication.name=ecs-ec -Dapp_id=ecs-ec -Djava.library.path=~/ibm/si/services/ecs-ec/current/eventgnosis ecs-ec.ecs 220 noconsole
2724 be/4 root 0.00 B 16.00 K 0.00 % 0.00 % bash --login /opt/qradar/perf/systemStabMon -interval 23
70313 be/4 postgres 0.00 B 8.00 K 0.00 % 0.00 % postgres: qradar qradar 127.0.0.1(46080) idle
98986 be/4 nobody 0.00 B 4.00 K 0.00 % 0.00 % httpd -DFOREGROUND
84662 be/4 root 140.00 K 0.00 B 0.00 % 0.00 % [kworker/u256:2]
17116 be/4 nobody 0.00 B 20.00 K 0.00 % 0.00 % coreutils --coreutils-prog-shebang=tee /usr/bin/tee -a /opt/app-root/store/log/startup.log
101093 be/4 nobody 0.00 B 4.00 K 0.00 % 0.00 % httpd -DFOREGROUND
101135 be/4 nobody 0.00 B 4.00 K 0.00 % 0.00 % httpd -DFOREGROUND
115506 be/4 nobody 0.00 B 4.00 K 0.00 % 0.00 % httpd -DFOREGROUND
88912 be/4 nobody 0.00 B 8.00 K 0.00 % 0.00 % httpd -DFOREGROUND
3001 be/4 root 0.00 B 20.00 K 0.00 % 0.00 % bash /opt/qradar/perf/runningAvgDStat.sh 20 /var/log/systemStabMon /tmp/runningAvgDStat.tmp
54222 be/4 postgres 0.00 B 4.00 K 0.00 % 0.00 % postgres: qradar qradar 127.0.0.1(43750) idle
17372 be/4 nobody 0.00 B 8.00 K 0.00 % 0.00 % nginx: worker process
86711 be/4 nobody 0.00 B 8.00 K 0.00 % 0.00 % httpd -DFOREGROUND
72931 be/4 nobody 0.00 B 8.00 K 0.00 % 0.00 % httpd -DFOREGROUND
81128 be/4 nobody 0.00 B 8.00 K 0.00 % 0.00 % httpd -DFOREGROUND
93457 be/4 nobody 0.00 B 8.00 K 0.00 % 0.00 % httpd -DFOREGROUND
The preceding sample output shows a QRadar host that is read bound and might have degraded performance.
For further assistance troubleshooting specific services contact QRadar Support
Ariel_proxy_server
To troubleshoot ariel_proxy_server performance it is necessary to check what searches are currently running, how much time they are taking to complete, and how much data is being polled.
Troubleshooting Ariel in the UI:
1. Go to the "Log Activity" tab the click the "Search" drop down and then "Manage Search Results”:
2. Order by duration to see whether any searches have been running a long time. Order by size to see whether there are any searches that are returning a large amount of data.
3. When the largest and longest running searches have been established, this technote can be referenced for pointers on searching efficiently.
4. If there are no large or long-running searches identified, run a new search. While the search is running, click the “More details” link under “Current Statistics”:
The output shows the current progress of the search on a per host basis:
If a particular host is taking longer to complete than others, proceed to the next set of steps.
Troubleshooting Ariel from the Command Line Interface:
Run the following command on the console to check the Ariel queues:
/opt/qradar/support/jmx.sh -p 7782 -b 'com.q1labs.ariel:application=ariel_proxy.ariel_proxy_server,type=Query server,a1=Queries*'
Run the following command on a Managed Host to check the Ariel queues:
/opt/qradar/support/jmx.sh -p 7782 -b 'com.q1labs.ariel:application=ariel.ariel_query_server,type=Query server,a1=Queries*'
Sample Output:-
ariel:application=ariel_proxy.ariel_proxy_server,type=Query server,a1=Queries*'
com.q1labs.ariel:application=ariel_proxy.ariel_proxy_server,type=Query server,a1 =Queries,a2=NORMAL,a3=flows,a4=12d45c1234
-------------------------------------------------------------------------------- -------------------------------------------------------------------
FileStats: compressedDataFileCount=0,compressedDataTotalSize=0,dataFileCount=211 80,dataTotalSize=52869662764,duration=104117,host=global,indexFileCount=2229,ind exTotalSize=46869240379,processedRecordCount=554,progress=28.661197416652122,pro gressDetails=null,serialversionuid=1
Duration: 0:01:44.117
QueryParameters: Id:31a-12cd-12345, DB:<flows@/store/ariel /flows/records, /store/ariel/flows/payloads>, Time:<23-02-20,13:18:40 to 23-02-2 8,12:17:37>, Criteria=((((<DomainID:[0,0]> AND <SourceIP:[172.xx.xx.xx,172.xx.x 20.53]>) AND <PartialMatchList:[100236,100236]>) AND <EndTime:[1677566520000,167 7566857482]>) AND <EventProcessorId:[8,8],[103,103],[133,133],[165,165],[238,238 ]>) AND Predicate=com.q1labs.frameworks.util.predicate.NotPredicate@b9984328[p= [mc=ContributesMatchList,e=[100236,100236]]], MappingFactory=com.q1labs.core.typ es.flow.mapping.FlowRecordMappingFactory@4ee, prio=NORMAL
ProcessedRecordCount: 554
Id: 31a-12cd-12345
ErrorMessages: <null>
DsStats: accessedTime=0,collectedRecordCount=0,retentionTime=0,sizeOnDisk=0
StartTime: 1677934267107
Status: EXECUTE
Progress: 28.661197416652122
If investigation in the UI or command executed on CLI reveals a long running or large search, restarting ariel services on the effected host might be enough to resolve the issue.
Large Reference Data
Command to be executed on CLI of console:-
psql -U qradar -c "select name, time_to_live, current_count from reference_data order by current_count DESC;"
Sample Output:-
In the preceding example (names redacted) there is no Time To Live value set on any of the reference sets and in some cases, the count is in the millions. In order to reduce impact on system performance, it is recommended that a Time To Live value should be set on reference sets with a current_count greater than 100K.
You can refer to this guide on how to set Time To Live values for large reference sets.
It is also possible to use the ReferenceDataUtil.sh script to set a Time To Live value from the command line.
Command to run on CLI:-
/opt/qradar/bin/ReferenceDataUtil.sh update “<Name of reference set from Table>“ -timeoutType=FIRST_SEEN -timeToLive='<Number of days/Months/Year/Hours>'
Database Bloat
Check Database Bloat in the QRadar postgres database by querying the table q_table_bloat.
You can check this by running the command:-
psql -U qradar -c "select * from q_table_bloat;"
Sample output:
relname | n_live_tup | n_tup_upd | n_dead_tup | total | bloat_pct | last_autovacuum
| last_autoanalyze
-------------------------------------------------------+------------+------------+------------+----------+---------------------+-------------------------
------+-------------------------------
reference_data_element | 2561166 | 2404944201 | 68804 | 2629970 | 2.61615151503629 | 2023-01-09 10:50:32.2474
What to do next
If the provided steps do not significantly help with your performance issues, you can open a case with QRadar Support. In order to speed up resolution, it is recommended to provide the following logs with your case:- The QRadar log files. For more information, see How to collect log files for QRadar support from the user interface.
- The threadTop, pg_stat, and qlocks.out outputs. To collect these outputs, run the following command from the Console:
mkdir -r /store/ibmsupport/refdump; for i in {1..40}; do (date; /opt/qradar/support/threadTop.sh -p 7779 --full)>> /store/ibmsupport/refdump/tomcat.out; (date; psql -U qradar -c "select * from q_locks" )>> /store/ibmsupport/refdump/qlocks.out; (date; psql -U qradar -c "select * from pg_stat_activity where state='active'")>> /store/ibmsupport/refdump/pg_stat.out; sleep 5 ; done
Note: This command takes about 5 minutes to run and generates three files to /store/ibmsupport/refdump/ that you can add to your case. -
Upload all files to your QRadar case.
Results
A case is open and assigned to Waiting on IBM. A support representative contacts you to discuss your case. If there is an alternate number or a better contact method, you can add a note to your case with the most recent contact information.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
06 November 2023
UID
ibm16962425