IBM Support

QRadar: How to investigate excessive offense notes coming through the API

Troubleshooting


Problem

Excessive API calls that add notes to offenses without restrictions can result in an abnormal growth of some tables in the PostgreSQL database and the overall offense model, slowing down the performance.
In extreme cases, a transaction sentry (TxSentry) might prevent the ECS-EP service from starting on the console, affecting event correlation and offense creation.

Symptom

  • Slowness in the Offenses tab.
  • ECS-EP in a restart loop due to a transaction sentry (TxSentry) that kills it for taking to long to query the database.
  • Offenses creation and update stop.

Cause

The excessive number of offense notes degrades the performance of the PostgreSQL database, resulting in queries that take longer than usual to complete.
There is an agent called TxSentry that runs in the background and terminates queries that run for longer than the configured threshold to prevent performance problems. When the offense notes degrade the performance to the point that a TxSentry is triggered, it can terminate queries the ECS-EP services runs during its startup, resulting in the loop failure of the process.

Diagnosing The Problem

If the symptoms that are described in this article are suspected, administrators can follow these steps to investigate if they are impacted.

How to determine how many notes there are in the system

  1. SSH into the QRadar console as the root user.
  2. Run the following command to identify host many notes there are:
    psql -Uqradar -c "select count(*) from notes"
    Output example: 
    In this case, there are over 3 million notes. The number does not need as high as this example for it to be an issue. If you find hundreds of thousands of them, it is worth investigating.
      count  
    ---------
     3128295
  3. To determine where those notes are coming from, run the following command, which groups them by user.
    psql -U qradar -c "select username, count (*) from notes group by username order by 2 desc;"
    The output looks like this. We can see how most of them are coming from an API token named XXXX
                username         |  count  
    -----------------------------+---------
     API_token: XXXX             | 3128279
     API_token: YYYY             |       9
     API_user: user1             |       5
     User2                       |       1
     User3                       |       1
Result
The administrator determines whether there are too many offenses in the system and where they are coming from. Excessive API calls are often the source in these scenarios.

How to determine whether a TxSentry is preventing the ECS-EP service from running stable

  1. SSH into the QRadar console as the root user.
  2. Determine whether ECS-EP is constantly restarting. Run the following command to do so:
    journalctl -u ecs-ep | less -iSR
    Output example:
    The service is constantly entering a failed state:
    ecs-ep[62708]: JVMDUMP039I Processing dump event "user", detail ""  - please wait.
    java[62708]: IBM Java[62708]: JVMDUMP039I Processing dump event "user", detail "" - please wait.
    ecs-ep[62708]: -------- Console dump --------
    ecs-ep[62708]: Thread=main (00007F339C006C50) Status=Sleeping
    ecs-ep[62708]: at java/lang/Thread.sleepImpl(JI)V (Native Method)
    ecs-ep[62708]: at java/lang/Thread.sleep(JI)V (Thread.java:977) (Compiled Code)
    ecs-ep[62708]: at java/lang/Thread.sleep(J)V (Thread.java:960) (Compiled Code)
    ecs-ep[62708]: at com/eventgnosis/util/Util.sleep(J)V (Util.java:803)
    ecs-ep[62708]: at com/eventgnosis/system/RuntimeController.main([Ljava/lang/String;)V (RuntimeController.java:921)
    ecs-ep[62708]: at com/ibm/si/service/ep/ServiceRunner.main([Ljava/lang/String;)V (ServiceRunner.java:69)
    ecs-ep[62708]: Thread=JIT Compilation Thread-000 Suspended (00007F339C131140) Status=Waiting
    ecs-ep[62708]: Monitor=00007F339C122438 (JIT-CompThreadMonitor-??) Count=0
    ecs-ep.service failed.
    ecs-ep.service failed.
    
  3. Determine whether a TxSentry is triggering by running the following command on the console:
    grep -A 20 "TX on host" /var/log/qradar.error| less -iSR
    Sample output:
    There is an occurrence of a TxSentry. The query in question involves the notes database table:
    com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][x.x.x.x/- -] [-/- -]
         TX on host x.x.x.x: pid=57126 age=606 IP=127.0.0.1 port=37160 locks=13 query='WITH deleted AS ( DELETE FROM notes WHERE id NOT IN ( SELECT note_id FROM attacker_notes_link UNION select note_id FROM target_notes_link UNION select note_id FROM offense_notes_link UNION select note_id FROM network_details_notes_link) RETURNING *) SELECT count(*) FROM deleted'
    
    [hostcontext.hostcontext] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][x.x.x.x/- -] [-/- -]
         Lock acquired on host x.x.x.x: rel=target_notes_link_pkey age=606 granted=t mode=AccessShareLock query='WITH deleted AS ( DELETE FROM notes WHERE id NOT I'
    [hostcontext.hostcontext] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][x.x.x.x/- -] [-/- -]
         Lock acquired on host x.x.x.x: rel=attacker_notes_link age=606 granted=t mode=AccessShareLock query='WITH deleted AS ( DELETE FROM notes WHERE id NOT I'
    [hostcontext.hostcontext] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][x.x.x.x/- -] [-/- -]
         Lock acquired on host x.x.x.x: rel=target_notes_link_target_idx age=606 granted=t mode=AccessShareLock query='WITH deleted AS ( DELETE FROM notes WHERE id NOT I'
    [hostcontext.hostcontext] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][x.x.x.x/- -] [-/- -]
         Lock acquired on host x.x.x.x: rel=attacker_notes_link_pkey age=606 granted=t mode=AccessShareLock query='WITH deleted AS ( DELETE FROM notes WHERE id NOT I'
    [hostcontext.hostcontext] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][x.x.x.x/- -] [-/- -]
         Lock acquired on host x.x.x.x: rel=notes age=606 granted=t mode=RowExclusiveLock query='WITH deleted AS ( DELETE FROM notes WHERE id NOT I'
    [hostcontext.hostcontext] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][x.x.x.x/- -] [-/- -]
         Lock acquired on host x.x.x.x: rel=offense_notes_link_pkey age=606 granted=t mode=AccessShareLock query='WITH deleted AS ( DELETE FROM notes WHERE id NOT I'
    [hostcontext.hostcontext] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][x.x.x.x/- -] [-/- -]
         Lock acquired on host x.x.x.x: rel=network_details_notes_link_pkey age=606 granted=t mode=AccessShareLock query='WITH deleted AS ( DELETE FROM notes WHERE id NOT I'
    [hostcontext.hostcontext] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][x.x.x.x/- -] [-/- -]
         Lock acquired on host x.x.x.x: rel=notes_pkey age=606 granted=t mode=RowExclusiveLock query='WITH deleted AS ( DELETE FROM notes WHERE id NOT I'
    [hostcontext.hostcontext] com.q1labs.hostcontext.tx.TxSentry: [WARN] [NOT:0000004000][x.x.x.x/- -] [-/- -]
Result: The administrator confirms whether the number of notes increased to the point that the stability of the ECS-EP service is compromised.

How to find whether there are orphaned notes in the system

A note needs to be associated with an offense. When offenses are purged from the system due to the retention period elapsing, the associated metadata is purged as well.
On large database instances affected by this issue that is not always the case. Orphaned notes are linked to an offense that no longer exists.
  1. SSH into the QRadar console as the root user.
  2. Determine whether there are orphaned notes by running this command:
    psql -Uqradar -c "SELECT COUNT(*) FROM notes WHERE id NOT IN (SELECT DISTINCT(note_id) FROM offense_notes_link)"
    Output example:
    Out of the +3 million notes present on the system, 2.5 million are orphaned:
      count  
    ---------
     2523266
Result: The administrator confirms whether there are orphaned notes in the system.

Resolving The Problem

If the administrator confirms that there are too many notes in the system, there are two possible scenarios and solutions:
  1. The impact is limited to slowness in the offenses tab.
    • For this particular case, it is key to take action to prevent getting on a worse scenario. As the source of the notes is different for every deployment, the recommendation is to tune that source to reduce the number of API calls adding notes to the system. As a temporary measure, the API token can be disabled or the app stopped.
  2. A TxSentry prevents the ECS-EP service from running stable.
    • These incidents require a more aggressive approach. The recommendation is to stop the source API calls that are adding the notes to the offenses by stopping the app or disabling the access token. Collect all information described in the Troubleshooting section, a get_logs and open a case with QRadar Support  for assistance.
For reference:

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwtiAAA","label":"Performance"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
05 February 2024

UID

ibm17091610