QRadar: Sanitizing logs before you open a support case

Question & Answer

Question

My company policy does not allow logs to contain sensitive data, such as IP addresses, hostnames, domains, or usernames. We are concerned about sending QRadar logs for support assistance. Can I sanitize QRadar logs before I submit them for review to IBM?

Answer

Yes, QRadar has a script that administrators can run to help sanitize some information from their log files. Log_scrubber.py is an option for users who cannot run and submit get_logs.sh output due to security concerns. As logs are required to troubleshoot components experiencing issues, the log_scrubber.py tool is intended to help users sanitize sensitive information to provide logs as defined in Client Responsibility of the IBM Support Handbook.

Client responsibilities

IBM does not warrant that its products are defect free; however, IBM does endeavor to fix its products to work as designed. It is important to note that clients play a key role in this effort.

Providing information
IBM Support is available to provide assistance and guidance, and as part of this relationship, clients need to provide IBM Support information about their systems and details about the failing components in order for IBM to quickly and accurately resolve the problem.

This information includes (but is not limited to):

Capturing documentation at the time of a failure
Applying a trap or trace code to a system
Formatting the output from the trap or trace (if needed)
Sending documentation or trace information (in hardcopy or digital copy) to the remote support center.
For more information, see IBM Support General Guidelines and Limitations.

What is the log_scrubber.py script?

The log_scrubber.py script is a support tool designed to scrub numerous different kinds of personal identifiable information (PII) from logs. The latest version log_scrubber.py is designed to help users with strict data handling requirements.

Without logs, QRadar Support might be limited in their ability to raise issues to development or duplicate issues experienced by users. If your security policy strictly prohibits sharing logs with IBM Support, you might need to request a WebEx session with QRadar Support so we can investigate the issue. If you do not allow log uploads to your case and do not allow WebEx sessions, you might need to engage IBM Expert Labs for an onsite visit or contact your IBM Account Manager to discuss options based on your security policies.

Note: The log_scrubber.py utility replaces scrub.pl to help users sanitize their logs. Administrators with automatic updates enabled have the latest version of log_scrubber

How do I run the script?

The log_scrubber utility is located in the /opt/qradar/support directory. The usage is visible by using either the -h option flag or the long argument –help. The usage gives a synopsis on all the arguments and has examples in the epilogue, which can prove useful when first running the script.

Help output example

usage: /opt/qradar/support/log_scrubber.py
   [-h] [-l] [-p PROCESS_NUMBER] [--flat] [--dryrun]
   [--ID ID] [--push_token] [INPUT_FILE [INPUT_FILE ...]]

The tool takes a get logs tar.gz and scrubs sensitive data from it. The scrubbed file 
will be saved as ..scrubbed.tar.gz

The tool can also be used to scrub plain text files when "--flat" flag is provided.

positional arguments:
INPUT_FILE file(s) to scrub, can be exactly one get logs tar.gz or a list of text files 
(when --flat is specified)

optional arguments:
 -h, --help show this help message and exit
 -l list available PII finders in JSON (-ll for more info)
 -p PROCESS_NUMBER, --process PROCESS_NUMBER number of concurrent processes (1-8)
 --flat flag to handle text file(s)
 --dryrun generate map files only --ID ID pass in a comma seperated list of finder IDs 
 to use (visible from -l)
 --push_token push scrubber token to MHs

Examples:
log_scrubber.py /store/LOGS/logs_myhostname_20221124_5af06248.tar.gz
log_scrubber.py --flat example/example.txt example/example2.txt
log_scrubber.py --dryrun -p 3 --flat /root/example/* --ID 0,1,2

Procedure
As the log_scrubber tool is run from the command line, you must have root access to the QRadar appliance.

Use SSH to log in to the QRadar Console as the root user.
Optional. If you need logs from a non-Console appliance, open an SSH session to the QRadar host.

Type the following command: /opt/qradar/support/get_logs.sh
The script informs you that the log was created and provides the name and the location, which is always the /store/LOGS/ directory.

INFO: Gathering install information...
INFO: Collecting DrQ output...
INFO: Collecting system files...
INFO: Collecting old files...
INFO: Collecting Cert metadata...
INFO: Collecting accumulator information with collectGvStats.sh v1.8...
INFO: Collecting deployment info with deployment_info.sh v0.7...
INFO: Collecting thread dumps from running java processes...
INFO: Collecting database information...
INFO: Collecting rpm version information...
INFO: Collecting QVM files...
INFO: Fetching Salesforce information...
INFO: Collecting additional qflow information...
INFO: Extracting rule information...
INFO: Compressing collected files...

The file /store/LOGS/logs_qradarconsole1_e579fe7e.tar.gz (53M) has been created to send to support

To scrub the log bundle, type:

/opt/qradar/support/log_scrubber.py /store/LOGS/{logs_filename.tar.gz}

For example,

/opt/qradar/support/log_scrubber.py /store/LOGS/logs_qradarconsole1_e579fe7e.tar.gz

Output example,

Summary of [/store/LOGS/logs_qradarconsole1_e579fe7e.tar.gz]
    Total:      1010 files
    Scrubbed:   993/1010
    Skipped:    17/1010
Mapping files are under [/store/ibm_support/scrub/logs_qradarconsole1_e579fe7e.tar.gz_1670856374_map]
Scrubbed get_logs is saved as [/store/LOGS/logs_qradarconsole1_e579fe7e.scrubbed.tar.gz]

A new file is created in /store/LOGS that indicates the file is scrubbed.
Download the scrubbed log from the /store/LOGS directory.
Note: Administrators need to download and attach the scrubbed file to your case. The mapping file in /store/ibm_support is not required and only used to troubleshoot PII scrubbing issues in the tool.
Create a case with IBM Support (sign-in required).
Complete in the required fields.
Attach the scrubbed log file to your case.

Results
The case status updates to IBM is working as we review the issue and the attached logs. The scrubbed log file bundle remains in the /store/LOGS directory until the files are deleted by an administrator.

What PII can this script scrub?

The `-l` argument shows this information on any deployment. More PII types might be added to the script in the future. Updates to log_scrubbed are delivered through QRadar automatic updates as part of the supportability tools RPM file. If you are not running the latest version of the log_scrubber tool, you might have different PII options available.

Listing the data types

The list option flag -l alleviates this complexity and displays the PII currently known to the tool. The following output shows the PII for an initial release of the tool:

# /opt/qradar/support/log_scrubber.py -l
{"ID": 0, "Name": "IPv6"}
{"ID": 1, "Name": "IPv4"}
{"ID": 2, "Name": "Hostname"}
{"ID": 3, "Name": "Username"}
{"ID": 4, "Name": "Domain"}

Listing verbose data types
The script also accepts -ll as an argument. This argument gives verbose output on how each type of PII is found by the script, thus allowing any user to investigate whether it works for their use case. As with the previous output is in JSON format. As mentioned, the output is verbose, a recommendation would be to use a tool like `jq` to beautify the output. For example,

# /opt/qradar/support/log_scrubber.py --ID 1 -ll | jq .
{
  "ID": 1,
  "Name": "IPv4",
  "Type": "regex",
  "Patterns": [
    {
      "Name": "IPv4",
      "Pattern": "\\b(25[0-5]|2[0-4]\\d|1\\d{1,2}|[1-9]?\\d)\\.(25[0-5]|2[0-4]\\d|1\\d{1,2}|[1-9]?\\d)\\.(25[0-5]|2[0-4]\\d|1\\d{1,2}|[1-9]?\\d)\\.(25[0-5]|2[0-4]\\d|1\\d{1,2}|[1-9]?\\d)\\b"
    }
  ]
}

Why am I seeing a license when first running the script?

The log_scrubber utility is provided as a convenience to help users with strict PII requirements. The license states that the responsibility to uphold your data requirements is still your own.

What logs can I scrub with this tool?

By default, log_scrubber.py expects a "get_logs" archive. However, the tool can also accept a text file or list of files passed with the "–flat" parameter. For example,

/opt/qradar/support/log_scrubber.py --flat /var/log/qradar.log
Scrubbing /var/log/qradar.log.
File is scrubbed. Output is saved as /storetmp/scrub/qradar.log_1670935514
Mapping file is saved as /store/ibm_support/scrub/qradar.log_1670935514.map

Passing a space-separated list of files or by using a wildcard for example /root/exampleDir/* allows multiple files to be scrubbed with this parameter. We suggest starting with a single "flat" file when you use the script to get familiar with the utility.

To run log_scrubber in its default mode, add a get_logs archive as an argument. For example,

/opt/qradar/support/log_scrubber.py /store/LOGS/logs_ip-128-58_20221212_e579fe7e.tar.gz
Scrubbing /store/LOGS/logs_ip-128-58_20221212_e579fe7e.tar.gz
No need to scrub PERFDATA_LOGS.tar.gz, does not contain PII.
 
INFO: The following files were found to be empty: ['datanode.properties', 'krb5.conf', 'message-mapping.AhnLabPolicyCenterJdbc.properties', 'message-mapping.ObserveITJdbc.properties', 'manager.log', 'host-manager.log', 'iem-cron.log', 'access.log', 'ssl_error_log', 'qflow.debug', 'xforce_scaserver_updates.20221212.txt']
 
WARN: Scrubbing failed for the following files, they will be omitted from the output: ['var/log/httpd/ssl_request_log.1.gz', 'var/log/messages.1.gz', 'var/log/messages.2.gz', 'var/log/qradar.old/qradar.log.1.gz', 'var/log/qradar.old/qradar.log.2.gz']
 
Summary of [/store/LOGS/logs_ip-128-58_20221212_e579fe7e.tar.gz]
    Total:      1010 files
    Scrubbed:   993/1010
    Skipped:    17/1010
Mapping files are under [/store/ibm_support/scrub/logs_ip-128-58_20221212_e579fe7e.tar.gz_1670856374_map]
Scrubbed get_logs is saved as [/store/LOGS/logs_ip-128-58_20221212_e579fe7e.1670856374.scrubbed.tar.gz]

As you can see lists of empty files and files that failed to be scrubbed are shown at the end. The files that failed to be scrubbed are omitted from the output.

How do I view the output?

This section is divided into two sections as we have "Scrubbed Files" and "Mapping Files".
The scrubbed files are the scrubbed log files themselves that is uploaded to support.
The mapping files are the files that contain a dictionary of scrubbed PII and their replacements in the log file.
These files must be kept by the user.

Scrubbed Files

In the previous "–flat" example, the output was stored in /storetmp/scrub/qradar.log_1670935514. Notice the output file was stored in /storetmp/. Meaning that files are deleted within 24 hours by disk maintenance unless moved. If you need to copy log files off the deployment for support you can, then forget about it as it is deleted automatically. Viewing the file would look something like this:

# head -5 /storetmp/scrub/qradar.log_1670935514
Hostname_ID_<unique ID> OutOfMemoryMonitor[30169]: Starting out-of-memory monitoring (enabled: yes)...
Hostname_ID_<unique ID> abrtd_lockfile_watcher[30879]: /usr/bin/find: ‘/store/jheap’: No such file or directory
Hostname_ID_<unique ID> .symlinkPythonTools.sh[6714]: Running .symlinkPythonTools.sh correctly setting up the symlinks    
Hostname_ID_<unique ID> .symlinkPythonTools.sh[6715]: Removing all the symlinks within the /opt/qradar/support/ directory    
Hostname_ID_<unique ID> .symlinkPythonTools.sh[6718]: Cleaning up support directory

As you can see, each instance of hostname in this case is replaced with "Hostname_ID_<unique ID>". These IDs are unique per deployment so the same PII on two different deployments does not produce the same ID. These IDs might make the logs harder to read so if you need to read through the logs, we suggest the use of regular expressions such as "perl -pe 's/ID\w+//g'":

# perl -pe 's/_ID_\w+//g' /storetmp/scrub/qradar.log_1670935514 | head -5
Hostname OutOfMemoryMonitor[30169]: Starting out-of-memory monitoring (enabled: yes)...
Hostname abrtd_lockfile_watcher[30879]: /usr/bin/find: ‘/store/jheap’: No such file or directory
Hostname .symlinkPythonTools.sh[6714]: Running .symlinkPythonTools.sh correctly setting up the symlinks
Hostname .symlinkPythonTools.sh[6715]: Removing all the symlinks within the /opt/qradar/support/ directory
Hostname .symlinkPythonTools.sh[6718]: Cleaning up support directory

The output for a get_logs archive is largely the same (more files), except, it is compressed again upon completion of the script.

Mapping Files

Mapping files contain a dictionary of these IDs and the original PII associated with it. In the previous "–flat" example, the mapping file was stored in /store/ibm_support/scrub/qradar.log_1670935514.map. You notice that this time the file is stored in /store/. The file in this directory persists until deleted, which allows time for scrubbed logs to be uploaded to support and give them time to analyze and provide recommendations.

The mapping file "/store/ibm_support/scrub/qradar.log_1670935514.map" looks like this:

# cat /store/ibm_support/scrub/qradar.log_1670935596.map
{'::ffff:127.0.0.1': 'IPv6_ID_<unique ID>', '::': 'IPv6_ID_<unique ID>', '192.168.0.1': 'IPv4_ID_<unique ID>', '127.0.0.1': 'IPv4_ID_<unique ID>', '192.168.0.1': 'IPv4_ID_<unique ID>', '255.255.255.255': 'IPv4_ID_<unique ID>', '0.0.0.0': 'IPv4_ID_<unique ID>', 'exampleAIO.lab': 'Hostname_ID_<unique ID>', 'exampleEP.lab': 'Hostname_ID_<unique ID>', 'exampleEP': 'Hostname_ID_<unique ID>'}

This file is necessary only when support finds an error that they would like you to investigate. For instance, if they came across the following:

IPv6_ID_<unique ID> [ecs-ec.ecs-ec] [ecs-ec/EC/TCP_TO_EP:TakeFromQueue] com.ibm.si.ec.destinations.StoreForwardDestination(ecs-ec/EC/TCP_TO_EP): [WARN] [NOT:0000004000][IPv4_ID_<unique ID>/- -] [-/- -]IO
 Error

Support might mention to you, that they see an issue on "IPv4_ID_<unique ID>" in "qradar.log_1670935514". Our recommendation then would be a grep statement that follows the format

grep -wPo "\'[^']*\': \'<PII hash from support>\'" <filename from support>

Using our example:

# grep -wPo "\'[^']*\': \'IPv4_ID_<unique ID>\'" qradar.log_1670935514
'192.168.0.1': 'IPv4_ID_<unique ID>'

You would then know that your issue is on the system with IP "192.168.0.1".

Navigating the mapping files is similar for the output from a get_logs archive. The main caveat is that each output file has its own mapping file. Meaning that there are ~1000 mapping files for each set of logs. The decision to have separate mapping files instead of one central "dictionary" file was a design decision made to improve performance. If support provides you with an ID, but no associated file a recursive grep can be used in the mapping directory.
For example,

# grep -rwPo "\'[^']*\': \'Hostname_ID_<unique ID>\'"
DB_Dumps/serverhost.20221213.sql_1670936381.map:'exampleEP.lab': 'Hostname_ID_<unique ID>'
DB_Text/serverhost.20221213.txt_1670936381.map:'exampleEP.lab': 'Hostname_ID_<unique ID>'
etc/httpd/conf/httpd.conf_1670936381.map:'exampleEP.lab': 'Hostname_ID_<unique ID>'
etc/sysconfig/network_1670936381.map:'exampleEP.lab': 'Hostname_ID_<unique ID>'
journalctlDump/ip6tables.service.log_1670936381.map:'exampleEP.lab': 'Hostname_ID_<unique ID>'
...

Why is the script scrubbing some text that is not PII?

This situation is more noticeable with some deployments more than others. As seen earlier, PII such as "Domain" and "Username" are scrubbed. If, by chance, you created a user or domain called "windows" then the script will scrub every instance of "windows" in the provided logs. This situation can be troublesome for support. In this scenario, you would want to rename that user/domain. The script does provide an "–ID" argument that can be useful in this case. The IDs for each PII type are shown with "-l" as seen earlier. The "–ID" parameter can then be used to only scrub certain types of PII.
For example, if "Username" was an issue in any of the previous examples then you run:

/opt/qradar/support/log_scrubber.py --ID 0,1,2,4 /store/LOGS/logs_ip-128-58_20221212_e579fe7e.tar.gz

You must always use the IDs found with "-l" and not the IDs outlined in this document and they might change in the future.

New! How to add customized values in log_scrubber.py

Scenarios exist where the default PII scrubbed by the tool do not meet all of the security use cases for administrators. To address these issues, an update to the log_scrubber.py utility adds functionality for custom strings to scrub for more values from the logs. For example, the following table describes the custom substitutions added in the QRadar log after the log_scrubber.py tool runs.

Value to scrub from logs	Replaced in logs with value
user-lastname	custom_scrub_1
email@address.com	custom_scrub_2
server.example.com	custom_scrub_3
databasename	custom_scrub_4

Procedure
Each time you update your custom_scrub.conf file on the Console, you can use the all_servers command to ensure the file is copied to all managed hosts in the deployment.

Log in to the QRadar Console as the root user.
Navigate to the /opt/qradar/support/data/log_scrubber/ directory.
Tip: To ensure the file is created on all managed hosts where you might collect logs, run the following command from the QRadar Console:
```
/opt/qradar/support/all_servers.sh -k "/opt/qradar/support/log_scrubber.py -h"
```
In a text editor, edit the custom_scrub.conf file.
Add values one per line to the file that you want to scrub from the logs. For example,
```
user-lastname
email@address.com
server.example.com
databasename
```
Save your changes to custom_scrub.conf.

To clone your updated custom_scrub.conf file to all hosts in the deployment, type:

/opt/qradar/support/all_servers.sh -p /opt/qradar/support/data/log_scrubber/custom_scrub.conf -r /opt/qradar/support/data/log_scrubber/

To scrub the log bundle, type:
```
/opt/qradar/support/log_scrubber.py /store/LOGS/{logs_filename.tar.gz}
```
For example,
```
/opt/qradar/support/log_scrubber.py /store/LOGS/logs_qradarconsole1_e579fe7e.tar.gz
```
Results
The logs are scrubbed and the administrator values are replaced with "custom_scrub_#" in the logs. The data replaced corresponds to the line number in the file.

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwt3AAA","label":"QRadar Apps"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"7.4.0;7.4.1;7.4.2;7.4.3;7.5.0"}]

Tips