QRadar: Understanding IO Errors while searching

Troubleshooting

Problem

A red bar with the error message "An IO Error occurred on server(s)x.x.x.x. Please try again." is displayed while running searches.

Symptom

When running historical searches, a red bar is displayed on the results page. It displays the message: An IO Error occurred on server(s) hostname. Please try again. The Hostname or IP address that is displayed in the message is likely that of the appliance that is presenting the issue. Applying filters to the search by using the "Event Processor" parameter might eliminate the error.

On Encrypted Managed hosts the error will vary and will show similar to the following:

An IO Error occurred on server(s) 127.0.0.1:32013. Please try again.

Note: The specific port number referenced in the error will change, depending on which managed host is causing the communication problem.

Cause

Receiving the An IO Error occurred on server(s) hostname. Please try again. message while running searches indicates that the Ariel database is not accessible on one or more managed hosts.

The Hostname or IP address that is displayed on the error message does not always match the host or hosts that are experiencing the problem as there are various reasons that cause this error message to be displayed.

Note: Whether a hostname or an IP address is included in the message can depend on your name resolution configuration and whether or not the managed host is encrypted.

Diagnosing The Problem

When you are running a historical search, the console proxies your search requests to other managed hosts involved depending on your specified filters. An IO Error indicates that one or more of your managed hosts are not responding to these search requests.

Identifying the correct host or hosts that are experiencing the issue is the first step in effectively troubleshooting this problem.

You can use a combination of the methods below to help you identify the managed host that is experiencing the issue. If you are aware of managed hosts with issues, you can skip to the Checking the Security Data Distribution tab section for the verification step only.

Search Details
By clicking the More Details link at the results section, you can get a better picture of how your managed hosts are responding to your search:

In this example, the host test-ep ran its search on no data files and the search duration was also zero:

Based on this result, test-ep is the host that is experiencing the problem. In a more realistic example, it can be necessary to verify your findings.

Reviewing the QRadar logs
If your managed hosts are not encrypted, or you have a mix of encrypted and not encrypted managed hosts in your environment, QRadar logs can be useful in identifying the managed host having the problems.

Run the search by using the following filters to identify the actual error message:

Event Processor: your console
QuickFilter: MappingFactory
Time window: A time range that includes the last time that you received the IO Error

If the managed host experiencing the Ariel database issue is not encrypted, the raw event includes its hostname. Drill into the event and find the raw text:

Sep 14 14:56:23 127.0.0.1 [aqw_remote_2:4bac31ad-5cb4-47c8-8f0d-280d3bcb3d10] com.q1labs.ariel.searches.tasks.ServiceTaskBase: [ERROR] [NOT:0000003000][198.51.100.2/- -] [-/- -]Can't communicate to server [test-ep:32006] executing query:Id:4bac31ad-5cb4-47c8-8f0d-280d3bcb3d10, DB:<events@/store/ariel/events/records, /store/ariel/events/payloads>, Time:<16-09-14,14:55:23 to 16-09-14,14:56:00>, Criteria=<DeviceType:[368,368]>, MappingFactory=com.q1labs.core.types.event.mapping.NormalizedEventMappingFactory@4ee, processedRecordsLimit=2147483647, executionTimeLimit=9223372036854775807, collectedRecordsLimit=2147483647, prio=NORMAL

If the issue is on an non-encrypted managed host:

The error on the GUI will show similar to the following:

The failing managed host can be found by:

Checking the /etc/tunnel_manager/config/deployment.json file and setting it to a readable format:

cat /etc/tunnel_manager/config/deployment.json | python -m json.tool | less -i

While in less search for the port mentioned in the IO error, for this example it is port 32006 the remote IP will be listed along with the port. The configuration for the tunnel should look like:

{
"bind_address": "localhost",
"component": "ariel_proxy--ariel--Tunnel",
"compression": false,
"destination_host": "localhost",
"destination_port": 32006,
"direction": "local",
"remote_host": "A.B.C.D",
"remote_user": "root",
"source_port": 32006
}

Where "remote_host": "A.B.C.D" is the IP of the managed host.

On an encrypted host, the raw event contains localhost as the hostname:

Sep 14 14:23:23 127.0.0.1 [aqw_remote_2:dd380d0d-ad31-4497-a9d3-81224cbd4b6b] com.q1labs.ariel.searches.tasks.ServiceTaskBase: [ERROR] [NOT:0000003000][198.51.100.2/- -] [-/- -]Can't communicate to server [localhost:32006] executing query:Id:dd380d0d-ad31-4497-a9d3-81224cbd4b6b, DB:<events@/store/ariel/events/records, /store/ariel/events/payloads>, Time:<16-09-14,14:22:23 to 16-09-14,14:23:00>, Criteria=<DeviceType:[368,368]>, MappingFactory=com.q1labs.core.types.event.mapping.NormalizedEventMappingFactory@4ee, processedRecordsLimit=2147483647, executionTimeLimit=9223372036854775807, collectedRecordsLimit=2147483647, prio=NORMAL

Note: Regardless of the encryption setting of your managed host, you should make a note of hostname information from these raw events, as it is useful when verifying connectivity as described in the Resolving the Problem section.

Eliminating Event Processors
The IO Error is displayed only when you are searching on the managed host experiencing the issue. In our example, setting a filter to show events only from the console eliminates the IO error:

Therefore, you can set filters on your search to help you identify which managed hosts are experiencing the issue. Try filtering on the managed hosts that you previously identified when checking the search details. If the Ariel database of the managed host or hosts that you are filtering on are not accessible, you receive the same IO error:

Checking the Security Data Distribution tab
When you have an idea about which managed host or hosts are experiencing an issue, you can verify your conclusion by checking the Security Data Distribution tab of the System Information window. Open this tab by clicking Admin > System Configuration > System and License Management > Systems. When System and License Management window is opened, select the suspected managed host, and click Actions > View and Manage System. The System and License Details window open with Security Data Distribution tab that is selected by default. If the Ariel database is not accessible, the following warning is displayed:

Resolving The Problem

Note: For a video guide on how to troubleshoot IO errors, follow this article: How to troubleshoot IO errors when searching on QRadar

When you identified which Event Processor is experiencing the issue, you need to restore the access to its Ariel database. This is not always trivial. Below are some basic resolution steps that can help address the most common causes before contacting IBM support for further assistance.

Warning: The deployment of full configuration that is recommended in some of the below steps restart the services on all of your managed hosts, which result in a brief service interruption. This interruption must be taken into consideration when deploying full configuration. Perform a Full Deploy by going to the Admin tab on the UI and clicking Advanced > Deploy Full Configuration.

Open an SSH connection to your Console with the root account.
Create an SSH connection to the managed Host that you identified in the Diagnosing the Problem Section.
Example:
```
[root@test-console ~]# ssh test-ep
```
If you are not able to connect to your Managed Host, verify that your host is powered up, and your network connectivity is correctly routing to the managed host IP address on port 22. If your host is operational and your network connectivity is verified but you are still unable to connect to the host, contact support for further assistance.

Verify that the Ariel Query Server is running on this managed host:
Example:

[root@qradarep750 ~]# systemctl status ariel_query_server
● ariel_query_server.service - Ariel Query Server
   Loaded: loaded (/usr/lib/systemd/system/ariel_query_server.service; static; vendor preset: disabled)
  Drop-In: /etc/systemd/system/ariel_query_server.service.d
           └─ulimit.conf
   Active: active (running) since Thu 2022-09-15 08:50:36 EDT; 3h 19min ago
  Process: 24057 ExecStartPre=/opt/qradar/systemd/bin/generate_environment.sh ariel_query_server ariel (code=exited, status=0/SUCCESS)
  Process: 17885 ExecStartPre=/opt/qradar/systemd/bin/console_check.sh -r (code=exited, status=0/SUCCESS)
 Main PID: 29483 (java)

If the Ariel Query Server is not running, a full configuration deployment might resolve this issue by restarting all services on the managed host after deploying the most recent configuration on it. If the Ariel Query Server is still not running after a full deployment, contact support for further assistance.

If the Ariel Query Server is running, verify that it is listening on the port that is identified in the Diagnosing the Problem section:
Example:
```
[root@test-ep ~]# netstat -nalp | grep 32006
tcp 0 0 :::32006 :::* LISTEN 13732/ariel
```
If your Ariel Query Server is running but is not listening on the port identified, a full deployment might resolve the issue by deploying the most recent configuration on the managed host. If it is still not listening on the port after a full deployment, contact support for further assistance.
If your Ariel Query Server is listening on the specified port, verify the connectivity from the console to the managed host on that specific port. For unencrypted hosts, you need to use the hostname or IP address of the managed host and for encrypted host you need to use localhost.

Example for unencrypted hosts:
```
[root@test-console ~]# telnet test-ep 32006
```
Example for encrypted hosts:
```
[root@test-console ~]# telnet localhost 32006
```
If you do not receive a message indicating a successful connection, the most likely reason is a firewall blocking the traffic for the Ariel port.

The following steps applied only for encrypted managed host.
Review that the status of the tunnel is proper. Once the managed host IP has been identified, the service which manages said tunnel must be reviewed, in order to do so the following command must be run:
Replace <Managed Host IP> with the IP that has been identified of the failing host.
```
grep -C8 '<Managed Host IP>' /etc/tunnel_manager/tunnels/managed-tunnel\@* | egrep '"ariel_proxy--ariel--Tunnel"'
```
The output should look like the following:
```
grep -C8 '<Managed Host IP>' /etc/tunnel_manager/tunnels/managed-tunnel\@* | egrep '"ariel_proxy--ariel--Tunnel"'
/etc/tunnel_manager/tunnels/managed-tunnel@18376953612552673763-Component = "ariel_proxy--ariel--Tunnel"
```
Make note of the managed-tunnel@ string, for this example it would be managed-tunnel@18376953612552673763 , this is the name of the service that manages the tunnel for said host.

Review the status by using systemctl status command on the service name:
Replace <managed-tunnel> with the managed-tunnel you got from the previous step.

systemctl status <managed-tunnel>

Command and output example:

systemctl status managed-tunnel@18376953612552673763
● managed-tunnel@18376953612552673763.service - SSH tunnel created and managed by the Tunnel Manager service
   Loaded: loaded (/etc/systemd/system/managed-tunnel@.service; static; vendor preset: disabled)
   Active: active (running) since Mon 2023-10-30 11:04:02 EDT; 4 days ago
 Main PID: 16362 (ssh)
   CGroup: /system.slice/system-managed\x2dtunnel.slice/managed-tunnel@18376953612552673763.service
           └─16362 /usr/bin/ssh -N -T -o ServerAliveInterval=60 -o ExitOnForwardFailure=yes -o Compression=no -L localhost:32008:localhost:32006 root@A.B.C.D
Nov 03 10:40:13 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused
Nov 03 10:40:18 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused
Nov 03 10:40:18 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused
Nov 03 10:40:18 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused
Nov 03 10:40:19 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused
Nov 03 10:40:19 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused
Nov 03 10:40:21 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused
Nov 03 10:40:21 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused
Nov 03 10:40:21 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused
Nov 03 10:40:21 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused

A status like the above means that the tunnel could not establish a proper connection.

Restart the tunnel by using systemctl restart as in the example below:
```
systemctl restart managed-tunnel@18376953612552673763
```

Run again the status command:

systemctl status <managed-tunnel>

A properly established tunnel should look like:

systemctl status managed-tunnel@18376953612552673763
● managed-tunnel@18376953612552673763.service - SSH tunnel created and managed by the Tunnel Manager service
   Loaded: loaded (/etc/systemd/system/managed-tunnel@.service; static; vendor preset: disabled)
   Active: active (running) since Fri 2023-11-03 20:45:47 EDT; 1min 48s ago
 Main PID: 38045 (ssh)
   CGroup: /system.slice/system-managed\x2dtunnel.slice/managed-tunnel@18376953612552673763.service
           └─38045 /usr/bin/ssh -N -T -o ServerAliveInterval=60 -o ExitOnForwardFailure=yes -o Compression=no -L localhost:32008:localhost:32006 root@A.B.C.D

Nov 03 20:45:47 hostname systemd[1]: Started SSH tunnel created and managed by the Tunnel Manager service.

Note : The option to configure the password expiry for root account is not supported in QRadar. If the root password is changed, you must restart the tunnel-manager service on the QRadar console system from the command line to re-establish the console connection with the managed host.

Result
The IO Errors are not longer present on the searches.

[{"Type":"MASTER","Line of Business":{"code":"LOB24","label":"Security Software"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSBQAC","label":"IBM Security QRadar SIEM"},"ARM Category":[{"code":"a8m0z000000cwsyAAA","label":"Admin Tasks"}],"ARM Case Number":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"7.3.3;7.4.3;7.5.0"}]

Tips

QRadar: Understanding IO Errors while searching

Troubleshooting

Problem

Symptom

Cause

Diagnosing The Problem

Resolving The Problem

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?