Troubleshooting
Problem
A red bar with the error message "An IO Error occurred on server(s)x.x.x.x. Please try again.
" is displayed while running searches.
Symptom
When running historical searches, a red bar is displayed on the results page. It displays the message: An IO Error occurred on server(s)
hostname
. Please try again
. The Hostname
or IP address that is displayed in the message is likely that of the appliance that is presenting the issue. Applying filters to the search by using the "Event Processor
" parameter might eliminate the error.
On Encrypted Managed hosts the error will vary and will show similar to the following:
An IO Error occurred on server(s) 127.0.0.1:32013. Please try again
.Note: The specific port number referenced in the error will change, depending on which managed host is causing the communication problem.
Cause
Receiving the An IO Error occurred on server(s)
hostname
. Please try again.
message while running searches indicates that the Ariel database is not accessible on one or more managed hosts.
The Hostname or IP address that is displayed on the error message does not always match the host or hosts that are experiencing the problem as there are various reasons that cause this error message to be displayed.
Note: Whether a hostname or an IP address is included in the message can depend on your name resolution configuration and whether or not the managed host is encrypted.
Diagnosing The Problem
When you are running a historical search, the console proxies your search requests to other managed hosts involved depending on your specified filters. An IO Error indicates that one or more of your managed hosts are not responding to these search requests.
Identifying the correct host or hosts that are experiencing the issue is the first step in effectively troubleshooting this problem.
You can use a combination of the methods below to help you identify the managed host that is experiencing the issue. If you are aware of managed hosts with issues, you can skip to the Checking the Security Data Distribution tab section for the verification step only.
Search Details
By clicking the More Details link at the results section, you can get a better picture of how your managed hosts are responding to your search:
In this example, the host test-ep ran its search on no data files and the search duration was also zero:
Based on this result, test-ep is the host that is experiencing the problem. In a more realistic example, it can be necessary to verify your findings.
Reviewing the QRadar logs
If your managed hosts are not encrypted, or you have a mix of encrypted and not encrypted managed hosts in your environment, QRadar logs can be useful in identifying the managed host having the problems.
Run the search by using the following filters to identify the actual error message:
Event Processor
: your consoleQuickFilter
: MappingFactoryTime window
: A time range that includes the last time that you received the IO Error
If the managed host experiencing the Ariel database issue is not encrypted, the raw event includes its hostname. Drill into the event and find the raw text:
Sep 14 14:56:23 127.0.0.1 [aqw_remote_2:4bac31ad-5cb4-47c8-8f0d-280d3bcb3d10] com.q1labs.ariel.searches.tasks.ServiceTaskBase: [ERROR] [NOT:0000003000][198.51.100.2/- -] [-/- -]Can't communicate to server [
test-ep:32006] executing query:Id:4bac31ad-5cb4-47c8-8f0d-280d3bcb3d10, DB:<events@/store/ariel/events/records, /store/ariel/events/payloads>, Time:<16-09-14,14:55:23 to 16-09-14,14:56:00>, Criteria=<DeviceType:[368,368]>, MappingFactory=com.q1labs.core.types.event.mapping.NormalizedEventMappingFactory@4ee, processedRecordsLimit=2147483647, executionTimeLimit=9223372036854775807, collectedRecordsLimit=2147483647, prio=NORMAL
If the issue is on an non-encrypted managed host:
The error on the GUI will show similar to the following:
The failing managed host can be found by:
- Checking the /etc/tunnel_manager/config/deployment.json file and setting it to a readable format:
cat /etc/tunnel_manager/config/deployment.json | python -m json.tool | less -i
- While in less search for the port mentioned in the IO error, for this example it is port 32006 the remote IP will be listed along with the port. The configuration for the tunnel should look like:
{
"bind_address": "localhost",
"component": "ariel_proxy--ariel--Tunnel",
"compression": false,
"destination_host": "localhost",
"destination_port": 32006,
"direction": "local",
"remote_host": "A.B.C.D",
"remote_user": "root",
"source_port": 32006
}
Where "remote_host": "A.B.C.D" is the IP of the managed host.
On an encrypted host, the raw event contains localhost as the hostname:
Sep 14 14:23:23 127.0.0.1 [aqw_remote_2:dd380d0d-ad31-4497-a9d3-81224cbd4b6b] com.q1labs.ariel.searches.tasks.ServiceTaskBase: [ERROR] [NOT:0000003000][198.51.100.2/- -] [-/- -]Can't communicate to server [
localhost:32006] executing query:Id:dd380d0d-ad31-4497-a9d3-81224cbd4b6b, DB:<events@/store/ariel/events/records, /store/ariel/events/payloads>, Time:<16-09-14,14:22:23 to 16-09-14,14:23:00>, Criteria=<DeviceType:[368,368]>, MappingFactory=com.q1labs.core.types.event.mapping.NormalizedEventMappingFactory@4ee, processedRecordsLimit=2147483647, executionTimeLimit=9223372036854775807, collectedRecordsLimit=2147483647, prio=NORMAL
Note: Regardless of the encryption setting of your managed host, you should make a note of hostname information from these raw events, as it is useful when verifying connectivity as described in the Resolving the Problem section.
Eliminating Event Processors
The IO Error is displayed only when you are searching on the managed host experiencing the issue. In our example, setting a filter to show events only from the console eliminates the IO error:
Therefore, you can set filters on your search to help you identify which managed hosts are experiencing the issue. Try filtering on the managed hosts that you previously identified when checking the search details. If the Ariel database of the managed host or hosts that you are filtering on are not accessible, you receive the same IO error:
Checking the Security Data Distribution tab
When you have an idea about which managed host or hosts are experiencing an issue, you can verify your conclusion by checking the Security Data Distribution tab of the System Information window. Open this tab by clicking Admin > System Configuration > System and License Management > Systems. When System and License Management window is opened, select the suspected managed host, and click Actions > View and Manage System. The System and License Details window open with Security Data Distribution tab that is selected by default. If the Ariel database is not accessible, the following warning is displayed:
Resolving The Problem
Note: For a video guide on how to troubleshoot IO errors, follow this article: How to troubleshoot IO errors when searching on QRadar
When you identified which Event Processor is experiencing the issue, you need to restore the access to its Ariel database. This is not always trivial. Below are some basic resolution steps that can help address the most common causes before contacting IBM support for further assistance.
Warning: The deployment of full configuration that is recommended in some of the below steps restart the services on all of your managed hosts, which result in a brief service interruption. This interruption must be taken into consideration when deploying full configuration. Perform a Full Deploy by going to the Admin tab on the UI and clicking Advanced > Deploy Full Configuration.
- Open an SSH connection to your Console with the root account.
- Create an SSH connection to the managed Host that you identified in the Diagnosing the Problem Section.
Example:[root@test-console ~]# ssh test-ep
- Verify that the Ariel Query Server is running on this managed host:
Example:[root@qradarep750 ~]# systemctl status ariel_query_server ● ariel_query_server.service - Ariel Query Server Loaded: loaded (/usr/lib/systemd/system/ariel_query_server.service; static; vendor preset: disabled) Drop-In: /etc/systemd/system/ariel_query_server.service.d └─ulimit.conf Active: active (running) since Thu 2022-09-15 08:50:36 EDT; 3h 19min ago Process: 24057 ExecStartPre=/opt/qradar/systemd/bin/generate_environment.sh ariel_query_server ariel (code=exited, status=0/SUCCESS) Process: 17885 ExecStartPre=/opt/qradar/systemd/bin/console_check.sh -r (code=exited, status=0/SUCCESS) Main PID: 29483 (java)
- If the Ariel Query Server is running, verify that it is listening on the port that is identified in the Diagnosing the Problem section:
Example:[root@test-ep ~]# netstat -nalp | grep 32006 tcp 0 0 :::32006 :::* LISTEN 13732/ariel
- If your Ariel Query Server is listening on the specified port, verify the connectivity from the console to the managed host on that specific port. For unencrypted hosts, you need to use the hostname or IP address of the managed host and for encrypted host you need to use localhost.
Example for unencrypted hosts:[root@test-console ~]# telnet test-ep 32006
[root@test-console ~]# telnet localhost 32006
If you do not receive a message indicating a successful connection, the most likely reason is a firewall blocking the traffic for the Ariel port.
The following steps applied only for encrypted managed host.
- Review that the status of the tunnel is proper. Once the managed host IP has been identified, the service which manages said tunnel must be reviewed, in order to do so the following command must be run:
Make note of the managed-tunnel@ string, for this example it would be Replace <Managed Host IP> with the IP that has been identified of the failing host. managed-tunnel@18376953612552673763 , this is the name of the service that manages the tunnel for said host.
grep -C8 '<Managed Host IP>' /etc/tunnel_manager/tunnels/managed-tunnel\@* | egrep '"ariel_proxy--ariel--Tunnel"'
The output should look like the following:grep -C8 '<Managed Host IP>' /etc/tunnel_manager/tunnels/managed-tunnel\@* | egrep '"ariel_proxy--ariel--Tunnel"' /etc/tunnel_manager/tunnels/managed-tunnel@18376953612552673763-Component = "ariel_proxy--ariel--Tunnel"
- Review the status by using systemctl status command on the service name:
Replace <managed-tunnel> with the managed-tunnel you got from the previous step.systemctl status <managed-tunnel>
Command and output example:systemctl status managed-tunnel@18376953612552673763 ● managed-tunnel@18376953612552673763.service - SSH tunnel created and managed by the Tunnel Manager service Loaded: loaded (/etc/systemd/system/managed-tunnel@.service; static; vendor preset: disabled) Active: active (running) since Mon 2023-10-30 11:04:02 EDT; 4 days ago Main PID: 16362 (ssh) CGroup: /system.slice/system-managed\x2dtunnel.slice/managed-tunnel@18376953612552673763.service └─16362 /usr/bin/ssh -N -T -o ServerAliveInterval=60 -o ExitOnForwardFailure=yes -o Compression=no -L localhost:32008:localhost:32006 root@A.B.C.D Nov 03 10:40:13 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused Nov 03 10:40:18 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused Nov 03 10:40:18 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused Nov 03 10:40:18 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused Nov 03 10:40:19 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused Nov 03 10:40:19 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused Nov 03 10:40:21 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused Nov 03 10:40:21 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused Nov 03 10:40:21 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused Nov 03 10:40:21 hostname ssh[16362]: channel 2: open failed: connect failed: Connection refused
A status like the above means that the tunnel could not establish a proper connection. - Restart the tunnel by using systemctl restart as in the example below:
systemctl restart managed-tunnel@18376953612552673763
- Run again the status command:
systemctl status <managed-tunnel>
systemctl status managed-tunnel@18376953612552673763 ● managed-tunnel@18376953612552673763.service - SSH tunnel created and managed by the Tunnel Manager service Loaded: loaded (/etc/systemd/system/managed-tunnel@.service; static; vendor preset: disabled) Active: active (running) since Fri 2023-11-03 20:45:47 EDT; 1min 48s ago Main PID: 38045 (ssh) CGroup: /system.slice/system-managed\x2dtunnel.slice/managed-tunnel@18376953612552673763.service └─38045 /usr/bin/ssh -N -T -o ServerAliveInterval=60 -o ExitOnForwardFailure=yes -o Compression=no -L localhost:32008:localhost:32006 root@A.B.C.D Nov 03 20:45:47 hostname systemd[1]: Started SSH tunnel created and managed by the Tunnel Manager service.
The IO Errors are not longer present on the searches.
Was this topic helpful?
Document Information
Modified date:
15 April 2024
UID
swg21991038