Connecting to an Amazon Athena data source

Connect the Amazon Athena data source to IBM Security QRadar® Suite Software to enable your applications and dashboards to collect and analyze Amazon Athena security data. Universal Data Insights connectors enable federated search across your security products.

Before you begin

Collaborate with an AWS administrator to obtain a user account with access to query the CloudWatch data source.

Configure VPC Flow Logs in Amazon Athena

Enable the VPC flow logs in Amazon Console.
Configure VPC flow log service to save logs in Amazon S3 bucket. For more information, see Publishing flow logs to Amazon S3 (https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-s3.html).
Create Amazon VPC table for VPC flow logs in Amazon Athena service. For more information, see Querying Amazon VPC Flow Logs (https://docs.aws.amazon.com/athena/latest/ug/vpc-flow-logs.html).

Configure Amazon GuardDuty in Amazon Athena

Enable the GuardDuty features in Amazon Console.
Configure the GuardDuty feature to export findings in Amazon S3 bucket. For more information, see Export Findings (https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_exportfindings.html).
Create a table for GuardDuty findings in Amazon Athena. For more information, see Querying Amazon GuardDuty Findings (https://docs.aws.amazon.com/athena/latest/ug/querying-guardduty.html).

Configure Amazon Security Lake in Amazon Athena

Enable and start Amazon Security Lake in Amazon Console. For more information, see Getting Started with Amazon Security Lake (https://docs.aws.amazon.com/security-lake/latest/userguide/getting-started.html).
Ensure that Amazon Security Lake stores logs in Open Cybersecurity Schema Framework (OCSF) format. For more information, see Open Cybersecurity Schema Framework (OCSF) format (https://schema.ocsf.io/).
The client program must have query access to the AWS Lake Formation tables as a subscriber. The following list contains the minimum IAM permissions for the Amazon Athena connector:
- "athena:GetQueryExecution"
- "athena:GetQueryResults"
- "athena:ListWorkGroups"
- "athena:StartQueryExecution"
- "athena:StopQueryExecution"
- "glue:GetDatabases"
- "glue:GetTable"
- "s3:AbortMultipartUpload"
- "s3:DeleteObject"
- "s3:GetBucketLocation"
- "s3:GetObject"
- "s3:ListBucket",
- "s3:ListBucketMultipartUploads"
- "s3:ListMultipartUploadParts"
- "s3:PutObject"
- "sts:AssumeRole"
For more information, see Subscriber management in Amazon Security Lake (https://docs.aws.amazon.com/security-lake/latest/userguide/subscriber-management.html).

If you have a firewall between your cluster and the data source target, use the IBM® Security Edge Gateway to host the containers. The Edge Gateway must be V1.6 or later. For more information, see Edge Gateway.

About this task

Amazon Athena uses standard SQL to analyze data in Amazon S3. QRadar Suite Software currently supports the data source connection for logs of Amazon GuardDuty and VPC Flow.

Structured Threat Information eXpression (STIX) is a language and serialization format that organizations use to exchange cyberthreat intelligence. The connector uses STIX patterning to query Amazon Athena data and returns results as STIX objects. For more information about how the Amazon Athena data schema maps to STIX, see Amazon Athena stix-shifter repository (https://github.com/opencybersecurityalliance/stix-shifter/tree/develop/stix_shifter_modules/aws_athena).

Procedure

Log in to IBM Security QRadar Suite Software.
From the menu, click Connections > Integration data sources.
On the Integration data sources page, on the Amazon Athena tile, click Set up a connection.
On the Connection services page, select the Federated searches service tile, and then click Enable.
The available connection services include:
1. Connected assets & risk
2. Federated searches
Note: If there are multiple data sources to connect, select the connector from the Sources list, and then click Enable.
Click Next.

On the Connection details page, configure the following parameters.

Configure the connection to the data source.

Table 1. Connection parameters
Parameter	Description
Data source name	Enter a unique name to identify the data source connection. You can create multiple connections to a data source, so it is useful to clearly set them apart by name. Only alphanumeric characters and the following special characters are allowed: `- . _`
Data source description	Enter a description to indicate the purpose of the data source connection. You can create multiple connections to a data source, so it is useful to clearly indicate the purpose of each connection by description. Only alphanumeric characters and the following special characters are allowed: `- . _`
Edge gateway	If you have a firewall between your cluster and the data source target, use the Edge Gateway to host the containers. In the Edge gateway field, specify an Edge Gateway to host the connector. It can take up to five minutes for the status of newly deployed data source connections on the Edge Gateway to show as being connected.
Region	Enter the Amazon Athena region for the data source. Select your region code from the Region column of the Service Endpoints table in the Amazon Athena endpoints and quotas (https://docs.aws.amazon.com/general/latest/gr/athena.html).
Amazon S3 Bucket Location	Enter the location of the S3 bucket where query results will be stored.
VPC Flow Logs database name (optional)	If you are using Amazon Athena with VPC flow logs, specify the name of the database that contains the VPC flow logs in the VPC Flow Logs database name (optional) field.
VPC Flow Logs table name (optional)	If you are using Amazon Athena with VPC flow logs, specify the name of the table that contains the VPC flow logs in the VPC Flow Logs table name (optional) field.
Amazon GuardDuty database name (optional)	If you are using Amazon Athena with Amazon GuardDuty, specify the name of the database that contains the Amazon GuardDuty logs in the Amazon GuardDuty database name (optional) field.
Amazon GuardDuty table name (optional)	If you are using Amazon Athena with Amazon GuardDuty, specify the name of the table that contains the Amazon GuardDuty logs in the Amazon GuardDuty table name (optional) field.
OCSF logs database name (optional)	If you are using Amazon Athena to query Amazon Security Lake logs, specify the name of the AWS Lake formation database that contains the security logs in the OCSF logs database name (optional) field.
OCSF logs table name (optional)	If you are using Amazon Athena to query Amazon Security Lake logs, specify the name of the AWS Lake formation table that contains the security logs in the OCSF logs table name (optional) field.

Set the query parameters to control the behavior of the federated search query on the data source.

Table 2. Query parameters
Query parameter	Description
Concurrent search limit	Enter the number of simultaneous connections that can be made to the data source. The default limit for the number of connections is 4. The value must not be less than 1 and must not be greater than 100.
Query search timeout limit	Enter the time limit in minutes for how long the query is run on the data source. The default time limit is 30. When the value is set to zero, no timeout occurs. The value must not be less than 1 and must not be greater than 120.
Result size limit	Enter the maximum number of entries or objects that are returned by search query. The default result size limit is 10,000. The value must not be less than 1 and must not be greater than 500,000.
Query time range	Enter the time range in minutes for the search, represented as the last `X` minutes. The default is 5 minutes. The value must not be less than 1 and must not be greater than 10,000.
Custom mapping (Optional)	If you need to customize the STIX attributes mapping, click Customize attribute mapping and edit the JSON blob to map new or existing properties to their associated target data source fields.

Important: If you increase the Concurrent search limit and the Result size limit, a greater amount of data can be sent to the data source, which increases the strain on the data source. Increasing the query time range also increases the amount of data.

Click Next.

On the Connection configurations page, configure identity and access.

Click Add a configuration.

In the Configuration details window, configure the following parameters.

Table 3. Configuration parameters
Parameter	Description
Configuration Name	Enter a unique name to describe the access configuration and distinguish it from the other access configurations for this data source connection that you might set up. Only alphanumeric characters and the following special characters are allowed: `- . _`
Configuration Description	Enter a unique description to describe the access configuration and distinguish it from the other access configurations for this data source connection that you might set up. Only alphanumeric characters and the following special characters are allowed: `- . _`
AWS Access key id	Establish AWS authentication to enable access to the AWS search API. To establish an AWS key-based authentication, enter values for the AWS Access key id and AWS secret access key parameters. To establish an AWS role-based authentication, enter values for the AWS Access key id, AWS secret access key, and AWS IAM Role parameters.
AWS secret access key
AWS IAM Role
External ID for AWS Assume Role	To grant access to your AWS resources and establish an Assume Role authentication, enter a value for the External ID for AWS Assume Role parameter. For more information, see Using an external ID for third-party access (https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html).

For more information about AWS authentication, see Configuring AWS authentication.

To save your configuration and establish the connection, click Save.

Click Next.
To assign user access, on the User access control page, select one or more data source configurations from the Access list, and then click Finish.
To manage your active connections, complete the following steps:
1. On the Integration data sources page, on the tile of the relevant data source, click Manage <x> of <x> active connections.
2. On the Connection status page, on the tile of the relevant data source, you can edit, refresh, or delete your data source connection.

Results

After you connect a data source, it might take up to 30 seconds to retrieve the data. Before the full data set is returned, the data source might display as unavailable. After the data is returned, the data source shows as being connected, and a polling mechanism occurs to validate the connection status. The connection status is valid for 60 seconds after every poll.

You can add other connection configurations for this data source that have different users and different data access permissions.

What to do next

Test the connection by searching for an IP address in IBM Security Data Explorer that matches an asset data source. In Data Explorer, click an IP address to view its associated assets and risk.

To use Data Explorer, you must have data sources that are connected so that the application can run queries and retrieve results across a unified set of data sources. The search results vary depending on the data that is contained in your configured data sources. For more information about how to build a query in Data Explorer, see Build a query.