Learn how to archive your Event Streams Kafka data to Object Storage using SQL Query. This process, called stream landing, can be set up using the Terraform scripts provided in this post.
You can easily archive data to IBM Cloud Object Storage for long-term storage or to gain insight by leveraging interactive queries or big data analytics. You can achieve this through the Event Streams UI, where topics can be selected and linked to Cloud Object Storage buckets, with data automatically and securely streamed using the fully-managed IBM Cloud SQL Query service. All data is stored in Parquet format, making it easy to manage and process. Check out “Streaming to Cloud Object Storage by using SQL Query” for more info.
In this post, you will set up the Cloud Object Storage stream landing using Terraform.
What is Terraform?
Terraform is an open-source “Infrastructure as Code” tool created by HashiCorp.
A declarative coding tool, Terraform enables developers to use a high-level configuration language called HCL (HashiCorp Configuration Language) to describe the desired “end-state” cloud or on-premises infrastructure for running an application. It then generates a plan for reaching that end-state and executes the plan to provision the infrastructure:
Let’s get started
If you have Terraform set up on your machine, follow the steps below:
- Open a terminal or command prompt on your machine, clone the GitHub repository and move to the directory:
- Create the
local.env
file from the template file provided in the repo and update the environment variables accordingly. Once updated, source the file: - You can now run the individual Terraform commands to provision the required IBM Cloud services:
Use the IBM Cloud Schematics UI
Alternatively, you can use the IBM Schematics UI. You don’t need to install anything on your machine:
- Navigate to Schematics Workspaces on IBM Cloud and click on Create workspace.
- Under the Specify Template section, provide https://github.com/IBM-Cloud/stream-landing-terraform under GitHub or GitLab repository URL.
- Select terraform_v0.14 as the Terraform version and click Next.
- Provide the workspace name —
stream-landing
— and choose a resource group and location. - Click Next and then click Create.
- You should see the Terraform variables section. Fill in the variables as per your requirement by clicking the action menu next to each of the variables.
- Scroll to the top of the page to Generate (terraform plan) and Apply (terraform apply) the changes.
- Click Apply plan and check the progress under the Log. (Generate plan is optional.)
To understand more about Terraform and IBM Cloud Schematics, check this blog post: “Provision Multiple Instances in a VPC Using Schematics.” In short, you can run any Terraform script just by simply pointing to the Git repository with the scripts.
This is what the Terraform scripts do:
- Create a new resource group and provision resources under the group.
- Create a Key Protect service with a root key.
- Provision an Event Streams service with a topic.
- Provision a Cloud Object Storage service with a bucket.
- Provision a SQL Query service for stream landing.
- Stream landing permissions and authorizations.
Test stream landing
To produce messages to the event streams service, you can use tools like kcat (formerly Kafkacat) or Event Streams sample producer.
- Verify that the specified prefix in IBM Cloud Object Storage is filled with Parquet objects by navigating to the Object Storage service under your resources.
- Check the status of all streaming jobs in the SQL Query UI.
- Alternatively, use the REST API of SQL Query to get the list and the details of running stream landing jobs.
- In the Event Streams UI, you also get information about the active stream landing jobs per topic. Using Event Streams, you can view and stop the landing configuration.
Further reading
- Process big data logs with SQL
- Tutorial: Stream Landing from Event Streams Kafka Service to IBM Cloud Data Lake on Object Storage
If you have any queries, feel free to reach out to me on Twitter or on LinkedIn.