December 7, 2021 By Vidyasagar Machupalli 3 min read

Learn how to archive your Event Streams Kafka data to Object Storage using SQL Query. This process, called stream landing, can be set up using the Terraform scripts provided in this post. 

You can easily archive data to IBM Cloud Object Storage for long-term storage or to gain insight by leveraging interactive queries or big data analytics. You can achieve this through the Event Streams UI, where topics can be selected and linked to Cloud Object Storage buckets, with data automatically and securely streamed using the fully-managed IBM Cloud SQL Query service. All data is stored in Parquet format, making it easy to manage and process. Check out “Streaming to Cloud Object Storage by using SQL Query” for more info. 

In this post, you will set up the Cloud Object Storage stream landing using Terraform. 

What is Terraform?

Terraform is an open-source “Infrastructure as Code” tool created by HashiCorp.

A declarative coding tool, Terraform enables developers to use a high-level configuration language called HCL (HashiCorp Configuration Language) to describe the desired “end-state” cloud or on-premises infrastructure for running an application. It then generates a plan for reaching that end-state and executes the plan to provision the infrastructure:

Streaming to Cloud Object Storage by using SQL Query.

Let’s get started 

If you have Terraform set up on your machine, follow the steps below:

  1. Open a terminal or command prompt on your machine, clone the GitHub repository and move to the directory:
    git clone https://github.com/IBM-Cloud/stream-landing-terraform
    cd stream-landing-terraform
  2. Create the local.env file from the template file provided in the repo and update the environment variables accordingly. Once updated, source the file:
    cp template.local.env local.env
    source local.env
  3. You can now run the individual Terraform commands to provision the required IBM Cloud services:
    terraform init 
    terraform plan 
    terraform apply

Use the IBM Cloud Schematics UI

Alternatively, you can use the IBM Schematics UI. You don’t need to install anything on your machine:

  1. Navigate to Schematics Workspaces on IBM Cloud and click on Create workspace.
  2. Under the Specify Template section, provide https://github.com/IBM-Cloud/stream-landing-terraform under GitHub or GitLab repository URL
  3. Select terraform_v0.14 as the Terraform version and click Next.
  4. Provide the workspace name — stream-landing — and choose a resource group and location.
  5. Click Next and then click Create.
  6. You should see the Terraform variables section. Fill in the variables as per your requirement by clicking the action menu next to each of the variables.
  7. Scroll to the top of the page to Generate (terraform plan) and Apply (terraform apply) the changes.
  8. Click Apply plan and check the progress under the Log. (Generate plan is optional.)

To understand more about Terraform and IBM Cloud Schematics, check this blog post: “Provision Multiple Instances in a VPC Using Schematics.” In short, you can run any Terraform script just by simply pointing to the Git repository with the scripts.

This is what the Terraform scripts do:

  1. Create a new resource group and provision resources under the group.
  2. Create a Key Protect service with a root key.
  3. Provision an Event Streams service with a topic.
  4. Provision a Cloud Object Storage service with a bucket.
  5. Provision a SQL Query service for stream landing.
  6. Stream landing permissions and authorizations.

Test stream landing

To produce messages to the event streams service, you can use tools like kcat (formerly Kafkacat) or Event Streams sample producer.

  1. Verify that the specified prefix in IBM Cloud Object Storage is filled with Parquet objects by navigating to the Object Storage service under your resources.
  2. Check the status of all streaming jobs in the SQL Query UI. 
  3. Alternatively, use the REST API of SQL Query to get the list and the details of running stream landing jobs. 
  4. In the Event Streams UI, you also get information about the active stream landing jobs per topic. Using Event Streams, you can view and stop the landing configuration.

Further reading

If you have any queries, feel free to reach out to me on Twitter or on LinkedIn

Was this article helpful?
YesNo

More from Cloud

Fortressing the digital frontier: A comprehensive look at IBM Cloud network security services

6 min read - The cloud revolution has fundamentally transformed how businesses operate. Its superior scalability, agility and cost-effectiveness have made it the go-to platform for organizations of all sizes. However, this shift to the cloud has introduced a new landscape of ever-evolving security threats. Data breaches and cyberattacks continue to hit organizations, making robust cloud network security an absolute necessity. IBM®, a titan in the tech industry, recognizes this critical need, provides a comprehensive suite of tools and offers unmatched expertise to fortify…

How well do you know your hypervisor and firmware?

6 min read - IBM Cloud® Virtual Private Cloud (VPC) is designed for secured cloud computing, and several features of our platform planning, development and operations help ensure that design. However, because security in the cloud is typically a shared responsibility between the cloud service provider and the customer, it’s essential for you to fully understand the layers of security that your workloads run on here with us. That’s why here, we detail a few key security components of IBM Cloud VPC that aim…

New IBM study: How business leaders can harness the power of gen AI to drive sustainable IT transformation

3 min read - As organizations strive to balance productivity, innovation and environmental responsibility, the need for sustainable IT practices is even more pressing. A new global study from the IBM Institute for Business Value reveals that emerging technologies, particularly generative AI, can play a pivotal role in advancing sustainable IT initiatives. However, successful transformation of IT systems demands a strategic and enterprise-wide approach to sustainability. The power of generative AI in sustainable IT Generative AI is creating new opportunities to transform IT operations…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters