October 26, 2017 By Oriana Zambrano 2 min read

Streaming Analytics Updates: IBM Streams Runner for Apache Beam

The IBM Streaming Analytics service is a cloud-based service for IBM Streams. Streams is an analytics platform that allows you to create applications that analyze data from a variety of sources in real time. Streaming Analytics continues to add enhancements to make it easy for you to create streaming applications however you choose. Previously, we announced integration with DSX to allow creating Streams applications in Python. Now, you can run a Beam application/pipeline in Streaming Analytics.

Imagine you are given the task to write an application for a website. The application needs to look at online users and their activity to identify popular content. You’ll need to look at logs, user clickstreams, and existing user data stored in a database. Which platform are you going to use to write this application: Apache Spark, Apache Flink, IBM Streams? Why not write the app with a single interface and choose where you run it later?

This is the goal of Apache Beam, a unified programming model for data processing—batch or streaming. Similar to Streams, Beam allows users to develop data processing applications using a set of functions to manipulate your data. Beam, however, simply provides a programming model, and leaves it up to you to select a runtime platform via a runner when you launch your application.

We’ve added the IBM Streams Runner for Apache Beam to the Streaming Analytics service so that you can run your Beam application on the Streams platform.

Beam on the Industry-leading IBM Streams Platform

IBM Streams offers a continuous, complete, and connected solution. If you use IBM Streams as your Beam runner, you’ll get a fast, stable, industry-leading platform. In addition, since the Streams runner can run in the cloud, you can develop Beam applications locally using the direct runner and then later deploy the applications to the Bluemix cloud.

No Streams Installation Required — The Streams runner allows you to directly send your applications to the Streaming Analytics service to be compiled and executed. This means there’s no need to install Streams on your system.

Interact with Beam pipelines with the newly updated Streams Console — Beam applications appear just like they are laid out in your source code. Additionally, you can view all custom metrics, console logs, data stream flow rates, and even congested streams.

Download today — The Streams Runner is now available to download through your existing Streaming Analytics service. Don’t have an existing service? Create one here.

IBM Streams Runner for Apache Beam Features

  • Support for Beam 2.0 Java SDK

  • Support primitive and custom composite Beam transforms

  • Support for custom Beam metrics

    • Counter, Distribution, and Gauge types

    • Watermark metrics are automatically created for you

  • Support for processing-time and event-time timers and window triggers

  • Support for stateful processing

  • Support for custom parameters specified at application runtime

  • Integration into the Streams Platform

    • Submit Beam applications to a Streaming Analytics service with no local Streams installation required

    • Specify local data files to be available for your application in the Streaming Analytics service

    • Support to cancel Streams job from the Beam application

    • View Beam Pipeline layouts in the Streams Graph

  • Specialized Beam SDK for Streams

    • Publish data streams for other Streams applications to utilize or subscribe to data streams for your application to consume

    • Read/write files to an IBM Object Storage OpenStack Swift for Bluemix service

Learn More

More from

New IBM study: How business leaders can harness the power of gen AI to drive sustainable IT transformation

3 min read - As organizations strive to balance productivity, innovation and environmental responsibility, the need for sustainable IT practices is even more pressing. A new global study from the IBM Institute for Business Value reveals that emerging technologies, particularly generative AI, can play a pivotal role in advancing sustainable IT initiatives. However, successful transformation of IT systems demands a strategic and enterprise-wide approach to sustainability. The power of generative AI in sustainable IT Generative AI is creating new opportunities to transform IT operations…

IBM Research data loader enhances AI model training for open-source community

3 min read - How do you overcome bottlenecks when you’re training AI models on massive quantities of data? At this year’s PyTorch conference, IBM Research showcased a groundbreaking data loader for large-scale LLM training. The tool, now available to PyTorch users, aims to simplify large-scale training for as broad an audience as possible. The origins of the research The idea for the high-throughput data loader stemmed from practical issues research scientists observed during model training, as their work required a tool that could…

How IBM Data Product Hub helps you unlock business intelligence potential

4 min read - Business intelligence (BI) users often struggle to access the high-quality, relevant data necessary to inform strategic decision making. These professionals encounter a range of issues when attempting to source the data they need, including: Data accessibility issues: The inability to locate and access specific data due to its location in siloed systems or the need for multiple permissions, resulting in bottlenecks and delays. Inconsistent data quality: The uncertainty surrounding the accuracy, consistency and reliability of data pulled from various sources…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters