We are excited to announce the availability of Time Series Libraries in Watson Studio Spark Environments starting today (October 8, 2020).

This library, developed by IBM Research, includes a full set of time series functionality that is not available in any other competing offerings. It joins our IBM Research Assets, Geospatial functionality, Data Skipping, and Parquet Encryption libraries as fully supported features by Watson Studio Spark Environments. 

The time series library allows users to perform various key operations on time series data, including construction of a collection of time series, imputation functions (like segmentation), transformers, reducers, joins, and machine learning functions (such as forecasting, clustering, and discriminatory sequence mining). The library supports various time series types, including numeric, categorical, and arrays.

Examples of time series data include the following:

  • Stock share prices and trading volumes
  • Clickstream data
  • Electrocardiogram (ECG) data
  • Temperature or seismographic data
  • Network performance measurements
  • Network logs
  • Electricity usage as recorded by a smart meter and reported via an Internet of Things data feed

Key features of the Time Series Libraries in Watson Studio Spark Environments

I. Data model

  • A core data model for univariate and multivariate time series  
  • Time Reference Systems for handling different timestamp representations
  • Support for aperiodic, duplicate, and time of order timestamps
  • Spark RDD and dataframe extensions for timeseries
  • Numeric and categorical timeseries
  • Lossless and lossy compression

II. Transformation and segmentation functions

  • Math: Mean, variance, skew, correlations, PAA, SAX, covariance matrix, Graphical Gaussian Model, etc.
  • Statistical tests: Augmented Dickey-Fuller, Ljung-box, Granger causality
  • Distance metrics: Dynamic Time Warping, Damerau Levenshtein, Longest Common Subsequence, Jaro-winkler,
  • Timeseries reconciliation: Hungarian algorithm, Earth mover distance
  • Change point detection: CU-SUM, Bayesian, Gaussian
  • Segmentation: Window, Record-based, Burst-based, Anchor, Regression

III. Forecasting functions

  • ARIMA
  • Holt-Winters
  • BATS
  • Vector auto-regression
  • Anomaly detection

IV. Joins

  • A complete suite of temporal joins, including inner, outer, left-outer, right-outer, left-inner, and right-inner supported

V. SQL extensions

VI. Spark machine learning

  • Sequence mining
  • Timeseries clustering: K-means, K-shape, Motif-based, Cluster drift detection 
  • Data connectors for feature engineering that provide Spark data frame iterators to TensorFlow and Sci-kit learn.

For full list of functions and how to get started, please refer to the documentation.

Learn more about data lakes in the IBM Cloud

If you would like to know more about time series use case on IBM Cloud, please reach out to Kiran Guduguntla or Josh Rosenkranz.

More from Analytics

IBM acquires StreamSets, a leading real-time data integration company

3 min read - We are thrilled to announce that IBM has acquired StreamSets, a real-time data integration company specializing in streaming structured, unstructured and semistructured data across hybrid multicloud environments. Acquired from Software AG along with webMethods, this strategic acquisition expands IBM's already robust data integration capabilities, helping to solidify our position as a leader in the data integration market and enhancing IBM Data Fabric’s delivery of secure, high-quality data for artificial intelligence (AI).  According to a Forrester study conducted on behalf of…

Fine-tune your data lineage tracking with descriptive lineage

4 min read - Data lineage is the discipline of understanding how data flows through your organization: where it comes from, where it goes, and what happens to it along the way. Often used in support of regulatory compliance, data governance and technical impact analysis, data lineage answers these questions and more.  Whenever anyone talks about data lineage and how to achieve it, the spotlight tends to shine on automation. This is expected, as automating the process of calculating and establishing lineage is crucial to…

Reimagine data sharing with IBM Data Product Hub

3 min read - We are excited to announce the launch of IBM® Data Product Hub, a modern data sharing solution designed to accelerate data-driven outcomes across your organization. Today, we're making this product generally available to our clients across the world, following its announcement at the IBM Think conference in May 2024. Data sharing has become the lifeblood of modern organizations, fueling growth and driving innovation. But traditional approaches to data sharing can often be a bottleneck constricting the seamless sharing of data.…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters