Announcing the Availability of Time Series Functionality in Watson Studio Spark Environments

We are excited to announce the availability of Time Series Libraries in Watson Studio Spark Environments starting today (October 8, 2020).

This library, developed by IBM Research, includes a full set of time series functionality that is not available in any other competing offerings. It joins our IBM Research Assets, Geospatial functionality, Data Skipping, and Parquet Encryption libraries as fully supported features by Watson Studio Spark Environments.

The time series library allows users to perform various key operations on time series data, including construction of a collection of time series, imputation functions (like segmentation), transformers, reducers, joins, and machine learning functions (such as forecasting, clustering, and discriminatory sequence mining). The library supports various time series types, including numeric, categorical, and arrays.

Examples of time series data include the following:

Stock share prices and trading volumes
Clickstream data
Electrocardiogram (ECG) data
Temperature or seismographic data
Network performance measurements
Network logs
Electricity usage as recorded by a smart meter and reported via an Internet of Things data feed

Key features of the Time Series Libraries in Watson Studio Spark Environments

I. Data model

A core data model for univariate and multivariate time series
Time Reference Systems for handling different timestamp representations
Support for aperiodic, duplicate, and time of order timestamps
Spark RDD and dataframe extensions for timeseries
Numeric and categorical timeseries
Lossless and lossy compression

II. Transformation and segmentation functions

Math: Mean, variance, skew, correlations, PAA, SAX, covariance matrix, Graphical Gaussian Model, etc.
Statistical tests: Augmented Dickey-Fuller, Ljung-box, Granger causality
Distance metrics: Dynamic Time Warping, Damerau Levenshtein, Longest Common Subsequence, Jaro-winkler,
Timeseries reconciliation: Hungarian algorithm, Earth mover distance
Change point detection: CU-SUM, Bayesian, Gaussian
Segmentation: Window, Record-based, Burst-based, Anchor, Regression

III. Forecasting functions

ARIMA
Holt-Winters
BATS
Vector auto-regression
Anomaly detection

IV. Joins

A complete suite of temporal joins, including inner, outer, left-outer, right-outer, left-inner, and right-inner supported

V. SQL extensions

A rich set of timeseries SQL extensions using Spark SQL

VI. Spark machine learning

Sequence mining
Timeseries clustering: K-means, K-shape, Motif-based, Cluster drift detection
Data connectors for feature engineering that provide Spark data frame iterators to TensorFlow and Sci-kit learn.

For full list of functions and how to get started, please refer to the documentation.

Learn more about data lakes in the IBM Cloud

What is a Data Lake?
Big Data Explained
For Jupyter notebook users, we also have an in-depth tutorial of using this functionality for data science.

If you would like to know more about time series use case on IBM Cloud, please reach out to Kiran Guduguntla or Josh Rosenkranz.

Kiran Guduguntla

Offering Manager - IBM Analytics Engine

We are excited to announce the availability of Time Series Libraries in Watson Studio Spark Environments starting today (October 8, 2020).

Key features of the Time Series Libraries in Watson Studio Spark Environments

I. Data model

II. Transformation and segmentation functions

III. Forecasting functions

IV. Joins

V. SQL extensions

VI. Spark machine learning

Learn more about data lakes in the IBM Cloud

More from Analytics

IBM acquires StreamSets, a leading real-time data integration company

Fine-tune your data lineage tracking with descriptive lineage

Reimagine data sharing with IBM Data Product Hub

IBM Newsletters