Important:

IBM Cloud Pak® for Data Version 4.8 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.
Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.8 reaches end of support. For more information, see Upgrading from IBM Cloud Pak for Data Version 4.8 to IBM Software Hub Version 5.1.

Creating synthetic data from imported data

Supported data sources for Synthetic Data Generator.

Using Synthetic Data Generator, you can connect to your data no matter where it lives, using either connectors or data files.

Data size

The Synthetic Data Generator environment can import up to ~2.5GB of data.

Connectors

The following table lists the data sources that you can connect to using Synthetic Data Generator.

Connector Read Only Read & Write Notes
Amazon RDS for MySQL Replace the data set option isn't supported for this connection.
Amazon RDS for PostgreSQL Replace the data set option isn't supported for this connection.
Amazon Redshift
Amazon S3
Apache Cassandra
Apache Derby
Apache HDFS (formerly known as "Hortonworks HDFS")
Apache Hive
Box
Cloudera Impala
DataStax Enterprise
Dropbox
FTP (remote file system transfer)
Google BigQuery
Google Cloud Storage
Greenplum
HTTP
IBM Cloud Object-Storage
IBM Cloud Object-Storage (infrastructure)
IBM Cloud Data Engine
IBM Cloud Databases for MongoDB
IBM Cloud Databases for MySQL
IBM Cloud Databases for PostgreSQL
IBM Cloudant
IBM Cognos-Analytics
IBM Data Virtualization Manager for z/OS
IBM Db2
IBM Db2 Big SQL
IBM Db2 for i
IBM Db2 for z/OS
IBM Db2 on Cloud
IBM Db2 Warehouse
IBM Informix
IBM Netezza Performance Server
IBM Planning Analytics (formerly known as "IBM TM1") Only the Replace the data set option is supported.
Looker
MariaDB
Microsoft Azure Blob Storage
Microsoft Azure Cosmos DB
Microsoft Azure Data Lake Storage
Microsoft Azure File Storage
Microsoft Azure SQL Database
Microsoft SQL Server SQL pushback isn't supported when Active Directory is enabled.
MongoDB
MySQL
OData
Oracle
PostgreSQL
Presto
Salesforce.com
SAP ASE
SAP IQ
SAP OData
SingleStoreDB
Snowflake
Tableau
Teradata

Data files

In addition to using data from remote data sources or integrated databases, you can use data from files. You can work with data from the following types of files using Synthetic Data Generator.

Connector Read Only Read & Write
AVRO
CSV/delimited
Excel (XLS, XLSX)
JSON
ORC
Parquet
SAS
SAV
SHP
XML