Important:

IBM Cloud Pak® for Data Version 4.8 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.
Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.8 reaches end of support. For more information, see Upgrading from IBM Cloud Pak for Data Version 4.8 to IBM Software Hub Version 5.1.

Choosing a tool in projects (Watson Studio)

With Watson Studio and its complimentary services, projects provide a range of tools for users with all levels of experience in preparing, analyzing, and modeling data, from beginner to expert. The right tool for you depends on the type of data you have, the tasks you plan to do, and the amount of automation you want.

To pick the right tool, consider these factors.

The type of data you have

Tabular data in delimited files or relational data in remote data sources
Image files
Textual (unstructured) data in documents

The type of tasks you need to do

Prepare data: cleanse, shape, visualize, organize, and validate data.
Analyze data: identify patterns and relationships in data, and display insights.
Build models: build, train, test, and deploy models to make predictions or optimize decisions.

How much automation you want

Code editor tools: Use to write code in Python or R, all also with Spark.
Graphical builder tools: Use menus and drag-and-drop functionality on a builder to visually program.
Automated builder tools: Use to configure automated tasks that require limited user input.

Find the right tool:

Tools for tabular or relational data
Tools for textual data
Tools for image data
Accessing tools

Tools for tabular or relational data

The following table shows the tools for tabular or relational data by task:

Tools for tabular or relational data
Tool	Tool type	Prepare data	Analyze data	Build models
Jupyter notebook editor	Code editor	✓	✓	✓
JupyterLab	Code editor	✓	✓	✓
RStudio	Code editor	✓	✓	✓
Masking flows	Automated builder	✓
Data Refinery	Graphical builder	✓	✓
Synthetic Data Generator	Graphical builder	✓
Data Replication	Graphical builder	✓
Dashboard editor	Graphical builder		✓
SPSS Modeler	Graphical builder	✓	✓	✓
Decision Optimization model builder	Graphical builder and code editor	✓		✓
AutoAI	Automated builder	✓		✓
Federated Learning	Automated builder			✓
Metadata import	Automated builder	✓
Metadata enrichment	Automated builder	✓	✓
Data quality rule	Automated builder and code editor		✓
IBM Match 360 with Watson	Automated builder	✓
Watson Pipelines	Graphical builder	✓	✓	✓

Tools for textual data

The following table shows the tools for building a model that works with textual data:

Tools for textual data
Tool	Code editor	Graphical builder
Prompt Lab		✓
Jupyter notebook editor	✓
JupyterLab	✓
RStudio	✓
SPSS Modeler		✓
Experiment builder		✓
Watson Pipelines		✓

Tools for image data

The following table shows the tools for building a model that classifies images:

Tools for image data
Tool	Code editor	Graphical builder
Jupyter notebook editor	✓
JupyterLab	✓
RStudio	✓
Experiment builder		✓
Watson Pipelines		✓

Accessing tools

To use a tool, you must create an asset specific to that tool, or open an existing asset for that tool. To create an asset, click New asset or Import assets and then choose the asset type you want. This table shows the asset type to choose for each tool.

Tools to asset type mapping
To use this tool	Choose this asset type
Prompt Lab	Prompt Lab
Synthetic Data Generator	Synthetic Data Generator
Jupyter notebook editor	Jupyter notebook
Masking flows	Masking flows
Data Refinery	Data Refinery flow
Data Replication	Data Replication
Dashboard editor	Dashboard
SPSS Modeler	Modeler flow
Decision Optimization model builder	Decision Optimization
AutoAI	AutoAI experiment
Experiment builder	Experiment
Federated Learning	Federated Learning experiment
Metadata import	Metadata import
Metadata enrichment	Metadata enrichment
Data quality rules	Data quality rule
IBM Match 360 with Watson	Master data configuration

To edit notebooks with RStudio, click Launch IDE > RStudio.

To edit notebooks with JupyterLab, click Launch IDE > JupyterLab.

Prompt Lab

Use the Prompt Lab to experiment with prompting foundation models.

Required service: watsonx.ai
Data format: Text
Data size: Limited by the context length for the model.
How you can engineer prompts: Enter prompts and generate responses from the selected large language model.; Run sample prompts.; Save prompt templates to deploy in your generative AI solution.; Save prompts as Python code in a notebook.
Learn more: Documentation about Prompt Lab; Quick start: Prompting a foundation model with Prompt Lab

Synthetic Data Generator

Use the Synthetic Data Generator to generate synthetic tabular data.

Required services: Synthetic Data Generator; watsonx.ai
Data format: Relational: Tables in relational data sources or files. See Creating synthentic data from imported data.
Data size: ~2.5 GB
How you can generate synthetic data: Mask and mimic production data; Generate synthetic data from a custom data schema
Learn more: Documentation about Synthetic Data Generator; Quick start: Generate synthetic tabular data

Jupyter notebook editor

Use the Jupyter notebook editor to create a notebook in which you run code to prepare, visualize, and analyze data, or build and train a model.

Required services: Watson Studio; Watson Studio runtimes
Data format: Any
Data size: Any
How you can prepare data, analyze data, or build models: Write code in Python or R, all also with Spark.; Include rich text and media with your code.; Work with any kind of data in any way you want.; Use preinstalled or install other open source and IBM libraries and packages.; Schedule runs of your code; Import a notebook from a file or a URL.; Share read-only copies of your notebook externally.
Get started: To create a notebook, click New asset > Jupyter notebook editor.
Learn more: Documentation about notebooks

Data Refinery

Use Data Refinery to prepare and visualize tabular data with a graphical flow editor. You create and then run a Data Refinery flow as a set of ordered operations on data.

Required services: Watson Studio or IBM Knowledge Catalog
Data format: Tabular: Avro, CSV, JSON, Microsoft Excel (xls and xlsx formats. First sheet only, except for connections and connected data assets.), Parquet, SAS with the "sas7bdat" extension (read only), TSV (read only), or delimited text data asset; Relational: Tables in relational data sources
Data size: Any
How you can prepare data: Cleanse, shape, organize data with over 60 operations.; Save refined data as a new data set or update the original data.; Profile data to validate it.; Use interactive templates to manipulate data with code operations, functions, and logical operators.; Schedule recurring operations on data.
How you can analyze data: Identify patterns, connections, and relationships within the data in multiple visualization charts.
Get started: To create a Data Refinery flow, click New asset > Data Refinery. The Data Refinery tile is in the Graphical builders section.
Learn more: Documentation about Data Refinery

Data Replication

Use Data Replication to integrate and synchronize data. Data Replication provides near-real-time data delivery with low impact to sources.

Required service: Data Replication
Related service: IBM Knowledge Catalog
Data formats: Data Replication works with connections to and from select types of data sources and formats. For more information, see Supported Data Replication connections.
Credentials: Data Replication uses your IBM Cloud credentials to connect to the service.

Dashboard editor

Use the Dashboard editor to create a set of visualizations of analytical results on a graphical builder.

Required service: Cognos Dashboard
Data format: Tabular: CSV files; Relational: Tables in some relational data sources
Data size: Any size
How you can analyze data: Create graphs without coding.; Include text, media, web pages, images, and shapes in your dashboard.
Get started: To create a dashboard, click New asset > Dashboard editor. The Dashboard editor tile is in the Graphical builders section.
Learn more: Documentation about dashboards

SPSS Modeler

Use SPSS Modeler to create a flow to prepare data and build and train a model with a flow editor on a graphical builder.

Required services: SPSS Modeler; Watson Studio
Data formats: Relational: Tables in relational data sources; Tabular: Excel files (.xls or .xlsx), CSV files, or SPSS Statistics files (.sav); Textual: In the supported relational tables or files
Data size: Any
How you can prepare data: Use automatic data preparation functions.; Write SQL statements to manipulate data.; Cleanse, shape, sample, sort, and derive data.
How you can analyze data: Visualize data with over 40 graphs.; Identify the natural language of a text field.
How you can build models: Build predictive models.; Choose from over 40 modeling algorithms.; Use automatic modeling functions.; Model time series or geospatial data.; Classify textual data.; Identify relationships between the concepts in textual data.
Get started: To create an SPSS Modeler flow, click New asset > SPSS Modeler.
Learn more: Documentation about SPSS Modeler

Decision Optimization model builder

Use Decision Optimization to build and run optimization models in the Decision Optimization modeler or in a Jupyter notebook.

Required service: Decision Optimization; Watson Studio
Data formats: Tabular: CSV files
Data size: Any
How you can prepare data: Import relevant data into a scenario and edit it.
How you can build models: Build prescriptive decision optimization models.; Create, import and edit models in Python DOcplex, OPL or with natural language expressions.; Create, import and edit models in notebooks.
How you can solve models: Run and solve decision optimization models using CPLEX engines.; Investigate and compare solutions for multiple scenarios.; Create tables, charts and notes to visualize data and solutions for one or more scenarios.
Get started: To create a Decision Optimization model, click New asset > Decision Optimization, or for notebooks click New asset > Jupyter notebook editor.
Learn more: Documentation about Decision Optimization

AutoAI tool

Use the AutoAI tool to automatically analyze your tabular data and generate candidate model pipelines customized for your predictive modeling problem.

Required services: Watson Machine Learning; Watson Studio
Data format: Tabular: CSV files
Data size: Depends on model type. See AutoAI Overview for details.
How you can prepare data: Automatically transform data, such as impute missing values and transform text to scalar values.
How you can build models: Train a binary classification, multiclass classification, or regression model.; View a tree infographic that shows the sequences of AutoAI training stages.; Generate a leaderboard of model pipelines ranked by cross-validation scores.; Save a pipeline as a model.
Get started: To create an AutoAI experiment, click New asset > AutoAI.
Learn more: Documentation about AutoAI

Deep Learning Experiment builder

Use the Deep Learning Experiment builder to build deep learning experiments and run hundreds of training runs. This method requires that you provide code to define the training run. You run, track, store, and compare the results in the Experiment Builder graphical interface, then save the best configuration as a model.

Required services: Watson Studio; Watson Machine Learning; Watson Machine Learning Accelerator
Data format: Textual: CSV files with labeled textual data; Image: Image files in a PKL file. For example, a model testing signatures uses images resized to 32×32 pixels and stored as numpy arrays in a pickled format.
Data size: Any size
How you can build models: Write Python code to specify metrics for training runs.; Write a training definition in Python code.; Define hyperparameters, or choose the RBFOpt method or random hyperparameter settings.; Find the optimal values for large numbers of hyperparameters by running hundreds or thousands of training runs.; Run distributed training with GPUs and specialized, powerful hardware and infrastructure.; Compare the performance of training runs.; Save a training run as a model.
Get started: To create an experiment, click New asset > Experiment.; Documentation about Experiment builder

Federated Learning

Use the Federated Learning tool to train a common model using distributed data. The data is never combined or shared, preserving data integrity while providing all participating parties with a model based on the aggregated data.

Required services: Watson Studio; Watson Machine Learning
Data format: Any
Data size: Any size
How you can build models: Choose a training framework.; Configure the common model.; Configure a file for training the common model.; Have remote parties train their data.; Deploy the common model.
Get started: To create an experiment, click New asset > Federated Learning.
Learn more: Documentation about Federated Learning

Metadata import

Use the metadata import tool to automatically discover and import technical and process metadata for data assets into a project or a catalog.

Required service: IBM Knowledge Catalog
Data format: Any
Data size: Any size
How you can prepare data: Import data assets from a connection to a data source.
Get started: To import metadata, click New asset > Metadata import.
Learn more: Documentation about metadata import

Metadata enrichment

Use the metadata enrichment tool to automatically profile data assets and analyze data quality in a project.

Required service: IBM Knowledge Catalog
Data format: Relational and structured: Tables and files in relational and nonrelational data sources; Tabular: Avro, CSV, or Parquet files
Data size: Any size
How you can prepare and analyze data: Profile and analyze a select set of data assets in a project.
Get started: To enrich data, click New asset > Metadata enrichment.
Learn more: Documentation about metadata enrichment

Data quality rule

Use the data quality tool to create rules that analyze data quality in a project.

Required service: IBM Knowledge Catalog
Data format: Relational and structured: Tables and files in relational and nonrelational data sources; Tabular: Avro, CSV, or Parquet files
Data size: Any size
How you can prepare and analyze data: Analyze the quality of a select set of data assets in a project.
Get started: To create a data quality rule, click New asset > Data quality rule.
Learn more: Documentation about data quality rules

IBM Match 360 with Watson

Use IBM Match 360 with Watson to create master data entities that represent digital twins of your customers. Model and map your data, then run the matching algorithm to create master data entities. Customize and tune your matching algorithm to meet your organization's requirements.

Required services: IBM Match 360 with Watson IBM Knowledge Catalog
Data size: Any
How you can prepare data: Model and map data from sources across your organization.; Run the customizable matching algorithm to create master data entities.; View and edit master data entities and their associated records.
Get started: To create an IBM Match 360 configuration asset, click New asset > Master data configuration.
Learn more: Documentation about IBM Match 360 with Watson

RStudio IDE

Use RStudio IDE to analyze data or create Shiny applications by writing R code. RStudio can be integrated with a Git repository which must be associated with the project.

Required services: RStudio runtimes; Watson Studio
Data format: Any
Data size: Any size
How you can prepare data, analyze data, and build models: Write code in R.; Create Shiny apps.; Use open source libraries and packages.; Include rich text and media with your code.; Prepare data.; Visualize data.; Discover insights from data.; Build and train a model using open source libraries.; Share your Shiny app in a Git repository.
Get started: To use RStudio, click Launch IDE > RStudio.
Learn more: Documentation about RStudio

JupyterLab

Use the JupyterLab IDE to create a notebook or Python script in which you run code to prepare, visualize, and analyze data, or build and train a model. JupyterLab is integrated with a Git repository which must be associated with the project.

Required services: Watson Studio; Watson Studio runtimes
Data format: Any
Data size: Any
How you can prepare data, analyze data, or build models: Write code in Python.; Include rich text and media with your code.; Work with any kind of data in any way you want.; Use preinstalled or install other open source and IBM libraries and packages.; Import a notebook from a file.; Share your notebook or script in a Git repository.
Get started: To use JupyterLab, click Launch IDE > JupyterLab.
Learn more: Documentation about JupyterLab

Masking flows

Use the Masking flow tool to prepare masked copies or masked subsets of data from the catalog. Data is de-identified using advanced masking options with data protection rules.

Required service: IBM Knowledge Catalog
Data format: Relational: Tables in relational data sources
Data size: Any size
How you can prepare data, analyze data, or build models: Import data assets from governed catalog to project.; Create masking flow job definitions to specify what data to mask with data protection rules.; Optionally subset data to reduce size of copied data.; Run masking flow jobs to load masked copies to target database connections.
Get started: Ensure that pre-requisite steps in IBM Knowledge Catalog are completed. To privatize data, do one of the following tasks:

Click New asset > Masking flow.
Click the menu options for individual data assets to mask that asset directly.

Learn more: Documentation about masking data

Watson Pipelines

Use the Pipelines canvas editor to create a flow to prepare, visualize, and analyze data, or build and train a model.

Required service: IBM Knowledge Catalog or Watson Studio
Data format: Any
Data size: Any
How you can prepare data, analyze data, or build models: Use a variety of nodes that each contain their own logs.; Incorporate notebooks into the flow to run any Python or R code.; Work with any kind of data in any way you want.; Schedule runs of your flow.; Import data from your mounted PVC, project, or ingest data from Github.; Create your custom component with a Python code.; Conditionalize your pipelines to monitor data quality however you want.; Use webhook to send emails or messages to keep up to date on the status of your flow.
Get started: To create a new pipeline, click New asset > Pipelines.

Data visualizations

Use data visualizations to discover insights from your data. By exploring data from different perspectives with visualizations, you can identify patterns, connections, and relationships within that data and quickly understand large amounts of information.

Required service: IBM Knowledge Catalog or Watson Studio
Data format: Tabular: Avro, CSV, JSON, Parquet, TSV, SAV, Microsoft Excel .xls and .xlsx files, SAS, delimited text files, and connected data. For more information about supported data sources, see Connectors.
Data size: No limit
Get started: To create a visualization, click Data asset in the list of asset types in your project, and select a data asset. Click the Visualization tab and choose a chart type.
Learn more: Visualizing your data

Parent topic: Projects