Asset types and properties
An asset is an item that contains information about data, other valuable information, or another item that works with data. You can create assets by working with tools in collaborative workspaces or by writing code.
You add assets by importing them or creating them with tools. You work with assets in collaborative workspaces. The workspace that you need depends on your tasks and which platform experience that you are working in. The primary workspace for working with assets across all experiences is the project. The other most common workspaces that are available in most of the experiences are catalogs and deployment spaces. Each experience has other workspaces for specialized tasks.
| Workspace | Description | Cloud Pak for Data | watsonx | Data Fabric | watsonx.data Premium |
|---|---|---|---|---|---|
| Projects | Where you collaborate with others to work with data and create assets. | ✔️ | ✔️ | ✔️ | ✔️ |
| Catalogs | Where you store assets to share with your organization or go to find assets that you need to work with. | ✔️ | ✔️ | ✔️ | |
| Deployment spaces | Where you deploy and run assets that are ready for testing or production. | ✔️ | ✔️ |
You can find any asset in any of the workspaces for which you are a collaborator by searching for it from the global search bar.
You can create many different types of assets.
Asset types
The following table lists the types of assets that you can create, the tools you need to create them, and the workspaces where you can add them.
| Asset type | Description | Tools to create it | Workspaces |
|---|---|---|---|
| AI use case | Tracks the lifecycle of a model from request to production. | AI Factsheets | Inventories |
| AutoAI experiment | Automatically generates candidate predictive model pipelines. | AutoAI | Projects |
| COBOL copybook | Displays the map metadata for connected data assets from z/OS mainframe computers. | Metadata import tool | Projects, Catalogs |
| Code package | Contains an executable file and supporting files. | JupyterLab, RStudio | Projects, Spaces |
| Connected data asset | Represents data that is accessed through a connection to a remote data source. | Connected data tool, Metadata import tool | Projects, Catalogs, Spaces |
| Connection | Contains the information to connect to a data source. | Connection tool | Projects, Catalogs, Spaces |
| Dashboard | Visualizes data in interactive graphs without code. | Dashboard editor | Projects, Catalogs |
| Data asset from a file | Represents a file that you uploaded from your local system. | Upload pane | Projects, Catalogs, Spaces |
| Data integration assets | Describe the components of ETL jobs. | Metadata import tool | Catalogs |
| Data Refinery flow | Prepares data. | Data Refinery | Projects, Spaces |
| Data Replication flow | Replicates data. | Data Replication | Projects, Catalogs |
| Data quality definition | Defines a reusable rule logic component for data quality rules. | Data quality definition editor | Projects, Catalogs |
| Data quality rule | Evaluates data quality for specific conditions. | Data quality rule editor | Projects |
| DataStage build stage | Defines a reusable build stage component for DataStage flows. | DataStage component editor | Projects, Spaces |
| DataStage custom stage | Defines a reusable custom stage component for DataStage flows. | DataStage component editor | Projects, Spaces |
| DataStage data definition | Defines a reusable column metadata component for DataStage flow jobs. | DataStage component editor | Projects, Spaces |
| DataStage flow | Transforms and integrates data. | DataStage flow editor | Projects, Spaces |
| DataStage function library | Defines a reusable custom function component for DataStage flows. | DataStage component editor | Projects, Spaces |
| DataStage Java library | Collects a reusable set of JAR files for DataStage flows. | DataStage component editor | Projects, Spaces |
| DataStage match specification | Defines a reusable criteria component for a matching strategy in DataStage flows. | DataStage component editor | Projects, Spaces |
| DataStage Operational Decision Manager stage | Defines a reusable set of complex business rules for DataStage flow jobs. | DataStage component editor | Projects, Spaces |
| DataStage schema library | Imports a reusable set of resources for DataStage flows. | DataStage component editor | Projects, Spaces |
| DataStage standardization rule | Defines a reusable rule component to format data in DataStage flows. | DataStage component editor | Projects, Spaces |
| DataStage subflow | Defines a reusable set of stages and connectors for DataStage flows. | DataStage component editor | Projects, Spaces |
| DataStage wrapped stage | Defines a reusable UNIX command for a DataStage stage. | DataStage component editor | Projects, Spaces |
| Decision Optimization experiment | Solves optimization problems. | Decision Optimization | Projects |
| Deep learning experiment | Runs hundreds of experiment training runs. | Experiment builder | Projects |
| Dynamic view | Represents data that is accessed through a connection to a remote data source and filtered by an SQL query. | Query tool | Projects, Catalogs |
| Folder asset | Represents a folder in IBM Cloud Object Storage. | Connected data tool | Projects, Catalogs, Spaces |
| Jupyter notebook | Runs Python or R code to analyze data or build models. | Jupyter notebook editor, AutoAI, JupyterLab, Visual Studio Code | Projects, Catalogs |
| Lineage relationship mapping files | Defines a mapping group that contains mappings. | Manual upload only | Catalogs |
| Logical data model asset types | Visualize a logical data model. | Metadata import tool | Catalogs |
| Masking flow | Creates masked copies of data assets. | Masking flow | Projects |
| Master data configuration | Configures Match 360. | Match 360 | Projects |
| Metadata enrichment | Enriches imported asset metadata. | Metadata enrichment tool | Projects |
| Metadata import | Imports asset metadata from a connection. | Metadata import tool | Projects |
| Model | Contains information about a saved or imported model. | Various tools that run experiments or train models | Projects, Catalogs, Spaces |
| Parameter set | Collects a reusable set of job parameters for DataStage jobs. | Parameter set editor | Projects |
| Physical constraint | Represents primary or foreign key constraints for data assets. | Are created automatically when metadata enrichment results are published that contain key relationships | Catalogs |
| Physical data model asset types | Visualize a physical data model. | Metadata import tool | Catalogs |
| Pipeline | Automates the model lifecycle. | Orchestration Pipelines | Projects |
| Python function | Contains Python code to support a model in production. | Jupyter notebook editor, JupyterLab | Projects, Spaces |
| Report asset types | Organize business intelligence reports. | Metadata import tool | Catalogs |
| Script | Contains a Python or R script to support a model in production. | Jupyter notebook editor, RStudio, JupyterLab | Projects, Spaces |
| Shiny app | Contains an interactive data visualization dashboard. | RStudio | Projects, Spaces |
| SPSS Modeler flow | Runs a flow to prepare data and build a model. | SPSS Modeler | Projects |
| Transformation script asset types | Describe data transformations. | Metadata import tool | Catalogs |
| Visualization | Shows visualizations from a data asset. | Visualization page in data assets | Projects |
Common properties for assets
Assets accumulate information in properties when you create them, use them, or when they are updated by automated processes. Some properties are provided by users and can be edited by users. Other properties are automatically provided by the system. Most system-provided properties can't be edited by users.
The Last modified field for an asset tracks both user actions and system actions. System actions often occur in the background and might involve only changes to the asset's internal metadata.
Common properties for assets everywhere
Most types of assets have the properties that are listed in the following table in all the workspaces where those asset types exist.
Some of the properties are shared properties.
| Property | Description | Editable? |
|---|---|---|
| Name | The asset name. Can contain up to 255 characters. Supports multibyte characters. Cannot be empty, contain Unicode control characters, or contain only blank spaces. Asset names do not need to be unique within a project or deployment space. Whether asset names must be unique in a catalog depends on the duplicate handling method set for the catalog. | Yes |
| Description | Optional. Supports multibyte characters and hyperlinks. | Yes |
| Creation date | The timestamp of when the asset was created or imported. | No |
| Creator or Owner | The username or email address of the person who created or imported the asset. | No |
| Last modified date | The timestamp of when the asset was last modified. | No |
| Last editor | The username or email address of the person who last modified the asset. | No |
Common properties for assets in catalogs
In addition to the common properties that all assets have, assets in catalogs have the properties and pages that are listed in the following table.
Some of the properties are shared properties.
| Property or page | Description | Editable? |
|---|---|---|
| Asset page | A view of the contents of the asset. | No |
| Privacy | Set to public by default. This setting can restrict access to an asset in a catalog when set to private. Only the owner and members of the asset can view and use private assets. | Yes |
| Access page | The owner and members of the asset. By default, the asset owner is the user who added the asset to the catalog. The asset members can view and use the asset when it is marked private. | Yes |
| Ratings page | Optional. Catalog collaborators can rate and review assets. | Yes |
| Tags | Optional. Text labels that catalog collaborators create to simplify searching. A tag consists of one string of up to 255 characters. It can contain spaces, letters, numbers, underscores, dashes, and the symbols # and @. | Yes |
| Relationships | Optional. Relationships that appear in the Related items section of the asset Overview page are informational and do not have other effects on the asset. Can be between assets in the same workspace or different workspaces. For example, you can add a relationship between an asset in a catalog and an asset in a project. Can be between an asset and an artifact. For example, you can add a relationship between an asset and a policy. Administrators can create custom relationships for assets. | Yes |
| Governance artifacts | Optional. The business terms and classification that users assigned to the asset. These assignments can affect the asset. For example, an assigned business term can trigger the enforcement of a data protection rule. | Yes |
You can create custom properties for asset types. Custom properties are shown in the Details section on the asset's Overview tab in the catalog.
Common properties for assets that run in tools
Some assets are associated with running a tool. For example, an AutoAI experiment asset runs in the AutoAI tool. Assets that run in tools are also known as operational assets. Every time that you run assets in tools, you start a job. You can monitor and schedule jobs. Jobs use compute resources.
For many assets that run in tools, you have a choice of the compute environment configuration to use. Typically, larger and faster environment configurations consume compute resources faster.
In addition to basic properties, most assets that run in tools contain the following types of information in projects:
| Properties | Description | Editable? | Workspaces |
|---|---|---|---|
| Environment definition | The environment template, hardware specification, and software specification for running the asset. | Yes | Projects, Spaces |
| Settings | Information that defines how the asset is run. Specific to each type of asset. | Yes | Projects |
| Associated data assets | The data that the asset is working on. | Yes | Projects |
| Jobs | Information about how to run the asset, including the environment definition, schedule, and notification options. | Yes | Projects, Spaces |
Data asset types and their properties
Data asset types contain metadata and other information about data, including how to access the data.
How you create a data asset depends on where your data is:
-
If your data is in a file, you upload the file from your local system to a workspace.
-
If your data is in a remote data source, you first create a connection asset that defines the connection to that data source. Then, you create a data asset by selecting the connection, the path or other structure, and the table or file that contains the data. This type of data asset is called a connected data asset.
For data sources that support SQL queries, you can also create dynamic views, which are data assets of the type Query. To create such an asset, select the connection and provide an SQL query that retrieves only the data that you need.
The following graphic illustrates how data assets from files point to uploaded files in storage. Connected data assets require a connection asset and point to data in a remote data source.
You can create the following types of data assets in workspaces:
- Data asset from a file
-
Represents a file that you uploaded from your local system. The file is stored in the storage container that is associated with the workspace. The contents of the file can include structured data, unstructured textual data, images, and other types of data. You can create a data asset with a file of any format. However, you can do more actions on CSV files than other file types.
-
You can create a data asset from a file by uploading a file in a workspace. You can also create data files with tools and convert them to assets. For example, you can create data assets from files with the Data Refinery, Jupyter notebook, and RStudio tools.
- Connected data asset
-
Represents a table, file, or folder that is accessed through a connection to a remote data source. The connection is defined in the connection asset that is associated with the connected data asset. You can create a connected data asset for every supported connection. When you access a connected data asset, the data is dynamically retrieved from the data source.
-
You can add the same connected data asset to more than one catalog. As a result, the same physical asset that resides in a remote data source is represented by multiple connected data assets in different workspaces. To facilitate governance, such assets reference a shared properties record. For more information, see Identical data assets.
-
You can import connected data assets from a data source with the connected data tool in a workspace. If you want to import sets of connected data assets, for example an entire database schema, use the metadata import tool in projects.
-
In projects, you can create dynamic views that contain filtered data from one or more tables in a data source by using the query data-access tool. After you create your SQL query data asset, you can publish it to a catalog. In catalogs, you can't edit the query. If you need to update the query, edit the SQL query asset in the project and publish it again.
- Folder asset
-
Represents a folder in IBM Cloud Object Storage. A folder data asset is a special case of a connected data asset. You create a folder data asset by specifying the path to the folder and the IBM Cloud Object Storage connection asset. You can view the files and subfolders that share the path with the folder data asset. The files that you can view within the folder data asset are not themselves data assets. For example, you can create a folder data asset for a path that contains news feeds that are continuously updated.
-
You can import folder assets from IBM Cloud Object Storage with the connected data tool in a workspace.
- Connection asset
-
Contains the information necessary to create a connection to a data source.
-
You can create connections with the connection tool in a workspace.
Properties of data assets from files and connected data assets
In addition to basic properties and common catalog properties, data assets from files and connected data assets have the properties or pages that are listed in the following table.
Some of the properties are shared properties.
| Property or page | Description | Editable? | Workspaces |
|---|---|---|---|
| Columns | A summary of the properties of the columns in the data asset. Includes the quality score, description, assigned data classes, and assigned business terms for each column. The assigned data classes and business terms can affect the asset.
For example, an assigned business term can trigger the enforcement of a data protection rule. Primary key and key relationship information: • A column that is set as the primary key is identified by a key icon ( • If key relationships exist for the asset, you can click the View key relationships link. On the Parent of tab, you see all relationships for the primary key. On the Child of tab, you see all relationships for which the asset contains a foreign key. |
No | Catalogs |
| Tags | Optional. Text labels that users create to simplify searching. A tag consists of one string of up to 255 characters. It can contain spaces, letters, numbers, underscores, dashes, and the symbols # and @. | Yes | Projects, Catalogs |
| Format | The MIME type of a file. Automatically detected. | Yes | Projects, Catalogs, Spaces |
| Asset details | Information about the size of the data, the number of columns and rows, and the asset version. In projects, also the table type of relational data is shown. |
No | Projects, Catalogs, Spaces |
| Source | Information about the data file in storage or the data source and connection. | No | Catalogs, Spaces |
| Query | SQL query that generates the asset. Dynamic views only. |
Yes | Projects, Catalogs |
| Connection details | For connected data assets, the path, the connection name, the type of connector, and the connection owner. For dynamic views, only the connection name and the connector type are shown. |
No | Projects |
| Activities pane | The history of actions performed on the asset in all workspaces. | No | Projects, Catalogs |
| Asset page | A preview of the data that includes a limited set of columns and rows from the original data source. | No | Projects, Catalogs, Spaces |
| Profile page | Metadata and statistics about the content of the data. For example, when an enriched asset is published to a catalog, the expanded metadata is also published, and Display name and Description, which can be an AI-generated or an edited version, show on this page. This information is also surfaced on the Overview page. | Yes | Projects, Catalogs |
| Lineage page | A graphical depiction of the origin, transformations, and destination of data. | No | Catalogs |
| Data quality page | Information about the data quality of an asset and its columns, and the data quality checks that were applied. | Yes * | Projects, Catalogs |
| Visualizations page | Charts and graphs that users create to understand the data. | Yes | Projects |
| Feature group page | Information about which columns in the data asset are used as features in models. | Yes | Projects, Catalogs, Spaces |
- Projects only
Properties of connection assets
The properties of connection assets depend on the data source that you select when you create a connection. Connection assets for most data sources have the properties that are listed in the following table.
| Properties | Description | Editable? | Workspaces |
|---|---|---|---|
| Connection details | The information that identifies the data source. For example, the database name, hostname, IP address, port, instance ID, bucket, endpoint URL, and so on. | Yes | Projects, Catalogs, Spaces |
| Credential setting | Whether the credentials are shared across the platform (default) or each user must enter their personal credentials. Not all data sources support personal credentials. | Yes | Projects, Catalogs, Spaces |
| Authentication method | The format of the credentials information. For example, an API key or a username and password. | Yes | Projects, Catalogs, Spaces |
| Credentials | The username and password, API key, or other credentials, as required by the data source and the specified authentication method. | Yes | Projects, Catalogs, Spaces |
| Certificates | Whether the data source port is configured to accept SSL connections and other information about the SSL certificate. | Yes | Projects, Catalogs, Spaces |
| Secrets from a vault | Whether to store personal credentials as secrets in a vault. Not all data source services support vaults. | No | Projects, Catalogs, Spaces |