Batch deployment input details for AutoAI models
Follow these rules when you are specifying input details for batch deployments of AutoAI models.
Data type summary table:
Data | Description |
---|---|
Type | inline, data references |
File formats | CSV , Parquet (added in release 4.5.3) , xslx (added in release 4.5.3) |
Data Sources
Input/output data references:
- Local/managed assets from the space
- Connected (remote) assets from these sources:
- DataStax (added in release 4.5.2)
- Exasol (added in release 4.5.2)
- Generic JDBC
- Snowflake
- Db2
- MySQL
- Microsoft SQL Server
- PostgreSQL
- Netezza Performance Server
- Amazon S3
- Oracle
- Google BigQuery
- Teradata
- Files in IBM Cloud Object Storage
- Files in IBM Cloud Object Storage (infrastructure)
- Files in a Storage Volume Connection
Notes:
-
Until release 4.5.3, in order to use remote assets for batch deployments you had to first create a connected data asset in the space. The only exception was for remote assets located in COS; you could use the COS connection directly to select assets while creating a deployment. From release 4.5.3 onwards, the connection option is available for all supported data source types. While creating a batch deployment you can select any connection in your space, and then search for the assets within that connection directly from the UI.
-
For connections of type Cloud Object Storage or Cloud Object Storage (infrastructure), you must configure Access key and Secret key, also known as HMAC credentials.
- Your training data source can differ from your deployment data source, but the schema of the data must match or the deployment will fail. For example, you can train an experiment by using data from a Snowflake database and deploy by using input data from a Db2 database if the schema is an exact match.
- The environment variables parameter of deployment jobs is not applicable.
- If you are deploying a model where you joined data sources to train the experiment, choose an input source that corresponds to each of the training data sources when you create the batch deployment job. For an example, refer to the deployment section of Joining data tutorial.
- For models trained with joined data, Oracle, Google BigQuery, and Teradata are not supported as input data references.
Action required: AutoAI experiments with joined data deprecated
The AutoAI experiment feature for joining multiple data sources to create a single training data set is deprecated. Support for joining data in an AutoAI experiment will be removed in a future release. After support ends, AutoAI experiments with joined data and deployments of resulting models will no longer run. To join multiple data sources, use a data preparation tool such as Data Refinery or DataStage to join and prepare data, then use the resulting data set for training an AutoAI experiment. Redeploy the resulting model.
If you are specifying input/output data references programmatically:
- Data source reference
type
depends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space. - For AutoAI assets, if the input or output data reference is of type
connection_asset
and the remote data source is a database thenlocation.table_name
andlocation.schema_name
are required parameters. For example:
"input_data_references": [{
"type": "connection_asset",
"connection": {
"id": <connection_guid>
},
"location": {
"table_name": <table name>,
"schema_name": <schema name>
<other wdp-properties supported by runtimes>
}
}]
Parent topic: Batch deployment input details by framework