Batch deployment input details for SPSS models
Follow these rules when specifying input details for batch deployments of SPSS models.
Data type summary table:
| Data | Description |
|---|---|
| Type | inline, data references |
| File formats | CSV |
Data Sources:
Input/output data references:
- Local/managed assets from the space
- Connected (remote) assets from these sources:
Notes:
- For connections of type Cloud Object Storage or Cloud Object Storage (infrastructure), you must configure Access key and Secret key, also known as HMAC credentials.
If you are specifying input/output data references programmatically:
- Data source reference
typedepends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space. - SPSS jobs support multiple data source inputs and a single output. If the schema is not provided in the model metadata at the time of saving the model, you must enter
idmanually and select a data asset in the Watson Studio UI for each connection. If the schema is provided in model metadata,idnames are auto populated using metadata. You just select the data asset for the correspondingids in Watson Studio. For details, see Using multiple data sources for an SPSS job. - To create a local or managed asset as an output data reference, the
namefield should be specified foroutput_data_referenceso that a data asset will be created with the specified name. Specifying anhrefthat refers to an existing local data asset is not supported. Note that connected data assets referring to supported databases can be created in theoutput_data_referencesonly when theinput_data_referencesalso refers to one of these sources. - Note that table names provided in input and output data references are ignored. Table names referred in SPSS model stream will be used during the batch deployment.
- SQL PushBack allows you to generate SQL statements for native IBM SPSS Modeler operations that can be “pushed back” to (that is, executed in) the database in order to improve performance. SQL Pushback is only supported with:
- Db2
- SQL Server
- Netezza Performance Server
- PosgreSQL
- Oracle
- Snowflake
- Exasol
-
If you are creating a job using the Python client, you must provide the connection name referred in data nodes of SPSS model stream in the
idfield, and the data asset href inlocation.hreffor input/output data references of the deployment jobs payload. For example, you can construct the jobs payload like this:job_payload_ref = { client.deployments.ScoringMetaNames.INPUT_DATA_REFERENCES: [{ "id": "DB2Connection", "name": "drug_ref_input1", "type": "data_asset", "connection": {}, "location": { "href": <input_asset_href1> } },{ "id": "Db2 WarehouseConn", "name": "drug_ref_input2", "type": "data_asset", "connection": {}, "location": { "href": <input_asset_href2> } }], client.deployments.ScoringMetaNames.OUTPUT_DATA_REFERENCE: { "type": "data_asset", "connection": {}, "location": { "href": <output_asset_href> } } }
Using connected data or connection asset for an SPSS modeler flow job
An SPSS modeler flow can have a number of input and output data nodes. When connecting to a supported database as an input and output data source, note that the connection details are selected from the input and output data reference, but the input and output table names are selected from the SPSS model stream file.
To perform batch deployment of an SPSS model using a database connection, make sure the modeler stream Input and Output nodes are Data Assetor Connection Asset nodes. In SPSS Modeler, the Data Asset or Connection Asset nodes must be configured with the table names that will be used later for job predictions. Set the nodes and table names before you save the model to Watson Machine Learning. While configuring the Data Assetor Connection Asset nodes, choose the table name from the Connections; choosing a Data Asset or Connection Asset that is created in your project is currently not supported.
When creating the deployment job for the SPSS model, make sure the type of data sources are the same for input and output. The configured table names from the model stream will be passed to the batch deployment and the input/output table names provided in the connected data will be ignored.
To perform batch deployment of SPSS model using a Cloud Object Storage (COS) connection, make sure the SPSS model stream has single input and output data asset nodes.
Supported combinations of input and output sources
You must specify compatible sources for the SPSS Modeler flow input, the batch job input, and the output. If you specify an incompatible combination of types of data sources, you will get an error trying to execute the batch job.
These combinations are supported for batch jobs:
| SPSS model stream input/output | Batch deployment job input | Batch deployment job output |
|---|---|---|
| File | Local/managed or referenced data asset or connection asset (file) | Remote data asset or connection asset (file) or name |
| Database | Remote data asset or connection asset (database) | Remote data asset or connection asset (database) |
For details on how Watson Studio connects to data, see Accessing data.
Specifying multiple inputs
If you are specifying multiple inputs for an SPSS model stream deployment with no schema, specify an ID for each element in input_data_references.
For details, see Using multiple data sources for an SPSS job.
In this example, when you create the job, provide three input entries with ids: "sample_db2_conn", "sample_teradata_conn" and "sample_googlequery_conn" and select the required connected data for each input.
{
"deployment": {
"href": "/v4/deployments/<deploymentID>"
},
"scoring": {
"input_data_references": [{
"id": "sample_db2_conn",
"name": "DB2 connection",
"type": "data_asset",
"connection": {},
"location": {
"href": "/v2/assets/<asset_id>?space_id=<space_id>"
},
},
{
"id": "sample_teradata_conn",
"name": "Teradata connection",
"type": "data_asset",
"connection": {},
"location": {
"href": "/v2/assets/<asset_id>?space_id=<space_id>"
},
},
{
"id": "sample_googlequery_conn",
"name": "Google bigquery connection",
"type": "data_asset",
"connection": {},
"location": {
"href": "/v2/assets/<asset_id>?space_id=<space_id>"
},
}],
"output_data_references": {
"id": "sample_db2_conn",
"type": "data_asset",
"connection": {},
"location": {
"href": "/v2/assets/<asset_id>?space_id=<space_id>"
},
}
}
Notes The environment variables parameter of deployment jobs is not applicable.
Parent topic: Batch deployment input details by framework