Batch deployment input details for SPSS models

Follow these rules when specifying input details for batch deployments of SPSS models.

Data type summary table:

Data	Description
Type	inline, data references
File formats	CSV

Data Sources:

Input/output data references:

Local/managed assets from the space
Connected (remote) assets from these sources:

Notes:

For connections of type Cloud Object Storage or Cloud Object Storage (infrastructure), you must configure Access key and Secret key, also known as HMAC credentials.

If you are specifying input/output data references programmatically:

Data source reference type depends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space.
SPSS jobs support multiple data source inputs and a single output. If the schema is not provided in the model metadata at the time of saving the model, you must enter id manually and select a data asset in the Watson Studio UI for each connection. If the schema is provided in model metadata, id names are auto populated using metadata. You just select the data asset for the corresponding ids in Watson Studio. For details, see Using multiple data sources for an SPSS job.
To create a local or managed asset as an output data reference, the name field should be specified for output_data_reference so that a data asset will be created with the specified name. Specifying an href that refers to an existing local data asset is not supported. Note that connected data assets referring to supported databases can be created in the output_data_references only when the input_data_references also refers to one of these sources.
Note that table names provided in input and output data references are ignored. Table names referred in SPSS model stream will be used during the batch deployment.
SQL PushBack allows you to generate SQL statements for native IBM SPSS Modeler operations that can be “pushed back” to (that is, executed in) the database in order to improve performance. SQL Pushback is only supported with:
- Db2
- SQL Server
- Netezza Performance Server
- PosgreSQL
- Oracle
- Snowflake
- Exasol

If you are creating a job using the Python client, you must provide the connection name referred in data nodes of SPSS model stream in the id field, and the data asset href in location.href for input/output data references of the deployment jobs payload. For example, you can construct the jobs payload like this:

job_payload_ref = {
    client.deployments.ScoringMetaNames.INPUT_DATA_REFERENCES: [{
        "id": "DB2Connection",
        "name": "drug_ref_input1",
        "type": "data_asset",
        "connection": {},
        "location": {
            "href": <input_asset_href1>
        }
    },{
        "id": "Db2 WarehouseConn",
        "name": "drug_ref_input2",
        "type": "data_asset",
        "connection": {},
        "location": {
            "href": <input_asset_href2>
        }
    }],
    client.deployments.ScoringMetaNames.OUTPUT_DATA_REFERENCE: {
            "type": "data_asset",
            "connection": {},
            "location": {
                "href": <output_asset_href>
            }
        }
    }

Using connected data or connection asset for an SPSS modeler flow job

An SPSS modeler flow can have a number of input and output data nodes. When connecting to a supported database as an input and output data source, note that the connection details are selected from the input and output data reference, but the input and output table names are selected from the SPSS model stream file.

To perform batch deployment of an SPSS model using a database connection, make sure the modeler stream Input and Output nodes are Data Assetor Connection Asset nodes. In SPSS Modeler, the Data Asset or Connection Asset nodes must be configured with the table names that will be used later for job predictions. Set the nodes and table names before you save the model to Watson Machine Learning. While configuring the Data Assetor Connection Asset nodes, choose the table name from the Connections; choosing a Data Asset or Connection Asset that is created in your project is currently not supported.

When creating the deployment job for the SPSS model, make sure the type of data sources are the same for input and output. The configured table names from the model stream will be passed to the batch deployment and the input/output table names provided in the connected data will be ignored.

To perform batch deployment of SPSS model using a Cloud Object Storage (COS) connection, make sure the SPSS model stream has single input and output data asset nodes.

Supported combinations of input and output sources

You must specify compatible sources for the SPSS Modeler flow input, the batch job input, and the output. If you specify an incompatible combination of types of data sources, you will get an error trying to execute the batch job.

These combinations are supported for batch jobs:

SPSS model stream input/output	Batch deployment job input	Batch deployment job output
File	Local/managed or referenced data asset or connection asset (file)	Remote data asset or connection asset (file) or name
Database	Remote data asset or connection asset (database)	Remote data asset or connection asset (database)

For details on how Watson Studio connects to data, see Accessing data.

Specifying multiple inputs

If you are specifying multiple inputs for an SPSS model stream deployment with no schema, specify an ID for each element in input_data_references.

For details, see Using multiple data sources for an SPSS job.

In this example, when you create the job, provide three input entries with ids: "sample_db2_conn", "sample_teradata_conn" and "sample_googlequery_conn" and select the required connected data for each input.

{
"deployment": {
    "href": "/v4/deployments/<deploymentID>"
  },
  "scoring": {
        "input_data_references": [{
               "id": "sample_db2_conn",              
               "name": "DB2 connection",
               "type": "data_asset",      
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           },
           {
               "id": "sample_teradata_conn",          
               "name": "Teradata connection",
               "type": "data_asset",      
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           },
           {
               "id": "sample_googlequery_conn",        
               "name": "Google bigquery connection",
               "type": "data_asset",      
               "connection": {},
               "location": {
                     "href": "/v2/assets/<asset_id>?space_id=<space_id>"
               },
           }],
        "output_data_references": {
                    "id": "sample_db2_conn",
                "type": "data_asset",
                "connection": {},
                "location": {
                    "href": "/v2/assets/<asset_id>?space_id=<space_id>"
                },
          }
}

Notes The environment variables parameter of deployment jobs is not applicable.

Parent topic: Batch deployment input details by framework