IBM Support

How to allow SPSS Modeler Flow to read parquet file stored in Project Asset?

Question & Answer


Question

Could you tell me how to allow SPSS Modeler Flow to read parquet file stored in Project Asset? 
image-20230311143539-1

Answer

Use Extension Import node to build data source input from parquet file. 
You can use extension import node to read parquet file in SPSS Modeler Flow by creating a custom script for importing data from that file in the syntax form. When your script is ready, you can execute the Extension Import node to create data source input.
Following is the sample custom script to import data from parquet file stored in CP4D Project Asset.
# pyspark setting to create an SQL Spark Context
import spss.pyspark.runtime
from pyspark.sql import SQLContext
from pyspark.sql.types import * 

cxt = spss.pyspark.runtime.getContext()
sqlContext = cxt.getSparkSQLContext()

# Use SQL Spark Context to create Spark dataframe
df = sqlContext.read.option("inferSchema", "true").option("header", "true").parquet("/project_data/data_asset/cars.parque")

# Set the Spark dataframe as outlet of Extension Import node
if cxt.isComputeDataModelOnly():
    _schema = df.schema
    cxt.setSparkOutputSchema(_schema)
else:  	
    cxt.setSparkOutputData(df)
image-20230311145752-1

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHE3N","label":"IBM Watson Studio Premium for IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m3p0000006xbcAAA","label":"Analyze-\u003EModels-\u003EModeler Flow"}],"ARM Case Number":"TS012372852","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
11 March 2023

UID

ibm16962993