IBM Support

How to convert data set in csv file format to parquet file format and store it in Project Asset

Question & Answer


Question

By using pandas library to_parquet(), we can easily convert data set in csv file format to parquet file format. However, when I tried to use ibm_watson_studio_lib library save_data(), I got following error:
RuntimeError: Argument "data" must be a bytes-like object. Use str.encode() to convert character data.
image-20230311211836-1
Could you tell me how to store the converted parquet file to CP4D Project Asset?

Cause

With following code, you created a parquet binary file in the current working directory. 
df_data_1.to_parquet("cars.parque",compression='gzip')
The error occurred due to ibm_watson_studio_lib library save_data() expected a byte-like object for the argument "data", however it found a binary parquet file instead of an encoded plain text file.

Answer

Since the binary parquet file already created in the current working directory, use ibm_watson_studio_lib library upload_file() instead of save_data(). Following is the sample code to convert data set in csv file format to parquet file format and stored it in Project Asset.
import os
from ibm_watson_studio_lib import access_project_or_space

os.chdir ("/project_data/data_asset/")
wslib = access_project_or_space()
wslib.upload_file("cars.parque", df_data_1.to_parquet("cars.parque",compression='gzip'))
image-20230311213633-1

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHE3N","label":"IBM Watson Studio Premium for IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m50000000ClW2AAK","label":"Organize-\u003ETransform Data"}],"ARM Case Number":"TS012251438","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
15 March 2023

UID

ibm16962995