DataStage command-line tools

CPDCTL and the dsjob tool are command-line interfaces (CLI) you can use to manage your DataStage® resources in IBM Cloud Pak for Data.

Use the command-line tools to reuse any DataStage scripts that exist on your on-premises systems on the Cloud platform.

You can use the following command-line tools to run DataStage tasks:

CPDCTL: cpdctl dsjob <command> --project <project name>
DataStage jobs CLI (dsjob): dsjob <command> <project name>

Using the CLI tools, you can:

List projects and jobs
Create, get, and delete jobs
Run jobs
Print job logs
Print a summary of job run logs
List job information
Migrate jobs
List, create, get, delete, and compile flows
List, create, and get hardware specifications
List, create, and get runtime environments
List, create, get, and delete connections
Manage imports and exports
Print versions

Listing projects

The following syntax displays a list of all known projects on the specified project:

cpdctl dsjob list-projects

dsjob lprojects

A list of all the projects is displayed, one per line.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Listing jobs

The following syntax displays a list of all jobs in the specified project:

cpdctl dsjob list-jobs --project <project>

dsjob -ljobs <project>

project is the name of the project that contains the jobs to list. This field is mandatory. A list of all the jobs in the project is displayed, one per line.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Creating jobs

The following syntax creates a job in the specified project:

cpdctl dsjob create-job --project PROJECT [--name name] --flow flow [--description description] [--schedule-start yyyy-mm-dd:hh:mm] [--schedule-end yyyy-mm-dd:hh:mm] [--repeat every/hourly/daily/monthly --minutes (0-59) --hours (0-23) --day-of-week (0-6) --day-of-month (1-31)]
dsjob -create_job [[-name name] [-flow flow] [-description description]] [-repeat mins/hourly/daily/monthly] [-mins <dd>] [hrs <dd>] project

project is the name of the project that the job is created for. This field is mandatory.
name is the name of the job to be created.
description is the description of the job to be created. This field is optional.
flow is the name of the flow. This field must be specified.
repeat indicates frequency of job run. Permitted values are every, hourly, daily, weekly, and monthly. The default value is none.
minutes indicates interval in minutes or the minutes at which to run the job. Values in the range 0-59 are accepted.
hours indicates hour of the day at which to run the job. Values in the range 0-23 are accepted.
day-of-month repeats on day of the month, works with minutes and hours. Values in the range 0-31 are accepted. Ex: 2 (runs on the second of the month).
schedule-start is the starting time for scheduling a job.
schedule-end is the ending time for scheduling a job.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Getting jobs

The following syntax fetches a job by name from the specified project:

cpdctl dsjob get-job --project PROJECT --name name [--output file] [--file-name <name>] [--with-metadata]
dsjob -get_job -name name project

project is the name of the project that the job is fetched from. This field is mandatory.
name is the name of the queried job.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Deleting jobs

The following syntax deletes a job by name from the specified project:

cpdctl dsjob delete-job --project PROJECT --name name 
dsjob -delete-job -name name project

project is the name of the project that the job is deleted from. This field is mandatory.
name is the name of the deleted job.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Displaying job information

The following syntax displays the available information about a specified job:

cpdctl dsjob jobinfo --project <project> --job <job> [--full]

dsjob -jobinfo project job [-full]

project is the name of the project that contains job. This field is mandatory.
job is the name of the job. This field is mandatory.
full displays more detailed information about the job, including information about all job runs. This field is optional.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Managing jobs

You can use the run command to start, stop, validate, and reset jobs.

cpdctl dsjob run --project PROJECT --job JOB [--runid RUNID] [--mode MODE] [--param PARAM] [--paramfile FILENAME] [--env ENVJSON] [--stop] [--wait secs]

dsjob -run
[ -mode [ NORMAL | RESET | VALIDATE | RESTART ] ]
[ -runid <runid>]
[ -paramfile filename ] 
[ -wait secs]
 project job 
or
dsjob -stop [-runid id] project job

project is the name of the project that contains job. This field is mandatory.
job is the name of the job. This field is mandatory.
runid can be specified to cancel or restart an existing job run. If runid is not specified, the runid of the latest job run that is not completed is used by default. This field is optional.
mode can be specified to start, cancel, or restart a job. The value of mode defaults to NORMAL, which starts a job run. If the value of mode is RESET, a running job is canceled. RESTART restarts a job run. This field is optional.
stop stops a job that is running. This field is optional. When specified, it supersedes mode.
param specifies a parameter value to pass to the job. The value is in the format name=value, where name is the parameter name and value is the value to be set. This flag can be repeated, ex: --param k1=v1 --param k2=v2
paramfile specifies a file that contains the parameter values to pass to the job. This field is not implemented currently.
wait the job run waits for the specified amount of time for the job to finish. The job logs are printed to the output until the job is completed or the wait time expires. This field is optional.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Displaying a specific log entry

The following syntax displays the specified entry in a job log file:

cpdctl dsjob logdetail --project PROJECT --job JOB [--runid RUNID] [--eventrange EVENTRANGE] [--compatible]

dsjob -logdetail [-eventrange]  project job

project is the project that contains job. This field is mandatory.
job is the job whose log entries are to be retrieved. This field is mandatory.
runid processes the log entry for a specific runid. If runid is not specified, the latest run is used by default. This field is optional.
eventrange is the range of event numbers that is assigned to the entry that is printed to the output. The first entry in the file is 0. If eventrange is not specified, the full log is processed. For example, if you specify eventrange 2-4, the third, fourth, and fifth entries from the log are printed.
compatible will output logs in the format previously used by DataStage components. This field is optional.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Displaying a short log entry

The following syntax displays a summary of entries in a job log file:

cpdctl dsjob logsum --project PROJECT --job JOB [--runid RUNID] [--type TYPE] [--max MAX] [--compatible]

dsjob -logsum [-type type] [ -max n ] [ -useid ] project job|job_id

project is the project that contains job. This field is mandatory.
job is the job whose log entries are to be retrieved. This field is mandatory.
runid processes the log entry for a specific runid. If runid is not specified, the latest run is used by default. This field is optional.
type specifies the type of log entry to retrieve. If type is not specified, all the entries are retrieved. type can be one of the following options:
- INFO: Information
- WARNING: Warning
- FATAL: Fatal error
- REJECT: Rejected rows from a Transformer stage
- STARTED: All control logs
- RESET: Job reset
- BATCH: Batch control
- ANY: All entries of any type. This option is the default if type is not specified.
compatible will output logs in the format previously used by DataStage components. This field is optional. max n limits the number of entries that are retrieved to n.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Identifying the newest log entry

The following syntax displays the ID of the newest log entry of the specified type:

cpdctl dsjob lognewest --project PROJECT [—type ] —job JOB [--runid RUNID]

dsjob -lognewest project job <type>

project is the project that contains job. This field is mandatory.
job is the job whose log entries are to be retrieved. This field is mandatory.
type can be one of the following options:
- INFO: Information
- WARNING: Warning
- FATAL: Fatal error
- REJECT: Rejected rows from a Transformer stage
- STARTED: All control logs
- RESET: Job reset
- BATCH: Batch control

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Managing job migrations

The Migrate command can be used to create data flows from an exported legacy ISX file. You can use the command to check status or cancel a migration that is in progress.

cpdctl dsjob migrate --project PROJECT [--on-failure ONFAILURE] [--conflict-resolution CONFLICT-RESOLUTION] [--attachment-type ATTCHMENTTYPE] [--body BODY] [--file-name FILE-NAME] [--status import-id] [--stop importid]

dsjob migrate [-action <skip|rename|replace>] [-status] [-stop] project [filename| importid]

project is the project to which jobs are migrated.
on-failure indicates what action to taken if the import process fails. Possible options are either continue or stop. This field is optional.
conflict-resolution or -action specify the resolution when the data flow to be imported has a name conflict with an existing data flow in the project or catalog. Possible resolutions are skip, rename, or replace. This field is optional.
attachment-type is the type of attachment. The default attachment type is isx. This field is optional
body is the file name of the import file. This field is required for an import operation but not used with options -stop or -status.
file-name is the name of the input file. This field is required for an import operation but not with options -stop or -status.
status returns the status of a previously submitted import job. A value for importid must be specified with this option.
stop cancels an import operation that is in progress. A value for importid must be specified with this option.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Listing flows

The following syntax displays a list of all flows in the specified project:


cpdctl dsjob list-flows --project <project>

dsjob -lflows <project>

project is the project that contains the flows to list. This field is mandatory. A list of all the flows in the project is displayed, one per line.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Creating flows

The following syntax creates a flow in the specified project:


cpdctl dsjob create-flow --project PROJECT --name name [--description description] [--pipeline-file filename]
dsjob -create_flow [[-name name] [-description description]] [-pipeline-file file] project

project is the name of the project that the flow is created for. This field is mandatory.
name is the name of the flow being created.
description is the description of the flow being created. This field is optional.
pipeline-file is the name of the file that contains the flow JSON. This field must be specified.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Getting flows

The following syntax fetches a flow by name from the specified project:

cpdctl dsjob get-flow --project PROJECT --name name [--output file] [--file-name <name>] [--with-metadata]
dsjob -get_flow -name name project

project is the project that the flow is fetched from. This field is mandatory.
name is the name of the queried flow.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Deleting flows

The following syntax deletes a flow by name from the specified project:

cpdctl dsjob delete-flow --project PROJECT --name name 
dsjob -get_conn -name name project

project is the project that the flow is deleted from. This field is mandatory.
name is the name of the flow.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Compiling flows

The following syntax allows you to compile flows in the specified project:


cpdctl dsjob compile --project PROJECT [--flow FLOW] [--osh] [--threads <n>]

dsjob compile <project> <flow> [-osh] [-t <n>]

project is the project that contains the flows to compile. This field is mandatory.
flow is the name of the flow to compile, if not present all the flows in the project are compiled. This field is optional.
osh the output will display compiled 'osh' output. This field is optional.
threads specifies the number of parallel compilations to run. The value should be in the range 5-20, default value is 5. This field is optional.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Listing hardware specifications

The following syntax displays a list of all hardware specifications in the specified project:

cpdctl dsjob list-hardware-specs --project PROJECT [--full] [--all]

dsjob -lhws <project> [-full] [-all]

project is the name of the project that contains the hardware specifications to list. This field is mandatory. A list of all the DataStage Hardware Specifications in the project are displayed, one per line.
type displays list of all hardware specifications in the project for the specified type, ex: DataStage, Spark, and Nodes.
full provides full configuration details of each hardware specification. This field is optional.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Creating hardware specifications

The following syntax creates a hardware specification for the specified project:

cpdctl dsjob create-hardware-spec --project PROJECT [[--name name] [--description description] [--body json]] [--filename file]
dsjob -create_hardware_spec [[-name name] [-description description] [-body body]] [-filename file] project

project is the name of the project that the hardware specification is created for. This field is mandatory.
name is the name of the hardware specification being created.
description is the description of the hardware specification being created. This field is optional.
body should contain the hardware specification in json format. Alternatively, the hardware specification can be provided in a file by using --filename.
filename is the name of the file that contains the hardware specification. Alternatively, the hardware specification can be provided inline by using --body. Either --body or --filename must be specified.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Getting hardware specifications

The following syntax fetches a hardware specification by name from the specified project:

cpdctl dsjob get-hardware-spec --project PROJECT --name name [--file-name <output file>] [--with-metadata]
 
dsjob -get_hardware_spec -name name project

project is the name of the project that the hardware specification is fetched from. This field is mandatory.
name is the name of the queried hardware specification.
file-name is the name of the output file to which the hardware specification is written.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Listing runtime environments

The following syntax displays a list of all Environments in the specified project:


cpdctl dsjob lenvs --project PROJECT [--types TYPE] [--full]

dsjob -lenvs <project> <type> [-full]

project is the name of the project that contains the environments to list. This field is mandatory. A list of all the DataStage Environments in the project are displayed, one per line.
type displays list of all environments in the project specified by the type. The value should be one of notebook, wml_flow, rstudio, default_spark, remote_spark, jupyterlab, remote_yarn, datastage, profiling, modeler, or data_privacy. This field is optional.
full provides full configuration details of each environment. This field is optional.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Creating runtime environments

The following syntax creates a runtime environment for the specified project:


cpdctl dsjob create-env --project PROJECT [[--name name] [--display-name display name] [--types TYPE] [--location loc] [--hwspec hwspec]] [--filename file]

dsjob -create-env [[-name name] [-display-name display name] [-location loc] [-hwspec hwspec]] [-filename file] project

project is the name of the project that the environment is created for. This field is mandatory.
name is the name of the environment being created. Used when filename is not specified.
display-name is the long name of the environment being created. Used when filename is not specified.
type is the type of environment to create, ex: datastage. Used when filename is not specified.
location if specified is the json-formatted location information needed to access the environment. Used when filename is not specified.
hwspec is the name of the hardware specification used to create the environment. Used when filename is not specified.
filename is the name of the file that contains the hardware specification, location and other attributes. When specified, all other options are ignored.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Getting runtime environments

The following syntax fetches a runtime environment by name from the specified project:

cpdctl dsjob get-env --project PROJECT --name name [--file-name <output file>] [--with-metadata]
 
dsjob -get_env -name name project

project is the name of the project that the environment is fetched from. This field is mandatory.
name is the name of the queried environment.
file-name is the name of the output file to which the environment is written.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Listing connections

The following syntax displays a list of all connections in the specified project:

cpdctl dsjob list-connections --project PROJECT
dsjob -lconns <project>

project is the name of the project that contains the connections to list. This field is mandatory. A list of all the DataStage connections in the project is displayed, one per line.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Creating connections

The following syntax creates a flow in the specified project:

cpdctl dsjob create-connection --project PROJECT --name name [--description description] [--datasource-type TYPE] [--country country] [--property-file filename]
dsjob -create_connection [[-name name] [-description description]] [-filename file] project

project is the name of the project that the connection is created for. This field is mandatory.
name is the name of the connection being created.
description is the description of the connection being created. This field is optional.
datasource-type is the data source type for the connection ex: 971223d3-093e-4957-8af9-a83181ee9dd9.
country is the country of origin for the connection. The default is "us."
property-file is the name of the file that contains the connection properties. This field must be specified.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Getting connections

The following syntax fetches a connection by name from the specified project:

cpdctl dsjob get-connection --project PROJECT --name name [--output file] [--file-name <name>] [--with-metadata]
dsjob -get_conn -name name project

project is the project that the connection is being fetched from. This field is mandatory.
name is the name of the queried connection.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Deleting connections

The following syntax deletes a flow by name from the specified project:

cpdctl dsjob delete-connection --project PROJECT --name name 
dsjob -get_conn -name name project

project is the name of the project that the connection is deleted from. This field is mandatory.
name is the name of the connection.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Listing imports

The following syntax displays a list of all imports into the specified project:

cpdctl dsjob list-imports --project PROJECT
dsjob limps project

project is the name of the specified project. This field is mandatory.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Importing

The following syntax imports the specified project to a file:

cpdctl dsjob import --project PROJECT --import-file filename
dsjob import --import-file file project

project is the name of the specified project . This field is mandatory.
import-file is the name of the file that contains previously exported assets.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Exporting

The following syntax exports the specified project to a file:

cpdctl dsjob export --project PROJECT [--name name] [--description descr] [--export-file filename] [--wait secs] [--asset-type <type>] [--asset <name,type>] [--all]
dsjob export -name name [-export-file file] [-asset-type type] [-description description] project

project is the name of the specified project . This field is mandatory.
asset list is a list of all the asset names to be exported. Format: --asset type=assetname1,assetname2.
name is the name of the export.
asset-type is a list of all asset types to export, ex: --asset-type Connection --asset-type data_flow.
description is a description of the exported assets.
export-file is the file for assets to be exported to.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Listing exports

The following syntax displays a list of all exports from the specified project:

cpdctl dsjob list-exports --project PROJECT
dsjob project

project is the name of the specified project. This field is mandatory.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Saving exports

The following syntax saves an export to a file.

cpdctl dsjob save-export --project PROJECT [--name name] --export-file filename
dsjob -name export --export-file file project

project is the name of the specified project . This field is mandatory.
name is the name of the export.
export-file is the name of the file that the export is saved to.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Deleting exports

The following syntax deletes an export from the specified project:

cpdctl dsjob delete-export --project PROJECT [--name name]
dsjob delete-export -name name project

project is the name of the specified project . This field is mandatory.
name is the name of the export.

A status code is printed to the output. A status code of 0 indicates successful completion of the command.

Printing versions

The following command prints all the versions of the DataStage components that are installed in the cluster.

cpdctl dsjob version 
dsjob version

For installation, configuration, available commands, supported outputs, and usage scenarios, refer to github.com/IBM/cpdctl.

For detailed CPDCTL command reference, see CPDCTL command reference.

For detailed information about installing, configuring, and using the DataStage jobs command-line interface, see https://github.com/IBM/cpdctl/releases/.