Defining custom stages in DataStage

You can define a custom stage type.

Procedure

  1. On the Assets tab of your project, click New asset + > DataStage component > Custom stage.
  2. Specify a name for the custom stage, optional description, and operator name. The operator name is the name that the stage will be known by to DataStage®. Avoid using the same name as existing stages.
    You can also upload a custom operator .so file from this page to define the stage configuration.
  3. Click Create.
  4. Complete the fields on the General page:
    • Mapping. Choose whether the stage has a Mapping tab or not. A Mapping tab enables the user of the stage to specify how output columns are derived from the data that is produced by the stage. Choose None to specify that output mapping is not performed, choose Default to accept the default setting that DataStage uses.
    • Description. Optionally enter a description of the stage.
    • Execution mode. Choose the execution mode. This mode that will appear in the Advanced section in the stage editor. You can override this mode for individual instances of the stage as required, unless you select Parallel only or Sequential only.
    • Preserve partitioning. Choose the default setting of the Preserve partitioning flag. This is the setting that will appear in the Advanced section in the stage editor. You can override this setting for individual instances of the stage as required.
    • Partitioning. Choose the default partitioning method for the stage. This method will appear in the Inputs page Partitioning section of the stage editor. You can override this method for individual instances of the stage as required.
    • Collecting. Choose the default collection method for the stage. This method will appear in the Inputs page Partitioning section of the stage editor. You can override this method for individual instances of the stage as required.
    • Operator. Enter the name of the orchestration operator that you want the stage to invoke.
  5. Go to the Links page and specify information about the links that are allowed to and from the stage you are defining.

    Specify the minimum and maximum number of input and output links that your custom stage can have.

  6. Go to the Properties page. Specify the options that the orchestration operator requires as properties that appear in the Stage Properties tab. For custom stages the Properties tab always appears under the Stage page.
  7. Complete the fields as follows:
    • Name. The name of the property.
    • Data type. The data type of the property. Choose from:
      • Boolean
      • Float
      • Integer
      • String
      • Pathname
      • List
      • Input column
      • Output Column

      If you choose Input Column or Output Column, when the stage is included in a job a drop-down list will offer a choice of the defined input or output columns.

    • Prompt. The name of the property that will be displayed on the Properties tab of the stage editor.
    • Default Value. The value the option will take if no other is specified.
    • Required. Set to True if the property is mandatory.
    • Repeats. Set to true if the property repeats (that is, you can have multiple instances of it).
    • Use Quoting. Specify whether the property will have quotes added when it is passed to the orchestration operator.
    • Conversion. Specifies the type of property as follows:

      None. Allows the creation of properties that do not generate any osh, but can be used for conditions on other properties (for example, for use in a situation where you have mutually exclusive properties, but at least one of them must be specified).

      -Name. The name of the property will be passed to the operator as the option value. This will normally be a hidden property, that is, not visible in the stage editor.

      -Name Value. The name of the property will be passed to the operator as the option name, and any value that is specified in the stage editor is passed as the value.

      -Value. The value for the property that is specified in the stage editor is passed to the operator as the option name. Typically used to group operator options that are mutually exclusive.

      Value only. The value for the property that is specified in the stage editor is passed as it is.

      Input Schema. Specifies that the property will contain a schema string whose contents are populated from the Input page Columns tab.

      Output Schema. Specifies that the property will contain a schema string whose contents are populated from the Output page Columns tab.

    • Schema properties require format options. Select this checkbox to specify that the stage that is being specified will have a Format tab.
    • Data Type is set to String.
    • Required is set to Yes.
    • The property is marked as hidden and will not appear on the Properties page when the custom stage is used in a job design.

      If your stage can have multiple input or output links, or there would be an Input Schema property or Output Schema property per-link.

      When the stage is used in a job design, the property will contain the following OSH for each input or output link:

      
      -property_name record {format_properties} ( column_definition {format_properties}; ...)
      

      Where:

    • property_name is the name of the property (usually `schema')
    • format_properties is formatting information that is supplied on the Format page (if the stage has one).
    • there is one column_definition for each column that is defined in the Columns tab for that link. The fmat_props in this case refers to per-column format information specified in the Edit Column metadata dialog box.

      Schema properties are mutually exclusive with schema file properties. If your custom stage supports both, you should use the Extended Properties dialog box to specify a condition of "schemafile= " for the schema property. The schema property is then only valid provided the schema file property is blank (or does not exist).

  8. If your custom stage will create columns, go to the Mapping Additions page. It contains a grid that allows for the specification of columns created by the stage. You can also specify that column details are filled in from properties supplied when the stage is used in a job design, allowing for dynamic specification of columns.

    The grid contains the following fields:

    • Column name. The name of the column created by the stage. You can specify the name of a property you specified on the Property page of the dialog box to dynamically allocate the column name. Specify this in the form #property_name#, the created column will then take the value of this property, as specified at design time, as the name of the created column.
    • Parallel type. The type of the column (this is the underlying data type, not the SQL data type). Again you can specify the name of a property you specified on the Property page of the dialog box to dynamically allocate the column type. Specify this in the form #property_name#, the created column will then take the value of this property, as specified at design time, as the type of the created column. (Note that you cannot use a repeatable property to dynamically allocate a column type in this way.)
    • Nullable. Choose Yes or No to indicate whether the created column can contain a null.
    • Conditions. Allows you to enter an expression specifying the conditions under which the column will be created. This could, for example, depend on the setting of one of the properties specified in the Property page.

      You can propagate the values of the Conditions fields to other columns if required. Do this by selecting the columns you want to propagate to, then right-clicking in the source Conditions field and choosing Propagate from the shortcut menu. A dialog box asks you to confirm that you want to propagate the conditions to all columns.

  9. Click Save when you are done with your custom stage definition.