Step 2: Configuring the Unstructured Data stage

Configure the Unstructured Data stage to extract the data from the multiple Microsoft Excel sheets.

Procedure

  1. Double-click the Unstructured Data stage to open the stage properties.
  2. Click Configure.
    Note: Do not configure any stage properties in this step because you can configure all the required configurations in the Configuration window.
  3. In the Configuration window, specify the full file path of the Microsoft Excel input file Employee2.xls.
  4. From the Range option, select Specify the start row.
  5. In the Range expression field, specify A3:H3 . When the stage runs with Specify the first row option and no specific sheet name is specified in the Range expression, the job finds the last row dynamically and extracts rows to the last row at runtime.
  6. In Column header, select First row of data ranges.
  7. Click Load. The Excel columns that exist in the specified data range are listed in the Import pane.
  8. On the Property tab, select the checkbox next to the property, to extract the property value. In this example, select the Sheetname as the property.
  9. Click Import. The column mappings are generated by the stage.
  10. To make the SheetNname column the first column in the list:
    1. Select the SheetName column.
    2. Click Up until the SheetName column is the first column in the list.
  11. In the mapping table, insert a row for ADDRESS column in the input file that has hyperlink.
    1. Click Insert.
    2. In the Excel item option, select Column ADDRESS.
    3. In the Import option cell in the new row, select Hyper link address.
    4. Specify the DataStage column name EMAIL_ADDRESS for the new row.
  12. Click OK.
  13. Confirm that the values that you entered on the Configuration window are saved on the Property tab of the stage editor.
  14. Click Output > Column tab to change the data type or other attributes. Change the type of EMP_NO column to Integer.
  15. Click OK.