Creating batch jobs using job definition files

InfoSphere® MDM includes job definition template files for you to use as a basis for creating new batch job definition files.

About this task

The batch job definition file templates are located in the $home/templates/jobs folder. The job definition templates provide a range of common MDM batch processing tasks:
  • CreateSuspect.xml
  • LinkOnly_csv.xml
  • LinkOnly_db1.xml
  • PersistEntity_db1.xml
  • PersistEntity_db2.xml
  • PersistEntity_csv1.xml
  • PersistEntity_csv2.xml
  • PersistEntity_csv3.xml
  • StandardizeAddress.xml
  • StandardizeContactMethod.xml
  • StandardizeOrganizationName.xml
  • StandardizePersonName.xml
  • StandardizeOrganization.xml
  • StandardizePerson.xml
  • CollapseSuspect.xml

Procedure

  1. Select the batch job definition template that provides the functionality you need.
  2. Make a copy of the definition template file.
  3. Modify the first CommentText in the copied job definition file to provide the definition of the job.

    You can use predefined placeholders in job definition files to stand in for different pieces of information. The batch processor will replace these placeholders automatically while creating the batch job.

    Important: Some of the placeholders are mandatory. If the mandatory placeholders are not present in the job definition files, the batch jobs will not be created.
    Table 1. Batch job definition placeholders
    Placeholder Mandatory or Optional Description
    <<requesterName>> Optional The requester name for the addTask transaction. This value will be replaced with the value of the job.requesterName property in the Batch.properties file.
    <<requesterTimeZone>> Optional The requester time zone for the task management transactions. This value will be replaced with the current time zone the batch processor is in when the Multi Time Zone (MTZ) feature is enabled in the physical InfoSphere MDM hub.
    <<ProcessId>> Mandatory The process ID of the job or job chain. This value will be replaced with a dynamically generated ID. The same process ID will be used for all of the jobs in the same job chain.
    <<PriorityType>> Optional The priority of the jobs. This value will be replaced with a value between the range defined by priority_min and priority_max in the Batch.properties file.

    When a task chain is created in the batch processor, the priority type of the first task takes the value of priority_min, the next task will take the priority_min value plus one, then the next task will take the value plus two, and so on until the <<PriorityType>> value reaches the value defined in the priority_max property.

    Restriction: If there are more tasks in the chain than available priority values, then the task creation action will fail.
    <<TaskDueDate>> Optional The date that the job tasks are due. This value will be replaced with the current timestamp.
    <<Name>> Optional The name of the job. This value will be replaced with the name parameter provided with -name when the runbatch command is invoked. If no name is provided, a dynamically generated name will be used.
    <<BatchInstance>> Mandatory The batch instance name on which the job should be run. This value will be replaced with the name of the batch instance that creates the job.

Example

Example of the StandardizeAddress job template:

<?xml version="1.0" encoding="UTF-8"?>
<TCRMService xmlns="http://www.ibm.com/mdm/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/mdm/schema MDMDomains.xsd">
  <RequestControl>
    <requestID>1</requestID>
    <DWLControl>
      <requesterName><<requesterName>></requesterName>
      <requesterLocale>en</requesterLocale>
      <requesterTimeZone><<requesterTimeZone>></requesterTimeZone>
    </DWLControl>
  </RequestControl>
  <TCRMTx>
    <TCRMTxType>addTask</TCRMTxType>
    <TCRMTxObject>TaskBObj</TCRMTxObject>
    <TCRMObject>
      <TaskBObj>
        <TaskDefinitionId>30</TaskDefinitionId>
        <TaskCatType>8</TaskCatType>
        <PriorityType><<PriorityType>></PriorityType>
        <TaskOwnerRole>Bulk Processing</TaskOwnerRole>
        <TaskDueDate><<TaskDueDate>></TaskDueDate>
        <ProcessId><<ProcessId>></ProcessId>
        <TaskActionType>1</TaskActionType>
        <TaskCommentBObj>
        <!-- The definition comment provided below is a sample. It should be adjusted to fit the requirements -->
          <CommentText>
            <![CDATA[
            <SQLOverride>SELECT DISTINCT ADDRESS_ID FROM ADDRESS WHERE (ADDRESS.ADDR_STANDARD_IND IS NULL OR ADDR_STANDARD_IND <> 'Y') AND (ADDRESS.OVERRIDE_IND IS NULL OR OVERRIDE_IND <> 'Y')</SQLOverride>
            ]]>
          </CommentText>
        </TaskCommentBObj>
        <TaskCommentBObj>
          <CommentText>
            <![CDATA[
              <BatchInstance><<BatchInstance>></BatchInstance>
            ]]>
          </CommentText>
        </TaskCommentBObj>
        <WorkbasketBObj>
          <Name><<Name>></Name>
        </WorkbasketBObj>
      </TaskBObj>
    </TCRMObject>
  </TCRMTx>
</TCRMService>