Sample steps to ingest WITSML files (Open Data for Industries)

Convert WITSML data to Manifest data and ingest it on the Open Data for Industries storage layer.

Introduction to WITSML data format

The Wellsite Information Transfer Standard Markup Language (WITSML™) is an industry initiative to provide nonproprietary standard interfaces for systems that exchange well-related data. WITSML is developed by a third-party industry consortium, called Energistics and is built on the W3C Internet standards for XML (notably XML schema) and Web Services, including SOAP and WSDL.

The WITSML format is widely adopted in the industry to store and manage data. Open Data for Industries offers a solution to convert such legacy WITSML files into Manifest files. Then, ingest the Manifest files and use the legacy data on Open Data for Industries.

WITSML parser and ingestion

Open Data for Industries data objects and schemas are JSON documents.

The WITSML data objects are XML documents, whose structures are defined by schemas in XSD file format.

The WITSML parser converts the WITSML XML data format and XSD schema definitions to JSON formats, which are supported by the Open Data for Industries Manifest JSON format.

When the conversion to JSON format is completed, the WITSML DAG proceeds with the ingestion of the JSON documents.
WITSML DAG processing tasks
  1. The DAG is triggered by the Workflow service.
  2. The DAG gets the WITSML file, parses it, and creates a Manifest file,
  3. The DAG validates the Manifest file, and writes it into the Storage service.

update_status_running_task
Receives a Workflow ID and changes a Workflow state from submitted to running.
witsml_parser_task
Receives a preloaded file path value. This task runs as KubernetesPodOperator. It parses the WITSML file into a Manifest JSON file and passes it to next task as an object.
validate_manifest_schema_task
This task validates every record against the schemas in the Manifest file from the previous task.
provide_manifest_integrity_task
This task ensures the integrity of the data by checking whether referent entities are in the Manifest file, or on Open Data for Industries.
process_single_manifest_file_task
This task prepares and stores every entity on the Storage service.
update_status_finished_task
This task changes the Workflow state from "running" to either "finished", or "failed". If any previous task is "failed", the DAG is marked as "failed" and the status "failed" is sent to the Workflow service. Otherwise, the DAG is marked as "successful", and the status "finished" is sent to the Workflow service.

Ingestion process for WITSML DAG

  1. The process starts with a WITSML DAG (Energistics_xml_ingest) deployment and configuration on a third-party workflow management platform. For more information, see Workflow DAGs installation and configuration (Open Data for Industries)
    Note: Apache Airflow is used as a workflow management platform.
  2. Upload the WITSML XML document on the Open Data for Industries raw storage bucket.
    Note: All the given commands are sample commands to demonstrate the syntax. Modify them according to your Open Data for Industries installation.
    • Get the pre-signed URL:
      curl --location --request GET 'https://{{ODI-Installation-URL}}/osdu-file/api/file/v2/files/uploadURL' \
      --header 'data-partition-id: opendes' \
      --header 'Content-Type: application/json' \
      --header 'Authorization: Bearer {{access_token}
      }'
      
    • Upload the WITSML XML file:
      curl --location --request PUT '{{signed_url}}' \
      --header 'Content-Type: text/xml' \
      --data-binary '@{{local_witsml_file}
      }'
      
  3. Register the WITSML DAG on the Open Data for Industries.
    curl --location --request POST 'https://{{ODI-Installation-URL}}/osdu-workflow/api/workflow/v1/workflow' \
    --header 'data-partition-id: opendes' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer {{access_token}}' \
    --data-raw '{
      "description": "This is witsml parser and ingestion DAG",
      "registrationInstructions": {},
      "workflowName": "Energistics_xml_ingest" 
    }'
    
  4. When the DAG is registered, use the Workflow service endpoints to trigger the DAG.
    Property in Scope Type Description
    executionContext Payload AppKey String Any unique identifier.
    data-partition-id String Tenant identifier.
    Context legal String Legal tag. A JSON array.
    acl String Access control list JSON array.
    kind String Schema identifier for data definition of the uploaded file.
    file_name String File identifier generated as part of the signed URL from Cloud Object Storage.
    preload_file_path String s3://oc-cpd-opendes-staging-bucket/{{fileSource}}
    version String Defaults to 1.
    To trigger the DAG on Apache Airflow:
    curl --location --request POST 'https://{{ODI-Installation-URL}}/osdu-workflow/api/workflow/v1/workflow/Energistics_xml_ingest/workflowRun' \
    --header 'Content-Type: application/json' \
    --header 'data-partition-id: opendes' \
    --header 'Authorization: Bearer {{access_token}}' \
    --data-raw '{
        "executionContext": {
            "Payload": {
                "AppKey": "test-app",
                "data-partition-id": "{{data_partition_id}}"
            },
            "Context": {
                "legal": {
                    "legaltags": [
                        "{{legal_tag}}"
                    ],
                    "otherRelevantDataCountries": [
                        "US"
                    ],
                    "status": "compliant"
                },
                "acl": {
                    "viewers": [
                        "{{DATA_VIEWERS_GROUP}}"
                    ],
                    "owners": [
                        "{{DATA_OWNERS_GROUP}}"
                    ]
                },
                "kind": "osdu:wks:dataset--File.Generic:1.0.0",
                "file_name": "f4aa44297dfc4c50b704df5ba5e21063",
                "preload_file_path": "s3://oc-cpd-opendes-staging-bucket/f4aa44297dfc4c50b704df5ba5e21063",
                "version": 1
            }
        }
    }'
    
    Successful trigger gets following response:
    {
        "workflowId": "787b1acc8432474187a5dfd11aa1150b",
        "runId": "test_workflow388906",
        "startTimeStamp": 1624223946915,
        "status": "submitted",
        "submittedBy": "postman@osdu.opengroup.com"
    }
    
  5. Validate the WITSML ingestion.

    The WITSML ingestion process uses the Open Data for Industries core services to ingest the data into the storage layer.

    The ingested data can be verified against a particular schema, which is identified by the parameter ‘kind’, by accessing the following Storage Service endpoint.

    curl --location --request GET 'https://{{ODI-Installation-URL}}/osdu-storage/api/storage/v2/query/records?kind={{kind}}' \
    --header 'data-partition-id: opendes' \
    --header 'Accept: application/json' \
    --header 'Authorization: Bearer {{access_token}}'
    

    As a result, records are created successfully.