DataStage Flow Execution using API — DSaaS and CICD Pipeline

Ritesh Kumar Gupta
4 min readAug 2, 2021

DataStage Flows can be executed leveraging provided REST API(s) and create your CICD pipeline. I am using cURL to demonstrate execution process. You can use alternate methods to execute these steps. Below steps can also be used to run any other Job on IBM Data Platform.

Step 1: Create and Compile DataStage flow (Steps here)

Step 2: Retrieve Authentication Token as specified here

Step 3: Retrieve the projectID, JobID, AssetID (flowID) and URL. You can click on Assets Tab under selected project to retrieve required information.

URL : https://api.dataplatform.cloud.ibm.com/

projectID : b3a70fee-3145–457e-897b-85f69aede6f2 [See the address bar for BLOGSAMPLE Project ID as selected in image below or use api (here) to retrieve the projects and select project_id ]

Flows-ID/AssetID : ebb034a8-eee3–45da-abef-0db8f6a1c31c it is the flow ID you intend to execute / submit for execution [Seet the address bar for flow/Asset ID

Step 4: Create Data Platform Job as defined here.

You need to create a Data Platform Job using data flow on a specific project and then this job finally gets executed. To create job you need project_id and flow_id as documented in previous step. Can also use Swagger. Steps provided are same for execution of platform flows as well(all leverages platform jobs)

cURL command to create the Platform Job

curl --location --request POST 'https://api.dataplatform.cloud.ibm.com/v2/jobs?project_id=b3a70fee-3145-457e-897b-85f69aede6f2' --header 'Authorization: Bearer <TOKEN>' --header 'Content-Type: application/json' -d '{
"job": {
"asset_ref": "0f44e423-59cd-4031-a4c6-712c687a5109",
"name": "DataStageFlowSwitch",
"description": "Create Platform Job for DataStage Flow Switch",
"configuration": {"env_variables": [],
"flow_limits": {"warn_limit": 100}, "job_parameters": []}
}
}'

Post Successful Execution of job create API call you should get Job ID which is required for execution. It is part of href / asset_id of job creation response. You can get different jobs from Jobs Dashboard as well

Response for Job Creation and Retrieve Job ID

Step 5: Execute the Job as documented here. You only requires job_id which is created in Step 4 (or previously created job_id) apart from project_id.

curl -X 'POST' \
'https://api.dataplatform.cloud.ibm.com/v2/jobs/813d6d48-d50a-4048-b2d3-1c1d7027aa12/runs?project_id=b3a70fee-3145-457e-897b-85f69aede6f2' \
-H 'accept: application/json' -H 'Authorization: Bearer <TOKEN>' \
-H 'Content-Type: application/json' \
-d '{"job_run": {"configuration": {}}
}'
Response for Submitting Platform Job for Execution

It will provide you runtime_job_id which can be used to retrieve status or logs. Default status will be provided when execution triggeredstate": "Starting

Step 6: Checking current status of the Job Submitted. It requires runtime_job_id, job_id and project_id

curl -X 'GET' \
'https://api.dataplatform.cloud.ibm.com/v2/jobs/813d6d48-d50a-4048-b2d3-1c1d7027aa12/runs/0996009c-f34c-4603-ba39-5ecc61187270?project_id=b3a70fee-3145-457e-897b-85f69aede6f2' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <TOKEN>'
Response for Platform Job Status Get Request

You can see metrics at stage level including time stamp and rows read/written at each link (input/output) based on link name.

Step 7: Retrieve logs for Platform job and DataStage flow executed

curl -X 'GET' \
'https://api.dataplatform.cloud.ibm.com/v2/jobs/813d6d48-d50a-4048-b2d3-1c1d7027aa12/runs/0996009c-f34c-4603-ba39-5ecc61187270/logs?project_id=b3a70fee-3145-457e-897b-85f69aede6f2' -H 'accept: application/json' -H 'Authorization: Bearer <TOKEN>'
Platform Job Log for a DataStage flow

Disclaimer: “The postings on this site are my own exploration and don’t necessarily represent IBM’s positions, strategies or opinions.”

--

--

Ritesh Kumar Gupta

A Data and AI enthusiast and technology geek always learning about data that we are generating daily basis and process it smartly in Hybrid Cloud.