DataStage — IBM Data Platform Job Scheduling using API(s) for CICD pipeline

Ritesh Kumar Gupta
3 min readAug 14, 2021

--

IBM Data Platform on Hybrid Cloud is all about optimization and automation. DataStage and other components relying on platform job can leverage its K8S based scheduling component and schedule repeated flow execution based on different conditions. It is key for automation and CICD pipeline. Below steps can also be used to schedule any platform Job on IBM Data Platform including DataStage, Refinery etc..

Step 1: Create and Compile DataStage flow (Steps here)

Step 2: Retrieve Authentication Token as specified here

Step 3: Retrieve the projectID, JobID, AssetID (flowID) and URL. You can click on Assets Tab under selected project to retrieve required information. URL : https://api.dataplatform.cloud.ibm.com/ [Seet the address bar for flow/Asset ID]

Step 4: Create Data Platform Job as defined here or use existing Job. To create job you need project_id and flow_id as documented in previous step. Detailed steps are part of story here

Job Dashboard (Before Schedule Creation)

Step 5: Creating Schedule

Patch existing Job with desired Schedule information as documented here and its swagger. You need to specify crontab expression for schedule and epoch time for schedule_info and platform based on these provided values create the schedule

cURL command to patch the Platform Job with Schedule Information.

curl -X 'PATCH' 'https://api.dataplatform.cloud.ibm.com/v2/jobs/<Job_ID>?project_id=<Project_ID>' -H 'accept: application/json'-H 'Authorization: Bearer <TOKEN>' 
-H 'Content-Type: application/json' -d '[
{ "op": "add", "path": "/entity/job/schedule", "value": "*/15 * * * 1,2,3,4,5,6" },{ "op": "add", "path": "/entity/job/schedule_info", "value": { "description": "Schedule via API for Demo", "repeat": "true", "startOn": 1628919600000, "endOn": 1628999940000 }]'
Repeated Schedule every 15 minutes except Sundays between start and end timestamp

Post Successful Execution of patching existing job you can see updated job information along with schedule and schedule_info in the response and also on Job Dashboard.

Jobs Dashboard post Schedule Creation
Job Dashboard with execution every 15 minutes based on schedule

You can also specify schedule information while creating the platform Job. For this while creating Platform job, as part of the post request body include content under [{“op:…”, “op:..”}] for schedule and schedule_info

Step 6: Remove or update existing schedule

To remove any schedule previously created can patch existing job with both schedule and schedule_info using replace and setting value as “”. You can use this step to change the schedule with different values say every 30 minutes except Friday and Sunday

To Remove Existing Schedule

cURL command to patch the Platform Job and delete Schedule Information

curl -X 'PATCH' 'https://api.dataplatform.cloud.ibm.com/v2/jobs/<Job_ID>?project_id=<Project_ID>' -H 'accept: application/json' -H 'Authorization: Bearer <TOKEN>' -H 'Content-Type: application/json' -d '[
{"op": "replace","path": "/entity/job/schedule","value": ""}, {"op": "replace","path": "/entity/job/schedule_info","value": ""} ]'

Post Execution: You can see no schedule is visible as part of Job Details and is similar to when we started creating initial schedule

Jobs Dashboard post Schedule Deletion

Disclaimer: “The postings on this site are my own exploration and don’t necessarily represent IBM’s positions, strategies or opinions.”

--

--

Ritesh Kumar Gupta
Ritesh Kumar Gupta

Written by Ritesh Kumar Gupta

A Data and AI enthusiast and technology geek always learning about data that we are generating daily basis and process it smartly in Hybrid Cloud.

No responses yet