Migrating and constructing pipeline flows for DataStage
The following steps and limitations apply to migrated Sequence Jobs and flows that are constructed directly with the pipeline canvas.
For a list of general pipeline issues, see Known issues and limitations for DataStage.
Migrated flows
For more information on each component, see Pipeline components for DataStage.
- Wait for file
- Manually reselect or configure the file path. As a helper node for cross loop, the default timeout value is 23:59:59. Manually update value or set to 00:00:00 for no timeout.
- Wait for all
- Replaces Sequencer (all) and Nested condition.
- Wait for any
- Replaces Sequencer (any).
- Terminate pipeline
- Replaces Terminator.
- Terminate loop
- Controls the loop status and marks it as complete or failed. If the loop node has the result
control_break_node_id
after it finishes, the loop terminates and not all iterations are completed. The Terminate loop node is added if there is only one condition link to the End loop node from the parent node. For the Terminate loop node, only one of the condition links on the parent node can be true. - Loop in sequence
- Replaces Start/end loop.
- Run DataStage job
- Replaces Job activity for parallel jobs.
- Run Pipelines job
- Replaces Job activity for sequence jobs. For information, see Run Pipelines job.
- Run Bash script
- You must replace single quotes around environment variables with double quotes so they are not treated as string literals.
- Set user variables
- Replaces User variable. User variables are defined on the global level. For more information, see Configuring global objects for Orchestration Pipelines.
- Error handling
- Replaces Exception handler.
Set and get user status
To set user status in a DataStage job, you can call the built-in function SetUserStatus from the Expression builder in the Transformer stage. When you go to Triggers in the Transformer and call SetUserStatus, it cannot be used on input column derivations.
To get the status in a pipeline that calls the DataStage job with a Run DataStage job node, you can use the built-in function ds.GetUserStatus(tasks.<node name>) with the name of the Run DataStage job node. You can also access it in the job results with tasks.<node name>.user_status. To set user status in a pipeline, you must add it as a variable with the Set user variables node and select Make user variable value available as a pipeline result, which makes it an output parameter that other pipelines can access. Another pipeline can use a Run pipeline job node to call the pipeline that set the user status, and then get the user status using tasks.<node name>.results.output_parameters.<user status parameter name>.
If SetUserStatus is called in a child pipeline, migration creates a
global user variable named user_status
and selects the option Make user variable
value available as a pipeline result. In the parent pipeline, it also replaces the expression
that gets the status of the child pipeline, .$UserStatus
, with
tasks.results.output_parameters.user_status
.
Constructed flows
- Run DataStage job
- The
DSJobRunEnvironmentName
environment variable specifies the runtime environment for DataStage jobs. You can add theDSJobRunEnvironmentName
environment variable to the Run DataStage job node to override the default runtime environment that is set at a project level or a job level for a specific run. - Run Bash script
- Echo statements must use double quotes to access the value of a variable. For example,
echo "variablename"
will replace "variablename" with the value of the variable.echo 'variablename'
will just echo the name of the variable.