Workflow Components and Operation

Introduction

The Workflows feature allows for ETLs to be automatically triggered with dependency on multiple combinations of data sources surpassing predefined quality score thresholds.

Workflow Components and Operation

The bulleted numbers below correspond with the numbered callouts in the below screenshot.

1. Scheduled Run Time

The Schedule Run Time is used in conjunction with the Alerts. Alerts can be set to send specified users an email notification if the last ETL in a Workflow doesn’t trigger by the designated Scheduled Run Time. In the above screenshot, the last ETL in the example Workflow would be ‘AmultiPerfPipe_copy5’.

2. Data Source & ETL Parameters

When creating Workflows, you must select Data Sources that are already set up in the platform. These Data Sources will also be found in the Sources tab in the blue left-hand navigation pane. The highlighted portion in the above screenshot is the parameter that is used when setting up an ETL so the ETL knows which file to process when triggered.

If you click on the parameter name which is highlighted, it will automatically save to your clipboard.
This parameter name can also be found in the Settings tab in the Manage screen for any Source.

3. Run ETL Trigger

The Run ETL button will manually trigger the designated ETL. In our above instance, this would trigger the ETL circled in #5.

4. Min DQ Score & Status

When a Data Source has a report run, the status will display as “Complete” in green (as shown in the screenshot). The Data Quality Score for that report run will also display. The ‘Min DQ Score’ is a threshold that a source’s quality score must surpass in order to auto-trigger the ETL. In the above example, the report score did not surpass the threshold which is indicated by the Score being highlighted in red.

If a Data Source is used multiple times in either the same Workflow or in different Workflows, the Min DQ score will be the same for every instance where that Data Source is present within all Workflows. In other words, if the same Data Source is found in two different Workflows, and you change the Min DQ Score on one of them, it will automatically update the other with the same Min DQ Score.

5. ETL

This shows the ETL Pipeline that is designated to trigger if the following conditions are met:

To have an ETL Auto Trigger:
- A report is generated on a new Batch file that hasn’t yet auto-triggered the ETL (Batches further explained in #8),
- The report’s data quality score surpasses the Min DQ Score threshold, and
- Any dependent conditions are met. In the above instance, there are no other dependent conditions for the circled ETL ‘AmultiPerfPipe_copy4’, so this ETL did auto-trigger. In the example just to the right, there is a dependency on three DataSource files that need to complete successfully with scores that surpass their respective Min DQ Scores before the ETL ‘AmultiPerfPipe_copy3’ would auto-trigger.
If the Run ETL button mentioned in #3 were pressed, it would trigger the designated ETL

6. Dependency Icon Status

Where some ETLs are dependent on other ETLs completing first, dependency lines and icons are displayed. In the above example, there are two ETLs that must run successfully before the ETL ‘AmultiPerfPipe_copy5’ would auto-trigger. The example above shows that one of these conditions has been met, noted with a green check icon, and another one has not been met, noted with a red ‘X’.

There are four reasons why there could be a red X to indicate why a dependency hasn’t been met.
- All files for a particular batch haven’t yet been run for the ETL to trigger.
- The Min DQ Score was not surpassed by one of the Data Sources, therefore the ETL would not auto-trigger.
- The platform has triggered an ETL, (either automatically or manually) and is waiting for the ETL to finish processing before updating the status, which if successful, would turn the red ‘X’ into a green check, (status described in # 7).
- The ETL triggered and failed during processing, in which case the status would say ‘Failed’.

7. Workflow Statuses

The Status indicates if any dependencies are still required before an ETL is triggered, or what the completion status of an ETL pipeline run is after the ETL has completed processing. The different statuses are:

‘Waiting on Dependencies’ indicates that either:
- A dependent report hasn’t yet run,
- A dependent report hasn’t surpassed the Min DQ Score (which is the case in the above example screenshot), or
- An ETL has triggered and is awaiting completion. One of the following two statuses will display once an ETL has completed.
‘Failed’ will display as the status if an ETL pipeline has failed, (for example an ETL Pipeline triggered in Azure Data Factory and failed for any reason, that status will display in the platform as ‘Failed’)
‘Success’ will display if the ETL pipeline completed successfully. A status of Success is required for a dependent ETL to auto-trigger.

Workflow Components and Operation

Introduction