To run a workflow periodically, set the schedule: option at the top of your workflow definition file.
timezone: UTC
schedule:
daily>: 07:00:00
+step1:
echo>: "Example message"In the schedule: directive, you can choose one of the following options:
| Syntax | Description | Example |
|---|---|---|
| hourly>: MM:SS | Run this job every hour at MM:SS | hourly>: 30:00 |
| daily>: HH:MM:SS | Run this job every day at HH:MM:SS | daily>: 07:00:00 |
| weekly>: DDD,HH:MM:SS | Run this job every week on DDD at HH:MM:SS | weekly>: Sun,09:00:00 |
| monthly>: D,HH:MM:SS | Run this job every month on D at HH:MM:SS | monthly>: 1,09:00:00 |
| minutes_interval>: M | Run this job every this number of minutes | minutes_interval>: 30 |
| cron>: CRON_FORMAT | Use the "cron" format for complex scheduling | cron>: 42 4 1 * * |
When a field starts with an asterisk ( * ), the asterisk needs to be enclosed in quotation marks for the YAML file to be valid. For example, cron>: "* 23 31 12 7"
When you use hourly, daily, weekly or monthly, the session time may not match the scheduled time. For example, with daily, the session time would be 00:00:00 of the day of the actual run. But, with hourly, the session time would be 00:00 of the hour. Given a current time of 2019-02-24 14:20:10 +0900, this table shows the relationship between session time and scheduled time:
| schedule | first session time | first scheduled to run at |
|---|---|---|
| hourly>: "32:32" | 2019-02-24 14:00:00 +0900 | 2019-02-24 14:32:32 +0900 |
| daily>: "10:32:32" | 2019-02-25 00:00:00 +0900 | 2019-02-25 10:32:32 +0900 |
| weekly>: "2,10:32:32" | 2019-02-26 00:00:00 +0900 | 2019-02-26 10:32:32 +0900 |
| monthly>: "2,10:32:32" | 2019-03-02 00:00:00 +0900 | 2019-03-02 10:32:32 +0900 |
Workflow definition files that contain the "schedule:" option are started automatically by TD Workflow. If you change the workflow definition, the schedule of the workflow is automatically updated.
Set the sla option to send an alert if the workflow has not completed within a specified time. In the sla: directive, you can select either the "time" or "duration" option.
timezone: UTC
schedule:
daily>: 07:00:00
sla:
time: 02:00 # triggers this task at 02:00
+notice:
mail>: mail_body.txt
subject: "Workflow SLA failure"
to: [ example@example.com ]
+long_running_job:
td>:
query: "SELECT ..."| Syntax | Description | Example |
|---|---|---|
| time: "HH:MM:SS" | This job must be completed by "HH:MM:SS" | time: 12:30:00 |
| duration: "HH:MM:SS" | This job must be completed within "HH:MM:SS" | duration: 00:05:00 |
The SLA parameter supports the following options:
- fail: BOOLEAN — Setting fail: true marks the SLA as failed.
- alert: BOOLEAN — Setting alert: true will cause the notice to be sent.
sla:
time: 02:00
fail: true
alert: trueSometimes you have frequently-scheduled workflows that take longer than expected to complete. This variability in workflow duration can occur for a number of reasons. For example, you might experience an increase in the amount of data you typically process due to a seasonal spike in data during a holiday season. So, instead of taking an expected 30 minutes to run, your workflow may take 90 minutes.
If your next workflow session starts before your first workflow completes, the next session would further consume your available resources. In this case, it would be better to skip the second workflow session and wait until the first workflow session runs again and then have the second workflow session process the data created from both of the first workflows.
This can be implemented using the "skip_on_overtime" option.
schedule:
hourly>: 11:00
skip_on_overtime: true- Setting "skip_on_overtime: true" will skip the execution of a scheduled session if another session is already running.
- Each scheduled workflow session has a variable "last_executed_session_time" that contains the session time of the previous execution. It is usually the same as "last_session_time", but it will have a different value when "skip_on_overtime: true" is set, or when the session is the first execution.
The "start" and "end" options set the period of the schedule, in the date format YYYY-MM-DD.
When "start" is set, the schedule will start on or after the specified date. When "end" is set, the schedule will run up to and through the day specified. If the next run time is later than the day specified by "end", the next schedule is set to 9999-01-01 00:00:00+0000, and the workflow is not executed.
schedule:
hourly>: 11:00
start: 2020-02-01
end: 2024-04-30If you change "end" to extend the period after the schedule has ended, the schedule will be resumed from the last session. Note that it can cause multiple sessions to run unexpectedly. For example, assume that a workflow has "end: 2022-03-31" while the current date is "2022-04-15". If you update the workflow with "end: 2022-04-30", the past sessions which should have been executed between "2022-04-01" and "2022-04-15" will be executed.