Most of the settings of scheduled queries can be updated or reconfigured at any time. This includes the cron schedule, delay, name, query string, and so on. Alternatively, the ‘next scheduled run time’ is an inferred configuration that is auto updated based on current time in your browser's timezone, cron schedule, and delay setting.
When a scheduled query is updated, the updated settings apply immediately to the next scheduled run, occurring at ‘next scheduled run time + delay’ (with delay >0, <0 or 0).
A common pattern is to periodically calculate the fixed KPIs or metrics in a certain interval.
SELECT
TD_TIME_FORMAT(TIME, "yyyy-MM-dd") AS day,
COUNT(1) AS cnt
FROM
www_access
GROUP BY
TD_TIME_FORMAT(TIME, "yyyy-MM-dd"),
WHERE
TD_TIME_RANGE(time,
TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1d'),
TD_SCHEDULED_TIME())We can schedule the query to run on a daily basis by setting the following parameters:
Recurring?: @daily
Delay (seconds): 3600
The example aggregates the daily page views from an access log. It uses several common UDFs for Presto (or for Hive) to set the proper time range for the aggregation. TD_SCHEDULED_TIME() returns the time when the job is scheduled to run. In this case, TD_SCHEDULED_TIME() returns 12 a.m.
By setting the delay parameter to an hour (3600 seconds), the job launches at 1 AM each day – thus allowing some additional time (an hour in this case) for data to be imported before the job is run.
To write the result to an end-system of your choice to track KPIs, review Creating or Editing a Query in the TD Console.
It is common to periodically export results to a Treasure Data table, which can be used by other jobs.
SELECT
user_id
FROM
www_access
GROUP BY
user_id
WHERE
TD_TIME_RANGE(time,
TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1d'),
TD_SCHEDULED_TIME())We can schedule the query to run on a daily basis by setting the following parameters:
Recurring?: @daily
Delay (seconds): 3600
The example aggregates the list of daily active users from an access log.
You can write this list to another Treasure Data table by setting the following parameters:
Export Result To: Treasure Data
Database: YOUR_DATABASE
Table: YOUR_TABLE
Mode: Replace
Writing the result into another table is atomic so that you can reference this table from other queries at any time.
It is common to periodically summarize data from logs and build a datamart by pushing results to an RDBMS.
SELECT
user, code, method, path, agent, host, avg(size)
FROM
www_access
GROUP BY
user, code, method, path, agent, host
WHERE
TD_TIME_RANGE(time,
TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1h'),
TD_SCHEDULED_TIME())To schedule this job, set the following parameters:
Recurring?: @hourly
Delay (seconds): 600
The preceding example aggregates web request results by user, code, path, agent, host, and average size from an access log hourly.
See Using TD Data Exchange for information on how to write the result into another RDBMS:
After you push the query results to an external RDBMS, you can slice and dice the results using the BI tools of your choice.