Skip to content
Last updated

SQL Examples of Scheduled Queries

Most of the settings of scheduled queries can be updated or reconfigured at any time. This includes the cron schedule, delay, name, query string, and so on. Alternatively, the ‘next scheduled run time’ is an inferred configuration that is auto updated based on current time in your browser's timezone, cron schedule, and delay setting.

When a scheduled query is updated, the updated settings apply immediately to the next scheduled run, occurring at ‘next scheduled run time + delay’ (with delay >0, <0 or 0).

Daily KPIs Example

A common pattern is to periodically calculate the fixed KPIs or metrics in a certain interval.

SELECT
  TD_TIME_FORMAT(TIME, "yyyy-MM-dd") AS day,
  COUNT(1) AS cnt
FROM
  www_access
GROUP BY
  TD_TIME_FORMAT(TIME, "yyyy-MM-dd"),
WHERE
  TD_TIME_RANGE(time,
  TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1d'),
  TD_SCHEDULED_TIME())

We can schedule the query to run on a daily basis by setting the following parameters:

  • Recurring?: @daily

  • Delay (seconds): 3600

The example aggregates the daily page views from an access log. It uses several common UDFs for Presto (or for Hive) to set the proper time range for the aggregation. TD_SCHEDULED_TIME() returns the time when the job is scheduled to run. In this case, TD_SCHEDULED_TIME() returns 12 a.m.

By setting the delay parameter to an hour (3600 seconds), the job launches at 1 AM each day – thus allowing some additional time (an hour in this case) for data to be imported before the job is run.

To write the result to an end-system of your choice to track KPIs, review Creating or Editing a Query in the TD Console.

List of Daily Active Users Example

It is common to periodically export results to a Treasure Data table, which can be used by other jobs.

SELECT
  user_id
FROM
  www_access
GROUP BY
  user_id
WHERE
  TD_TIME_RANGE(time,
  TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1d'),
  TD_SCHEDULED_TIME())

We can schedule the query to run on a daily basis by setting the following parameters:

  • Recurring?: @daily

  • Delay (seconds): 3600

The example aggregates the list of daily active users from an access log.

You can write this list to another Treasure Data table by setting the following parameters:

  • Export Result To: Treasure Data

  • Database: YOUR_DATABASE

  • Table: YOUR_TABLE

  • Mode: Replace

Writing the result into another table is atomic so that you can reference this table from other queries at any time.

Data Mart Generation Example

It is common to periodically summarize data from logs and build a datamart by pushing results to an RDBMS.

SELECT
  user, code, method, path, agent, host, avg(size)
FROM
  www_access
GROUP BY
  user, code, method, path, agent, host
WHERE
  TD_TIME_RANGE(time,
  TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1h'),
  TD_SCHEDULED_TIME())

To schedule this job, set the following parameters:

  • Recurring?: @hourly

  • Delay (seconds): 600

The preceding example aggregates web request results by user, code, path, agent, host, and average size from an access log hourly.

See Using TD Data Exchange for information on how to write the result into another RDBMS:

After you push the query results to an external RDBMS, you can slice and dice the results using the BI tools of your choice.