Presto has a couple of approximate aggregation functions, that give you significant performance improvements, but with some errors. For example, by using approx_distinct() function, you can get an approximation of COUNT(DISTINCT x)with a standard error of 2.3%. The following example gives an approximate count of the previous day’s unique users.

SELECT
  approx_distinct(user_id)
FROM
  access
WHERE
  TD_TIME_RANGE(time,
    TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1d', 'PDT'),
    TD_SCHEDULED_TIME())

  • No labels