Presto Goodies

This documents collects Presto’s features that are convenient, useful, cool or any combination of those.

Table of Contents

May the WITH Clause be with you

The WITH clause is useful for nested queries. For example, consider the following query:

SELECT a, b, c FROM (
  SELECT a, MAX(b) AS b, MIN(c) AS c FROM tbl GROUP BY a
) tbl_alias 

The same query can be written with the WITH clause as follows:

WITH tbl_alias AS (SELECT a, MAX(b) AS b, MIN(c) AS c FROM tbl GROUP BY a)
SELECT a, b, c FROM tbl_alias

Notice that the query has been “de-nested”.

Multiple subqueries with WITH. Each subquery should be delimited by “,”.

WITH tbl1 AS (SELECT a, MAX(b) AS b, MIN(c) AS c FROM tbl GROUP BY a),
     tbl2 AS (SELECT a, AVG(d) AS d FROM another_tbl GROUP BY a)
SELECT tbl1.*, tbl2.* FROM tbl1 JOIN tbl2 ON tbl1.a = tbl2.a

VALUES is good for prototyping

If you want to quickly test Presto syntax, you can use VALUES to create a table on the fly.

SELECT a, b, c FROM (
  VALUES (1, 2.0, 'x'),(3, 4.0, 'y'),(5, 6.0, 'z')
)  tbl (a, b, c)

The above query returns the following table

a b c
1 2.0 'x'
3 4.0 'y'
5 6.0 'z'

There is no EXCEPT, except you can emulate it

Untitled-3
Presto now supports EXCEPT. See https://prestodb.io/docs/current/sql/select.html#union-intersect-except-clause

Presto does NOT support EXCEPT currently. That said, EXCEPT can be emulated as follows

Suppose that tbl1 looks like this:

a b c
1 2.0 'x'
3 4.0 'y'
5 6.0 'z'

Suppose that tbl2 looks like this:

a b c
1 2.0 'x'
3 4.0 'y'

Then, you can emulate tbl1 EXCEPT tbl2 as follows:

WITH tbl3 AS (
  SELECT 1 AS tbl_id, a, b, c FROM tbl1
  UNION ALL
  SELECT 2 AS tbl_id, a, b, c FROM tbl2
)
SELECT a, b, c FROM tbl3 GROUP BY a, b, c HAVING MAX(tbl_id) = 1

which returns the following:

a b c
5 6.0 'z'

Essentially, you are concatenating tbl1 and tbl2, grouping by all columns of interest, then removing the rows that appear in tbl2 (This is done by HAVING MAX(tbl_id) = 1)

INTERSECTION can be emulated, too

Untitled-3
Presto now supports INTERSECTION. See https://prestodb.io/docs/current/sql/select.html#union-intersect-except-clause

With tbl1 and tbl2 same as before, the following query

WITH tbl3 AS (
  SELECT 1 AS tbl_id, a, b, c FROM tbl1
  UNION ALL
  SELECT 2 AS tbl_id, a, b, c FROM tbl2
)
SELECT a, b, c FROM tbl3 GROUP BY a, b, c HAVING MAX(tbl_id) > MIN(tbl_id)

returns the intersection.

Clean up a table

To clean up a table before using CREATE TABLE AS or INSERT INTO statements, use multiple statements split by semi-colon (;):

DROP TABLE IF EXISTS mytable;
CREATE TABLE mytable AS SELECT ... ;

Limitation

Multiple-statement execution is not guarded by a transaction, so never write multiple update operations in a single job. For example, if you write two or more INSERT INTO statements in a single job, it may produce duplicated records:

# I1
INSERT INTO mytable SELECT ...;
# I2
INSERT INTO mytable SELECT ...;

If the system finds a sporadic failure during the job execution, it will start the job from the scratch, so the actual execution could be I1 (success), I2 (failure), (retry from the scratch), I1 (success) and I2 (success). In this case I1 was executed twice and I2 was only once. So there will be duplicated records in the target table.


Last modified: Jan 06 2017 08:03:55 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.