This document summarizes new features and changes in Presto 350.

Performance Improvements

Here is a list of updates related to performance improvements:

New UDFs

Many new UDFs are available. See the Trino official documentation to understand the usage of each functions.

New SQL support

JDBC Improvements

Query parameters are supported in LIMIT, OFFSET and FETCH FIRST clauses.

Connection conn = ...
PreparedStatement stmt = conn.prepareStatement(
  "SELECT * FROM sample_datasets.www_access OFFSET ? LIMIT ?");
stmt.setInt(1, 10); // OFFSET = 10
stmt.setInt(2, 20); // LIMIT = 20
ResultSet rs = stmt.executeQuery();
...

time_partitioning_shuffle magic comment for INSERT/CTAS

Treasure Data supports a TD-specific setting to customize a partition size for INSERT/CTAS queries for better query performance. The following is an example of a time_partitioning_range_session property that is supported as a magic comment.

-- set session time_partitioning_range = '12h'

This comment can also be used to disable shuffling by specifying no-shuffle as a value. In this case, however, the partition size cannot be customized. In Presto 350, we introduced a new magic comment time_partitioning_shuffle = 'true'|'false' so that you can specify partition shuffling and time partition ranges independently. 

Migration Guide

Use the following information to understand the changes required in your queries to use with Presto 350.

SELECT DISTINCT, ORDER BY 

When SELECT DISTINCT is used with ORDER BY statement, expressions must appear in the select list.

The following query works in the current version of Presto.

SELECT DISTINCT from_unixtime(time) , COUNT(*), time 
FROM sample_datasets.www_access 
GROUP BY from_unixtime(time), time 
ORDER BY from_unixtime(time)

This query fails on Presto 350 with an error message like For SELECT DISTINCT, ORDER BY expressions must appear in select list. To solve this issue on Presto 350, rewrite this query as follows.

SELECT DISTINCT from_unixtime(time) , COUNT(*), time
FROM sample_datasets.www_access 
GROUP BY from_unixtime(time) , time
ORDER BY from_unixtime(sample_datasets.www_access.time)

lag() and lead()

ORDER BY

lag() and lead() require ORDER BY in Presto 350. For example, the following query works on the current version, but it doesn’t on Presto 350.

SELECT
  time,
  LAG (path, 1) OVER () AS lag_data,
  path,
  LEAD (path, 1) OVER () AS lead_data 
FROM
  sample_datasets.www_access

To support this query in Presto 350, you must explicitly use the ORDER BY clause as follows.

SELECT
  time,
  LAG (path, 1) OVER (ORDER BY time) AS lag_data,
  path,
  LEAD (path, 1) OVER (ORDER BY time) AS lead_data 
FROM
  sample_datasets.www_access

Window Frame

Presto 350 has a new semantic check for lag() and lead(); the frame cannot be specified. For example, the following query works on the current version but doesn't produce expected results. However, this query fails on Presto 350.

SELECT
  time,
  user,
  LAG (path, 1) OVER (ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS lag_data,
  path,
  LEAD (path, 1) OVER (ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS lead_data 
FROM
  sample_datasets.www_access

To fix this query in Presto 350, remove ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING.

SELECT
  time,
  user,
  LAG (path, 1) OVER (ORDER BY time) AS lag_data,
  path,
  LEAD (path, 1) OVER (ORDER BY time) AS lead_data 
FROM
  sample_datasets.www_access

TIME and TIMESTAMP Behavior Changes

In Presto 350, TIME and TIMESTAMP behaviors follow SQL standards; it has some incompatibilities with the current version.

Presto has the following data types to represent time and timestamp:

In the current version, TIME and TIMESTAMP are similar to WITH TIME ZONE types, but in Presto 350, TIME and TIMESTAMP don’t have an associated timezone. This might affect query results.

Examples

The result of the following query will be different depending on the Presto version:

SELECT CAST('2000-01-01 00:00:00 US/Eastern' AS TIMESTAMP)
--> 2000-01-01 05:00:00.000 on Presto 317
--> 2000-01-01 00:00:00.000 on Presto 350

When you convert WITH TIME ZONE types to TIME or TIMESTAMP, timezone is considered in the current version, but not in Presto 350. If you have queries that contain such conversion, you might need to revise them for Presto 350. In the previous example, the query can be rewritten as follows:


SELECT CAST( -- Convert to UTC before CAST as TIMESTAMP TIMESTAMP '2000-01-01 00:00:00 US/Eastern' AT TIME ZONE 'UTC' AS TIMESTAMP)

In addition, political timezones (e.g. America/Los_Angeles) are no longer allowed in TIME WITH TIME ZONE, to avoid issues around DST and possible future policy changes.

Therefore, the following query works on the current version, but not on Presto 350.

SELECT TIME '01:02:03.456 America/Los_Angeles'

Rewrite this query as follows:

SELECT TIME '01:02:03.456 -07:00'

Query results can be also different. SELECT CURRENT_TIME generates 09:29:52.540 UTC on the current version; in Presto 350, use 09:29:52.540+00:00.

Query Length Limitation

Presto 350 might generate a longer byte-code internally, so long queries might hit the query length limitation. If you get error messages like the following, you need to shorten the query.

checksum() 

checksum function implementation has changed in Presto 350. The checksum() function generates a different result from previous Presto versions. 

approx_percentile()

approx_percentile() now uses T-digest as an internal data structure, which is more accurate and faster than the previous version. The function produces slightly different results.

However, if you specify the accuracy parameter, approx_percentile() doesn’t use T-digest because T-digest doesn’t allow the accuracy parameter. If you want to benefit from T-digest-based approx_percentile(), consider dropping the accuracy parameter.

approx_percentile() doesn’t allow infinite values (values divide by zero). If such a value is given, the query fails with java.lang.IllegalArgumentException: value must be finite. In this case, you have to exclude infinite values before approx_percentile(), or you can use the old version of approx_percentile() by specifying accuracy parameter intentionally as a workaround as follows:


-- Before
SELECT approx_percentile(
  column1 / column2, -- can be inifinite value
  0.5                -- percentile
) FROM ...
 
-- After
SELECT approx_percentile(
  column1 / column2, -- can be infinite value
  1,                 -- weight
  0.5,               -- percentile
  0.01                -- accuracy
) FROM ...

double NaN to integer

Presto 350 doesn’t allow convert NaN value to INTEGER, however, it can be converted to 0 by CAST in the current version. The following query works on the current version but fails on Presto 350 with Cannot cast double NaN to integer error message.

SELECT CAST(nan() AS INTEGER)

You can restore the original behavior by using TRY_CAST and COALESCE as follows:

SELECT COALESCE(TRY_CAST(nan() AS INTEGER), 0)

information_schema.columns

comment and extra_info columns have been removed from information_schema.columns. If you have queries that refer to these columns, you have to revise them.

Here is a query result of information_schema.columns on Presto 350.

presto> select * from information_schema.columns limit 10;
 table_catalog |    table_schema    |    table_name    |   column_name   | ordinal_position | column_default | is_nullable | data_type
---------------+--------------------+------------------+-----------------+------------------+----------------+-------------+-----------
 td-presto     | information_schema | tables           | table_catalog   |                1 | NULL           | YES         | varchar
 td-presto     | information_schema | tables           | table_schema    |                2 | NULL           | YES         | varchar
...