Oct 2, 2018
- Presto 0.205 in Production Prepare for Upgrade
- Side-by-Side Environments
- Migration Timeline
- Running Code in Presto 0.205 vs. Presto 0.188
- Required SQL Changes for Presto 0.205
- New System Function current_user
- Syntax Error-Producing Queries
- Changes in ORDER BY behavior
- JOIN with USING
- Error Message: Given correlated subquery is not supported
- Recommended Update: Replace TD_SESSIONIZE with TD_SESSIONIZE_WINDOW
Treasure Data’s new Presto release (available 1 October 2018) is based on the open source Presto 0.205 release. Our previous release was based on Presto 0.188.
We have done our own internal testing to identify compatibility and performance regression issues, and we document those below.
We understand that customers will need to do their own testing as well. Therefore, in this release, Presto 0.188 and Presto 0.205 are available side-by-side to allow customers to test code themselves before the upgrade.
Until 31 October 2018, the default environment will be Presto 0.188. On 1 November 2018, the default environment will switch to Presto 0.205. Presto 0.188 will remain available until 15 November 2018 to allow time to finish migration.
If you finish your compatibility testing before 31 October, you can contact Treasure Data Support and we can change your default to Presto 0.205 early.
Side-by-side environments and gradual migration periods will be the practice in future releases as well.
During the transition from Presto 0.188 and Presto 0.205, the two releases will be available side-by-side from 1 October 2018 to 15 November 2018.
You can control which Presto release runs your code by using the following query hint:
-- @TD engine_version: 0.188or
-- @TD engine_version: 0.205For this release there are a number of Presto syntax changes, mostly focused on closer adherence to the ANSI SQL standard.
A small number of existing queries will require correction before they will run in the new Presto environment. Note that for the changes described below, the queries will also be compatible with Presto 0.188. Therefore you can make these modifications in the Presto 0.188 environment, without changing your results. When you migrate to the new release your queries will continue to work.
Required changes listed here have been identified through running existing customer code.
For each change, we describe the change in Presto, the error message if you have incompatible Presto SQL, and the recommended rewrite.
Presto has added a new current_user() function that returns the name of the current user. As a result, if you have a column or alias named current_user(), you will have to quote the references to that column.
For example, if you have this code before the change:
SELECT current_user FROM table1Update it to put quotes around "current_user":
SELECT "current_user" FROM table1
-- will list tab."current_user" column valuePresto is now more closely compliant with ANSI SQL standard ORDER BY clause.
The general rules for ORDER BY behavior are now:
ORDER BY is executed after projection
ORDER BY prefers an ordinal alias wherever possible
Input and output columns can be used in the ORDER BY clause, but if there is ambiguity, the output columns have higher precedence
There are several error messages that may be caused by this change to Presto.
The issues in all the cases below are due to naming conflicts in the query -- use of aliases in the SELECT or ORDER BY that collide with column name or table name in the input. The use of the alias prevents the reference to the original input table or column.
Rewrites generally involve using aliases that don’t match the input table or column names, or using ordinals in the ORDER BY clause.
For each case, we show the message, a query that causes the error, and an example of how to rewrite the query.
Original code that causes the error:
SELECT *
FROM page_view
WHERE CAST(week_page_total_actions as VARCHAR) IS DISTINCT FROM '0';Modified code that fixes the error:
SELECT *
FROM page_view
WHERE CAST(week_page_total_actions as VARCHAR (1)) IS DISTINCT FROM '0';
### Error Message: Non deterministic ORDER BY expression is not supported with SELECT DISTINCT
Original code that causes the error:
```sql
SELECT
week_page_total_actions
FROM (
SELECT
DISTINCT week_page_total_actions,
rand()
FROM
page_view
WHERE
week_page_total_actions NOT LIKE '1'
ORDER BY
rand() LIMIT 20
) subModified code that fixes the error:
SELECT
week_page_total_actions
FROM (
SELECT
DISTINCT week_page_total_actions,
rand()
FROM
page_view
WHERE
week_page_total_actions NOT LIKE '1'
LIMIT 20
) sub
-- Remove the function or expression from the ORDER BY clause
### Error Message: Invalid reference to output projection attribute from ORDER BY aggregation
Original code that causes the error:
```sql
SELECT
code, count(code) AS cnt
FROM www_access GROUP BY code ORDER BY count(code)
-- output column, code, is used at ORDER BY aggregationModified code that fixes the error:
SELECT
code, count(code) AS cnt
FROM www_access GROUP BY code ORDER BY count(www_access.code)
-- (or) ORDER BY cnt (or) ORDER BY 2
### Error message: Expression XXX is not of type ROW
Example code that causes the error:
```sql
SELECT
TD_TIME_FORMAT(time,
'yyyy-MM-dd HH:mm:ss z',
'JST') AS time,
code AS access -- higher precedence
FROM www_access access -- lower precedence
ORDER BY access.time
-- access refers to alias column "code" here
-- so access.time doesn't make senseModified code that fixes the error:
SELECT
TD_TIME_FORMAT(time,
'yyyy-MM-dd HH:mm:ss z',
'JST') AS time,
code AS access_another -- avoid naming conflict
FROM www_access access
ORDER BY access.time Note that the following query (where the name clash occurs on the identifier time) is not equivalent to the rewrite above. The return value of TD_TIME_FORMAT aliased as time will be a date-formatted VARCHAR, not a timestamp:
SELECT
TD_TIME_FORMAT(time,
'yyyy-MM-dd HH:mm:ss z',
'JST') AS time, -- alias has higher precedence, is used at ORDER BY
count(*) AS access
FROM access_log access
ORDER BY time -- (or) ORDER BY 1 Example code that causes the error:
SELECT
DISTINCT TD_TIME_FORMAT(time,
'yyyy-MM-dd HH:mm:ss z',
'JST') AS time,
code
FROM www_access access
ORDER BY TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z', 'JST')
-- this TD_TIME_FORMAT call does not match an expression in the SELECT list, because
-- time in ORDER BY refers to the varchar returned by TD_TIME_FORMATModified code that fixes the error:
SELECT
DISTINCT TD_TIME_FORMAT(time,
'yyyy-MM-dd HH:mm:ss z',
'JST') AS time, -- higher precedence, overrides input time column
code
FROM www_access access
ORDER BY time
-- could also use ORDER BY 1 Function names are different but a solution is the same.
Unexpected parameters (varchar, varchar(21), varchar(3)) for function td_time_format. Expected: td_time_format(bigint, varchar, varchar) , td_time_format(bigint, varchar)
Unexpected parameters (varchar(5), varchar) for function date_trunc. Expected: date_trunc(varchar(x), date) , date_trunc(varchar(x), time) , date_trunc(varchar(x), time with time zone) , date_trunc(varchar(x), timestamp) , date_trunc(varchar(x), timestamp with time zone)
Example code for td_time_format that causes the error:
SELECT
TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z', 'JST') AS time, -- higher precedence, overrides input time column
code
FROM www_access access
ORDER BY TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z', 'JST')-- time here is the varchar output column in select list,
-- not the timestamp input column
-- so the expression here is not the same as the
-- expression in the select listed```
Modified code for **td_time_format** that fixes the error:
```sql
SELECT
TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z', 'JST') AS time, -- higher precedence
code
FROM www_access access
ORDER BY time -- (or) ORDER BY 1 With SELECT… JOIN... with the USING clause for an equijoin, any columns mentioned in the USING clause will appear only once in the output, with an unqualified name, rather than once for each table in the join. It is an error to reference the column from the source tables with qualified names.
The following error messages may be caused by this change vs. previous Presto. For each, we show an example of how to rewrite the query.
Example code that causes the error:
SELECT
lineitem.orderkey, lineitem.quantity,
orders.totalprice, orders.orderkey
FROM lineitem JOIN orders
USING (orderkey);Note the references to lineitem.orderkey and orders.orderkey.
Modified code that fixes the error:
SELECT
orderkey, lineitem.quantity, orders.totalprice
FROM lineitem JOIN orders
USING (orderkey);Correlated Subquery and LIMIT
After the upgrade, correlated subqueries no longer support the inclusion of a LIMIT clause.
Rewrite the correlated subquery to not require the LIMIT clause.
Example code that causes the error:
SELECT * FROM region r, LATERAL (
SELECT * FROM nation n
WHERE n.regionkey = r.regionkey LIMIT 1
) -- this will failModified code that fixes the error:
SELECT * FROM region r, LATERAL (
SELECT * FROM nation n
WHERE n.regionkey = r.regionkey
)Note that for this issue, you should review your code closely to make sure your changes do not change the meaning of the code.
While reviewing code for this update you may also wish to make an important upgrade to any pre-existing queries that use TD_SESSIONIZE. The TD_SESSIONIZE() function has been deprecated because of performance issues and inconsistent results. The replacement Presto window function, TD_SESSIONIZE_WINDOW(), was added in 2016. It addresses the issues with TD_SESSIONIZE. The detailed instructions on rewriting your queries are here:
Migrating from Presto TD_SESSIONIZE() to TD_SESSIONIZE_WINDOW()