# Presto 0 188 to 0 205 Migration 2018 Oct 2, 2018 * [Presto 0.205 in Production Prepare for Upgrade](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h1__1534007424) * [Side-by-Side Environments](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h2_875219085) * [Migration Timeline](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h2_328935467) * [Running Code in Presto 0.205 vs. Presto 0.188](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h2__958187426) * [Required SQL Changes for Presto 0.205](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h1_844570931) * [New System Function current_user](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h2__1117537908) * [Syntax Error-Producing Queries](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h1__456405141) * [Changes in ORDER BY behavior](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h2_1886465859) * [JOIN with USING](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h2_1179465342) * [Error Message: Given correlated subquery is not supported](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h2_464978500) * [Recommended Update: Replace TD_SESSIONIZE with TD_SESSIONIZE_WINDOW](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h1_839700295) # Presto 0.205 in Production Prepare for Upgrade Treasure Data’s new Presto release (available 1 October 2018) is based on the open source Presto 0.205 release. Our previous release was based on Presto 0.188. We have done our own internal testing to identify compatibility and performance regression issues, and we document those below. ## Side-by-Side Environments We understand that customers will need to do their own testing as well. Therefore, in this release, Presto 0.188 and Presto 0.205 are available side-by-side to allow customers to test code themselves before the upgrade. ## Migration Timeline Until 31 October 2018, the default environment will be Presto 0.188. On 1 November 2018, the default environment will switch to Presto 0.205. Presto 0.188 will remain available until 15 November 2018 to allow time to finish migration. If you finish your compatibility testing before 31 October, you can contact Treasure Data Support and we can change your default to Presto 0.205 early. Side-by-side environments and gradual migration periods will be the practice in future releases as well. ## Running Code in Presto 0.205 vs. Presto 0.188 During the transition from Presto 0.188 and Presto 0.205, the two releases will be available side-by-side from 1 October 2018 to 15 November 2018. You can control which Presto release runs your code by using the following query hint: ```sql -- @TD engine_version: 0.188 ``` or ```sql -- @TD engine_version: 0.205 ``` # Required SQL Changes for Presto 0.205 For this release there are a number of Presto syntax changes, mostly focused on closer adherence to the ANSI SQL standard. A small number of existing queries will require correction before they will run in the new Presto environment. Note that for the changes described below, the queries will also be compatible with Presto 0.188. Therefore **you can make these modifications in the Presto 0.188 environment, without changing your results.** When you migrate to the new release your queries will continue to work. Required changes listed here have been identified through running existing customer code. For each change, we describe the change in Presto, the error message if you have incompatible Presto SQL, and the recommended rewrite. ## New System Function current_user Presto has added a new current_user() function that returns the name of the current user. As a result, if you have a column or alias named current_user(), you will have to quote the references to that column. For example, if you have this code before the change: ```sql SELECT current_user FROM table1 ``` Update it to put quotes around "current_user": ```sql SELECT "current_user" FROM table1 -- will list tab."current_user" column value ``` # Syntax Error-Producing Queries ## Changes in ORDER BY behavior Presto is now more closely compliant with ANSI SQL standard ORDER BY clause. The general rules for ORDER BY behavior are now: * ORDER BY is executed after projection * ORDER BY prefers an ordinal alias wherever possible * Input and output columns can be used in the ORDER BY clause, but if there is ambiguity, the output columns have **higher precedence** There are several error messages that may be caused by this change to Presto. The issues in all the cases below are due to naming conflicts in the query -- use of aliases in the SELECT or ORDER BY that collide with column name or table name in the input. The use of the alias prevents the reference to the original input table or column. Rewrites generally involve using aliases that don’t match the input table or column names, or using ordinals in the ORDER BY clause. For each case, we show the message, a query that causes the error, and an example of how to rewrite the query. ### Error Message: Mismatched types: varchar(1) vs varchar Original code that causes the error: ```sql SELECT * FROM page_view WHERE CAST(week_page_total_actions as VARCHAR) IS DISTINCT FROM '0'; ``` Modified code that fixes the error: ```sql SELECT * FROM page_view WHERE CAST(week_page_total_actions as VARCHAR (1)) IS DISTINCT FROM '0'; ``` ``` ### Error Message: Non deterministic ORDER BY expression is not supported with SELECT DISTINCT Original code that causes the error: ```sql SELECT week_page_total_actions FROM ( SELECT DISTINCT week_page_total_actions, rand() FROM page_view WHERE week_page_total_actions NOT LIKE '1' ORDER BY rand() LIMIT 20 ) sub ``` Modified code that fixes the error: ```sql SELECT week_page_total_actions FROM ( SELECT DISTINCT week_page_total_actions, rand() FROM page_view WHERE week_page_total_actions NOT LIKE '1' LIMIT 20 ) sub -- Remove the function or expression from the ORDER BY clause ``` ``` ### Error Message: Invalid reference to output projection attribute from ORDER BY aggregation Original code that causes the error: ```sql SELECT code, count(code) AS cnt FROM www_access GROUP BY code ORDER BY count(code) -- output column, code, is used at ORDER BY aggregation ``` Modified code that fixes the error: ```sql SELECT code, count(code) AS cnt FROM www_access GROUP BY code ORDER BY count(www_access.code) -- (or) ORDER BY cnt (or) ORDER BY 2 ``` ``` ### Error message: Expression XXX is not of type ROW Example code that causes the error: ```sql SELECT TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z',    'JST') AS time,  code AS access -- higher precedence FROM www_access access -- lower precedence ORDER BY access.time -- access refers to alias column "code" here -- so access.time doesn't make sense ``` Modified code that fixes the error: ```sql SELECT  TD_TIME_FORMAT(time,    'yyyy-MM-dd HH:mm:ss z',    'JST') AS time,  code AS access_another -- avoid naming conflict FROM www_access access ORDER BY access.time ``` Note that the following query (where the name clash occurs on the identifier **time**) is not equivalent to the rewrite above. The return value of TD_TIME_FORMAT aliased as **time** will be a date-formatted VARCHAR, not a timestamp: ```sql SELECT  TD_TIME_FORMAT(time,    'yyyy-MM-dd HH:mm:ss z',    'JST') AS time, -- alias has higher precedence, is used at ORDER BY  count(*) AS access FROM access_log access ORDER BY time -- (or) ORDER BY 1 ``` ### Error Message: For SELECT DISTINCT, ORDER BY expressions must appear in select list Example code that causes the error: ```sql SELECT  DISTINCT TD_TIME_FORMAT(time,    'yyyy-MM-dd HH:mm:ss z',    'JST') AS time,  code FROM www_access access ORDER BY TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z', 'JST') -- this TD_TIME_FORMAT call does not match an expression in the SELECT list, because -- time in ORDER BY refers to the varchar returned by TD_TIME_FORMAT ``` Modified code that fixes the error: ```sql SELECT  DISTINCT TD_TIME_FORMAT(time,    'yyyy-MM-dd HH:mm:ss z',    'JST') AS time,   -- higher precedence, overrides input time column  code FROM www_access access ORDER BY time -- could also use ORDER BY 1 ``` ### Error messages: Function names are different but a solution is the same. Unexpected parameters (varchar, varchar(21), varchar(3)) for function **td_time_format**. Expected: td_time_format(bigint, varchar, varchar) , td_time_format(bigint, varchar) Unexpected parameters (varchar(5), varchar) for function **date_trunc**. Expected: date_trunc(varchar(x), date) , date_trunc(varchar(x), time) , date_trunc(varchar(x), time with time zone) , date_trunc(varchar(x), timestamp) , date_trunc(varchar(x), timestamp with time zone) Example code for **td_time_format** that causes the error: ```sql SELECT TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z', 'JST') AS time, -- higher precedence, overrides input time column code FROM www_access access ORDER BY TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z', 'JST')-- time here is the varchar output column in select list, -- not the timestamp input column -- so the expression here is not the same as the -- expression in the select listed``` Modified code for **td_time_format** that fixes the error: ```sql SELECT TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z', 'JST') AS time,   -- higher precedence code FROM www_access access ORDER BY time -- (or) ORDER BY 1 ``` ## JOIN with USING With SELECT… JOIN... with the USING clause for an equijoin, any columns mentioned in the USING clause will appear only once in the output, with an unqualified name, rather than once for each table in the join. It is an error to reference the column from the source tables with qualified names. The following error messages may be caused by this change vs. previous Presto. For each, we show an example of how to rewrite the query. ### Error Message: Column 'xxx.yyyy' cannot be resolved Example code that causes the error: ```sql SELECT lineitem.orderkey, lineitem.quantity,   orders.totalprice, orders.orderkey FROM lineitem JOIN orders USING (orderkey); ``` Note the references to lineitem.orderkey and orders.orderkey. Modified code that fixes the error: ```sql SELECT orderkey, lineitem.quantity, orders.totalprice FROM lineitem JOIN orders USING (orderkey); ``` Correlated Subquery and LIMIT ## Error Message: Given correlated subquery is not supported After the upgrade, correlated subqueries no longer support the inclusion of a LIMIT clause. Rewrite the correlated subquery to not require the LIMIT clause. Example code that causes the error: ```sql SELECT * FROM region r, LATERAL ( SELECT * FROM nation n WHERE n.regionkey = r.regionkey LIMIT 1 ) -- this will fail ``` Modified code that fixes the error: ```sql SELECT * FROM region r, LATERAL ( SELECT * FROM nation n WHERE n.regionkey = r.regionkey ) ``` Note that for this issue, you should review your code closely to make sure your changes do not change the meaning of the code. # Recommended Update: Replace TD_SESSIONIZE with TD_SESSIONIZE_WINDOW While reviewing code for this update you may also wish to make an important upgrade to any pre-existing queries that use TD_SESSIONIZE. The TD_SESSIONIZE() function has been deprecated because of performance issues and inconsistent results. The replacement Presto window function, TD_SESSIONIZE_WINDOW(), was added in 2016. It addresses the issues with TD_SESSIONIZE. The detailed instructions on rewriting your queries are here: [Migrating from Presto TD_SESSIONIZE() to TD_SESSIONIZE_WINDOW()](/products/customer-data-platform/data-workbench/queries/presto-0-188-to-0-205-migration-2018#h1_839700295)