# Presto 0 205 to 317 Migration 2020 Treasure Data’s Presto release is based on the Presto 317 open source release. Our previous release was based on Presto 0.205. This article includes: * [New Magic Comment Features](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h1_1921683457) * [parse_decimal_literals_as_double](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2__584086195) * [Distributed_Sort](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2__1974809204) * [time_partitioning_range](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2__1734486412) * [Deprecated Features](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h1__1821394898) * [Migrate Your Distributed Joins](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2_2028709490) * [Updating Presto](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h1__1516417081) * [About Side-by-Side Environments](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2__1521992254) * [Running Code in Presto 317 vs. Presto 0.205](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2__1437983654) * [Query Syntax Fixes](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h1__87043302) * [Having too many OR predicate. Maximum 500 OR allowed](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2_1448312120) * [Key not present in map: XXX](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2_756987314) * [Array subscript out of bounds](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2__599545120) * [Column alias list has 1 entry but 't' has N columns available'](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2_1644950305) * [Window does not support distinct](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2__285762603) * [ARRAY comparison not supported for arrays with null elements](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2__1682355132) * ['XXX' must be an aggregate expression or appear in a GROUP BY clause](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2_2060302521) * [Zero-length delimited identifier not allowed](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2__852805306) * [Unexpected parameters (char(1)) for function codepoint. Expected: codepoint(varchar(1))](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2__190112671) * [Invalid format: "XXX" is malformed / Invalid timezone: "XXX" is malformed](/products/customer-data-platform/data-workbench/queries/presto-0-205-to-317-migration-2020#h2_1996712627) We have done internal testing to identify compatibility and performance regression issues. # New Magic Comment Features ## parse_decimal_literals_as_double The **parse_decimal_literals_as_double** session property is supported as a magic comment. Decimal literals without an explicit type specifier (e.g. 1.2) are treated as the values of the DOUBLE type by default. This magic comment can disable the behavior and use decimal literals as the values of the DECIMAL type. For example: ```sql -- set session parse_decimal_literals_as_double = 'false' SELECT typeof(1.1); ``` ## Distributed_Sort The**distributed_sort** session property is supported as a magic comment. Distributed sort allows sorting of data, which exceeds `query.max-memory-per-node`. Distributed sort is enabled through the `distributed_sort` session property, or `distributed-sort` configuration property set in `etc/config.properties` of the coordinator. Distributed sort is enabled by default. When distributed sort is enabled, the sort operator executes in parallel on multiple nodes in the cluster. Partially sorted data from each worker node is then streamed to a single worker node for a final merge. This technique allows to utilization of memory of multiple worker nodes for sorting. The primary purpose of the distributed sort is to allow for the sorting of data sets that don’t normally fit into single node memory. Performance improvement can be expected, but it won’t scale linearly with the number of nodes, since the data needs to be merged by a single node. ```sql -- set session distributed_sort = 'true' SELECT * FROM large_table, small_table WHERE small_table.id = large_table.id ``` ## time_partitioning_range The time_partitioning_range session property is supported as a magic comment. ```sql -- set session time_partitioning_range = 'value` The value can be * none * no time partitioning * number with unit * allowed units are `h, d, mo, q, y` (hour, day, month, quarter, year) * ex) 12h, 2d, 1mo, 1q, 1y # Deprecated Features ## Migrate Your Distributed Joins The **distributed_join** session property has been removed. Use **join_distribution_type** session property instead. The **join_distribution_type** system property accepts the following values: * PARTITIONED * BROADCAST Deprecated code: ```sql -- set session distributed_join = 'true' SELECT * FROM large_table, small_table WHERE small_table.id = large_table.id Updated code: ```sql -- set session join_distribution_type = 'PARTITIONED' SELECT * FROM large_table, small_table WHERE small_table.id = large_table.id ``` # Updating Presto ## About Side-by-Side Environments For your testing, Presto 0.205 and Presto 317 are available side-by-side to allow testing of the code before the upgrade. ## Running Code in Presto 317 vs. Presto 0.205 During the transition from Presto 317 and Presto 0.205 the two releases will be available side-by-side. You can control which Presto release runs your code by using the following query hint: | **Version** | **Query Hint** | | --- | --- | | Presto 317 | `-- @TD engine_version: 317` | | Presto 0.205 | `-- @TD engine_version: 0.205` | # Query Syntax Fixes Running queries that worked in the previous version might cause error messages. For each case, we show the message, a query that causes the error, and an example of how to rewrite the query. ## Having too many OR predicate. Maximum 500 OR allowed Original code that causes the error: ```sql SELECT path FROM ( SELECT * FROM sample_datasets.www_access ) WHERE path = 'foo1' OR path = 'foo2' OR path = 'foo3' -- Query having a lot of "OR" clauses OR path = 'foo559'; ``` Modified code that fixes the error: ```sql SELECT path FROM ( SELECT * FROM sample_datasets.www_access ) WHERE path IN ( 'foo1', 'foo2', 'foo3' , -- Query having a lot of conditions 'foo559'); ``` Typically, using an **IN** clause to rewrite your queries solves the issue. ## Key not present in map: XXX Original code that causes the error: ```sql SELECT MAP(ARRAY [1, 3], ARRAY [2, 4])[5]; ``` Modified code that fixes the error: ```sql SELECT element_at(MAP(ARRAY [1, 3], ARRAY [2, 4]), 5); ``` You can use the function [element_at](https://prestosql.io/docs/317/functions/map.md) to rewrite your queries. The function [element_at](https://prestosql.io/docs/317/functions/map.md) is supported by both cluster versions, v0.205, and v317. The preceding query runs on both clusters. ## Array subscript out of bounds Original code that causes the error: ```sql SELECT numbers[99] -- key isn't exist FROM (VALUES (ARRAY[1,2,3]) ) AS t(numbers); ``` Modified code that fixes the error: ```sql SELECT element_at(numbers, 99) -- key isn't exist FROM (VALUES (ARRAY[1,2,3]) ) AS t(numbers); ``` The function [element_at](https://prestosql.io/docs/317/functions/array.md) is supported by both cluster versions, v0.205, and v317. The preceding query runs on both clusters. ## Column alias list has 1 entry but 't' has N columns available' Original code that causes the error: ```sql WITH dataset AS ( SELECT ARRAY[ CAST(ROW('Amy', 'devops') AS ROW(name VARCHAR, department VARCHAR)) ] AS users ), u AS ( SELECT person FROM dataset, UNNEST(dataset.users) AS t(person) ) SELECT person.name, person.department FROM u; ``` Modified code that fixes the error: ```sql WITH dataset AS ( SELECT ARRAY[ CAST(ROW('Amy', 'devops') AS ROW(name VARCHAR, department VARCHAR)) ] AS users ), u AS ( SELECT name, department FROM dataset, UNNEST(dataset.users) AS person(name, department) ) SELECT name, department FROM u; ``` ## Window does not support distinct Original code that causes the error: ```sql SELECT UPPER(DISTINCT(id)) AS user_id FROM (VALUES ('foo'), ('bar')) AS t(id); ``` Modified code that fixes the error: ```sql SELECT UPPER(id) AS user_id FROM ( SELECT DISTINCT id FROM (VALUES ('foo'), ('bar') ) AS t (id)); ``` ## ARRAY comparison not supported for arrays with null elements Original code that causes the error: ```sql SELECT id, array_agg(value) FROM ( VALUES (1, 3), (1, 4), (1, 5), (2, 6), (2, 7), (3, null) ) AS t(id, value) GROUP BY 1 ORDER BY 2; ``` Modified code that fixes the error: ```sql SELECT id, array_agg(value) FILTER (where value is not null) FROM ( VALUES (1, 3), (1, 4), (1, 5), (2, 6), (2, 7), (3, null) ) AS t(id, value) GROUP BY 1 ORDER BY 2; ``` The function [element_at](https://prestosql.io/docs/317/functions/map.md) is supported by both cluster versions, v0.205, and v317. The preceding query runs on both clusters. ### Details of this error: Behavior change of array_agg function For Presto 0.205, the array_agg function ignores NULL values. ``` presto> SELECT id, array_agg(value) FROM ( VALUES (1, null), (1, 0) ) AS t(id, value) GROUP BY 1; id | _col1 ----+------- 1 | [0] (1 row) presto> SELECT id, array_agg(value) FROM ( VALUES (1, null) ) AS t(id, value) GROUP BY 1; id | _col1 ----+------- 1 | NULL (1 row) In Presto 0.205, _deprecated_._legacyagg_ property, true is enabled in order to keep this legacy behavior of the array_agg function. This property was removed in Presto 317 and there is no ability to roll back the legacy behavior. For Presto 317, the array_agg function doesn't ignore NULL values and returns an array including NULL values. ``` presto> SELECT id, array_agg(value) FROM ( VALUES (1, null), (1, 0) ) AS t(id, value) GROUP BY 1; id | _col1 ----+----------- 1 | [null, 0] (1 row) presto> SELECT id, array_agg(value) FROM ( VALUES (1, null) ) AS t(id, value) GROUP BY 1; id | _col1 ----+-------- 1 | [null] (1 row) As a result, when Presto compares with a column that has arrays with NULL values, Presto returns the `ARRAY comparison not supported for arrays with null elements` message. ## 'XXX' must be an aggregate expression or appear in a GROUP BY clause Original code that causes the error: ```sql SELECT code FROM sample_datasets.www_access HAVING code IS NOT NULL; ``` Modified code that fixes the error: ```sql SELECT code FROM sample_datasets.www_access WHERE code IS NOT NULL; ``` If you use a HAVING clause as a WHERE clause, this error appears. ## Zero-length delimited identifier not allowed Original code that causes the error: ```sql SELECT 1 AS ""; ``` Modified code that fixes the error: ```sql SELECT 1 AS "A"; ``` ## Unexpected parameters (char(1)) for function codepoint. Expected: codepoint(varchar(1)) Original code that causes the error: ```sql SELECT codepoint(CAST('a' AS CHAR)); ``` Modified code that fixes the error: ```sql SELECT codepoint(CAST('a' AS VARCHAR(1))); ``` ## Invalid format: "XXX" is malformed / Invalid timezone: "XXX" is malformed Error handling has been improved in v317. When an invalid argument gets passed, the current Presto returns NULL, but the new Presto returns the error. Original code that causes the error: ```sql SELECT td_time_range(1567890000, 'yyyy-MM-dd','JST', NULL); ``` Modified code that fixes the error: ```sql SELECT td_time_range(1567890000, '2019-09-08', 'JST', NULL); ```