You can parallelize the query result output process by using the CREATE TABLE AS SELECT statement. If you DROP the table before running the query, your performance is significantly better. The result output performance will be 5x faster than running SELECT *. Treasure Data Presto skips the JSON output process and directly produces a 1-hour partitioned table.
Without using DROP TABLE, Presto uses JSON text to materialize query results. And if the result table contains 100GB of data, the coordinator transfers more than 100GB of JSON text to save the query result. So, even if the query computation is almost finished, output of theJSON results takes a long time.
To clean up the result table beforehand:
Add a DROP TABLE statement at the top of your query.
Use CREATE TABLE (table) AS SELECT …
For example, your query might look like this: