Trino Performance Tuning
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

Review the following information to learn how to tune Trino for the best performance. These performance tuning tips assume the following level of experience:

Basic knowledge of Treasure Data.
Basic knowledge of Trino query engine. Learn how to write Trino Queries.

Tuning Tips

Review these tips and make adjustments to increase your Trino queries performance.

Always use TD_TIME_RANGE or TD_INTERVAL
Avoid using slow memory consuming operators such as
- ORDER BY
- COUNT(DISTINCT x)
  - Consider using approximate aggregate function approx_distinct(x) instead.
Join processing
- Tables should be joined in the order of a larger table to smaller tables.
- Using a non-equi join condition slows down the query processing.
Columnar storage characteristics
- Choosing too many columns slows down query processing.
Query result size
- Generating too many rows takes time. Consider writing result records to a table with CREATE TABLE AS... or INSERT INTO.

Specifying the Columns you Need

Actual data in Treasure Data is stored as a columnar storage format, which is optimized for the query using only specific columns. Restricting accessed columns can improve your query performance significantly. Specify only needed columns instead of using a wildcard (*).

[GOOD]: SELECT time,user,host FROM tbl
[BAD]:  SELECT * FROM tbl

Leveraging Time-Based Partitioning

All imported data is automatically partitioned based on the time field within each data record.

By specifying the time range in your query, you can avoid reading unnecessary data and can thus speed up your query significantly.

Specify Time as an Integer

When the time field is specified within a WHERE clause, the query parser automatically detects which partitions should be processed. This auto-detection does not work if you specify the time with float instead of int.

[GOOD]: SELECT field1, field2, field3 FROM tbl WHERE time > 1349393020
[GOOD]: SELECT field1, field2, field3 FROM tbl WHERE time > 1349393020 + 3600
[GOOD]: SELECT field1, field2, field3 FROM tbl WHERE time > 1349393020 - 3600
[BAD]:  SELECT field1, field2, field3 FROM tbl WHERE time > 13493930200 / 10
[BAD]:  SELECT field1, field2, field3 FROM tbl WHERE time > 1349393020.00
[BAD]:  SELECT field1, field2, field3 FROM tbl WHERE time BETWEEN 1349392000 AND 1349394000
[BAD]:  SELECT field1, field2, field3 FROM tbl WHERE time > (SELECT MAX(last_updated) FROM tbl2)

Use TD-TIME-RANGE

You can use TD_TIME_RANGE to partition data.

[GOOD]: SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01', 'UTC')
[GOOD]: SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01', NULL, 'UTC')
[GOOD]: SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01',
                                       TD_TIME_ADD('2013-01-01', '1day', 'UTC'))

However, if you use division in TD_TIME_RANGE, the time partition optimization doesn’t work. For example, Treasure Data discourages the SQL constructions as they disable optimization.

Poorly optimized SQL

SELECT ... WHERE TD_TIME_RANGE(time, TD_SCHEDULED_TIME() / 86400 * 86400))

Poorly optimized SQL

SELECT ... WHERE TD_TIME_RANGE(time, 1356998401 / 86400 * 86400))

The TD_INTERVAL user defined function is also available for partitioning. TD_INTERVAL provides an intuitive way to specify the time range. For example, to select the last 7 days:

SELECT ... WHERE TD_INTERVAL(time, '-7d')

Considering the Cardinality within the GROUP BY Clause

It's possible to improve the performance of the GROUP BY function by carefully ordering a list of fields within the GROUP BY in an order of high cardinality.

SQL Example
Good	`SELECT GROUP BY uid, gender`
Bad	`SELECT GROUP BY gender, uid`

Or, use numbers instead of strings for the GROUP BY column, because numbers require less memory and are faster to compare than strings.

Using LIMIT with ORDER BY

ORDER BY requires that all rows be sent to a single worker which then sorts them. ORDER BY might often require a lot of memory on a Trino worker. When you want to look up the top or bottom N records, use LIMIT which can reduce the sort cost and memory pressure.

SQL Example
Good	`SELECT * FROM tbl ORDER BY time LIMIT 100`
Bad	`SELECT * FROM tbl ORDER BY time`

Using Approximate Aggregate Functions

Trino has a couple of approximate aggregation functions, that give you significant performance improvements, but with some errors. For example, by using approx_distinct() function, you can get an approximation of COUNT(DISTINCT x)with a standard error of 2.3%. The following example gives an approximate count of the previous day’s unique users.

SELECT
  approx_distinct(user_id)
FROM
  access
WHERE
  TD_TIME_RANGE(time,
    TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1d', 'PDT'),
    TD_SCHEDULED_TIME())

Aggregating a Series of LIKE Clauses in One Single regexp_like Clause

Trino’s query optimizer is unable to improve queries that contain many LIKE clauses. As a consequence, the query execution might be slower than expected.

To improve the performance, you can substitute a series of LIKE clauses that are chained with the OR condition with a single regexp_like clause, which is Trino native.

For example:

SELECT
  ...
FROM
  access
WHERE
  method LIKE '%GET%' OR
  method LIKE '%POST%' OR
  method LIKE '%PUT%' OR
  method LIKE '%DELETE%'

can be optimized by replacing the four LIKE clauses with a single regexp_like clause:

SELECT
  ...
FROM
  access
WHERE
  regexp_like(method, 'GET|POST|PUT|DELETE')

Choosing Trino Join and Sort Algorithms

Join Algorithms

There are two types of join distributions:

Partitioned: each node participating in the query builds a hash table from a fraction of the data
Broadcast: each node participating in the query builds a hash table from all of the data (data is replicated to each node)

The default join algorithm used in Trino is distributed PARTITION join. This algorithm partitions both the left and right tables using the hash values of join keys. Partitioned join uses multiple worker nodes for processing queries; it is generally faster and takes less memory.

In some cases where one of the join tables is very small, the overhead of partitioning data through the network using distributed PARTITION join might exceed the benefit of broadcasting the whole table to all the nodes participating in the join operation. In those cases, 'BROADCAST' join might perform better. If you use the BROADCAST join, specify the large table first in the join clause. 'BROADCAST' join can be enabled specifying the following magic comment.

-- set session join_distribution_type = 'BROADCAST'

This option will use more memory as the right join table will be copied to all nodes.

Sort Algorithms

For queries with ORDER BY clause, Trino uses Distributed sort by default; the sorting operation runs in parallel on multiple nodes, and a single node merges the final results. However, there might be cases where a single node sort operation performs better. For such cases, the distributed sort can be disabled by using the following magic comment.

-- set distributed_sort=false

Disabling distributed sort will use more memory of a single node; the query might fail due to the maximum memory limit.

Defining Partitioning for Trino

All tables in Treasure Data are partitioned based on the time column. This is why queries that use TD_TIME_RANGE or similar predicates on the time column are efficient in Treasure Data. Trino can eliminate partitions that fall outside the specified time range without reading them.

User-defined partitioning (UDP) provides hash partitioning for a table on one or more columns in addition to the time range partitioning. A query that filters on the set of columns used as user-defined partitioning keys can be more efficient because Trino can skip scanning partitions that have matching values on that set of columns.

Refer to User-defined partitioning for the details of UDP.

Managing Resources for Trino Queries

Use resource pools
Run more concurrent queries
Use more CPU resources
Use more memory (limited)

Magic Comments

Magic comments are instructions inserted into Trino SQL queries and given by the database driver to override the database optimization that occurs with standard Trino queries. The database driver follows certain rules for adding the magic comments to the database query and the magic comments override any execution plan the query optimizer might select for a query.

Use magic comments in cases when you know more about the database content and structure than the query optimizer. It can be more efficient to use magic comments to improve execution time.

Magic Comment Parsing Decimal Literals as Double

A Trino session-level property that controls whether the decimal literals (ex. 1.2) are parsed as decimal or double.

The parse_decimal_literals_as_double session property is supported as a magic comment.

Decimal literals without an explicit type specifier (e.g. 1.2) are treated as the values of the DOUBLE type by default.

This magic comment can disable the behavior and use decimal literals as the values of the DECIMAL type. For example:

-- set session parse_decimal_literals_as_double = 'false'
SELECT * FROM large_table, small_table
WHERE small_table.id = large_table.id

Magic Comment Distributed Sort

Trino session property that when enabled allows the sort operator to execute in parallel on multiple nodes in the cluster.

Distributed sort allows sorting of data, which exceeds query.max-memory-per-node. Distributed sort is enabled through the distributed_sort session property, or distributed-sort configuration property set in etc/config.properties of the coordinator. Distributed sort is enabled by default.

When distributed sort is enabled, the sort operator executes in parallel on multiple nodes in the cluster. Partially sorted data from each worker node is then streamed to a single worker node for a final merge. This technique allows to utilization of memory of multiple worker nodes for sorting. The primary purpose of the distributed sort is to allow for the sorting of data sets that don’t normally fit into single node memory. Performance improvement can be expected, but it won’t scale linearly with the number of nodes, since the data needs to be merged by a single node.

-- set session distributed_sort = 'true'
SELECT * FROM large_table, small_table
WHERE small_table.id = large_table.id

Magic Comment Time Partitioning Range

A TD-specific setting to customize partition size for INSERT/CTAS queries.

The time_partitioning_range session property is supported as a magic comment.

-- set session time_partitioning_range = 'value'

The value can be:

none
- no time partitioning
number with unit
- allowed units are h, d, mo, q, y (hour, day, month, quarter, year). For example, 12h, 2d, 1mo, 1q, 1y

Magic Comment Limiting Execution Time

If you want to override any other limits for a single query, you can apply the following query hint at the beginning of the query:

-- set session query_max_execution_time = '2h'
SELECT
  COUNT(*)
FROM
  ALL_EVENTS
;

If limits are set in your account, and you write a query with a hint, then the smallest limit between account, resource-group, query-hint, and global is used. Supported time measures are h (hours), m (minutes), s (seconds).

If the limit specified in the query hint is longer than the configured limit, then the shorter configured limit still applies. For example, if a 1-hour limit is configured for ad hoc queries, then you can use the magic comment to limit a query to 5 minutes (5m) but not to increase the limit to 2 hours (2h).

Info

All Trino queries are limited to 6 hours runtime, regardless of any account-level policy or query hint.

Magic Comment to improve performance for result output

To improve the job performance of queries that produce massive results, result_output_redirect is used to enhance the performance of Trino jobs. By default, result_output_redirect is set to 'true'.

-- set session result_output_redirect='true'

This impacts performance even though the query results are ready; TD Trino is waiting for the worker node to complete its sequential operations. To mitigate this, Treasure Data now uses result_output_direct, which redirects the query result to S3 in parallel, thereby improving the performance of the queries.

LIMITATION: There are SQL queries where Hints do not improve performance, including:

When a query includes ORDER BY
When a query is CREATE TABLE or INSERT TABLE
When a query includes uppercase letters
When a query has duplicated column names
When a query has NULL without explicit type

Trino Join Performance Improvement with Equi-Joins

Sometimes you can use equi-joins to improve performance of the joins in your Trino queries. The equi-join concatenates tables by comparing join keys using the equal (=) operator.

If this comparison becomes complex, the join processing slows down.

For example, if you want to join two tables with date string, ‘2015-10-01’, but one of the tables only has columns for year, month, and day values, you can write the following query to generate date strings:

SELECT a.date, b.name FROM
left_table a
JOIN right_table b
ON a.date = CAST((b.year * 10000 + b.month * 100 + b.day) as VARCHAR)

This query delays the join processing because the join condition involves several expressions.

To make the example query faster, you can push this condition down into a subquery to prepare a join key beforehand:

SELECT a.date, b.name
FROM
  left_table a
JOIN (
  SELECT
    CAST((b.year * 10000 + b.month * 100 + b.day) as VARCHAR) date,
# generate join key
    name
  FROM right_table
) b
ON a.date = b.date  # Simple equi-join

In this example, join keys are a.date and b.date str columns. Comparing VARCHAR strings is much faster than comparing VARCHAR and expression results.

Trino Query Simplification Options

The following are some of the ways that you can simplify your Trino queries.

Use WITH statement

If your query becomes complex or deeply nested, try to extract subqueries using WITH clause. For example, the following query that has a nested subquery:

SELECT a, b, c FROM (
   SELECT a, MAX(b) AS b, MIN(c) AS c FROM tbl GROUP BY a
) tbl_alias

can be rewritten as follows:

WITH tbl_alias AS (SELECT a, MAX(b) AS b, MIN(c) AS c FROM tbl GROUP BY a)
SELECT a, b, c FROM tbl_alias

You can also enumerate multiple sub-queries in WITH clause, by using a comma:

WITH tbl1 AS (SELECT a, MAX(b) AS b, MIN(c) AS c FROM tbl GROUP BY a),
     tbl2 AS (SELECT a, AVG(d) AS d FROM another_tbl GROUP BY a)
SELECT tbl1.*, tbl2.* FROM tbl1 JOIN tbl2 ON tbl1.a = tbl2.a

Use WITH statement in CREATE TABLE statement

If your CREATE TABLE query becomes complex or deeply nested, try to extract subqueries using WITH clause. For example, one sub query can be rewritten as follow:

CREATE TABLE tbl_new AS WITH tbl_alias AS (SELECT a, MAX(b) AS b, MIN(c) AS c FROM tbl1)
SELECT a, b, c FROM tbl_alias

You can also enumerate multiple sub-queries in WITH clause as in the following:

CREATE TABLE tbl_new AS WITH tbl_alias1 AS (SELECT a, MAX(b) AS b, MIN(c) AS c FROM tbl1),
                             tbl_alias2 AS (SELECT a, AVG(d) AS d FROM tbl2)
SELECT tbl_alias1.*, tbl2_alias.* FROM tbl_alias1 JOIN tbl_alias2 ON tbl_alias1.a = tbl_alias2.a

Specify GROUP BY targets with numbers

Group by clause requires having the same expression in the SELECT statement:

SELECT TD_TIME_FORMAT(time, 'yyyy-MM-dd HH', 'PDT') hour, count(*) cnt
FROM my_table
GROUP BY TD_TIME_FORMAT(time, 'yyyy-MM-dd HH', 'PDT')  # <-- redundant expression

You can simplify this query by using GROUP BY 1, 2, …:

SELECT TD_TIME_FORMAT(time, 'yyyy-MM-dd HH', 'PDT') hour, count(*) cnt
FROM my_table
GROUP BY 1

These numbers correspond to the column indexes (1-origin) of the SELECT statement.

Trino Performance TuningCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code