About Queries
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

Treasure Data uses two query engines that use SQL commands as well as a pre-defined query interface that allows the user to filter data using company-defined attribute and behavior filters. All of this can be found in the TD Console. The two query engines are Trino and Hive and the query editor for those are found in the Data Workbench under Queries. The pre-defined query interface is found in the Audience Studio under Master Segments.

Queries in Treasure Data

In Treasure Data, queries are used to give you regular insights in to what your customer behavior is from the variety of first, second and third party data sources that your organization has access to. You can create regularly scheduled queries that will update specific information at regular intervals or you can create queries that provide moment-in-time snapshots of your customers.

You can execute queries from the following interfaces:

TD Console
API
JDBC/ODBC
Scheduled queries
Hosted workflow execution framework

Every query is managed as a job, and for each job you can specify a query engine.

Trino as a Query Engine

History about query engine names Trino and Presto

Prior to its rebranding in 2021, the Trino query engine was known as PrestoSQL or more simply Presto. Treasure Data supported several earlier versions of Presto. Newer versions of Presto are now supported under the Trino branding, and support for Trino 423 has now been introduced. For the immediate future, you will see both Presto and Trino used in Treasure Data documentation. New functionality or new features will be identified as being Trino; however, documentation for existing functions and features that are common to both Presto and Trino will continue to be identified as Presto.

Presto is designed for short interactive queries useful for data exploration. It uses the industry-standard ANSI SQL dialect.

It offers better real time results than Hive, but has high latency in processing large amounts of data. Additionally, it has some limits in memory capacity, so if the data you are querying exceeds that memory capacity, the query will fail. At the same time, it processes queries 10-30X faster than Hive.

So when would one use Trino?

You have a small to medium size query and you want immediate results for a specific ad hoc or exploratory query.

You can find more details about Trino in About Presto Query Engine.

Hive as a Query Engine

Hive is good for batch processing and regularly scheduled queries. It has a low query latency when it comes to larger amounts of data, but has a slower processing time so it’s not as good for smaller ad hoc and exploratory queries. Hive is good for queries where it’s important that it has a 99.9% chance of running all the way to completion because it has a higher fault tolerance and will not need the entire query to be rerun if an error results. Hive uses a dialect of SQL language called HiveQL. HiveQL has a number of good features especially for the use cases of querying a large number of tables, but doesn’t have as complete and familiar a feature set as Trino.

So when would one use Hive?

You have a large size query for which you don’t need immediate results, as well as you have regularly scheduled queries.

You can find more details about Hive in About Hive Query Engine.

Characteristics and Use Cases of Trino and Hive

Trino: Trino is designed for fast, interactive queries. Trino is optimized for latency. Its execution model is referred to as push-model.
Hive: Designed for querying of large data systems in the open-source Hadoop platform. Hive 2023.1 converts SQL-like queries into Tez jobs for easy execution and processing of extremely large volumes of data. Hive is optimized for query throughput. Its execution model is refered to as pull-model.

	Trino	Hive
Optimized for	Interactivity	Throughput
SQL Standardized fidelity	Designed to comply with ANSI SQL	HiveQL (a subset of common data warehousing SQL)
Large JOINs	Optimized for star scheme joins (1 large fact table and many smaller dimension tables)	Very good for large fact-to-fact joins
Maximum Query Length	1MB	No Hard Limit

About QueriesCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code

Queries in Treasure Data

Trino as a Query Engine

Hive as a Query Engine

Characteristics and Use Cases of Trino and Hive

Was this helpful?

About Queries
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code