Skip to content
Last updated

Data Partitioning in Treasure Data

When your data is stored in Treasure Data, it is partitioned according to its timestamps. Data is partitioned on the time column, generally into one-hour partitions.

By constraining the 'time' column, you can avoid processing an entire data set and can instead have more targeted data processing. The partitioning enables good performance, efficient data management, and increased availability.

In the following examples, only records that fit the specified time range are selected.

--example 1:
SELECT  
  ... 
WHERE 
  TD_TIME_RANGE(time,'2013-01-01', 'PDT', null);

--example 2:
SELECT 
  COUNT(1) 
FROM 
  table_name
WHERE 
  TD_TIME_RANGE(time, '2017-07-01', '2017-07-02', 'UTC');

TD provides real-time storage and archive storage for customer data. The partitioned data is moved into real-time storage and then moved into archive storage. Data imported into Treasure Data through a streaming import API, such as td-agent and JS/Mobile SDK, is initially stored in the real-time storage. Data connectors or the bulk import API load data directly into archive storage.

User-Defined Partitioning

User-defined partitioning is an alternative to timestamp-based partitioning. User-defined partitioning allows other data partitioning strategies that can improve performance when working with non-time-series data. For more information, see Defining Partitioning for Presto.

For examples of how to use time-based partitioning in Treasure Data, refer to: