Visit our new documentation site! This documentation page is no longer updated.

Data Processing

This article explains the different options for processing data within Treasure Data.

Untitled-3
You can perform most data processing tasks directly on our console at https://console.treasure-data.com.

Table of Contents

Data Processing Options

Treasure Data provides two major ways of processing data for data collected from both batch & streaming sources.

  • Presto for ad-hoc and shorter batch workloads
  • Hive for large or complex batch workloads

Data Processing with Multiple Engines

Treasure Data allows users to issues queries from API, JDBC/ODBC, the web console, via scheduled queries, or our hosted workflow execution framework.

All of these issued queries are managed as separate jobs (see Job Management). For every query you issue, you can specify which data processing engine to use. Currently, we’re supporting two different data processing engines:

1.) Heavy Lifting SQL (Hive)

Hive is a MapReduce-based SQL engine. This engine is really powerful when you do large data processing and heavy JOINs. Often used for ETL or sessionization.

2.) Interactive SQL (Presto)

Presto provides low-latency SQL access to the dataset.

Scheduled Jobs

Treasure Data has a scheduler feature called Scheduled Jobs that supports periodic query execution. This allows you to launch hourly / daily / weekly / monthly jobs, WITHOUT having a cron daemon.

We take great care in distributing and operating our scheduler in order to achieve high availability. You can use any of the engines mentioned above for scheduled jobs.

Input Transfers

You use input transfers to transfer data from a data source into Treasure Data. You can transfer data in bulk or can configure your account to import data incrementally.

After establishing your data transfer, you can manage data input as a job.

Result Output

Result Output is a feature to push Treasure Data’s query result into other systems, such as RDBMS (MySQL, PostgreSQL, RedShift), Google Spread Sheet, FTP, etc. By using this feature, you can integrate Treasure Data with your existing system instantly.

You can use Scheduled Jobs with Result Output, so that you can periodically launch Treasure Data jobs and write the result somewhere else.

You can find the Output Results option as a checkbox in the Query Editor. Use the option to set up an export of query results.


Last modified: Apr 16 2018 19:03:11 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.