Data Processing

This article explains the different options for processing data within Treasure Data.

Untitled-3
You can perform most data processing tasks directly on the console. Go to https://console.treasure-data.com for more information.

Table of Contents

Data Processing Options

Once the data is in, Treasure Data provides a wide range of data processing options.



Data Processing with Multiple Engines

Treasure Data allows users to issue jobs (see Job Management) to process the data. When you issue the jobs, you can specify which data processing engine to use. Currently, we’re supporting two different data processing engines:

1.) Heavy Lifting SQL (Hive)

Hive is a MapReduce-based SQL engine. This engine is really powerful when you do large data processing and heavy JOINs. Often used for ETL or sessionization.

2.) Interactive SQL (Presto)

Presto provides low-latency SQL access to the dataset.

Scheduled Jobs

Treasure Data has a scheduler feature called Scheduled Jobs that supports periodic query execution. This allows you to launch hourly / daily / weekly / monthly jobs, WITHOUT having a cron daemon.

We take great care in distributing and operating our scheduler in order to achieve high availability. You can use any of the engines mentioned above for scheduled jobs.

Result Output

Result Output is a feature to push Treasure Data’s query result into other systems, such as RDBMS (MySQL, PostgreSQL, RedShift), Google Spread Sheet, FTP, etc. By using this feature, you can integrate Treasure Data with your existing system instantly.

You can use Scheduled Jobs with Result Output, so that you can periodically launch Treasure Data jobs and write the result somewhere else.


Last modified: Feb 10 2017 08:32:35 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.