This article explains the different options for processing data within Treasure Data.
|You can perform most data processing tasks directly on the console. Go to https://console.treasure-data.com for more information.|
Table of Contents
Data Processing Options
Once the data is in, Treasure Data provides a wide range of data processing options.
Data Processing with Multiple Engines
Treasure Data allows users to issue jobs (see Job Management) to process the data. When you issue the jobs, you can specify which data processing engine to use. Currently, we’re supporting three different data processing engines:
1.) Heavy Lifting SQL (Hive)
Hive is a MapReduce-based SQL engine. This engine is really powerful when you do large data processing and heavy JOINs. Often used for ETL or sessionization.
2.) Scripted Processing (Pig)
Pig is a relational data-flow language built on top of the Hadoop platform. If you need the sort of complex data processing requiring multiple phases of data transformation, Pig would be recommended.
3.) Interactive SQL (Presto)
Presto provides low-latency SQL access to the dataset.
Treasure Data has a scheduler feature called Scheduled Jobs that supports periodic query execution. This allows you to launch hourly / daily / weekly / monthly jobs, WITHOUT having a cron daemon.
We take great care in distributing and operating our scheduler in order to achieve high availability. You can use any of the engines mentioned above for scheduled jobs.
Result Output is a feature to push Treasure Data’s query result into other systems, such as RDBMS (MySQL, PostgreSQL, RedShift), Google Spread Sheet, FTP, etc. By using this feature, you can integrate Treasure Data with your existing system instantly.
You can use Scheduled Jobs with Result Output, so that you can periodically launch Treasure Data jobs and write the result somewhere else.
Last modified: Jul 16 2015 18:05:23 UTC