You can exercise different options for processing data within Treasure Data. Data processing within the platform includes import and export data, creating queries and running jobs, as well as managing workflows.

You can perform most data processing tasks directly on TD Console at

Importing Data

You can import data from an external data source using a data connector to an external tool or by uploading a CSV or TSV file into Treasure Data. You can transfer data in bulk or can configure your account to import data incrementally. After establishing your data transfer, you can manage data input as a job.

Query Engine Options

Treasure Data allows users to issues queries from API, JDBC/ODBC, the TD Console, via scheduled queries, or our hosted workflow execution framework.

All of these issued queries are managed as separate jobs. Treasure Data provides two major ways of processing data for data collected from both batch and streaming sources. For every query you issue, you can specify one of the following data processing engines:

  • Presto for ad hoc and shorter batch workloads. Presto provides low-latency SQL access to the data set.

  • Hive for large or complex batch workloads. Hive 2019.1 is a MapReduce-based SQL engine. Hive 2020.1 (Tez®) is an application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. 

Scheduling Jobs

Treasure Data has a scheduler feature called Scheduled Jobs that supports periodic query execution. This allows you to launch hourly, daily, weekly, or monthly jobs, without having to use a cron daemon.

We carefully configure our scheduler to ensure high availability. You can use any of the engines listed in the preceding section for scheduled jobs.

Exporting Results

Export Results is used to push Treasure Data’s query results in other systems. For example, other systems MySQL, PostgreSQL, RedShift, Google Spread Sheet, and FTP. By using this feature, you can integrate Treasure Data with your existing system instantly.

Queries are used to select your data. Establish your query, make sure the Export Results checkbox is selected, save, and run your query. For ongoing management of the export results, use the jobs area of the TD Console.

Automating Jobs

You can use TD Workflow to manage and automate jobs, perform incremental processing, and for data segmentation.

  • No labels