Architecture Overview

Table of Contents


Treasure Data is the live data management platform to make your data connected, current and easily accessible. Import your data instantly, and process it by issuing a SQL-style query which runs in parallel on the cloud. Focus 100% on your data analytics, and never worry about server scaling, storage and infrastructure ever again.

Why We Started Our Business?

Despite the increasing importance of data warehousing and analytics, most solutions today remain luxuries available only to large businesses. We strive to bring data management and analytics to the masses by leveraging open source software.


Overabundance of data is now a ubiquitous problem across all industries. Due to the proliferation of smart phones, social media, and device sensors, data is increasing exponentially both in terms of volume and velocity. Businesses today are trying to find valuable insights from their data. However, managing very large amounts of variety data has proven to be a challenge for many companies.

This is known as the Big Data Problem and it is three-fold:

  • Resource Problem: Strong data engineers are in high demand.
  • Economic Problem: Existing data warehouses are expensive. The operational overhead to manage large-scale clusters is a major concern for management.
  • Architectural Problem: Changing the existing infrastructure is difficult.

Our service is designed to solve these problems.

What We Provide

Treasure Data is the live data management platform to make your data connected, current and easily accessible. The major advantages are:

  • Instant setup.
  • Infinite cloud storage capacity, with high availability.
  • Easy to use, and does not require deep knowledge of “Big Data” technologies. You focus on your data and analytics instead of cluster management.
  • Elastic resource allocation that can be scaled up or down at any time.
  • Connectivity to your existing data sources and analytics applications without architectural changes.
  • Significantly Lower Costs of cloud storage than traditional data warehousing solutions. Our monthly subscription model lowers initial investment requirements.

Architecture Diagram

This is Treasure Data’s architecture.

Key features include:

1) Built-in Data Collector

Treasure Data provides a data collection daemon, Treasure Agent, which is installed in your existing infrastructure. Treasure Agent collects records from various data sources and continuously uploads the data to Treasure Data’s cloud storage.

2) Distributed Columnar Storage

All your data is stored in the cloud as columnar data. This format achieves far better performance and compression compared to existing RDBMSs. We also believe that your data is yours: you can bulk-export your data at any time.

3) SQL-Style Query with Parallel Execution

Treasure Data lets you analyze your data using a SQL-style language. The query is automatically executed in parallel at the cloud-hosted Hadoop cluster, so the users are able to handle large datasets without having to learn anything new. Also, our engine is schema-free. You can modify your data schema at any time.

4) Export to RDBMS/Traditional Warehouse

A built-in export capability is provided for writing summarized data from the Treasure Data store to a traditional RDBMs or Data Warehouse. This enables efficient processing of large data volumes with Treasure Data both as a primary analytics engine as well as a preprocessing platform.

5) 3rd Party BI/ETL Tools Connectivity

Treasure Data also provides a standard JDBC interface for data transfer to existing (or future) BI tools. The need for custom coding and maintenance to link these environments with the primary data store is eliminated.

6) Curation and Support

We actively curate each piece of technology we use. We ensure that you have the latest in software (e.g. Apache Hadoop) and best practices, and that it’s all stable and integrated smoothly. Our support team is dedicated to helping our users.

Use Cases

Our users hail from a wide range of industries, including digital marketing, social media, social games, finance, e-commerce, etc. Some real use cases include:

Name Description
Gaming Better user acquisition, retention, and revenue.
e-Commece From understanding your customers, to build own recommendation engines.
A/B Testing Comparing feature implementations between different sets of users.
Marketing Attribution Which contents work better in the customer journey?
Telematics Analyze and make sense of data coming from automotives.
Weblog Analysis Calculating daily UU and PV numbers.
Twitter Analysis Who gets the most ReTweets for topic X?

What’s Next?

If you need further information, please consult our resources page.

If you don’t have an account yet, please proceed to the quickstart guide.

Last modified: Nov 29 2016 21:50:52 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.