Visit our new documentation site! This documentation page is no longer updated.

Architecture Overview

Treasure Data offers a cloud analytics platform for customer data and for management of data for the Internet of Things (IoT). Our analytics platform delivers continuous data integration of end-to-end data pipeline data through easily accessed interfaces.

Table of Contents

The Challenge

Overabundance of data is now a ubiquitous problem across all departments and industries. Due to the proliferation of smartphones, social media, and device sensors, data is increasing exponentially both in terms of volume, velocity, and variety.

Businesses today are trying to find valuable insights from their data, but face challenges:

  • Siloed customer data. For marketing and product to deliver optimized user experiences, companies must have a single, unified view of each customer
  • Leveraging data from IoT devices, IT organizations are tasked with managing the scale of data and look to reliably collect and analyze
  • Getting that data to the right person or system, with timeliness that ensures maximal impact

Treasure Data meets the challenge by providing unified customer data — on both behavior and attributes —- and an infrastructure that robustly scales without needing to hire an ever larger team of infrastructure engineers. Treasure Data enables companies to deliver the right experience at the right time.

Collect, Explore and Activate Data

Using the Treasure Data platform, you can query and enrich the data, build workflows that orchestrate complex processes, and deliver transformed results in a variety of formats to data stores and applications on premise or in the cloud. You can focus on your data analytics, while letting Treasure Data manage server scaling, storage, infrastructure, and security.

The Treasure Data platform offers:

  • Instant setup.
  • Infinite cloud storage capacity, with high availability.
  • Easy to use, and does not require deep knowledge of “Big Data” technologies. You focus on your data and analytics instead of cluster management.
  • Elastic resource allocation that can be scaled up or down at any time.
  • Ingestion Flexibility to your existing data sources and analytics applications without architectural changes. Including connectivity to the large swath of Marketing and Business SaaS applications and tools.
  • Activation with Ease Connect directly to BI Tools for reporting, marketing and sales tools for delivering insights to team members in the field, or use our Personalization API for optimizing experiences in your IoT or Marketing experience in real-time.

Architecture Diagram

Treasure Data’s architecture is performance-optimized and secure.


Access and ingest your data from new and traditional sources. For example, ingest from mobile, web, CRM, and point-of-sale systems into a schema-free data representation that enables data to be communicated and used by various business units across the organization.

Or, you can ingest from your streaming data sources for real-time collection, such as website and mobile applications and IoT devices. You can use our collection tools that work reliably at scale (we currently ingest over 1M events every second).

And yet another example, ingest from data sources that collect data with batch processes, such as production databases, CRM SaaS tools, and Point of Sales systems.


Leverage our machine learning and query engine to uncover insights that were previously hidden by the complexity of the data pipeline. Or transfer your data seamlessly into systems of your choice, such as data science tools, business analytics tools, and CRMs.


Put analysis to work in real-time by creating orchestrated workflows that leverage extensive integrations and provide services, such as customer audience profiling.

Key Features

Treasure Data features include:

Built-in Data Collector

Treasure Data provides a data collection daemon, Treasure Agent, which is installed in your existing infrastructure. Treasure Agent collects records from various data sources and continuously uploads the data to Treasure Data’s cloud storage.

Ready-to-Use Data Collection Capabilities

Treasure Data provides flexible options to simplify collecting data in a wide variety of scenarios:

  • For many common use cases, you can install Treasure Agent, our data collection daemon, in your existing infrastructure. It can collect records from a wide range of data sources and continuously upload them to TD’s cloud storage.

  • For collecting data from mobile devices, we provide easy-to-use API libraries that you can embed in iOS or Android apps.

  • We provide Javascript libraries that you can embed in your websites, advertisements and web applications to send Web event data to TD.

Storage for High Performance

  • All your data is stored in the cloud as columnar data. This format achieves far better performance and compression compared to existing RDBMSs.

  • The data is stored in Amazon Web Services S3 storage, for maximum scalability, availability and reading and writing performance.

  • We also believe that your data is yours: you can bulk-export your data at any time.

Schema-Free Data Representation

Our internal data formats are schema-free, to keep up with the variety of data types and fast-evolving schemas that are increasingly common in big, heterogeneous data use cases.

SQL-Style Query Languages: Presto and Hive

Treasure Data lets you analyze your data using industry-standard Big Data query engines: Presto and Hive. Queries are executed in parallel in our elastic clusters, that can scale to keep up with your demanding requirements for interactive query performance and high-volume batch processing.

Export to RDBMS/Traditional Warehouse

A built-in export capability is provided for writing summarized data from the Treasure Data store to a traditional RDBMs or Data Warehouse. This enables efficient processing of large data volumes with Treasure Data both as a primary analytics engine as well as a preprocessing platform.

Hadoop clusters across multiple data centers

Our computing resources are always shared across our users. If a Hadoop cluster dies, jobs are automatically reassigned to another live cluster.

3rd Party BI/ETL Tools Connectivity

Treasure Data also provides a standard JDBC interface for data transfer to existing (or future) BI tools. The need for custom coding and maintenance to link these environments with the primary data store is eliminated.

Curation and Support

We maintain the value of collateral through the curation of processes and technologies. We ensure that you benefit from best practices and tool enhancements, as well as a stability and smooth integrations. Our support team has a deep knowledge base, is highly available and dedicated to helping our customers.

Use of Open Source

All of our client-side software are open-sourced. We have no black boxes in our client-side code. The td command is comprised of three components. The source code repositories for each listed:

Use Cases

Single Customer View

Handle the most complex identity unification use cases, by using both a customer data pipeline tool and Treasure Data’s ID Unification. Treasure Data’s unified console gives you full visibility of your data, data sources, destinations, jobs, users and usage information.

Segmentation and Syndication

Start with the unified view of your data that Treasure Data provides. Machine learning makes sense of that data for application into your unique business rules. Through Treasure Data’s interface select your audience and specify segments.


Through batch or real-time segmentation – your choice – target audiences with personalized email and advertisement campaigns.

Building your IoT-data Application

You can use data gathered from IoT applications for your new product development.

Your Data Engineering and Science team can leverage TD’s processing capabilities to build applications against the data ingested at scale with TD.

Industry Support

Treasure Data’s technology supports a wide range of industries, including digital marketing, social media, social games, finance, and e-commerce.

Name Description
Gaming Better user acquisition, retention, and revenue.
e-Commerce From understanding your customers, to build own recommendation engines.
A/B Testing Comparing feature implementations between different sets of users.
Marketing Attribution Which contents work better in the customer journey?
Telematics Analyze and make sense of data coming from automotives.
Weblog Analysis Calculating daily unique user (UU) and page view (PV) numbers.
Twitter Analysis Who gets the most ReTweets for topic X?

What’s Next?

Read more about Treasure Data, including white papers and case studies, by going to our resources page.

Or, if you are ready to get started, refer to the your task of choice:

Last modified: May 19 2018 00:56:10 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.