Visit our new documentation site! This documentation page is no longer updated.

Query Data Tanks from Presto

Use Presto, via TD Console, to query data stored in your Treasure Data Data Tank!

Table of Contents

Overview

This optional feature enables users of Data Tanks to run queries against their Data Tank instance directly from the Presto query engine!

Remind me, what is a Data Tank?

Data Tanks provide easy access to your aggregated metrics through convenient, fully hosted data marts on Treasure Data’s core platform. They can be used to drive a variety of external business intelligence and visualization applications without the hassle of manually hosting and maintaining your own PostgreSQL instances.

Data Tanks are PostgreSQL databases that use a columnar store to accelerate analytical queries. They are completely managed by Treasure Data on behalf of our customers. That means that we handle creation, setup, monitoring, management and troubleshooting so our users can just get their job done.

Benefits

This provides users the following key benefits:

  • Enables easier access to DataTank data via querying from Treasure Data Console, via Presto. Including the creation & deletion of tables.
  • More flexible processing flows by enabling data to be joined across DataTank & core TD tables.

How to Use

Data Querying (DML Queries)

SELECT query against table in datatank

select avg(close) from datatank.public.normal_nasdaq where symbol = 'AAPL';
       _col0
--------------------
 14.823083442853807
(1 row)

SELECT query that joins Data Tank and TD Core tables

-- sample_datasets.nasdaq is a TD table
-- sample_datasets. can be left out if you specify it at '-d' option in command-line or select it at the database dropdown list
select count(distinct time) 
  from datatank.public.normal_nasdaq dt_nasdaq JOIN sample_datasets.nasdaq td_nasdaq 
  ON dt_nasdaq.symbol = td_nasdaq.symbol AND dt_nasdaq.open = td_nasdaq.open AND dt_nasdaq.close = td_nasdaq.close 
where dt_nasdaq.symbol = 'AAPL';

CREATE TABLE AS in Data Tank from TD Table data

create table datatank.public.ms_nasdaq as select * from sample_datasets.nasdaq where symbol = 'MSFT' ;

INSERT INTO table in datatank

insert into datatank.public.ms_nasdaq (symbol, open, volume, high, low, close, time) values ('AAPL', 10, 100, 20, 5, 7, td_scheduled_time());

INSERT INTO table from TD Table

insert into datatank.public.ms_nasdaq select * from sample_datasets.nasdaq where symbol = 'AAPL' ;

Not supported:

  • UPDATE against table in Data Tank (Not supported in Presto). Use Result Output to Data Tank for this purpose.
  • DELETE against table in Data Tank (Not supported in Presto PostgreSQL connector)

Database Management Queries (DDL Queries)

List schemas

show schemas from datatank;
   Schema
--------------------
 information_schema
 pg_catalog
 public

List tables in schema

show tables from datatank.public ;
     Table
---------------
 normal_nasdaq

List columns of a table

desc datatank.public.normal_nasdaq;
 Column |  Type   | Comment
--------+---------+---------
 symbol | varchar |
 open   | double  |
 volume | bigint  |
 high   | double  |
 low    | double  |
 close  | double  |

DROP table

drop table datatank.public.ms_nasdaq;

Not Supported

  • CREATE TABLE in Data Tank (no supporting index creation, or otherwise.) Use Result Output to Data Tank for this purpose.

User Permissions

When this querying functionality is turned on, Account Owner & Admin users are given “Full Access” permissions.

All other users can be granted access at either the “Full Access” or “Query Only” level.

  • “Full Access” includes both DML & DDL query usage
  • “Query Only” includes only DML query usage

Administrators can grant these access levels in Team section of TD Console, by selecting the database named ‘datatank’. The database ‘datatank’ is only available if your account has had this feature turned on by request.

Key Details & Questions Answered

Getting Access to this Functionality

This feature can be accessed by any customers of our Data Tank add-on, at no additional cost.

Contact your account representative or Support to have this turned on.

Presto Query Processing Flow

The following steps describe a typical processing flow for a Presto query, joining data from a TD table with data from a Data Tank table:

  1. User writes a query in Presto (within the console)
  2. The presto query begins running
  3. The presto query generates & submits a sub-query to the DataTank PostgreSQL instance
  4. The Data Tank instance runs that query, and returns the results to the Presto cluster
  5. The Presto cluster finishes running the query, completing the join of the data from the core TD dataset and the data returned from the PostgreSQL query result
  6. The query result is displayed in the console to the user. It can be submitted to a result output job, or downloaded from the browser, just like the result of any other Presto query.

Can I issue queries that join data across TD core & Data Tank?

YES!

You can create queries that join TD core stored data and Data Tank stored data.

What happens when someone runs TD_TIME_RANGE(), or other TD specific UDFs, against data in Data Tank?

Presto & Treasure Data specific functions are run only once data is returned from the Data Tank instance. As such, TD_TIME_RANGE() or other functions are not “pushed down” the Data Tank query issued from Presto.

In general, we recommend not using such functions because it’s more performant for customers to run queries using PostgreSQL friendly syntax. By using PostgreSQL friendly syntax, much of the processing can be completed on the Data Tank instance.

If you do run a TD or Presto specific function, the query still runs, but it may be less performant than if the WHERE function is effectively “pushed down” into the PostgreSQL query issued from Presto.

How are Data Tanks convereted if the data type doesn’t exist in Treasure Data core? Like timestamp, float, varchar.

Presto as a query engine supports more datatypes than TD’s core tables. As such, when querying against PosgreSQL, our presto query engine is able to handle types such as timestamp. Full type support by Presto can be seen here: https://prestodb.io/docs/current/language/types.html

If you are joining data, you may need to run a type conversion function to ensure the TD datatype to align with the datatype you are joining from the DataTank instance. But, you may need to run a type conversion when joining against any other 2 tables as well.

Known Limitations

  • We currently do not support lists of tables or table preview functions in the query editor. You can still explore tables & preview tables by issuing queries against DataTank system tables & normal tables.
  • You cannot access tables that are using mixed case. Presto is a lowercase character system. Accessing case-sensitive systems like Postgres through Presto is a known issue.

Last modified: Apr 19 2018 16:21:03 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.