Writing Job Results into Tableau Online

Using Tableau Online with Treasure Data allows you to interactively explore huge amounts of data, and share your data discoveries across your business organizations.



Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data
  • An license and its installation of Tableau Online

Usage

CLI: One Time Query

1. Non-legacy Mode (new and recommended)

Add the Tableau result output destination by using the -r / —result option for the td query command:

$ td query -d mydb -r '{"type":"tableau","host":"company.online.tableau.com","username":"my_user","password":"passw0rd","ssl":true,"ssl_verify":false,"server_version":"online","datasource":"my_ds","site":"MarketingTeam","project":"","mode":"replace","chunk_size_in_mb":50,"timezone":"PST"}' 'SELECT * FROM access'

where:

  • username: Your Tableau Online Username
  • password: Your Tableau Online Password
  • host: Tableau Online Host on which your site exists (e.g. 10ay.online.tableau.com)
  • datasource: target Tableau DataSource name
  • site: The URL of the site to sign in to
    • To determine the value to use for the site attribute, sign in to Tableau Online and examine the value that appears after /site/ in the URL. For example, in the following URL, the site value is MarketingTeam:
      • https://online.tableau.com/#/site/MarketingTeam/workbooks
    • Refer here for more details
  • chunk_size_in_mb: Chunk file size (in MB) to be uploaded each time, default: 200, min: 100, max: 1024
  • timezone: timezone to convert from Timestamp data type to Tableau DateTime data type, default: UTC

2. Legacy Mode (not recommended)

Change the Tableau result output destination from JSON format to URL format as following example:

$ td query -d mydb -r 'tableau://username:password@host/datasource?mode=replace' 'SELECT * FROM access'

3. Migrating Existing Output Configurations in order to prepare for Tableau Hyper

Tableau has updated its data engine with a technology called Hyper. We will support the new Hyper format soon, but in order to prepare for it, you must update your existing Treasure Data output configurations. New Treasure Data configurations automatically use the latest version.

To upgrade your existing configurations:

  1. Go to Queries and select your scheduled query.
  2. In Query Editor and click Output Results target.
  3. Choose your saved Tableau connection and uncheck the box Legacy mode.
  4. Input the Site ID value; it’s required for non-legacy mode.
  5. Click Done to save your configuration.

Console: One Time Query

Go to the query editor on the web console, and type in the query. The following example query uses the access log example data set, and calculates the distribution of HTTP method per day.

HIVE:

# HiveQL
SELECT
  CAST(TD_TIME_FORMAT(time, "yyyy-MM-dd 00:00:00") AS TIMESTAMP) AS `dates`,
  method AS `Method`,
  COUNT(1) AS `Count`
FROM
  www_access
GROUP BY
  TD_TIME_FORMAT(time, "yyyy-MM-dd 00:00:00"),
  method

PRESTO:

# Presto
SELECT
  CAST(TD_TIME_FORMAT(time, 'yyyy-MM-dd 00:00:00') AS TIMESTAMP) AS "dates",
  method AS "Method",
  COUNT(1) AS "Count"
FROM
  www_access
GROUP BY
  TD_TIME_FORMAT(time, 'yyyy-MM-dd 00:00:00'),
  method
Untitled-3
We're casting the Datetime column from String type to TIMESTAMP type, for convenience in Tableau.

Tableau doesn’t support fractional seconds in a timestamp. Remove fractional seconds (for example, by using subtr() function) before casting to the TIMESTAMP type in the query.



Choose saved connection

A dialog Choose Saved Connection will be displayed. Select an existing Tableau online connection. If you do not have a Saved Connection already setup, follow the next step on how to create a new connection within the Sources Catalog.



Create a new connection

Visit Treasure Data Connections search and select Tableau. The dialog below will open



  • Username: Your Tableau Online Username
  • Password: Your Tableau Online Password
  • Host: Tableau Online Host on which your site exists (e.g. 10az.online.tableau.com)

Additional configuration

After create new Tableau connection or select an existing one, you will see the additional Configuration popup.

1. Non-legacy Mode (new)



Parameters Description Default values
Datasource Name The name of TDE file on Tableau Online
Site ID The URL of the site to sign in to, it’s required for Tableau Online
Project Name Go to your Tableau Online to get a list of projects
Mode replace to replace TDE each time, append to append to existing TDE append
Chunk File Size In MB Extract File is split into chunks before uploading. This option defines file size of each chunk (min: 100, max: 1024) 200
Timezone Timezone ID to use when converting from Timestamp (timezone-independent) to Tableau DateTime (timezone-dependent) UTC

2. Legacy Mode (not recommended)



Untitled-3
The Legacy Mode will be deprecated soon, due to incompatibility with latest Tableau version. Migrate your existing scheduled jobs to use Non-legacy Mode for best compatibility.
Untitled-3
Append Mode ignores new column if exporting data inserted an existing data source because of the specification of the Tableau. https://onlinehelp.tableau.com/current/pro/desktop/en-us/extracting_addfromfile.html

After completing all the fields, Submit the query. The system will execute the query, create the TDE (Tableau Data Extract) file, and upload TDE to Tableau Online.

Go to your Tableau Online, and click Data Sources at the top left bar. You can view the list of data sources, including your TDE file.



Click New Workbook to create the charts and dashboard from the browser. Drag and drop the dimensions and measures from the left nav, to top right nav to create graphs. Then click Save to store the result.

Console: Scheduled Query

In the previous section, you created charts through one job. Next, you can periodically refresh the dataset by using the Scheduled Jobs feature. Using this feature, you execute a query on Treasure Data, and update TDE file on Tableau Server. You can specify schedules from the query editor by using TD_SCHEDULED_TIME() UDF, with append mode.

The following query calculates the # of records within last 24 hours, from the time that the query gets executed. By continuously running this scheduled query, you can avoid processing the entire data set every day.

HIVE:

# HiveQL
SELECT
  CAST(TD_TIME_FORMAT(time, "yyyy-MM-dd 00:00:00") AS TIMESTAMP) AS `dates`,
  method AS `Method`,
  COUNT(1) AS `Count`
FROM
  www_access
GROUP BY
  TD_TIME_FORMAT(time, "yyyy-MM-dd 00:00:00"),
  method

PRESTO:

# Presto
SELECT
  CAST(TD_TIME_FORMAT(time, 'yyyy-MM-dd 00:00:00') AS TIMESTAMP) AS "dates",
  method AS "Method",
  COUNT(1) AS "Count"
FROM
  www_access
GROUP BY
  TD_TIME_FORMAT(time, 'yyyy-MM-dd 00:00:00'),
  method

Options

Result output to Tableau supports various options. The options can be specified as URL parameters on the CLI or with the REST APIs or the Console where supported. The options are normally compatible with each other and can be combined. Where applicable, the default behavior is indicated.

ssl

The ssl option determines whether to use SSL for connecting to the Tableau server. When ‘true’, SSL is used. ssl=true is the default when this option is not specified.

tableau://username:password@host/?ssl=true

ssl_verify

The ssl_verify option determines whether to require certifcate verification for the SSL communication. When ‘true’, the certificate verification is required. ssl_verify=true is the default when this option is not specified.

tableau://username:password@host/?ssl=true&ssl_verify=true

Disabling certificate verification is useful when the Tableau server’s SSL certificate is self-signed.

Limitations

  • The maximum result record is 250,000,000 records. If it exceeded, the log outputs TDE file records limit exceeded: 250,000,000.
  • The lowest Timestamp value is 1000-01-01 00:00:00. If it exceeded, the log outputs invalid date value.

Last modified: Apr 10 2018 15:05:53 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.