Learn more about Amplitude Import Integration.

Use queries to create Amplitude Events from the data you store in Treasure Data.


Prerequisites

  • Basic knowledge of Treasure Data, including the TD Toolbelt.

  • An Amplitude account that can grant permissions to Treasure Data.

Create an Amplitude Account

If you do not have an Amplitude account, go to https://amplitude.com and register. Then, create an organization and a project. When you create a project, you receive an API Key for the project.


Amplitude User Activity

In the Amplitude console, you can view User Activity to explore Amplitude events:

Usage

Go to the TD Console, and Queries and select New Query to access the query editor. You can pick the query language and write your query. Select Output results.


Create or Select an Existing Amplitude Connection

Option

Task Steps

Tip

Create a connector

- Go to Connections.
- Select on the Amplitude icon from Sources Catalog.
- Complete the required fields on the first pane.
- In the API Key field, enter the key that you use in Amplitude.
- In the Secret Key field, delete the default entry and leave the field empty.

Next you complete the Configuration fields.

Sometimes you need to define the column mapping before writing the query.

Select a connector

- Go to Treasure Data console.
- Go to Query Editor.
- Access the query that you plan to use to export data.
- Select Output results. The Choose Saved Connection dialog opens.
- Type the connection name in the search box to filter and select the connection that you want.

Next you complete the Configuration fields.

You can also create a new connection from here by selecting on Create New Connection.

Creating a new connection:

You must enter your API Key. For output to Amplitude, the `Secret Key` field must be empty. If necessary, delete the default entry the `Secret Key` field.

Selecting an existing connection:


Optional: Complete Additional Configuration

After creating a new amplitude connection or selecting an existing one, you see the Additional configuration popup, for example:


Parameters

Description

Default values

Disable auto-generation of Insert IDs for deduplication

By default, TD automatically generates the insert_ids for deduplication. See Event Deduplication for details.

Unchecked


Write a Quick Query to Populate Amplitude Events

In the following example, the Amplitude project has no events before the output of a query result:


In Treasure Data, run the following query with Amplitude as Result Export:

SELECT user_id, device_id, event_type, event_properties, country, time FROM (
  VALUES ('a@gmail.com', 'f925065b-4d56-486f-a2be-e97952e62925', 'game_open', '{"load_time":0.0619166532,"dates":["sunday","monday"],"source":"notification"}', 'United States', 1515083919310),
         ('b@gmail.com', 'b36c8c80-71a0-4de4-8bcb-c801f5c1c6c3', 'select_avatar', '{"load_time":0.0517593568,"dates":["friday","sunday"],"source":"notification"}]', 'United States', 1515083943092),
         ('c@gmail.com', 'e44a1024-c4bb-4eef-a02a-f2a7b3bf68b3', 'watch_tutorial', '{"load_time":0.0689998562,"dates":["sunday","monday"],"source":"notification"}', 'United States', 1515083970894),
         ('d@gmail.com', '8274fee1-7222-4e41-83ed-fb5521d5f4b8', 'game_start', '{"load_time":0.0976384392,"dates":["friday","monday"],"source":"notification"}', 'United States', 1515084000948),
         ('e@gmail.com', '29ba193d-0955-4084-9640-bb4d8ba9a23f', 'ad_clicked', '{"load_time":0.0708434955,"dates":["monday","saturday"],"source":"notification"}', 'United States', 1515084024759)
)  tbl (user_id, device_id, event_type, event_properties, country, time)

The preceding query requires no source table, which makes testing easy, but you still must choose a database. Pick “sample_datasets” or any other arbitrary table. Also, make sure that Presto is chosen as the SQL dialect.

The query completes in a few seconds. Log into your Amplitude account and check User Activity. It should look similar to this:

All columns or aliases included in your query must use the names and data types as described in the Amplitude API specification.

Combine Multiple Values into a Single JSON Column Value

If you have multiple values, you can combine them into a single JSON column value as follows:

SELECT CAST(MAP(ARRAY['source', 'status', 'action'], ARRAY['notification', 'pending', 'add']) AS JSON) AS event_properties

You create a simple JSON object for event_properties: ‘{“action”:“add”,“source”:“notification”,“status”:“pending”}’.

You can create a more complex column, as follows:

SELECT CAST(MAP(ARRAY['load_time', 'source', 'dates'], ARRAY[CAST (0.8371 AS JSON), CAST ('notification' AS JSON), CAST(ARRAY['monday', 'tuesday'] AS JSON)]) AS JSON) AS event_properties

You create the following JSON object as the event_properties value: ‘{“dates”:[“monday”,“tuesday”],“load_time”:0.8371,“source”:“notification”}’.

Write a Query to a TD Table to Populate Amplitude Events

An example of a TD database table that can be used to create records for Amplitude is as follows:

In the Treasure Data console, go to query editor and enter a query, for example:

SELECT user_id, device_id, event_type, event_properties, country, event_time AS time FROM events

Select Output results to send the data to Amplitude. Next, you select an existing connector, or create a new connection using your api_key.

When the connector has been selected or created, in the query editor, select Run to begin the data transfer.

Event Deduplication

It is highly recommended that an insert_id be sent with each event to prevent duplicate events from being saved by Amplitude. The insert_id is a unique identifier an event. Amplitude drops events that have the same insert_id if the IDs are sent within 7 days of each other.

Disable Auto-Generation of Insert IDs

If you do not select to disable auto-generation of insert IDs, then TD automatically generates a SHA256 insert_id for each record.

If you select to disable auto-generation of insert IDs, then insert_id’s must be provided by the user.

Amplitude Columns

The following table lists event columns and data types supported by Amplitude.

Standard Columns

Key

Description

user_id
string

(required unless device_id is present) A readable ID specified by you.
“datamonster@gmail.com”

device_id
string

(required unless user_id is present) A device specific identifier, such as the Identifier for Vendor on iOS.
“C8F9E604-F01A-4BD9-95C6-8E5357DF265D”

event_type
string

(required) A unique identifier for your event.
“watch_tutorial”

time
long

The timestamp of the event in milliseconds since epoch. It will be set to the upload time by default.
1396381378123

event_properties4
json

{“load_time”: 0.8371, “source”: “notification”, “dates”: [“monday”, “tuesday”]}

user_properties4
json

{“age”: 25, “gender”: “female”, “interests”: [“chess”, “football”, “music”]}

groups4
json

(Enterprise only)
{“company_id”: “1”, “company_name”:[“Amplitude”, “DataMonster”]}

app_version

The version of your application the user is on.
“2.1.3”

platform1,2
string

Platform of the device.
“iOS”, “Android”, or “Web”

os_name1,2
string

The mobile operating system or browser the user is on.
“iOS”, “Android”, “Chrome”

os_version1,2
string

The version of the mobile operating system or browser the user is on.
“8.1”, “4.2.2”, “37”

device_brand1,2
string

The device brand the user is on.
“Verizon”

device_manufacturer1,2
string

The device manufacturer the user is on.
“Samsung”, “Asus”, “Apple”

device_model1,2
string

The device model the user is on.
“Mac”, “iphone 9,1”, “sm-g30f”

carrier1,2
string

Carrier of the device.
“Verizon”

country1,3
string

The country the user is in.
“United States”

region1,3
string

The geographical region the user is in.
“California”

city1,3
string

What city the user is in.
“San Francisco”

dma1,3
string

The Designated Market Area of the user.
“San Francisco-Oakland-San Jose, CA”

language1
string

What language the user has set.
“English”

price5
float

(required for revenue data if revenue is not sent) The price of the item purchased. You can use negative values to indicate refunds.
4.99, -1.99

quantity5
integer

(required for revenue data, defaults to 1 if not specified) The quantity of the item purchased.
1,2

revenue5
float

revenue = price * quantity

If you send all three fields of price, quantity, and revenue, then (price * quantity) will take precedence and be the revenue value. You can use negative values to indicate refunds.
4.99, -1.99

productId5
string

An identifier for the product.

Note: You must send a price and quantity with this field.
“Google Play Store Product Id”, “Medium Bundle”

revenueType5
string

Type of revenue.
Note: You must send a price and quantity with this field.
“tax”, “refund”

location_lat
float

Latitude of the user.
37.77

location_lng
float

Longitude of the user.
-122.39

ip1
string

IP address of the user. Use “$remote” to use your server’s IP address. “127.0.0.1”

idfa
string

(iOS) Identifier for Advertiser.
“AEBE52E7-03EE-455A-B3C4-E57283966239”

idfv
string

(iOS) Identifier for Vendor.
“BCCE52E7-03EE-321A-B3D4-E57123966239”

adid
string

(Android) Google Play Services advertising ID (AdID).
“AEBE52E7-03EE-455A-B3C4-E57283966239”

View footnotes for 1,2,3,4,5


Optional Columns

Optional Amplitude Specific Key

Description

event_id
int

An incrementing counter to distinguish events with the same user_id and timestamp from each other. Note: We recommend you send an event_id, increasing over time, especially if you suspect any events to occur simultaneously. An event_id is not required to send a session_id though.
1

session_id
long

The start time of the session is in milliseconds since it uses epoch (Unix Timestamp), which is necessary if you want to associate events with a particular session (a session_id of -1 implies that you not sending a session_id, and so no session metrics will be tracked). Learn more about tracking sessions in Amplitude.
1396381378123

insert_id
string

A unique identifier for the event being inserted; we will deduplicate events with the same insert_id sent within 7 days of each other. Some combination of device_id, user_id, session_id, event_type, and event_id or time, would likely serve as a sufficient insert_id value.
“f47ac10b-58cc-4372-a567-0e02b2c3d479”

  • No labels