Use Treasure Data’s data.ai connector to import data.ai (formerly AppAnnie) data source objects into Treasure Data.

Prerequisites

  • Basic knowledge of Treasure Data

  • Basic knowledge of data.ai (formerly AppAnnie)

Rate Limits

There are 2 different rate limits in data.ai:

  • call per minute

  • call per user per day rate limits

The call-per-minute limit auto-refreshes after a certain amount of seconds, while the daily call limit is refreshed on a daily basis at 00:00 PST.

If you have multiple transfers under the same data.ai account, you can control the rate limit usage of each data.ai transfer via both calls_per_minute_limit and calls_per_day_limit settings as long as the total limits are lower than or equal to your account quota. For example, assume that your account has quotas as 100 calls/minute and 10000 calls/day, if you create 2 transfers, for example, product sales & product usage data, you can use 50 cpm & 5000 cpd for product sales transfer and the rest (50 cpm & 5000 cpd) for product usage transfer.

Use TD Console

  1. Create a New Connection

In Treasure Data, you must create and configure the data connection prior to running your query. As part of the data connection, you provide authentication to access the integration.

  1. Open TD Console.

  2. Navigate to Integrations HubCatalog.

  3. Click the search icon on the far right of the Catalog screen, and enter data.ai.
  4. Hover over the data.ai connector and select Create Authentication.



      The following dialog opens.

The method you use to authenticate Treasure Data with data.ai affects the steps you take to enable the data connector to import from data.ai.

 Treasure Data supports:

  • API Key

  • OAuth  

Using Credentials to Authenticate

Provide your data.ai API key information and select Continue.

Using OAuth  to Authenticate

OAuth is only available in the US region.

  1. Select “OAuth” from the “Authentication Method” dropdown to connect your data.ai account using OAuth 2.
  2. When you select the OAuth authentication method, select Click here to connect a new account. Log into your data.ai account from the new window:
  3. And grant access to the Data Connector and Treasure Data app:
  4. You will be redirected back to the Catalog. Repeat the Create a new connection step and choose your new OAuth connection.


When you complete the connection form, select Continue and give your connection a name:

Create a New Transfer

After creating the connection, you are automatically taken to the Authentications tab. Look for the connection you created and select New Transfer.

The following dialog opens. Complete the details and select Next.

Next, you see a Preview of your data similar to the following dialog. If you want to change any settings, select Advanced Settings otherwise, select Next.

 If you want to change some options, such as skipping on errors or rate limits, you do so in Advanced Settings:

Select the database and table where you want to transfer the data, as shown in the following dialog:

Specify the schedule of the data transfer using the following dialog and select Start Transfer:

 You see the new data transfer in progress listed under the My Input Transfers tab and a corresponding job is listed in the Jobs section.

Use Command Line

Install ‘td’ command v0.11.9 or later

You can install the newest TD Toolbelt.

$ td --version
0.15.0

Create Configuration File

Prepare a configuration file (for eg: load.yml) with your data.ai account access information, as follows:

in:
  type: app_data.ai
  apikey: xxxxxxxx
  target: product_sales (required, see Appendix B)
  breakdown_sales: date+country+iap (optional, see Appendix C)
  fetch_type: shared_products (optional, default: `both`, see Appendix D)
  start_date: 2017-01-01 (optional but required here as breakdown contains `iap`) 
  end_date: 2017-02-01 (optional, default: current date)
  currency: USD (optional, default: USD, see Appendix E)
  skip_on_invalid_records: true (optional, default: false)
  calls_per_minute_limit: 15 (optional, 30 by default, see Appendix F)
  calls_per_day_limit: 800 (optional, 1000 by default, see Appendix F)
out:
  mode: replace

This example dumps data.ai Account Data Source:

  • apikey: data.ai apiKey.

  • target: data.ai entity object to be imported.

  • breakdown: Breakdown type for which product sale or usage data is fetched.

    • This field name is changed according to which target is selected, either breakdown_sales or breakdown_usage.

    • See Appendix: Available breakdowns for usage and the list of available breakdowns.

  • fetch_type: The source of products to pull (any products from connected accounts or via sharing or both) to be imported.

  • start_date: From which date (yyyy-MM-dd) product data is to be imported. This field is required if either fetching product usage (target is product_usage) or product sale (target is product_sales) with an in-app-purchase breakdown (breakdown has iap).

  • end_date: Until what date (yyyy-MM-dd) product data is to be imported? This field is optional and is automatically adjusted to a maximum of 60 days since start_date based on the current date.

  • currency: The monetary currency in which the data is presented.

  • skip_on_invalid_records: Ignore errors (such as invalid JSON, unsupported data) and continue fetching record. (false by default)

  • calls_per_minute_limit / calls_per_day_limit: Limit number of API calls per minute / per day

For more details on available out modes, see the Appendix: Modes for out Plugin


Optionally, Preview Data to Import

You can preview data to be imported using the command td connector:preview.

$ td connector:preview load.yml
+-----------------+---------------------+-----------------+----
| account_id:long | account_name:string | vertical:string | ...
+-----------------+---------------------+-----------------+----
| 42023           | "Hello"             | apps            |
| 42045           | "World"             | apps            |
+-----------------+---------------------+-----------------+----

Execute Load Job

Submit the load job. It may take a couple of hours depending on the data size. Users need to specify the database and table where their data are stored.

It is recommended to specify --time-column option, since Treasure Data’s storage is partitioned by time (see also architecture) If the option is not given, the Data Connector will choose the first long or timestamp column as the partitioning time. The type of the column specified by --time-column must be either of long or timestamp type.

If your data doesn’t have a time column you can add it using the add_time filter option. More details at add_time Filter Plugin for Integrations.

$ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column updated_date

The preceding command assumes that you have already created database(td_sample_db) and table(td_sample_table). If the database or the table do not exist in TD, this command will not succeed, so create the database and table manually or use --auto-create-table option with td connector:issue command to auto create the database and table:

$ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column updated_date --auto-create-table

You can assign the Time Format column to the "Partitioning Key" by the "--time-column" option.


Scheduled Execution

You can schedule periodic Data Connector execution for periodic data.ai import. We configure our scheduler carefully to ensure high availability. By using this feature, you no longer need a cron daemon on your local data center.

Create the Schedule

A new schedule can be created using the td connector:create command. The name of the schedule, cron-style schedule, the database and table where their data will be stored, and the Data Connector configuration file are required.

$ td connector:create \
    daily_dataai_import \
    "10 0 * * *" \
    td_sample_db \
    td_sample_table \
    load.yml

The `cron` parameter also accepts these three options: `@hourly`, `@daily`, and `@monthly`.

By default, the schedule is set up in the UTC timezone. You can set the schedule in a timezone using -t or --timezone option. The `--timezone` option supports only extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles', etc. Timezone abbreviations like PST and CST are *not* supported and may lead to unexpected schedules.

List the Schedules

You can see the list of scheduled entries by td connector:list.

$ td connector:list
+-----------------------+--------------+----------+-------+--------------+-----------------+-----------------------------+
| Name                  | Cron         | Timezone | Delay | Database     | Table           | Config                      |
+-----------------------+--------------+----------+-------+--------------+-----------------+-----------------------------+
| daily_dataai_import | 10 0 * * *   | UTC      | 0     | td_sample_db | td_sample_table | {"type"=>"app_annie", ... } |
+-----------------------+--------------+----------+-------+--------------+-----------------+-----------------------------+

Show the Setting and History of Schedules

td connector:show displays the execution setting of a schedule entry.

% td connector:show daily_dataai_import
Name     : daily_dataai_import
Cron     : 10 0 * * *
Timezone : UTC
Delay    : 0
Database : td_sample_db
Table    : td_sample_table

td connector:history shows the execution history of a schedule entry. To investigate the results of each individual execution, use td job <jobid>.

% td connector:history daily_dataai_import
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| JobID  | Status  | Records | Database     | Table           | Priority | Started                   | Duration |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| 578066 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-18 00:10:05 +0000 | 160      |
| 577968 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-17 00:10:07 +0000 | 161      |
| 577914 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-16 00:10:03 +0000 | 152      |
| 577872 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-15 00:10:04 +0000 | 163      |
| 577810 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-14 00:10:04 +0000 | 164      |
| 577766 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-13 00:10:04 +0000 | 155      |
| 577710 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-12 00:10:05 +0000 | 156      |
| 577610 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-11 00:10:04 +0000 | 157      |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
8 rows in set

Delete the Schedule

td connector:delete removes the schedule.

$ td connector:delete daily_dataai_import

Appendix


Modes for Out Plugin

You can specify file import mode in the out section of load.yml.

append (default)

The following examples show the default mode and records are appended to the target table.

in:
  ...
out:
  mode: append

replace (In td 0.11.10 and later)

This mode replaces data in the target table. Any manual schema changes made to the target table remain intact with this mode.

in:
  ...
out:
  mode: replace

Available targets

Target

Description

account_connections

Connected accounts

connected_products

Products from connected accounts

shared_products

Shared products from external accounts

product_sales

Product sales data

product_usage

Product usage data

app_details

Application details


Available Breakdowns

This field is available for importing only product sales or product usages.

  • If the target is product_sales, the breakdown field name is breakdown_sales

  • If the target is product_usage, the breakdown field name is breakdown_usage

Breakdown

Product Sales

Product Usage

country

x

x

country+iap

x


country+device


x

date

x

x

date+country

x

x

date+country+device


x

date+country+iap

x


date+device


x

date+iap

x


date+type+iap

x


device


x

iap

x


Available Fetch Types

This field is available for importing product sales, product usage and app details.

Source

Description

connected_products

Import only data of products from connected accounts

shared_products

Import only data of products from sharing list

both

Import both product sources

Available Currencies

This field is available for importing only product sales. Contact data.ai support for more detail if needed.

Currency Code

Symbol

Full Name of Currency

AUD

A$

Australian Dollar

BGN

> лв

Bulgarian lev

BRL

R$

Brazilian real

CAD

C$

Canadian Dollar

CHF

CHF

Swiss Franc

CNY

¥

Chinese Yuan

CZK

Czech koruna

DKK

kr

Danish krone

EEK

kr

Estonian kroon

EUR

Euro

GBP

£

Pound sterling

HKD

HK$

Hong Kong dollar

HRK

kn

Croatian kuna

HUF

Ft

Hungarian forint

IDR

Rp

Indonesian rupiah

ILS

Israeli new shekel

INR

Indian rupee

JPY

¥

Japanese yen

KRW

South Korean won

LTL

Lt

Lithuanian litas

LVL

Ls

Latvian lats

MXN

Mex$

Mexican peso

MYR

RM

Malaysian ringgit

NOK

kr

Norwegian krone

NZD

$

New Zealand dollar

PHP

Philippine peso

PLN

Polish złoty

RON

lei

Romanian new leu

RUB

p.

Russian rouble

SEK

kr

Swedish krona/kronor

SGD

S$

Singapore dollar

THB

฿

Thai baht

TRY

TL

Turkish lira

TWD

NT$

New Taiwan dollar

USD

$

United States dollar

ZAR

R

South African rand




  • No labels