Skip to content
Last updated

Adobe Analytics Import Integration V2 with CLI

Import from Adobe Analytics using the CLI

Follow the steps bellow to set up the connector using CLI.

Install TD Toolbelt Command

You can install the newest Treasure Data Toolbelt.

td --version
0.17.1

Creating Your Configuration File

Prepare configuration file (for eg: load.yml) as shown in the following example, with your cloud storage service (S3) access information.
This example dumps Adobe Analytics data feed from S3:

  • s3_auth_type: Method to authentication with s3
  • access_key_id: Your S3 access key
  • secret_access_key: Your S3 secret key
  • bucket: Your S3 bucket
  • target: Target data to be ingest (data_feed_data or data_feed_lookup)
  • rs_id: Your report suite identification
in:
  type: adobe_analytics_v2
  storage_type: s3
  s3_auth_type: basic
  access_key_id: <your s3 access_key_id>
  secret_access_key: <your secret_access_key>
  region: global
  bucket: <your bucket>
  target: data_feed_lookup
  path_prefix: <your path prefix>
  rs_id: <your report suite id>
  modified_after: 2024-01-19T04:35:11Z
  incremental: true
filters:
- type: add_time
  to_column:
    name: time
    type: timestamp
  from_value:
    mode: upload_time
- type: rename    
  rules:
  - rule: upper_to_lower
  - rule: character_types
    pass_types: [ "a-z", "0-9" ]
    pass_characters: "_"
    replace: "_"  
out:
  type: td
  apikey: <td_api_key>
  endpoint: <td_endpoint>
  database: <database>
  table: <table>
  time_column: time
  mode: replace
  default_timestamp_format: '%d/%m/%Y'

Execute Load Job

Submit the load job. It may take a couple of hours depending on the data size. Users need to specify the database and table where their data are stored.

It is recommended to specify --time-column option, since Treasure Data’s storage is partitioned by time.

If the option is not given, the data connector selects the first long or timestamp column as the partitioning time. The type of the column specified by --time-column must be either of long and timestamp type. If your data doesn’t have a time column you can add it using add_time filter option. More details at add_time filter plugin.

td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column modifieddate

The preceding command assumes you have already created database(td_sample_db) and table(td_sample_table). If the database or the table does not exist in TD this command will not succeed. Therefore, create the database and table manually or use --auto-create-table option with td connector:issue command to auto create the database and table: You can assign Time Format column to the "Partitioning Key" by "--time-column" option.

$ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column modifieddate --auto-create-table 

Scheduling Your Execution

You can schedule periodic data connector execution for periodic Adobe Analytics import. We configure our scheduler carefully to ensure high availability. By using this feature, you no longer need a cron daemon on your local data center.

Create the Schedule

A new schedule can be created using the td connector:create command. The name of the schedule, cron-style schedule, the database and table where their data will be stored, and the Data Connector configuration file are required. The cron parameter also accepts these three options: @hourly, @daily and @monthly. By default, the schedule is setup in the UTC timezone. You can set the schedule in a timezone using -t or --timezone option. The --timezone option only supports extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.

td connector:create daily_adobe_analytics_v2_import "10 0 * * *" \
td_sample_db td_sample_table load.yml

List the Schedules

You can see the list of currently scheduled entries by td connector:list

$ td connector:list

Show the Setting and History of Schedules

Show the execution setting of a schedule entry by td connector:show

td connector:show daily_adobe_analytics_v2_importName 
daily_adobe_analytics_v2_importCron     : 10 0 * * *Timezone : UTCDelay    : 0Database : td_sample_dbTable    : td_sample_table 

Show the execution history of a schedule entry by td connector:history. To investigate the results of each individual execution, use td job jobid.

td connector:history daily_adobe_analytics_v2_import

Delete the Schedule

td connector:delete removes the schedule.

td connector:delete daily_adobe_analytics_v2_import 

Configurable Options

ParametersDescriptionTypeDefault valueNotes
typemust be adobe_analytics_v2StringRequired
storage_typeYour cloud storage service provider Currently support only s3Enum (s3)s3
s3_auth_typeS3 authentication methodEnum(basic, session, assume_role )basic
endpointYour S3 endpointStringIf set then it will be choose regardless of region
regionYour S3 regionEnum See complete list in Amazon Simple Storage Service endpoints and quotas - AWS General ReferenceglobalWill be use when enpoint is empty
access_key_idYou S3 Access Key IdStringRequired when using basic and session authentication
secret_access_keyYour S3 Secret KeyStringRequired when using basic and session authentication
session_tokenYour S3 Session TokenStringRequired when using session authentication
bucketYour S3 BucketStringRequired
targetData TargetEnum(data_feed_data and data_feed_lookup)data_feed_dataRequired
path_prefixPath prefix for location of data feedStringRequired
rs_idReport Suite IdStringRequired
incrementalIncremental loadingBooleantrue
modified_afterTime to start import data (exclusive)TimestampISO-8601 format
max_retryMaximum number of time to retryInteger7
initial_retry_waitTime to wait for the first retry (in seconds)Integer2
max_retry_waitMaximum time to wait for retry (in seconds)Integer120

Assume Role authentication is not configurable through the CLI. You can configure through re-using existing authentication. See Reuse the existing Authentication.

Sample Configurations

in:
  type: adobe_analytics_v2
  storage_type: s3
  s3_auth_type: basic
  access_key_id: <your s3 access_key_id>
  secret_access_key: <your secret_access_key>
  region: global
  bucket: <your bucket>
  target: data_feed_data
  path_prefix: <your path prefix>
  rs_id: <your report suite id>
  modified_after: 2024-01-19T04:35:11Z
  incremental: false
filters:
- type: add_time
  to_column:
    name: time
    type: timestamp
  from_value:
    mode: upload_time
- type: rename    
  rules:
  - rule: upper_to_lower
  - rule: character_types
    pass_types: [ "a-z", "0-9" ]
    pass_characters: "_"
    replace: "_"  
out:
  type: td
  apikey: <td_api_key>
  endpoint: <td_endpoint>
  database: <database>
  table: <table>
  time_column: time
  mode: replace
  default_timestamp_format: '%d/%m/%Y' 

Ingest Lookup Data

in:
  type: adobe_analytics_v2
  storage_type: s3
  s3_auth_type: basic
  access_key_id: <your s3 access_key_id>
  secret_access_key: <your secret_access_key>
  region: global
  bucket: <your bucket>
  target: data_feed_lookup
  path_prefix: <your path prefix>
  rs_id: <your report suite id>
  modified_after: 2024-01-19T04:35:11Z
  incremental: false

filters:
- type: add_time
  to_column:
    name: time
    type: timestamp
  from_value:
    mode: upload_time
- type: rename    
  rules:
  - rule: upper_to_lower
  - rule: character_types
    pass_types: [ "a-z", "0-9" ]
    pass_characters: "_"
    replace: "_"  

out:
  type: td
  apikey: <td_api_key>
  endpoint: <td_endpoint>
  database: <database>
  table: <table>
  time_column: time
  mode: replace
  default_timestamp_format: '%d/%m/%Y'

External Reference