Follow the steps bellow to set up the connector using CLI.
You can install the newest Treasure Data Toolbelt.
td --version
0.17.1Prepare configuration file (for eg: load.yml) as shown in the following example, with your cloud storage service (S3) access information.
This example dumps Adobe Analytics data feed from S3:
s3_auth_type: Method to authentication with s3access_key_id: Your S3 access keysecret_access_key: Your S3 secret keybucket: Your S3 buckettarget: Target data to be ingest (data_feed_dataordata_feed_lookup)rs_id: Your report suite identification
in:
type: adobe_analytics_v2
storage_type: s3
s3_auth_type: basic
access_key_id: <your s3 access_key_id>
secret_access_key: <your secret_access_key>
region: global
bucket: <your bucket>
target: data_feed_lookup
path_prefix: <your path prefix>
rs_id: <your report suite id>
modified_after: 2024-01-19T04:35:11Z
incremental: true
filters:
- type: add_time
to_column:
name: time
type: timestamp
from_value:
mode: upload_time
- type: rename
rules:
- rule: upper_to_lower
- rule: character_types
pass_types: [ "a-z", "0-9" ]
pass_characters: "_"
replace: "_"
out:
type: td
apikey: <td_api_key>
endpoint: <td_endpoint>
database: <database>
table: <table>
time_column: time
mode: replace
default_timestamp_format: '%d/%m/%Y'Submit the load job. It may take a couple of hours depending on the data size. Users need to specify the database and table where their data are stored.
It is recommended to specify --time-column option, since Treasure Data’s storage is partitioned by time.
If the option is not given, the data connector selects the first long or timestamp column as the partitioning time. The type of the column specified by --time-column must be either of long and timestamp type. If your data doesn’t have a time column you can add it using add_time filter option. More details at add_time filter plugin.
td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column modifieddateThe preceding command assumes you have already created database(td_sample_db) and table(td_sample_table). If the database or the table does not exist in TD this command will not succeed. Therefore, create the database and table manually or use --auto-create-table option with td connector:issue command to auto create the database and table: You can assign Time Format column to the "Partitioning Key" by "--time-column" option.
$ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column modifieddate --auto-create-table You can schedule periodic data connector execution for periodic Adobe Analytics import. We configure our scheduler carefully to ensure high availability. By using this feature, you no longer need a cron daemon on your local data center.
A new schedule can be created using the td connector:create command. The name of the schedule, cron-style schedule, the database and table where their data will be stored, and the Data Connector configuration file are required. The cron parameter also accepts these three options: @hourly, @daily and @monthly. By default, the schedule is setup in the UTC timezone. You can set the schedule in a timezone using -t or --timezone option. The --timezone option only supports extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.
td connector:create daily_adobe_analytics_v2_import "10 0 * * *" \
td_sample_db td_sample_table load.ymlYou can see the list of currently scheduled entries by td connector:list
$ td connector:listShow the execution setting of a schedule entry by td connector:show
td connector:show daily_adobe_analytics_v2_importName daily_adobe_analytics_v2_importCron : 10 0 * * *Timezone : UTCDelay : 0Database : td_sample_dbTable : td_sample_table Show the execution history of a schedule entry by td connector:history. To investigate the results of each individual execution, use td job jobid.
td connector:history daily_adobe_analytics_v2_importtd connector:delete removes the schedule.
td connector:delete daily_adobe_analytics_v2_import | Parameters | Description | Type | Default value | Notes |
|---|---|---|---|---|
| type | must be adobe_analytics_v2 | String | Required | |
| storage_type | Your cloud storage service provider Currently support only s3 | Enum (s3) | s3 | |
| s3_auth_type | S3 authentication method | Enum(basic, session, assume_role ) | basic | |
| endpoint | Your S3 endpoint | String | If set then it will be choose regardless of region | |
| region | Your S3 region | Enum See complete list in Amazon Simple Storage Service endpoints and quotas - AWS General Reference | global | Will be use when enpoint is empty |
| access_key_id | You S3 Access Key Id | String | Required when using basic and session authentication | |
| secret_access_key | Your S3 Secret Key | String | Required when using basic and session authentication | |
| session_token | Your S3 Session Token | String | Required when using session authentication | |
| bucket | Your S3 Bucket | String | Required | |
| target | Data Target | Enum(data_feed_data and data_feed_lookup) | data_feed_data | Required |
| path_prefix | Path prefix for location of data feed | String | Required | |
| rs_id | Report Suite Id | String | Required | |
| incremental | Incremental loading | Boolean | true | |
| modified_after | Time to start import data (exclusive) | Timestamp | ISO-8601 format | |
| max_retry | Maximum number of time to retry | Integer | 7 | |
| initial_retry_wait | Time to wait for the first retry (in seconds) | Integer | 2 | |
| max_retry_wait | Maximum time to wait for retry (in seconds) | Integer | 120 |
Assume Role authentication is not configurable through the CLI. You can configure through re-using existing authentication. See Reuse the existing Authentication.
in:
type: adobe_analytics_v2
storage_type: s3
s3_auth_type: basic
access_key_id: <your s3 access_key_id>
secret_access_key: <your secret_access_key>
region: global
bucket: <your bucket>
target: data_feed_data
path_prefix: <your path prefix>
rs_id: <your report suite id>
modified_after: 2024-01-19T04:35:11Z
incremental: false
filters:
- type: add_time
to_column:
name: time
type: timestamp
from_value:
mode: upload_time
- type: rename
rules:
- rule: upper_to_lower
- rule: character_types
pass_types: [ "a-z", "0-9" ]
pass_characters: "_"
replace: "_"
out:
type: td
apikey: <td_api_key>
endpoint: <td_endpoint>
database: <database>
table: <table>
time_column: time
mode: replace
default_timestamp_format: '%d/%m/%Y' in:
type: adobe_analytics_v2
storage_type: s3
s3_auth_type: basic
access_key_id: <your s3 access_key_id>
secret_access_key: <your secret_access_key>
region: global
bucket: <your bucket>
target: data_feed_lookup
path_prefix: <your path prefix>
rs_id: <your report suite id>
modified_after: 2024-01-19T04:35:11Z
incremental: false
filters:
- type: add_time
to_column:
name: time
type: timestamp
from_value:
mode: upload_time
- type: rename
rules:
- rule: upper_to_lower
- rule: character_types
pass_types: [ "a-z", "0-9" ]
pass_characters: "_"
replace: "_"
out:
type: td
apikey: <td_api_key>
endpoint: <td_endpoint>
database: <database>
table: <table>
time_column: time
mode: replace
default_timestamp_format: '%d/%m/%Y'- Data feed overview: Analytics Data Feed overview | Adobe Analytics
- How user create a data feed: Create a data feed | Adobe Analytics
- You can find region and endpoint information from the AWS service endpoints document.