This article describes how to use the data connector for MariaDB, which allows you to directly import data from MariaDB to Treasure Data.

This topic contains:

Prerequisites

  • Basic knowledge of Treasure Data

  • Basic knowledge of MariaDB

  • Basic Treasure Data command line

Use Command Line

Install TD Toolbelt v0.11.9 or later

You can install the newest TD Toolbelt.

$ td --version0.15.0

Create Configuration File

Prepare configuration file (for eg: load.yml) with your MariaDB account access information, similar to the following example.

in:   type: mysql   host: "hostname"   port: 3306   user: "myuser"   password: "mypassword"   database: "mydatabasename"   table: "mytablename"   select: "*"out: mode: replace

Guess Fields (Generate load.yml)

Use connector:guess. This command automatically reads the target file, and intelligently guesses the file format.

$ td connector:guess seed.yml -o load.yml

When you open load.yml, you see guessed file format definitions including, in some cases, file formats, encodings, column names, and types.

in: {type: mysql     host: "hostname"     port: 3306     user: "myuser"     password: "mypassword"     database: "mydatabasename"     table: "mytablename"     select: "*"}filters: []out: {mode: replace}exec: {}

Preview data to import

You can preview data to be imported using the command td connector:preview.

$ td connector:preview maria_import.yml +-----------+-------------------+-------------------------+----------------+-----------+---------------------------------------------------------------------------------------+-----------+---------------+------------+ | user:long | host:string       | path:string             | referer:string | code:long | agent:string                                                                          | size:long | method:string | time:long  | +-----------+-------------------+-------------------------+----------------+-----------+---------------------------------------------------------------------------------------+-----------+---------------+------------+ | nil       | "123.456.789.00"   | "/category/toys"        |    "-"         |   200     | "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0.1) Gecko/20100101 Firefox/9.0.1" | 127       | "GET"         | 1412362794 | 

Execute Load Job

Submit the load job. It may take a couple of hours depending on the data size. Users need to specify the database and table where their data are stored.

It is recommended to specify the --time-column option, because Treasure Data’s storage is partitioned by time (see also Data Partitioning).

If the option is not given, the data connector chooses the first long or timestamp column as the partitioning time. The type of the column specified by --time-column must be either of type long and timestamp.

If your data doesn’t have a time column you can add it using the add_time filter option. More details at add_time filter plugin

$ td connector:issue maria_import.yml --database td_sample_db --table td_mariadb_table --time-column created_at

The preceding command assumes that you have already created database(td_sample_db) and table(td_maria_table). If the database or the table does not exist in TD, this command will not succeed. Therefore, create the database and table manually or use the --auto-create-table option with the td connector:issue command to auto-create the database and table:

$ td connector:issue maria_import.yml --database td_sample_db --table td_mariadb_table --auto-create-table -w

Scheduled execution

You can schedule periodic Data Connector executions for periodic MariaDB imports. The load distribution and operation of Treasure Data’s scheduler is optimized to achieve high availability. By using Treasure Data’s scheduler, you no longer need a cron daemon on your local data center.

Create the schedule

A new schedule can be created using the td connector:create command. Specify the name of the schedule, the cron-style schedule, the database and table where the data will be stored, and the Data Connector configuration file. The values are required.

$ td connector:create \
    daily_mariadb_import \
    "10 0 * * *" \
    td_sample_db \
    td_mariadb_table \
    load.yml

The `cron` parameter also accepts these three options: `@hourly`, `@daily` and `@monthly`.

By default, schedule is setup in UTC timezone. You can set the schedule in a timezone using -t or —timezone option. The --timezone option supports only extended timezone formats like ‘Asia/Tokyo’, ‘America/Los_Angeles’ etc. Timezone abbreviations like PST, CST are not supported and might lead to unexpected schedules.

List the Schedules

You can see the list of scheduled entries by td connector:list.

$ td connector:list
+-----------------------+--------------+----------+-------+--------------+-----------------+----------------------------+
| Name                  | Cron         | Timezone | Delay | Database     | Table           | Config                     |
+-----------------------+--------------+----------+-------+--------------+-----------------+----------------------------+
| daily_mariadb_import  | 10 0 * * *   | UTC      | 0     | td_sample_db | td_mariadb_table | {"type"=>"mariadb", ... }  |
+-----------------------+--------------+----------+-------+--------------+-----------------+----------------------------+

Show the Setting and History of Schedules

td connector:show shows the execution setting of a schedule entry.

% td connector:show daily_mariadb_import
Name     : daily_mariadb_import
Cron     : 10 0 * * *
Timezone : UTC
Delay    : 0
Database : td_sample_db
Table    : td_mariadb_table

td connector:history shows the execution history of a schedule entry. To investigate the results of each individual execution, use td job <jobid>.

% td connector:history daily_mariadb_import
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| JobID  | Status  | Records | Database     | Table           | Priority | Started                   | Duration |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| 578066 | success | 10000   | td_sample_db | td_mariadb_table | 0        | 2019-08-28 00:10:05 +0000 | 160      |
| 577968 | success | 10000   | td_sample_db | td_mariadb_table | 0        | 2019-08-28 00:10:07 +0000 | 161      |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
2 rows in set

Delete the Schedule

td connector:delete removes the schedule.

$ td connector:delete daily_mariadb_import



  • No labels