Skip to content
Last updated

MariaDB Import Integration

This article describes how to use the data connector for MariaDB, which allows you to directly import data from MariaDB to Treasure Data.

Prerequisites

  • Basic knowledge of Treasure Data
  • Basic knowledge of MariaDB
  • Basic Treasure Data command line

Use Command Line

Install TD Toolbelt v0.11.9 or later

You can install the newest TD Toolbelt.

$ td --version

Create Configuration File

Prepare configuration file (for eg: seed.yml) with your MariaDB account access information, similar to the following example.

in:
  type: mysql
  host: "hostname"
  port: 3306
  user: "myuser"
  password: "mypassword"
  database: "mydatabasename"
  table: "mytablename"
  select: "*"
out:
  mode: replace

Guess Fields (Generate load.yml)

Use connector:guess. This command automatically reads the target file, and intelligently guesses the file format.

$ td connector:guess seed.yml -o load.yml

When you open load.yml, you see guessed file format definitions including, in some cases, file formats, encodings, column names, and types.

in:
  {type: mysql,host: "hostname",port: 3306,user: "myuser",password: "mypassword",  database: "mydatabasename", table: "mytablename",select: "*"}
filters: []
out: {mode: replace}
exec: {}

Preview data to import

You can preview data to be imported using the command td connector:preview.

$ td connector:preview maria_import.yml

Execute Load Job

Submit the load job. It may take a couple of hours depending on the data size. Users need to specify the database and table where their data are stored.

It is recommended to specify the --time-column option, because Treasure Data’s storage is partitioned by time (see also Data Partitioning).

If the option is not given, the data connector chooses the first long or timestamp column as the partitioning time. The type of the column specified by --time-column must be either of type long and timestamp.

If your data doesn’t have a time column you can add it using the add_time filter option. More details at add_time filter plugin

td connector:issue maria_import.yml \
--database td_sample_db \
--table td_mariadb_table \
--time-column created_at

The preceding command assumes that you have already created database(td_sample_db) and table(td_maria_table). If the database or the table does not exist in TD, this command will not succeed. Therefore, create the database and table manually or use the --auto-create-table option with the td connector:issue command to auto-create the database and table:

td connector:issue maria_import.yml \
--database td_sample_db \
--table td_mariadb_table \
--auto-create-table -w

Scheduled execution

You can schedule periodic Data Connector executions for periodic MariaDB imports. The load distribution and operation of Treasure Data’s scheduler is optimized to achieve high availability. By using Treasure Data’s scheduler, you no longer need a cron daemon on your local data center.

Create the schedule

A new schedule can be created using the td connector:create command. Specify the name of the schedule, the cron-style schedule, the database and table where the data will be stored, and the Data Connector configuration file. The values are required.

$ td connector:create \
    daily_mariadb_import \
    "10 0 * * *" \
    td_sample_db \
    td_mariadb_table \
    load.yml

The cron parameter also accepts these three options: @hourly, @daily and @monthly.

By default, schedule is setup in UTC timezone. You can set the schedule in a timezone using -t or —timezone option. The --timezone option supports only extended timezone formats like ‘Asia/Tokyo’, ‘America/Los_Angeles’ etc. Timezone abbreviations like PST, CST are not supported and might lead to unexpected schedules.