This article describes how to use the data connector for MariaDB, which allows you to directly import data from MariaDB to Treasure Data.
This topic contains:
Basic knowledge of Treasure Data
Basic knowledge of MariaDB
Basic Treasure Data command line
Use Command Line
Install TD Toolbelt v0.11.9 or later
You can install the newest TD Toolbelt.
Create Configuration File
Prepare configuration file (for eg:
load.yml) with your MariaDB account access information, similar to the following example.
Guess Fields (Generate load.yml)
connector:guess. This command automatically reads the target file, and intelligently guesses the file format.
When you open
load.yml, you see guessed file format definitions including, in some cases, file formats, encodings, column names, and types.
Preview data to import
You can preview data to be imported using the command
Execute Load Job
Submit the load job. It may take a couple of hours depending on the data size. Users need to specify the database and table where their data are stored.
It is recommended to specify the
--time-column option, because Treasure Data’s storage is partitioned by time (see also Data Partitioning).
If the option is not given, the data connector chooses the first
timestamp column as the partitioning time. The type of the column specified by
--time-column must be either of type
If your data doesn’t have a time column you can add it using the
add_time filter option. More details at add_time filter plugin
The preceding command assumes that you have already created database(td_sample_db) and table(td_maria_table). If the database or the table does not exist in TD, this command will not succeed. Therefore, create the database and table manually or use the
--auto-create-table option with the
td connector:issue command to auto-create the database and table:
You can schedule periodic Data Connector executions for periodic MariaDB imports. The load distribution and operation of Treasure Data’s scheduler is optimized to achieve high availability. By using Treasure Data’s scheduler, you no longer need a cron daemon on your local data center.
Create the schedule
A new schedule can be created using the
td connector:create command. Specify the name of the schedule, the cron-style schedule, the database and table where the data will be stored, and the Data Connector configuration file. The values are required.
The `cron` parameter also accepts these three options: `@hourly`, `@daily` and `@monthly`.
By default, schedule is setup in UTC timezone. You can set the schedule in a timezone using -t or —timezone option. The
--timezone option supports only extended timezone formats like ‘Asia/Tokyo’, ‘America/Los_Angeles’ etc. Timezone abbreviations like PST, CST are not supported and might lead to unexpected schedules.
List the Schedules
You can see the list of scheduled entries by
Show the Setting and History of Schedules
td connector:show shows the execution setting of a schedule entry.
td connector:history shows the execution history of a schedule entry. To investigate the results of each individual execution, use
td job <jobid>.
Delete the Schedule
td connector:delete removes the schedule.