The Data Connector for MongoDB enables importing documents (records) stored in your MongoDB server, to Treasure Data.
Continue to the following topics:
Basic knowledge of Treasure Data
Configure the Connection
You can create an instance of the MongoDB data connector from the TD Console. Select create on the MongoDB connector tile.
Create a New MongoDB Connector
Enter the required credentials for your MongoDB instance. Set the following parameters.
Auth method: Auth method to authenticate.
If you choose "Auto", The connector negotiates the best mechanism based on the version of the server that the connector is authenticating to.
If the server version is 3.0 or higher, the driver authenticates using the SCRAM-SHA-1 mechanism.
Otherwise, the driver authenticates using the MONGODB_CR mechanism.
Auth source: The database name where the user is defined.
Username: Username to connect to the remote database.
Password: Password to connect to the remote database.
Hostname: The hostname or IP address of the remote Server. (You can add more than one IP address, depending on your MongoDB setup.)
Port: Port number of the remote server (Default is 27017).
Select Continue after entering the required connection details. Name the connection so you can find it later if you need to modify any of the connection details. If you would like to share this connection with other users in your organization, select Share with others. If this box is unchecked, this connection is visible to only you.
Select Create Connection to complete the connection. If the connection is a success, then the connection you just created appears in your list of connections with the name you provided.
Transfer Data into Treasure Data
After creating the connection to your remote database, you can import the data from your database into Treasure Data. You can set up an ad hoc one-time transfer or a recurring transfer at regular intervals.
Enter Database Details (Fetch From)
Provide the details of the database and table from which you want to ingest data.
Database name: The name of the database from which you are transferring data. (for example,
Collection Name: The name of the collection from which you are transferring data.
JSON Query: Specifies records to return
JSON Projection: Specifies fields to return
Select Next to preview the data in the next step.
If there are no errors with the connection, you see a preview of the data to be imported. If you are unable to see the preview or have any issues viewing the preview, contact support.
The records arw imported into one column both during the preview and when the data import is run. If you need to use non-standard options for your import, select Advanced Settings.
Advanced Settings allow to you modify aspects of your transfer to allow for special requirements. The following fields are available in Preview > Advanced Settings.
Object ID field name: Name of Object ID field name to import.
Load only new records each run: If checked/enabled, you must specify which fields to sort by
Sort by: Fields to use to sort records. Required if `Load only new records each run` is checked.
Output column name: The name of the column to output the records to.
Stop on invalid record: If checked, the transfer will stop and not complete if it encounters an invalid record.
In this phase, select the Treasure Data target database and table into which you want to import your data. You can create a new database or table using Create new database or Create new table.
Database: The database into which to import the data.
Table: The table within the database to import the data.
Mode: Append – Allows you to add records into an existing table.
Mode: Replace – Replace the existing data in the table with the data being imported.
Partition Key Seed: Choose the long or timestamp column that you would like to use as the partitioning time column. If you do not specify a time column, the upload time of the transfer is used in conjunction with the addition of a
Data Storage Timezone: Data Storage Timezone – Timezone in which the data is stored; data is also displayed in this timezone.
Data Transfer Frequency (When)
In this phase, you can choose to run the transfer only one time or schedule it to run at a specified frequency.
Once now: Run the transfer only once.
Schedule: accepts these three options:
Delay Transfer: add a delay of execution time.
Time Zone: supports extended timezone formats like ‘Asia/Tokyo’.
After selecting the frequency, select Start Transfer to begin the transfer. If there are no errors, the transfer into Treasure Data will complete and the data will be available.
My Input Transfers
If you need to review the transfer you have just completed for other data connector jobs, you can view a list of your transfers in the
My Input Transfers section.
Use the CLI to Configure the Connector
You can also use the MongoDB data connector from the command line interface. The following instructions show you how to import data using the CLI.
Install ‘td’ Command v0.11.9 or Later
Install the newest TD Toolbelt.
Create Seed Config File (seed.yml)
seed.yml as shown, with your MongoDB details. Create
seed.yml with the following content.
The Data Connector for MongoDB imports all documents that are stored in a specified collection. You may filter fields, specify queries, or sort with the following options.
3.2.1. Projection Option
A JSON document used for projection on query results. Fields in a document are used only if they match with this condition.
A JSON document used for querying on the source collection. Documents are loaded from the collection if they match with this condition.
Order of result
This option can't be used with aggregation option.
For more details on available
out modes, see Appendix.
This option can't be used with sort option.
For more details on available
out modes, see Appendix.
Guess Fields (Generate load.yml)
The Data Connector MongoDB loads MongoDB’s documents as a single column and therefore doesn’t support
connector:guess. Edit all settings in your
You can preview how the system parses the documents by using the
The data connector supports parsing of “boolean”, “long”, “double”, “string”, and “timestamp” types.
You also must create a local database and table prior to executing the data load job.
Execute Load Job
Finally, submit the load job. It may take a couple of hours depending on the size of the data. Specify the Treasure Data database and table where the data should be stored.
--time-column option, because Treasure Data’s storage is partitioned by time (see architecture). If the option is not provided, the Data Connector will choose the first
timestamp column as the partitioning time. The type of the column specified by
--time-column must be either of
If your data doesn’t have a time column you can add it using
add_time filter option. For more details see add_time filter plugin.
If you want to expand the JSON column, you may add it using the
expand_json filter option. More details at expand_json filter plugin
The connector:issue command assumes that you have already created a database(td_sample_db) and a table(td_sample_table). If the database or the table do not exist in TD, the connector:issue command will fail, so create the database and table manually or use the --auto-create-table option with the td connector:issue command to auto-create the database and table:
The Data Connector does not sort records on server-side. To use time-based partitioning effectively, sort records beforehand.
If you have a field called
time, you don’t have to specify the
You can load records incrementally by specifying a field in your table that contains date information by utilizing the
The connector automatically creates the query and sort values.
Incremental Load with Multiple Fields
You can also specify multiple fields for
The connector creates query and sort values using ‘AND’ condition.
The `sort` option can't be used when you specify `incremental_field`.
You must specify `last_record` with special characters when the field type is ObjectId or DateTime.
You can schedule periodic data connector executions for MongoDB data imports. We configure our scheduler carefully to ensure high availability. By using this feature, you no longer need a
cron daemon on your local data center.
Create the Schedule
A new schedule can be created using the
td connector:create command. The following are required: the name of the schedule, the cron-style schedule, the database and table where the data will be stored, and the Data Connector configuration file.
It’s also recommended to specify the
--time-column option since Treasure Data’s storage is partitioned by time (see data partitioning).
The `cron` parameter also accepts three special options: `@hourly`, `@daily` and `@monthly`.
By default, schedule is setup in UTC timezone. You can set the schedule in a timezone using -t or --timezone option. Note that `--timezone` option supports only extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.
List the Schedules
You can see the list of currently scheduled entries by running the command
Show the Setting and Schedule History
td connector:show shows the execution setting of a scheduled entry.
td connector:history shows the execution history of a scheduled entry. To investigate the results of each individual run, use
td job <jobid>.
Delete the Schedule
td connector:delete removes the schedule.
Modes for Out Plugin
You can specify data import mode in
out section of seed.yml.
This is the default mode and records are appended to the target table.
replace (In td 0.11.10 and later)
This mode replaces data in the target table. Note that any manual schema changes made to the target table remain intact with this mode.