You can import Magento (v2) objects into Treasure Data.

Prerequisites

  • Basic knowledge of Treasure Data

  • Basic knowledge of Magento 2

Create a New Authentication

When you configure a data connection, you provide authentication to access the integration. In Treasure Data, you configure the authentication and then specify the source information.

  1. Open the TD Console

  2. Navigate to Integrations Hub -> Catalog

  3. Search for Magento.

  4. Select Create.


  5. The following dialog opens. Choose one of the Authentication Modes. Edit the required credentials.


  6. Select Continue

Authentication Mode Options

Using Login Credentials to Authenticate

  1. Select Login Credential and enter your Admin-level user name and password.


Using an Integration Access Token to Authenticate

  1. To get the access token, in your Magento Admin view, go to System > Integrations > Add New Integration (or open an existing one).


  2. Select the appropriate permissions for your target resources.


  3. Activate the integration and then copy the Access Token.


  4. In Treasure Data, paste the name of the access token. For Base URL, set the value to be your Magento 2 REST API endpoint (for example, http://my-magento-server/rest/)


  5. Select Continue.

  6. Name your Authentication. Click Done.

Transfer Your Magento Data to Treasure Data

  1. After creating the authenticated connection, you are automatically taken to the Authentications tab. Look for the connection you created and click New Source.

  2. The following dialog opens. 


  3. Complete the details. 


Parameters:

  • API URL: The desired resource (for example, /V1/orders). The target API must conform to Magento Search API: https://devdocs.magento.com/guides/v2.3/rest/performing-searches.html.

  • Incremental: Specify columns and column type. Set a begin and end value. These values are used as the initial fetching range and are automatically increased while keeping the same range (or duration for timestamp type).

    When running on schedule, the time window of the fetched data automatically shifts forward on each run. For example, if the initial config is January 1, with ten days in duration, the first run fetches data modified from January 1 to January 10, the second run fetches from January 11 to January 20, and so on.

    Review "How Incremental Loading Work: in the Appendix below.

Preview

Click Next to preview the data.

The Magento data connector must sample a response to guess for the resource's schema, and the Magento API potentially omits the decimal part of decimal numbers if they are rounded. The Magento data connector might incorrectly guess a column's data type as long instead of double and cause loss of precision. You can adjust to the correct types in Advance Settings.

Click Advanced Settings to see the schema settings.


Choose the target database and table

Choose an existing or create a new database and table where you want to transfer data to:


Scheduling 

  1. In the Schedule tab, you can specify a one-time transfer, or you can schedule an automated recurring transfer.

  2. If you select Once now, click Start Transfer. If you select Repeat, specify your schedule options, then click Schedule Transfer.


  3. After your transfer has run, you can see the results of your transfer in Data WorkbenchDatabases.


Use the Command Line to Create your Magento Connection

You can use the CLI to configure your connection. 

Install the Treasure Data Toolbelt

Open a terminal and run the following command to install the newest Treasure Data Toolbelt.

$ td --version
0.16.3

Prepare load.yml file

Prepare load.yml. The in: section is where you specify what comes into the connector from Magento  and the out: section is where you specify what the connector puts out to the database in Treasure Data.

Provide your Magento account access information as follows:

in:
  type: magento
  auth_method: token
  token: xxxxxxxxxxxxx
  base_url: https://your-magento-server/rest/
  resource_url: V1/products

Configuration Keys

Keys

Description

auth_method

Either password or token

token

The Access Token for auth_method: token

username

Admin username for auth_method: username

password

Admin password for auth_method: password

base_url

Magento REST API endpoint, example: https://my-magento-server/rest

resource_url

the target resource URL, example: /V1/products

incremental

should data import be continuous or once, default as false.

incremental_column

which column should be used as the cursor for incremental loading

incremental_column_type

the data type of incremental_column, either timestamp or number

incremental_start_timestamp

initial start value for incremental_column_type: timestamp

incremental_end_timestamp

initial end value for incremental_column_type: timestamp

incremental_start_number

initial start value for incremental_column_type: number

incremental_end_number

initial end value for incremental_column_type: number


Preview data to import (optional)

You can preview data to be imported using the command td connector:preview.

$ td connector:preview load.yml
+-----------------+-------------+----
| product_id:long | name:string | ...
+-----------------+-------------+----
| 1               | "Chocolate" |
| 2               | "Cookie"    |
| 3               | "Bike"      |
+-----------------+-------------+----


Execute the Load Job

Use td connector:issue to execute the job. Processing might take a couple of hours depending on the data size. The following are required:

  • name of the schedule

  • cron-style schedule

  • database and table where their data will be stored

  • the Data Connector configuration file

$ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column created_at \
    daily_xxxx_import ...

The preceding command assumes you have already created database(td_sample_db) and table(td_sample_table). If the database or the table do not exist in TD, this command will not succeed so create the database and table manually or use -auto-create-table option with td connector:issue command to auto create the database and table:

 $ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column created_at --auto-create-table

It is recommended to specify --time-column option, because Treasure Data’s storage is partitioned by time. If the option is not given, the data connector selects the first long or timestamp column as the partitioning time. The type of the column, specified by --time-column, must be either of long or timestamp type (use Preview results to check for the available column name and type. Generally, most data types have a last_modified_date column).

A time column is available at the end of the output.

$ td connector:issue load.yml --database td_sample_db --table td_sample_table \ --time-column created_at

If your data doesn’t have a time column you may add it using the add_time filter. You add the "time" column by adding the add_time filter to your configuration file as follows.

in:
  type: xxxxx
  ...
filters:
- type: add_time
  from_value:
    mode: upload_time
  to_column:
    name: time
out:
  type: td

 More details at add_time filter plugin.

 If you have a field called time, you do not have to specify -time-column option.

Scheduled execution

You can schedule a periodic Data Connector execution for a periodic Magento import. We manage our scheduler to ensure high availability. By using this feature, you no longer need a cron daemon on your local data center.

Create the schedule

A new schedule can be created using the td connector:create command. The name of the schedule, cron-style schedule, the database and table where their data will be stored, and the Data Connector configuration file are required.

$ td connector:create \
    daily_magento_import \
    "10 0 * * *" \
    td_sample_db \
    td_sample_table \
    load.yml 


The `cron` parameter also accepts these three options: `@hourly`, `@daily` and `@monthly`.


By default, the schedule is setup in the UTC timezone. You can set the schedule in a timezone using -t or --timezone option. The `--timezone` option only supports extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.

Incremental Scheduling

You can load records incrementally by specifying columns in your table using incremental_column.

in:
  type: magento
  base_url: ...
  api_url: ...
  auth_method: token
  token: ...
  incremental_column: true
  incremental_column_type: number
  incremental_start_number: ...
  incremental_end_number: ...
out:
  mode: append
  exec: {}

Replace incremental_start_number and incrementa_end_number to incremental_start_timestamp and incremental_end_timestamp respectively if incremental_column_type: timestamp.

If you’re using scheduled execution, the connector automatically remembers the last value state and holds it internally. Then you can use it at the next scheduled execution. See Appendix: How Incremental Loading Works for details.

List the Schedules

You can see the list of currently scheduled entries by td connector:list.

$ td connector:list
+-----------------------+--------------+----------+-------+--------------+-----------------+------------------------------+
| Name                  | Cron         | Timezone | Delay | Database     | Table           | Config                       |
+-----------------------+--------------+----------+-------+--------------+-----------------+------------------------------+
| daily_magento_import  | 10 0 * * *   | UTC      | 0     | td_sample_db | td_sample_table | {"type"=>"magento", ... } |
+-----------------------+--------------+----------+-------+--------------+-----------------+------------------------------+

Show the Setting and History of Schedules

td connector:show shows the execution setting for a schedule entry.

% td connector:show daily_magento_import
Name     : daily_magento_import
Cron     : 10 0 * * *
Timezone : UTC
Delay    : 0
Database : td_sample_db
Table    : td_sample_table

td connector:history shows the execution history of a schedule entry. To investigate the results of each individual execution, use td job <jobid>.

% td connector:history daily_magento_import
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| JobID  | Status  | Records | Database     | Table           | Priority | Started                   | Duration |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| 578066 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-18 00:10:05 +0000 | 160      |
| 577968 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-17 00:10:07 +0000 | 161      |
| 577914 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-16 00:10:03 +0000 | 152      |
| 577872 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-15 00:10:04 +0000 | 163      |
| 577810 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-14 00:10:04 +0000 | 164      |
| 577766 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-13 00:10:04 +0000 | 155      |
| 577710 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-12 00:10:05 +0000 | 156      |
| 577610 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-11 00:10:04 +0000 | 157      |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
8 rows in set


Delete the Schedule

td connector:delete removes the schedule.

$ td connector:delete daily_magento_import

Appendix

How can I get all the items from all the active carts on my Magento Site?

For those API that Magento 2 doesn't support off the shelf. Since TD Magento Connector simply requests from its REST API, you have to expose your own API and set the corresponding "Resource URL" (resource_url) configuration. Keep in mind that API have to implement SearchCriteriaInterface for Magento Connector to work.

How Incremental Loading works

Incremental loading requires a monotonically increasing unique column (incremental_column) to load records that were inserted (or updated) after the last execution.

This mode is useful when you want to fetch just the object targets that have changed since the previously scheduled run. For example, if incremental_column: updated_at option is set, the request is as follows: 

GET https://your-magento-server/rest/V1/searchable-resource
   &searchCriteria[filter_groups][0][filters][0][field]=updated_at
   &searchCriteria[filter_groups][0][filters][0][value]=<incremental_start_timestamp>
   &searchCriteria[filter_groups][0][filters][][condition_type]=gt

   &searchCriteria[filter_groups][0][filters][0][field]=updated_at
   &searchCriteria[filter_groups][0][filters][0][value]=<incremental_end_timestamp>
   &searchCriteria[filter_groups][0][filters][][condition_type]=lt

When bulk data loading finishes successfully, it outputs incremental_last_value and incremental_run_count(number of jobs ran, starting at 0) parameters as config-diff so that the next execution uses it.
At the next execution, when those parameters are also set, this plugin uses incremental_last_value as the new lower bound, condition_type will also switch from gt to gteq. For upper bound, it is calculated by: incremental_start_<timestamp/number> + incremetal_run_count * (incremental_end_<timestamp/number> - incremental_start_<timestamp/number>), condition_type stays as lt.

GET https://your-magento-server/rest/V1/searchable-resource
 &searchCriteria[filter_groups][0][filters][0][field]=updated_at
 &searchCriteria[filter_groups][0][filters][0][value]=<incremental_last_value>
 &searchCriteria[filter_groups][0][filters][][condition_type]=gteq

 &searchCriteria[filter_groups][0][filters][0][field]=updated_at
 &searchCriteria[filter_groups][0][filters][0][value]=<increased_end_value_with_the_mentioned_formula>
 &searchCriteria[filter_groups][0][filters][][condition_type]=lt

Currently, only strings, timestamp and integers are supported as incremental_columns.

If incremental: false is set, the data connector fetches all the records of the specified Magento object type, regardless of when they were last updated. This mode is best combined with writing data into a destination table using ‘replace’ mode.

  • No labels