You can import Business Cards or Tags, that you create in Sansan to categorized business cards, into Treasure Data. You can also import Business Cards with attached Tag Names.

This topic contains:

Prerequisites

  • Basic knowledge of Treasure Data, including the TD Toolbelt.

  • A Sansan API key

  • Authorized Treasure Data account access

Obtain Sansan API Key Information

Login to Sansan then select Settings.


From the Settings page, select API Key on the menu and copy API Key value.


Use the TD Console to Create Your Connection

Create a New Connection

Go to Integrations Hub ->  Catalog and search and select Sansan.


Select Create. The following dialog opens.


Edit the required credentials. Input your Sansan API Key that you copied in the previous step. Select Continue. Name your new Sansan Connection. Select Done.

Transfer Your Sansan Data to Treasure Data

After creating the connection, you are automatically taken to the Authentications tab. Look for the connection you created and select New Source.


The dialog opens. Complete the details as follows.

Select the Data Type and Scope of Data

Specify the Sansan data type that you want to import: Business Card or Tag.


Specify other parameters:

  • Import a set of business cards based on: Select Term or Condition. Select Term to import cards updated within a specified timeframe. Select Condition to import cards based on specified Tag Names.

  • Include previous business card information: Include past business cards. Check this box if you want business cards, that are recognized as for the same person, to be imported as separate cards.

  • Filter by range of holder: The range of business cards can be specified as “me” (only the cards held by oneself) or as “all” (all the cards within the range that can be viewed by oneself).

Specify Import Business Card by Terms

Complete the following fields if you want to import Business Card based on Sansan defined Terms. Terms are a timeframe, a range of time that the business cards are created or updated in Sansan.



Parameters:

Data Card Updated From: Import business cards updated since this time. Time is set in UTC.

  • Data Card Updated To: Import business cards updated until this time. Time is set in UTC.

  • Incremental: When importing based on a schedule, the time window of the fetched data automatically shifts forward on each run. For example, if the initial config is January 1, with ten days in duration, the first run fetches data modified from January 1 to January 10, the second run fetches from January 11 to January 20, and so on.

Specify Import Business Card by Condition

Complete the following fields if you want to import Business Card based on Sansan defined Conditions.


Parameters:

  • Filter by Tag: Import business cards contain a Tag. Select this if you want to import by Tag. If you select you must specify a Tag name. Leave the box unchecked if you don't want to match business cards by tags.

  • Tag Name: Import business cards that have the attached tag name specified. Enter only one name. Example: If you enter 'Treasure_Data', all business cards that contain the Treasure_Data tag name are imported.

  • Tag range: Specify the range of Tag holders. Import business cards that have an attached tag within the range of Me or All.


Specify Import of Tag Data

You can specify to import only Sansan tags, not business cards. Import Tag data into Treasure Data to better manage the tags and align information.


Parameters:

  • Tag range: Specify the range of Tag holders. Import business cards that have an attached tag within the range of Me or All.

After completing your configuration, select Next.


Data Preview 


You can see a preview of your data before running the import by selecting Generate Preview.

Data shown in the data preview is approximated from your source. It is not the actual data that is imported.

  1. Select Next.
    Data preview is optional and you can safely skip to the next page of the dialog if you want.

  2. To preview your data, select Generate Preview. Optionally, select Next

  3. Verify that the data looks approximately like you expect it to.


  4. Select Next.


Advanced Settings


You can specify the following parameters:

  • Page size. Determines the number of records that will be returned for each call to the Sansan REST API.

      Type: number
      Default: 300
    
  • Maximum retry times. Specifies the maximum retry times for each API call.

      Type: number
      Default: 3
    
  • Initial retry interval millisecond. Specifies the wait time for the first retry.

      Type: number
      Default: 20000
    
  • Maximum retry interval milliseconds. Specifies the maximum time between retries.

      Type: number
      Default: 120000
    

Data Placement

Create a new database and give your database a name. Complete similar steps for Create new table.

Select whether to append records to an existing table or replace your existing table.

If you want to set a different partition key seed rather than use the default key, you can specify one using the popup menu.

For data placement, select the target database and table where you want your data placed and indicate how often the import should run.

  1.  Select Next. Under Storage you will create a new or select an existing database and create a new or select an existing table for where you want to place the imported data.

  2. Select a Database > Select an existing or Create New Database.

  3. Optionally, type a database name.

  4. Select a Table> Select an existing or Create New Table.

  5. Optionally, type a table name.

  6. Choose the method for importing the data.

    • Append (default)-Data import results are appended to the table.
      If the table does not exist, it will be created.

    • Always Replace-Replaces the entire content of an existing table with the result output of the query. If the table does not exist, a new table is created. 

    • Replace on New Data-Only replace the entire content of an existing table with the result output when there is new data.

  7. Select the Timestamp-based Partition Key column.
    If you want to set a different partition key seed than the default key, you can specify the long or timestamp column as the partitioning time. As a default time column, it uses upload_time with the add_time filter.

  8. Select the Timezone for your data storage.

  9. Under Schedule, you can choose when and how often you want to run this query.

    • Run once:
      1. Select Off.

      2. Select Scheduling Timezone.

      3. Select Create & Run Now.

    • Repeat the query:

      1. Select On.

      2. Select the Schedule. The UI provides these four options: @hourly, @daily and @monthly or custom cron.

      3. You can also select Delay Transfer and add a delay of execution time.

      4. Select Scheduling Timezone.

      5. Select Create & Run Now.

 After your transfer has run, you can see the results of your transfer in Data Workbench > Databases.


Details

Name your Transfer and select Done to start.


After your transfer has run, you can see the results of your transfer in the Databases tab.

Use Command Line to Create Your Sansan Connection

You can use the TD Console to configure your connection.


Install the Treasure Data Toolbelt

Install the newest Treasure Data Toolbelt.

Create a Configuration File (load.yml)

The configuration file includes an in: section where you specify what comes into the connector from Sansan and an out: section where you specify what the connector puts out to the database in Treasure Data. For more details on available out modes, see the Appendix.

The following example shows how to specify the import of Business Cards based on a specified Term, without incremental scheduling.

in:
  api_key: "api key"
  target: bizcard
  type: sansan
  query_by: "term"
  biz_range: "all"
  include_prev_card: true
out:
 mode: append

The following example shows how to specify the import of Business Cards based on a specified Term, with incremental scheduling.

in:
  api_key: "api key"
  target: bizcard
  type: sansan
  query_by: "term"
  biz_range: "all"
  include_prev_card: true
  updated_from: "2018-11-01T00:00:00.000Z"
  updated_to: "2018-11-12T00:00:00.000Z"
  incremental: true
out:
 mode: append

The following example shows how to specify the import of Business Cards based on a specified Condition without a Tag filter.

in:
  api_key: "api key"
  target: bizcard
  type: sansan
  query_by: "condition"
  biz_range: "all"
  include_prev_card: true
out:
 mode: append

The following example shows how to specify the import of Business Cards based on a specified Condition with a Tag filter.

in:
  api_key: "api key"
  target: bizcard
  type: sansan
  query_by: "condition"
  biz_range: "all"
  include_prev_card: true
  biz_tag_filter: true
  biz_tag_name: "tag name"
  biz_tag_range: "all"
out:
 mode: append

The following example shows how to specify the import of Tags only.

in:
  api_key: "api key"
  target: tag
  type: sansan
  tag_range: "all"
out:
 mode: append

Preview the Data to be Imported (Optional)

You can preview data to be imported using the command td connector:preview.

$ td connector:preview load.yml 

Execute the Load Job

You use td connector:issue to execute the job.

You must specify the database and table where you want to store the data before you execute the load job. Ex td_sample_db, td_sample_table

$ td connector:issue load.yml \ 
     --database td_sample_db \ 
     --table td_sample_table \ 
     --time-column date_time_column

It is recommended to specify --time-column option, because Treasure Data’s storage is partitioned by time. If the option is not given, the data connector selects the first long or timestamp column as the partitioning time. The type of the column, specified by --time-column, must be either of long or timestamp type (use Preview results to check for the available column name and type. Generally, most data types have a last_modified_date column).

If your data doesn’t have a time column, you can add the column by using the add_time filter option. See details at add_time filter plugin.

td connector:issue assumes you have already created a database (sample_db) and a table (sample_table). If the database or the table does not exist in TD, td connector:issue will fail. Therefore you must create the database and table manually or use --auto-create-table with td connector:issue to automatically create the database and table.

 $ td connector:issue load.yml \ 
      --database td_sample_db \ 
      --table td_sample_table \ 
      --time-column date_time_column \
      --auto-create-table

Finally, from the command line, submit the load job. Processing might take a couple of hours depending on the data size.

Scheduled Execution

You can schedule periodic data connector execution for periodic Sansan import. We configure our scheduler carefully to ensure high availability. By using this feature, you no longer need a cron daemon on your local data center.

Scheduled execution supports configuration parameters that control the behavior of the data connector during its periodic attempts to fetch data from Sansan:

  • incremental This configuration is used to control the load mode, which governs how the data connector fetches data from<integration> based on one of the native timestamp fields associated with each object.

  • columns This configuration is used to define a custom schema for data to be imported into Treasure Data. You can define only columns that you are interested in here but make sure they exist in the object that you are fetching. Otherwise, these columns aren’t available in the result.

  • last_record This configuration is used to control the last record from the previous load job. It requires the object to include a key for the column name and a value for the column’s value. The key needs to match the Sansan column name.

See Appendix: How Incremental Loading works for details and examples.

Create the Schedule

A new schedule can be created using the td connector:create command. The name of the schedule, cron-style schedule, the database and table where their data will be stored, and the data connector configuration file are required.

The `cron` parameter accepts these options: `@hourly`, `@daily` and `@monthly`.

By default, the schedule is set up in UTC timezone. You can set the schedule in a timezone using -t or --timezone option. The `--timezone` option only supports extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles', etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.

$ td connector:create \
    daily_import \
    "10 0 * * *" \
    td_sample_db \
    td_sample_table \
    load.yml

It’s also recommended to specify the --time-column option, since Treasure Data’s storage is partitioned by time.

$ td connector:create \
    daily_import \
    "10 0 * * *" \
    td_sample_db \
    td_sample_table \
    load.yml \
    --time-column created_at

List the Schedules

You can see the list of currently scheduled entries by entering the command td connector:list.

$ td connector:list
+--------------+------------+----------+-------+--------------+-----------------+----------------------------+
| Name         | Cron       | Timezone | Delay | Database     | Table           | Config                     |
+--------------+------------+----------+-------+--------------+-----------------+----------------------------+
| daily_import | 10 0 * * * | UTC      | 0     | td_sample_db | td_sample_table | {"in"=>{"type"=>"sansan",  |
+--------------+------------+----------+-------+--------------+-----------------+----------------------------+

Show the Schedule Settings and History of Schedules

td connector:show shows the execution setting of a schedule entry.

% td connector:show daily_import
Name     : daily_import
Cron     : 10 0 * * *
Timezone : UTC
Delay    : 0
Database : td_sample_db
Table    : td_sample_table
Config
---in:
 api_key: "api key"
 target: tag
 type: sansan
 tag_range: "all"

td connector:history shows the execution history of a schedule entry. To investigate the results of each individual execution, use td job <jobid>.

% td connector:history daily_import
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| JobID  | Status  | Records | Database     | Table           | Priority | Started                   | Duration |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| 578066 | success | 10000   | td_sample_db | td_sample_table | 0        | 2018-04-18 00:10:05 +0000 | 160      |
| 577968 | success | 10000   | td_sample_db | td_sample_table | 0        | 2018-04-17 00:10:07 +0000 | 161      |
| 577914 | success | 10000   | td_sample_db | td_sample_table | 0        | 2018-04-16 00:10:03 +0000 | 152      |
| 577872 | success | 10000   | td_sample_db | td_sample_table | 0        | 2018-04-15 00:10:04 +0000 | 163      |
| 577810 | success | 10000   | td_sample_db | td_sample_table | 0        | 2018-04-14 00:10:04 +0000 | 164      |
| 577766 | success | 10000   | td_sample_db | td_sample_table | 0        | 2018-04-13 00:10:04 +0000 | 155      |
| 577710 | success | 10000   | td_sample_db | td_sample_table | 0        | 2018-04-12 00:10:05 +0000 | 156      |
| 577610 | success | 10000   | td_sample_db | td_sample_table | 0        | 2018-04-11 00:10:04 +0000 | 157      |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
8 rows in set

Delete the Schedule

td connector:delete removes the schedule.

$ td connector:delete daily_import

Appendix

Modes for the Out Plugin

You can specify file import mode in the out section of the load.yml file.

The out section controls how data is imported into a Treasure Data table.
For example, you may choose to append data or replace data in an existing table in Treasure Data.

Output modes are ways to modify the data as the data is placed in Treasure Data.

  • Append (default): Records are appended to the target table.

  • Replace (available In td 0.11.10 and later): Replaces data in the target table. Any manual schema changes made to the target table remain intact.

Examples:

in:
  ...
out:
  mode: append


in:
  ...
out:
  mode: replace


How Incremental Loading Works

Incremental loading uses monotonically increasing unique columns (such as AUTO_INCREMENT column) to load records that were inserted (or updated) after the last execution.

If incremental: true is set, this connector loads all records with additional ORDER BY. This mode is useful when you want to fetch just the object targets that have changed since the previous scheduled run. For example, if incremental_columns: [updated_at, id] option is set, the query is as follows:

SELECT * FROM (
 ...original query is here...
)
ORDER BY updated_at, id

When bulk data loading finishes successfully, it outputs last_record: parameter as config-diff so that next execution uses it.
At the next execution, when last_record: is also set, this plugin generates additional WHERE conditions to load records larger than the last record. For example, if last_record: ["2017-01-01T00:32:12.000000", 5291] is set,

SELECT * FROM (
 ...original query is here...
)
WHERE updated_at > '2017-01-01T00:32:12.000000' OR (updated_at = '2017-01-01T00:32:12.000000' AND id > 5291)
ORDER BY updated_at, id

Then, it updates last_record: so that the next execution uses the updated last_record.
IMPORTANT: If you set incremental_columns: option, make sure that there is an index on the columns to avoid a full table scan. For this example, the following index should be created:

CREATE INDEX embulk_incremental_loading_index ON table (updated_at, id);

Recommended usage is to leave incremental_columns unset and let the connector automatically finds an AUTO_INCREMENT primary key.

Currently, only Timestamp, Datetime, and numerical columns are supported as incremental_columns.
For the raw query, the incremental_columns is required because it won't be able to detect the Primary keys for a complex query.

If incremental: false is set, the data connector fetches all the records of the specified <integration> object type, regardless of when they were last updated. This mode is best combined with writing data into a destination table using ‘replace’ mode.

Incremental Loading for Data Extensions

Treasure Data supports incremental loading for Data Extensions that have a date field.

If incremental: true is set, the data connector loads records according to the range specified by the from_date and the fetch_days for the specified date field.


  • No labels