Data Connector for Salesforce

Table of Contents

Background

This Data Connector allows you to connect your Salesforce data to Treasure Data, so you can quickly start analyzing your business critical information using simple SQL. It should take you about 20 minutes to setup.

You can also create a connection to run on a scheduled basis so you’re always analyzing the most recent data available. Continue below to get started.

Step 0: Install ‘td’ command v0.11.9 or later

Install the newest Treasure Data Toolbelt.

$ td --version
0.14.1

Step 1: Obtain OAuth Client ID and Client Secret

Please visit Setup -> App Setup -> Create -> Apps in Salesforce. Then, please scroll down to Connected Apps and click New button. Please fill in the forms like below.


Then, please revisit Setup -> App Setup -> Create -> Apps and click Treasure Data under Connected Apps. It will show Consumer Key and Consumer Secret which will be both used in the next step.

If you haven’t already signed up for an account, contact us and we’ll get you started.

Step 2: Obtain Security Token

For accessing Salesforce API securely all Salesforce users have Security Token. If you haven’t generate it go My Settings -> Personal -> Reset My Security Token. You should receive new Security Token by e-mail.

Step 3: Create Seed Config File (seed.yml)

First, please prepare seed.yml as below, with your Salesforce credentials. This configuration dumps Account object specified at target field. You’ll use “replace” mode. For more details on available out modes, see Appendix

in:
  type: sfdc
  username: "USERNAME"                     # your username
  password: "PASSWORD"                     # your password
  security_token: "SECURITY_TOKEN"         # your Security Token
  client_id: "CLIENT_ID"                   # your app's consumer key
  client_secret: "CLIENT_SECRET"           # your app's consumer secret
  login_url: https://login.salesforce.com/ # test.salesforce.com for sandbox
  incremental: false                       # 'full dump', see Scheduled execution below
  target: Account                          # Salesforce Object you want to import
out:
  mode: replace

All Salesforce Objects are supported for the target option, some common ones are:

  • Opportunity
  • Contact
  • Lead
  • Account
  • Event
  • Task

Please refer to the full reference at the Salesforce Developer portal.

Untitled-3
Accessible object, field and record depend on the user's Salesforce license, security configuration and API access configuration in your Salesforce organization. Please check those configurations if you see authorization error in Step 4.

Step 4: Guess Fields (Generate load.yml)

Second, please use connector:guess. This command automatically reads the target data, and intelligently guesses the data format.

$ td connector:guess seed.yml -o load.yml

If you open up load.yml, you’ll see guessed file format definitions including, in some cases, file formats, encodings, column names, and types.

in:
  type: sfdc
  username: "USERNAME"
  password: "PASSWORD"
  security_token: "SECURITY_TOKEN"
  client_id: "CLIENT_ID"
  client_secret: "CLIENT_SECRET"
  login_url: https://login.salesforce.com/
  ...
filters:
  ...
out:
  mode: replace

Then you can preview how the system will parse the file by using preview command.

$ td connector:preview load.yml

If the system detects your column name or type unexpectedly, please modify load.yml directly and preview again.

Untitled-3
Currently, the Data Connector supports parsing of "boolean", "long", "double", "string", and "timestamp" types.

Step 5: Execute Load Job

Finally, submit the load job. It may take a couple of hours depending on the data size. Users need to specify the database and table where their data is stored.

$ td connector:issue load.yml --database td_sample_db --table td_sample_table \
                              --time-column createddate

The above command assumes you have already created database(td_sample_db) and table(td_sample_table). If the database or the table do not exist in TD this command will not succeed, so create the database and table manually or use --auto-create-table option with td connector:issue command to auto create the database and table:

$ td connector:issue load.yml --database td_sample_db --table td_sample_table \
                              --time-column createddate --auto-create-table
Untitled-3
You can assign Time Format column to the "Partitioning Key" by "--time-column" option.
Untitled-3
If you want to ingest records that have been deleted in SFDC, you need to specify `include_deleted_records: true`. By default, the ingested records don't include deleted ones.

Example: load file using ‘include_deleted_records’ option

in:
  type: sfdc
  username: "USERNAME"
  password: "PASSWORD"
  security_token: "SECURITY_TOKEN"
  client_id: "CLIENT_ID"
  client_secret: "CLIENT_SECRET"
  login_url: https://login.salesforce.com/
  include_deleted_records: true # Ingest records that have been deleted
  ...
filters:
  ...
out:
  mode: replace

Scheduled execution

You can schedule periodic Data Connector execution for periodic SFDC import. We take great care in distributing and operating our scheduler in order to achieve high availability. By using this feature, you no longer need a cron daemon on your local data center.

For the scheduled import, Data Connector for SFDC imports all Objects that match the specified target.

Scheduled execution support additional configuration parameters that control the behavior of the Data Connector during its periodic attempts to fetch data from SFDC:

  • incremental This configuration is used to control the load mode, which governs how the Data Connector fetches data from SFDC based on one of the native timestamp fields associated with each Object.
    • incremental: true (default)
      in this mode the data connector will only fetch records of the specified Salesforce Object type that have been updated since the previous run of the connector. This is useful when the user wants to fetch just the Object targets that have changed since the previous scheduled run. This mode is typically combined with writing data into the destination table using ‘append’ mode.
      • incremental_columns (required)
        This option is required for incremental mode, to load necessary data only from SFDC, as the following example.
    • incremental: false
      in this mode the data connector will fetch all the records of the specified Salesforce Object type regardless of when they were last updated. This mode is best combined with writing data into a destination table using ‘replace’ mode.

Example: seed file using incremental mode combined with ‘append’ mode for output

in:
  type: sfdc
  username: "USERNAME"
  password: "PASSWORD"
  security_token: "SECURITY_TOKEN"
  client_id: "CLIENT_ID"
  client_secret: "CLIENT_SECRET"
  login_url: https://login.salesforce.com/
  target: Account
  incremental: true
  incremental_columns:
    - LastModifiedDate
out:
  mode: append

Above config has a column name LastModifiedDate which will be used to load records incrementally. Data Connector will use a SOQL (SFDC Query Language) with that config as below.

-- first execution:
SELECT Id, Name, ... FROM Account ORDER BY LastModifiedDate

-- second execution:
SELECT Id, Name, ... FROM Account WHERE (LastModifiedDate > 2015-10-21T10:33:07+0000) ORDER BY LastModifiedDate
-- '2015-10-21T10:33:07+0000' is the latest modified record time from first execution

-- third execution:
SELECT Id, Name, ... FROM Account WHERE (LastModifiedDate > 2015-10-30T22:30:41+0000) ORDER BY LastModifiedDate
-- '2015-10-30T22:30:41+0000' is the latest modified record time from second execution

You can see the WHERE clause was updated on every execution to load modified records only.

Example: seed file using non-incremental mode (incremental: false) combined with ‘replace’ mode for output

in:
  type: sfdc
  username: "USERNAME"
  password: "PASSWORD"
  security_token: "SECURITY_TOKEN"
  client_id: "CLIENT_ID"
  client_secret: "CLIENT_SECRET"
  login_url: https://login.salesforce.com/
  incremental: false
  target: Account
out:
  mode: replace

With above configuration, Data Connector will load all records every time and replace all existing records with loaded records.

Create the schedule

A new schedule can be created by using the td connector:create command. The name of the schedule, cron-style schedule, the database and table where their data will be stored, and the Data Connector configuration file are required.

$ td connector:create \
    daily_sfdc_import \
    "10 0 * * *"      \
    td_sample_db      \
    td_sample_table   \
    load.yml
Untitled-3
The `cron` parameter also accepts three special options: `@hourly`, `@daily` and `@monthly`.
Untitled-3
By default, schedule is setup in UTC timezone. You can set the schedule in a timezone using -t or --timezone option. Please note that `--timezone` option only supports extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.

List the Schedules

You can see the list of currently scheduled entries by td connector:list.

$ td connector:list
+-------------------+--------------+----------+-------+--------------+-----------------+------------------------+
| Name              | Cron         | Timezone | Delay | Database     | Table           | Config                 |
+-------------------+--------------+----------+-------+--------------+-----------------+------------------------+
| daily_sfdc_import | 10 0 * * *   | UTC      | 0     | td_sample_db | td_sample_table | {"type"=>"sfdc", ... } |
+-------------------+--------------+----------+-------+--------------+-----------------+------------------------+

Show the Setting and History of Schedules

td connector:show shows the execution setting of a schedule entry.

% td connector:show daily_sfdc_import
Name     : daily_sfdc_import
Cron     : 10 0 * * *
Timezone : UTC
Delay    : 0
Database : td_sample_db
Table    : td_sample_table

td connector:history shows the execution history of a schedule entry. To investigate the results of each individual execution, please use td job <jobid>.

% td connector:history daily_sfdc_import
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| JobID  | Status  | Records | Database     | Table           | Priority | Started                   | Duration |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| 578066 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-18 00:10:05 +0000 | 160      |
| 577968 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-17 00:10:07 +0000 | 161      |
| 577914 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-16 00:10:03 +0000 | 152      |
| 577872 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-15 00:10:04 +0000 | 163      |
| 577810 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-14 00:10:04 +0000 | 164      |
| 577766 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-13 00:10:04 +0000 | 155      |
| 577710 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-12 00:10:05 +0000 | 156      |
| 577610 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-11 00:10:04 +0000 | 157      |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
8 rows in set

Delete the Schedule

td connector:delete will remove the schedule.

$ td connector:delete daily_sfdc_import

Appendix

A) Modes for out plugin

You can specify file import mode in out section of seed.yml.

append (default)

This is the default mode and records are appended to the target table.

in:
  ...
out:
  mode: append

replace (In td 0.11.10 and later)

This mode replaces data in the target table. Please note that any manual schema changes made to the target table will remain intact with this mode.

in:
  ...
out:
  mode: replace

Last modified: Feb 24 2017 09:27:52 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.