Visit our new documentation site! This documentation page is no longer updated.

How to integrate Salesforce data with Treasure Data

You can connect Salesforce with Treasure Data for greater control over your Salesforce Data and better integration with the other business applications in your marketing and sales operations stack.

Integrating Salesforce with Treasure Data makes it easy to:

  • Add new features to Salesforce. For example, you can prevent churn by tracking web usage and receiving alerts when customers’ product usage declines.

  • Use Salesforce data to improve other parts of your marketing stack. For example, you can increase your Facebook Ads ROI by automatically removing new customers from your Facebook Custom Audiences.

If you don’t have a Treasure Data account yet, contact us so that we can get you set up!

Table of Contents

How to connect to Salesforce using the Treasure Data Console

Connecting to Salesforce using the Treasure Data Console is quick and easy. (For instructions on how to connect using the command line, here.

Using Credentials

In Salesforce

Go to Setup -> Apps -> App Manager and click New Connected App. (Note that the steps may vary depending on the version of Salesforce you are running.)


From the Setup -> Build -> Create (Apps), you can see all your connected apps


Click on your app name to go to a page where you can view and manage all information about your connected app. From there, you can get your Consumer Key (client_id) and Consumer Secret (client_secret), which you will need for the next step.


For secure account access, get a Salesforce Security Token, if you don’t already have one. To do get a security token, go to Account -> Settings -> Reset My Security Token and click Reset Security Token. You’ll receive your Security Token by email.

In the Treasure Data console

Go to Connections -> Sources Catalog and click on the Salesforce tile.


In the dialog box, enter https://login.salesforce.com/ as the login URL. Enter your username (your email) and password, as well as your Client ID, Client Secret and Security Token.

Give your connection a descriptive name and select Create Connection.

Using OAuth2

Go to Connection -> Sources Catalog and search for Salesforce.


Click on the Salesforce connector. In the dialog to create new connector, open the Authenticate method dropdown list and choose OAuth.


Click on the Click here to connect to a new account. You are redirected to the Salesforce to login if you haven’t login yet, or the consent page to grant access to Treasure Data.




After you grant access to Tresure Data you are redirected back to Treasure Data Console. Choose the Salesforce connector again, then choose the OAuth Authenticate method as in the preceding step. You will see an OAuth connection with your account name in the dropdown list. Choose the account you want to use and then proceed to create the connection.


Note

Your Salesforce account’s profile must have API Enabled. You can enable the permission in Setup > Administer > Manage Users > Profiles


How to transfer your Salesforce data to Treasure Data

Select the Salesforce connection you created in Connections -> My Connections.

Click on New Transfer.


Enter the name of the database you want to import in the Target field and click Next.

You’ll see a preview of your data. At this stage, you can click on Advanced Setting to manually modify the SOQL and the schema.


Click Next to go to the transfer to step.


If you are creating a new database, select Create new database and give your database a name. Complete the same steps for Create new table.

Select whether to append records to an existing table or replace your existing table.

If you want to set a different partition key seed rather than use the default key, you can specify one using the popup menu.


In the When tab, you can specify a one-time transfer, or you can schedule an automated recurring transfer. If you select Once now, click Start Transfer. If you select Repeat… specify your schedule options, then click Schedule Transfer.

After your transfer has run, you can see the results of your transfer in the Databases tab.


How to run a query and output the results to Salesforce

In the Queries tab, click New Query.

Select a database and table from the popup menus. Enter your query and select Output results.


Click on your saved connection.


Enter the name of the Salesforce Data Object that you want your query written to. Choose whether you want to append, truncate or update the object. Click Done to run your query and output the results to Salesforce.

How to connect to Salesforce using the command line

Install the Treasure Data Toolbelt

Open a terminal and run the following command to install the newest Treasure Data Toolbelt.

$ td --version
0.15.3

In Salesforce

Go to Setup -> Apps -> App Manager and click New Connected App. (Note the procedure may vary depending on what version of Salesforce you are running.)


Click Manage Connected Apps and get your Consumer Key and Consumer Secret, which you will need for the next step.

For secure account access, if you don’t have one already, you should get a Salesforce Security Token. To do this, go to Account -> Settings -> Reset My Security Token and click Reset Security Token. You’ll receive your Security Token by email.

Create a Seed Config File (seed.yml)

Using a text editor, create a file called seed.yml. Copy and paste the information below, replacing the placeholder text with your Salesforce credentials. Note that this configuration dumps the Account object specified in the target field because “replace” mode is specified. For more details on available out modes, see Appendix

in:
  type: sfdc
  username: "USERNAME"                     # your username
  password: "PASSWORD"                     # your password
  security_token: "SECURITY_TOKEN"         # your Security Token
  client_id: "CLIENT_ID"                   # your app's consumer key
  client_secret: "CLIENT_SECRET"           # your app's consumer secret
  login_url: https://login.salesforce.com/ # test.salesforce.com for sandbox
  incremental: false                       # 'full dump', see Scheduled execution below
  target: Account                          # Salesforce Object you want to import
out:
  mode: replace

All Salesforce Objects are supported for the target option. Some common objects include:

  • Opportunity
  • Contact
  • Lead
  • Account
  • Event
  • Task

(For more information about Salesforce Objects, refer to the full reference at the Salesforce Developer portal.)

NOTE: The objects, fields and records accessible to you depend on your Salesforce license, security configuration and API access configuration in your Salesforce organization. Check your configurations if you see authorization errors in the next step.

Run the Guess Fields command (generate load.yml)

Run the following command in your terminal:

$ td connector:guess seed.yml -o load.yml

Connector:guess automatically reads the target data and intelligently guesses the data format.

Open the file load.yml. You can see guessed file format definitions including, in some cases, file formats, encodings, column names, and types.

in:
  type: sfdc
  username: "USERNAME"
  password: "PASSWORD"
  security_token: "SECURITY_TOKEN"
  client_id: "CLIENT_ID"
  client_secret: "CLIENT_SECRET"
  login_url: https://login.salesforce.com/
  ...
filters:
  ...
out:
  mode: replace

You can preview how the system will parse the file by using preview command.

$ td connector:preview load.yml

If the system detects the wrong column name or type, modify load.yml and preview again.

NOTE: Currently, the Data Connector supports parsing of “boolean”, “long”, “double”, “string”, and “timestamp” types.

Step 5: Execute the load job

Finally, in your terminal, submit the load job as shown in the following example. Processing might take a couple of hours depending on the data size. Users need to specify the database and table where their data is stored.

$ td connector:issue load.yml --database td_sample_db --table td_sample_table \
                              --time-column createddate

NOTE: The above command assumes you have already created database(td_sample_db) and table(td_sample_table). If the database or the table do not exist in TD the command will not succeed. So create the database and table manually or use the --auto-create-table option with td connector:issue command, as shown below, to auto-create the database and table.

$ td connector:issue load.yml --database td_sample_db --table td_sample_table \
                              --time-column createddate --auto-create-table

NOTE: You can assign the Time Format column to the “Partitioning Key” by using the “—time-column” option.

NOTE: If you want to ingest records that have been deleted in SFDC, specify include_deleted_records: true. By default, ingested records don’t include records that you have previously deleted in Salesforce.

Here’s an example of a load file using the ‘include_deleted_records’ option

in:
  type: sfdc
  username: "USERNAME"
  password: "PASSWORD"
  security_token: "SECURITY_TOKEN"
  client_id: "CLIENT_ID"
  client_secret: "CLIENT_SECRET"
  login_url: https://login.salesforce.com/
  include_deleted_records: true # Ingest records that have been deleted
  ...
filters:
  ...
out:
  mode: replace

Scheduled execution using the command line

You can schedule periodic Data Connector execution for periodic SFDC import. We take great care in distributing and operating our scheduler in order to achieve high availability. By using this feature, you no longer need a cron daemon on your local data center.

For the scheduled import, Data Connector for SFDC imports all Objects that match the specified target.

Scheduled execution supports additional configuration parameters that control the behavior of the Data Connector during its periodic attempts to fetch data from SFDC:

  • incremental This configuration is used to control the load mode, which governs how the Data Connector fetches data from SFDC based on one of the native timestamp fields associated with each Object.

    • incremental: true (default)
      In this mode, the data connector fetches only records of the specified Salesforce Object type that have been updated since the previous run of the connector. This mode is useful when the user wants to fetch just the Object targets that have changed since the previous scheduled run. This mode is typically combined with writing data into the destination table using ‘append’ mode.
      • incremental_columns (required)
        This option is required for incremental mode, to load necessary data only from SFDC. Make sure these columns are selected if you use a custom SOQL.
    • incremental: false
      In this mode, the data connector fetches all the records of the specified Salesforce Object type, regardless of when they were last updated. This mode is best combined with writing data into a destination table using ‘replace’ mode.
  • soql This configuration allows to use a custom SOQL (Salesforce Object Query Language) to query and filter data. With SOQL, you can retrieve the data from a single object or multiple objects that are related to each other. You can pick only your interested columns and filter or sort the output with your own conditional statement.

  • columns This configuration is used to define a custom schema for data to be imported into Treasure Data. You can define only columns that you are interested in here but make sure they exist in the object that you are fetching. Otherwise, these columns aren’t available in the result.

  • last_record This configuration is used to control the last record from the previous load job. It requires the object include a key for the column name and a value for the column’s value. The key needs to match the SFDC (SOQL) column name.

Here’s an example of a seed file using incremental mode combined with ‘append’ mode for output.

in:
  type: sfdc
  username: "USERNAME"
  password: "PASSWORD"
  security_token: "SECURITY_TOKEN"
  client_id: "CLIENT_ID"
  client_secret: "CLIENT_SECRET"
  login_url: https://login.salesforce.com/
  target: Account
  incremental: true
  incremental_columns:
    - LastModifiedDate
out:
  mode: append

The preceding configuration has a column name LastModifiedDate which is used to load records incrementally. The Data Connector will use a SOQL (SFDC Query Language) with that configuration as shown as follows.

-- first execution:
SELECT Id, Name, ... FROM Account ORDER BY LastModifiedDate

-- second execution:
SELECT Id, Name, ... FROM Account WHERE (LastModifiedDate > 2015-10-21T10:33:07+0000) ORDER BY LastModifiedDate
-- '2015-10-21T10:33:07+0000' is the latest modified record time from first execution

-- third execution:
SELECT Id, Name, ... FROM Account WHERE (LastModifiedDate > 2015-10-30T22:30:41+0000) ORDER BY LastModifiedDate
-- '2015-10-30T22:30:41+0000' is the latest modified record time from second execution

You can see that the WHERE clause was updated on every execution to load modified records only.

Here’s an example of a seed file using non-incremental mode (incremental: false) combined with ‘replace’ mode for output.

in:
  type: sfdc
  username: "USERNAME"
  password: "PASSWORD"
  security_token: "SECURITY_TOKEN"
  client_id: "CLIENT_ID"
  client_secret: "CLIENT_SECRET"
  login_url: https://login.salesforce.com/
  incremental: false
  target: Account
out:
  mode: replace

With the preceding configuration, the Data Connector always loads all records and replaces all existing records with loaded records.

The following is an example of a seed file using incremental mode and last_record combined with ‘append’ mode for output.

in:
  type: sfdc
  username: "USERNAME"
  password: "PASSWORD"
  security_token: "SECURITY_TOKEN"
  client_id: "CLIENT_ID"
  client_secret: "CLIENT_SECRET"
  login_url: https://login.salesforce.com/
  target: Account
  incremental: true
  incremental_columns:
    - LastModifiedDate
  last_record:
    - {"key": "LastModifiedDate", "value": "2017-01-01 00:00:00UTC"}
out:
  mode: append

With preceding configuration, Data Connector loads all records since 2017-01-01 00:00:00.

Here’s an example of a seed file using incremental mode combined with custom SOQL and columns schema for output.

in:
  type: sfdc
  username: "USERNAME"
  password: "PASSWORD"
  security_token: "SECURITY_TOKEN"
  client_id: "CLIENT_ID"
  client_secret: "CLIENT_SECRET"
  login_url: https://login.salesforce.com/
  target: Account
  incremental: true
  incremental_columns:
    - LastModifiedDate
  soql: "select Id, Name, Description, LastModifiedDate from Account where ownership = 'Public'"
  columns:
    - name: Id
      type: string
    - name: Name
      type: string
    - name: Desc
      type: string
    - name: LastModifiedDate
      type: string
out:
  mode: append

As you can see, the SOQL can select fewer columns with restricted filter on ‘ownership’ field. The schema also defines an unknown field name or not exist in the SOQL (i.e. Desc) and this column won’t be available in the importing result.

Create the schedule

A new schedule can be created by using the td connector:create command. The name of the schedule, cron-style schedule, database and table where their data will be stored, and the Data Connector configuration file are required.

$ td connector:create \
    daily_sfdc_import \
    "10 0 * * *"      \
    td_sample_db      \
    td_sample_table   \
    load.yml

NOTE: The cron parameter also accepts three special options: @hourly, @daily and @monthly.

**

Untitled-3
By default, the schedule is set in the UTC time zone. You can set the schedule in a time zone using the `-t` or `--timezone` option. The `--timezone` option supports only extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.

List the Schedules

You can see the list of currently scheduled entries by entering the command td connector:list.

$ td connector:list
+-------------------+--------------+----------+-------+--------------+-----------------+------------------------+
| Name              | Cron         | Timezone | Delay | Database     | Table           | Config                 |
+-------------------+--------------+----------+-------+--------------+-----------------+------------------------+
| daily_sfdc_import | 10 0 * * *   | UTC      | 0     | td_sample_db | td_sample_table | {"type"=>"sfdc", ... } |
+-------------------+--------------+----------+-------+--------------+-----------------+------------------------+

Show the Setting and History of Schedules

td connector:show shows the execution setting of a schedule entry.

% td connector:show daily_sfdc_import
Name     : daily_sfdc_import
Cron     : 10 0 * * *
Timezone : UTC
Delay    : 0
Database : td_sample_db
Table    : td_sample_table

td connector:history shows the execution history of a schedule entry. To investigate the results of each individual execution, use td job <jobid>.

% td connector:history daily_sfdc_import
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| JobID  | Status  | Records | Database     | Table           | Priority | Started                   | Duration |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| 578066 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-18 00:10:05 +0000 | 160      |
| 577968 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-17 00:10:07 +0000 | 161      |
| 577914 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-16 00:10:03 +0000 | 152      |
| 577872 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-15 00:10:04 +0000 | 163      |
| 577810 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-14 00:10:04 +0000 | 164      |
| 577766 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-13 00:10:04 +0000 | 155      |
| 577710 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-12 00:10:05 +0000 | 156      |
| 577610 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-11 00:10:04 +0000 | 157      |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
8 rows in set

Delete the Schedule

td connector:delete removes the schedule.

$ td connector:delete daily_sfdc_import

Appendix

A) Modes for out plugin

You can specify the file import mode in the out section of seed.yml.

append (default)

This is the default mode. Records are appended to the target table.

in:
  ...
out:
  mode: append

replace (In td 0.11.10 and later)

This mode replaces data in the target table. Any manual schema changes made to the target table remain intact with this mode.

in:
  ...
out:
  mode: replace

Last modified: May 22 2018 15:57:19 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.