Writing Job Results into Salesforce.com (SFDC)
This article explains how to write job results to your Salesforce.com organization.
Table of Contents
- Basic knowledge of Treasure Data, including the toolbelt
- Salesforce.com organization and username, password, and security token for API integration
- User has “API Enabled” permission
- Target Salesforce.com Object should exist with read/write permissions for the User
A front-end application streams data to be collected in Treasure Data via Log/data collector daemon or Mobile SDKs. You can also bulk import your data using Bulk Import from the CLI. A scheduled query is setup in Treasure Data to run periodically on the data and write the result of each query execution into your Salesforce.com Object.
The above is a fairly common architecture; below are a few examples.
Example 1: Ranking: What are the “Top N of X?”
Every social/mobile application calculates the “top N of X” (ex: top 5 movies watched today). Treasure Data already handles the raw data warehousing; the “write-to-Salesforce.com” feature enables Treasure Data to find the “top N” data as well.
Example 2: Dashboard Application
If you’re a data scientist, you need to keep track of a range of metrics every hour/day/month and make them accessible via visualizations. Using this “write-to-Salesforce.com” feature, you can streamline the process and focus on building visualizations of your query results via Reports and Dashboards on the Salesforce.com organization.
Result Output URL Format
The result output target is represented by a URL with the following format:
- sfdc is an identifier for result output to Salesforce.com;
- username and password are the credential to your Salesforce.com organization;
- security_token is the additional credential for API access;
- hostname is the host name of the Salesforce.com organization. Usually this is ‘login.salesforce.com’ for production environments and ‘test.salesforce.com’ for sandbox environments. In case where you configure custom domain for your organization specify the hostname you’re using for login;
- object_name is the target Salesforce.com Object API name (e.g. ResultOutput__c). Please note that the Object and columns for data integration must be defined beforehand;
For example with:
- username: email@example.com
- password: PASSWORD
- security_token: 7SMvicR9ojdPz0XLtlWi3Rtw
The URL will look like:
|Please make sure to escape the '@' in the username with '%40'.|
Result output to Salesforce.com supports various options that can be specified as optional URL parameters. The options are compatible with each other and can be combined.
Where applicable, the default behavior is indicated.
Update mode option
Controls the various ways of modifying the database data.
The append mode is the default which is used when no mode option is provided in the URL. In this mode the query results are appended to the object.
Because mode=append is the default behavior, these two URLs are equivalent:
With the truncate mode the system first truncates the existing records in the Salesforce.com Object and moves them into the Trashbin, then inserts the query results.
|You can specify the hard_delete=true option for mode=truncate to delete records instead of moving it to the Trashbin. To use this option, the user must have the 'Bulk API Hard Delete' permission.|
With the update mode a row is inserted unless it would cause a duplicate value in the external key columns specified in the “unique” parameter: in such case an update is performed instead. Please note that with this mode the “unique” parameter is required and must be defined as an external key when using the update mode.
The default behavior for the ‘update’ mode is actually ‘upsert’. If you do not want to “upsert” but only “update”, you can add the upsert=false option. Then it updates existing records based on “unique” parameter match and not insert new records.
Upload concurrency_mode option
The concurrency_mode option controls how the data is uploaded to the Salesforce.com organization. The default mode is parallel; it is the recommended method for most situations.
With the parallel method, data is uploaded in parallel. This is the most reliable and effective method and it is recommended for most situations.
Because concurrency_mode=parallel is the default behavior, these two URLs are equivalent:
Uploading records in parallel is recommended but in case you see “UNABLE_TO_LOCK_ROW” in the error message you should try concurrency_mode=serial instead.
Updating A Salesforce.com Object acquires a lock for the Object and parent Object referenced by columns. If you upload objects in parallel and multiple objects have reference to the same parent object, Salesforce.com could not be able to acquire the lock for insert/update and returns an ‘UNABLE_TO_LOCK_ROW’ error. In such case specify the concurrency_mode=serial option.
Authentication session_id option
In case you have Salesforce.com Session ID you can authenticate with the session_id option instead of username, password, and security token (that is, username, password, and security token can be omitted from the URL).
This options sets the number of attemps the Treasure Data export worker will made to write the result to the configured Salesforce.com destination in case of errors. If the export fails more than the set number of retries, the query will fail afterwards.
The default number of retries is retry=2 but one can virtually set it to any number. Please note that the number of retries will affect the overall duration of a query in case of errors.
Split Records options
The Treasure Data result export will split the records in the result of a query in chunks of 10000 records by default and bulk upload one chunk at a time. The split_records option will configure the size of this chunk, if required.
To output the result of a single query to Salesforce.com organization add the -r / —result option to the td query command. After the job is finished, the results will be written into your Salesforce.com organization Object:
$ td query -w -d testdb \ --result 'sfdc://login.salesforce.com/CustomObject__c?session_id=.....' \ "SELECT code as Code__c, COUNT(1) as Count__c FROM www_access GROUP BY code"
To create a scheduled query whose output is systematically written to Salesforce.com organization add the -r / —result option when creating the schedule through the td sched:create command:
$ td sched:create hourly_count_example "0 * * * *" -d testdb \ --result 'sfdc://user%40treasure-data.com:PASSWORDsecuritytoken@login.salesforce.com/CustomObject__c' \ "SELECT COUNT(*) as Count__c FROM www_access"
Using new query page, create a query with result set that you would like to write to Salesforce.
|To avoid any issues with result export, please define column aliases in your query such that resulting column names from the query would match the Salesforce field names for default fields and API names(usually ending with __c) for custom fields.|
To export the result of a query or schedule to Salesforce.com please fill up a the Result Export information for a new query or an existing job/query:
Last modified: Mar 03 2016 01:19:53 UTC