Skip to content
Last updated

Onetrust Import Integration

As more and more data protection laws arrive all over the world, ensuring compliance is a priority. OneTrust is a privacy management and marketing compliance company. Its services are used by organizations to comply with global regulations like GDPR.

This OneTrust input integration is to provide an input integration that can collect customer's consent data and load it into TD. Access to OneTrust data on the Treasure Data platform enables your marketing team to optimally enrich your data.

Prerequisites

  • Basic knowledge of Treasure Data
  • Basic knowledge of OneTrust
  • GUID of a single Collection point to limit data, if not provided, get from all Collection Points.

Obtain your API Key

  1. Navigate to https://app.onetrust.com/integrations/api-keys .

  2. Sign on to the OneTrust application if necessary.

  3. Select Add New

  4. Type a name that you want for the Connection Name.

  5. Select Install.

Retrieving Collection Point GUID

  1. Navigate to https://app.onetrust.com/consent/collection-points. The Collection Point screen displays.

  2. Select the corresponding Collection Point, the GUID is in the URL. For example:

Obtain your OAuth Access Token

Create a OneTrust token to store the Client ID and Secret. This is short-lived token.

  1. Navigate to https://app.onetrust.com/settings/client-credentials/list.

  2. Select Add.

  3. Type a name and describe your token.

  4. Select appropriate Access Token Lifetime. The default lifetime is one hour.

  5. Navigate to https://app.onetrust.com/settings/client-credentials/list.

  6. Select your credential.

  7. Select Generate Token.

Use TD Console to Create Your Connection

Create a New Connection

When you configure a data connection, you provide authentication to access the integration. In Treasure Data, you configure the authentication and specify the source information.

  1. Open TD Console.

  2. Navigate to Integrations Hub >  Catalog.

  3. Search for and select OneTrust.

  4. Type the name of the access token that you created in the OneTrust application.

  5. Select Continue.

  6. Type a name for your connection.

  7. Select Done.

Transfer Your OneTrust Account Data to Treasure Data

After creating the authenticated connection, you are automatically taken to Authentications.

  1. Search for the connection you created.

  2. Select New Source.

  3. Type a name for the data transfer**.**

  4. Select Next. The Source Table dialog opens.

  5. Select Next. The Data Settings dialog opens.

  6. Edit the following parameters:

Parameters Description
Data Type
  • Data Subject Profile. Fetch Data Subject Profile data.
  • Collection Point.Fetch Collection Point data.
  • Data Subject Profile (API V4). Fetch Data Subject Profile by OneTrust API V4.
  • Link Token (API V4). Fetch Link Token by OneTrust API V4.
  • Purpose (API V4). Fetch Purpose by OneTrust API V4.
Collection Point GUID(Optional)GUID of a single Collection point to limit data, if not provided, get from all Collection Points.
Incremental LoadingEnables incremental report loading with new Start Time automatic calculation. For example, if you start incremental loading with Start Time = 2014-10-02T15:01:23Z to 2014-10-03T15:01:23Z, the next jobs run new Start Timewill be 2014-10-03T15:01:23
Start Time (Required when select Incremental Loading. Required for every API V4 Data Type).For UI configuration, you can pick the date and time from supported browser, or input the date that suit the browser expectation of date time. For example, on Chrome, you will have a calendar to select Year, Month, Day, Hour, and Minute; on Safari, you need to input the text such as 2020-10-25T00:00. For cli configuration, we need a timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds, for example: "2014-10-02T15:01:23Z".
Incremental By Modifications of
  • Data Subject. Incremental by the last update of the data subject
  • Consent Information.Incremental by the last consent date of the consent information.
End Time (Required when select Incremental Loading. Required for every API V4 Data Type).For UI configuration, you can pick the date and time from supported browser, or input the date that suit the browser expectation of date time. For example, on Chrome, you will have a calendar to select Year, Month, Day, Hour, and Minute; on Safari, you need to input the text such as 2020-10-25T00:00. For cli configuration, we need a timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds, for example: "2014-10-02T15:01:23Z"
Properties (Optional)Comma-separated setting to add properties query param when fetch Data Subject Profile Data Type. It will be showed when select the Data Type Data Subject Profile. Things to Know:
  • It is critical that all queries include properties=ignoreCount. Not including ignoreCount will significantly decrease performance. If you need the count, we advise to only include it in the initial query, and not for subsequent page calls.
  • The values passed in the properties query parameter can change the response of this API. A fast response on large data sets can be obtained passing any of the following values: linkTokens, ignoreCount, ignoreTopics, ignoreCustomPreferences.
  • It is strongly recommended to pass the requestContinuation parameter returned in the response of this API in the next API request to paginate. Including it is crucial for better performance when dealing with multiple pages of data subject records. For more information, see Understanding & Implementing Pagination.
  • Recommended paramters: ignoreCount,requestContinuation Ref. Get List of Data Subjects get
Request Continues Paging (Optional)If checked, the ingestion process will run to fetch data that is paginated by request continues tokens. This option is useful if the data volume is large.

Data Settings

  1. Select Next. The Data Settings page opens.
  2. Skip this page of the dialog.

Data Preview

You can see a preview of your data before running the import by selecting Generate Preview. Data preview is optional and you can safely skip to the next page of the dialog if you choose to.

  1. Select Next. The Data Preview page opens.
  2. If you want to preview your data, select Generate Preview.
  3. Verify the data.

Data Placement

For data placement, select the target database and table where you want your data placed and indicate how often the import should run.

  1. Select Next. Under Storage, you will create a new or select an existing database and create a new or select an existing table for where you want to place the imported data.

  2. Select a Database > Select an existing or Create New Database.

  3. Optionally, type a database name.

  4. Select a TableSelect an existing or Create New Table.

  5. Optionally, type a table name.

  6. Choose the method for importing the data.

    • Append (default)-Data import results are appended to the table. If the table does not exist, it will be created.
    • Always Replace-Replaces the entire content of an existing table with the result output of the query. If the table does not exist, a new table is created.
    • Replace on New Data-Only replace the entire content of an existing table with the result output when there is new data.
  7. Select the Timestamp-based Partition Key column. If you want to set a different partition key seed than the default key, you can specify the long or timestamp column as the partitioning time. As a default time column, it uses upload_time with the add_time filter.

  8. Select the Timezone for your data storage.

  9. Under Schedule, you can choose when and how often you want to run this query.

Run once

  1. Select Off.
  2. Select Scheduling Timezone.
  3. Select Create & Run Now.

Repeat Regularly

  1. Select On.
  2. Select the Schedule. The UI provides these four options: @hourly@daily and @monthly or custom cron.
  3. You can also select Delay Transfer and add a delay of execution time.
  4. Select Scheduling Timezone.
  5. Select Create & Run Now.

After your transfer has run, you can see the results of your transfer in Data Workbench > Databases.

Import from OneTrust via CLI (Toolbelt)

Before setting up the integration, install the latest version of the TD Toolbelt.

Prepare a Load File

in:
  type: onetrust
  base_url: ***************
  auth_method: oauth
  access_token: ***************
  data_type: data_subject_profile
  incremental: false
  start_time: 2025-01-30T00:49:04Z
  end_time: 2025-02-28T17:00:00.000Z
  thread_count: 5
out:
  mode: replace

This example gets a list of Data Subject Profile objects. The start_time specifies the date to start getting data from. In this case, the import will start pulling data from January 30th, 2025 at 00:49.

Parameters Reference

Name Description Value Default Value Require
typeThe source of the import."onetrust"Yes
base_urlBase url of onetrust server.String."app.onetrust.com"Yes
auth_methodAuthentication method "oauth" or "api_key"String."oauth"Yes
access_tokenOauth access token require when oauth auth mode.StringYes when auth_method is "oauth"
api_keyApi key require when api_key auth mode.StringYes when auth_medthod is "api_key"
data_typeThe Data Type that want to fetch from OneTrust.
  • Data Subject Profile. Fetch Data Subject Profile data.
  • Collection Point.Fetch Collection Point data.
  • Data Subject Profile (API V4). Fetch Data Subject Profule data from API V4.
  • Link Token (API V4). Fetch Link Token data from API V4.
  • Purpose (API V4). Ferch Purpose data from API V4.
String. Supported data_type:
  • data_subject_profile
  • collection_point
  • data_subject_profile_api_v4
  • link_token_api_v4
  • purpose_api_v4
Yes
collection_point_guid(Optional)GUID of a single Collection point to limit data, if not provided, get from all Collection Points. It will show if the Data Type are Data Subject Profile and Purpose (API V4).String.No
incrementalEnables incremental report loading with new Start Time automatic calculation. For example, if you start incremental loading with Start Time is 2014-10-02T15:01:23Z and End Time is 2014-10-03T15:01:23Z, the next jobs run new Start Time will be 2014-10-03T15:01:23BooleanFalseYes

start_time

For UI configuration, you can pick the date and time from supported browser, or input the date that suit the browser expectation of date time. For example, on Chrome, you will have a calendar to select Year, Month, Day, Hour, and Minute; on Safari, you need to input the text such as 2020-10-25T00:00.

For cli configuration, we need a timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds, for example: "2014-10-02T15:01:23Z".

TimeStamp.

Yes when select Incremental Loading.

No for every the API V1 Data Type (data_subject_apiand collection_point).

Yes for every API V4 Data Type. (data_subject_profile_v4, link_token_api_v4 and purpose_api_v4).

incremental_typeSelect the time type that you want to fetch data from OneTrurst.String.
  • data_subject_profile. Incremental by the last update of the data subject.
  • collection_point.Incremental by the last consent date of the consent information.
"data_subject_profile"Yes when select Incremental Loading in the data_subject_profileData Type.

data_subject_properties(Optional)

Comma-separated setting to add properties query param when fetch Data Subject Profile Data Type.

It will be showed when select the Data Type Data Subject Profile.

String

No

end_time

For UI configuration, you can pick the date and time from supported browser, or input the date that suit the browser expectation of date time. For example, on Chrome, you will have a calendar to select Year, Month, Day, Hour, and Minute; on Safari, you need to input the text such as 2020-10-25T00:00.

For cli configuration, we need a timestamp in RFC3339 UTC "Zulu" format, accurate to nanoseconds, for example: "2014-10-02T15:01:23Z".

TimeStamp

No for every the API V1 Data Type (data_subject_apiand collection_point).

Yes for every API V4 Data Type. (data_subject_profile_v4, link_token_api_v4 and purpose_api_v4).

request_continues_pagingIf true, the ingestion process will run to fetch data that is paginated by request continues tokens. This option is useful if the data volume is large.BooleanfalseNo
ingest_duration_minutesSet the time range for ingestion duration when target is data_subject_profile and request_continues_paging is trueInteger1440No

To preview the data, use the td connector:preview command.

td connector:preview load.yml

Execute the Load Job

It might take a couple of hours, depending on the size of the data. Be sure to specify the Treasure Data database and table where the data should be stored.  Treasure Data also recommends specifying the --time-column option because Treasure Data’s storage is partitioned by time (see data partitioning). If this option is not provided, the data connector chooses the first long or timestamp column as the partitioning time. The type of the column specified by --time-column must be either of long and timestamp type.

If your data doesn’t have a time column, you can add a time column by using the add_time filter option. For more details see the documentation for the add_time Filter Function.

$ td connector:issue load.yml --database td_sample_db --table td_sample_table \--time-column created_at

The connector:issue command assumes that you have already created a database(td_sample_db) and a table(td_sample_table). If the database or the table does not exist in TD, this command fails. Create the database and table manually or use --auto-create-table option with td connector:issue command to auto-create the database and table.

$ td connector:issue load.yml --database td_sample_db --table td_sample_table--time-column created_at --auto-create-table

The data connector does not sort records on the server side. To use time-based partitioning effectively, sort records in files beforehand.

If you have a field called time, you don’t have to specify the --time-column option.

$ td connector:issue load.yml --database td_sample_db --table td_sample_table

Import Modes

Specify the file import mode in the out: section of the load.yml file. The out: section controls how data is imported into a Treasure Data table. For example, you may choose to append data or replace data in an existing table.

ModeDescriptionExamples
AppendRecords are appended to the target table.in:   ... out:   mode: append
Always ReplaceReplaces data in the target table. Any manual schema changes made to the target table remain intact.in:   ... out:   mode: replace
Replace on new dataReplaces data in the target table only when there is new data to import.in:   ... out:   mode: replace_on_new_data

Scheduling Executions

You can schedule periodic data connector execution for incremental file import. The Treasure Data scheduler is optimized to ensure high availability.

For the scheduled import, you can import all files that match the specified prefix and one of these conditions:

  • If use_modified_time is disabled, the last path is saved for the next execution. On the second and subsequent runs, the integration only imports files that come after the last path in alphabetical order.
  • Otherwise, the time that the job is executed is saved for the next execution. On the second and subsequent runs, the connector only imports files that were modified after that execution time in alphabetical order.

Create a Schedule Using the TD Toolbelt

A new schedule can be created using the td connector:create command.

$ td connector:create daily_import "10 0 * * *" \td_sample_db td_sample_table load.yml

Treasure Data also recommends specifying the --time-column option because Treasure Data’s storage is partitioned by time (see data partitioning).

$ td connector:create daily_import "10 0 * * *" \td_sample_db td_sample_table load.yml \--time-column created_at

The cron parameter also accepts three special options: @hourly, @daily, and @monthly.

By default, the schedule is set up in the UTC timezone. You can set the schedule in a timezone using -t or --timezone option.  The --timezone option supports only extended timezone formats like Asia/Tokyo, America/Los_Angeles, etc. Timezone abbreviations like PST, CST are not supported and might lead to unexpected schedules.