# Brandwatch Import Integration

This data connector allows you to import Brandwatch Mention objects into Treasure Data.

# Limitations

**API Limitations.**If a case *query* or *query group* contains a large number of mentions*,* for example, more than 5000 mentions*,* the configuration takes up to 50 requests to fetch all mentions*.* Use the *page size* parameter to manage requests.

Continue to the following topics:

# Prerequisites

- Basic knowledge & access to a Treasure Data account
- Basic knowledge & access to a Brandwatch account


# Using TD Console

## Create a New Connection

Go to Integrations Hub > Catalog. Search and select Brandwatch.

![](/assets/brandwatchtile.dcb235ba78d6822d576cab774bde1ece2be3de5afbac8496cad0e08e44bab557.4ea22b15.png)

The dialog opens.

Provide your Brandwatch Username and Password information, select **Continue** and give your connection a name:

![](/assets/brandwatchauthen.e3bd0b36123fb19225fc68859210406d5a32cfaf7305a2136539b684e5fd00e8.4ea22b15.png)

## Create a New Source

After creating the connection, you are automatically taken back to  Integrations Hub > Catalog. Look for the connection you created and select **New Source**.

The dialog opens. Complete the details and select **Next**.

![](/assets/brandwatchsource.d787724b181cfe8aadf9ae8edc1d450987181d03ffd89cdeaaac85af94c764ab.4ea22b15.png)

Next, you see a Preview of your data similar to the following dialog. To make changes, select **Advanced Settings** if you want to change some options such as skipping on errors or rate limits. Otherwise, select **Next**.

Choose an existing or create a new database and table where you want to transfer data to.

In the Schedule tab, you can specify a one-time transfer, or you can schedule an automated recurring transfer. If you select Once now, select **Start Transfer.** If you select Repeat… specify your schedule options, then select **Schedule Transfer**.

After your transfer has run, you can see the results of your transfer in **Data Workbench** > **Databases.** A corresponding job appears in the Jobs section.

You are ready to start analyzing your data.

# Use Command Line

## Install ‘td’ Command v0.11.9 or Later

You can install the latest [TD Toolbelt](https://toolbelt.treasuredata.com/).


```
$ td --version
0.15.8
```

## Create Configuration File

Prepare configuration file (for eg: *load.yml*) as shown in the following example with your Brandwatch credential and transfer information.


```
in:
  type: brandwatch
  username: xxxxxxxxxx
  password: xxxxxxxxxx
  project_name: xxx
  query_name: xxx
  from_date: yyyy-MM-dd'T'hh:mm:ss.SSS'Z'
  to_date: yyyy-MM-dd'T'hh:mm:ss.SSS'Z'
out:
  mode: replace
```

This example shows a dump of Brandwatch mention by executing a query. Besides query, you have another option *query_group* to fetch mention:


```
in:
  type: brandwatch
  username: xxxxxxxxxx
  password: xxxxxxxxxx
  project_name: xxx
  query_group_name: xxx
  from_date: yyyy-MM-dd'T'hh:mm:ss.SSS'Z'
  to_date: yyyy-MM-dd'T'hh:mm:ss.SSS'Z'
out:
  mode: replace
```

## Configuration

- *username*: Brandwatch account’s username (string, required)
- *password*: Brandwatch account’s password (string, required)
- *project_name*: Brandwatch project which all of query, query group, mentions, etc. belong to (string, required)
- *query_name*: Brandwatch query name will be executed to fetch mentions (string, optional)
- *query_group_name*: Brandwatch query group name will be executed to fetch mentions (string, optional)
  - **Note:** Either *query_name* or *query_group_name* must exist, both must not exist at the same time
- *from_date*: Specify the date and time to fetch records from (date format: *yyyy-MM-dd'T'hh:mm:ss.SSS'Z'*) (string, required, inclusive)
- *to_date*: Specify the allowable duration to fetch records (date format: *yyyy-MM-dd'T'hh:mm:ss.SSS'Z'*) (string, required, exclusive)
- *retry_initial_wait_msec*: Parameter that provides the initial wait time (in milliseconds) for each retry logic to call Brandwatch API
- *max_retry_wait_msec*: Parameter that provides the maximum wait time (in milliseconds) for each retry to call Brandwatch API (int, optional)
- *retry_limit*: Parameter that provides the number of attempts to call the Brandwatch API (int, optional)
- *page_size*: Parameter that provides the number of mentions fetched per API call
  - **Note**: This parameter is really helpful to overcome the API limitation in case *query* or *query group* contains a large number of mentions e.g. *query X* contains totally 5000 mentions, by default configuration it will take up to 50 requests to fetch all of those mentions, the more requests it makes, the chances it will reach the API limitation will be high, if *page_size* is changed to *200*, it will take up to only 25 requests with the previous query


## Preview Data to Import (Optional)

You can preview data to be imported using the command *td connector:preview*.


```
$ td connector:preview load.yml
+---------------------+--------------------+-----------------------+----
| accounttype:string  | authorcity:string  | authorcitycode:string | ...
+---------------------+--------------------+-----------------------+----
| individual          | "Atlanta"          | atl9                  |
| individual          | "Atlanta"          | atl9                  |
+---------------------+--------------------+-----------------------+----
```

## Execute Load Job

Submit the load job. It may take a couple of hours depending on the data size. Users need to specify the database and table where their data is stored.

It is recommended to specify *--time-column* option, since Treasure Data’s storage is partitioned by time (see also [data partitioning](https://docs.treasuredata.com/smart/project-product-documentation/data-partitioning-in-treasure-data)). If the option is not given, the data connector selects the first *long* or *timestamp* column as the partitioning time. The type of the column specified by *--time-column* must be either of *long* and *timestamp* type.

If your data doesn’t have a time column you can add it using *add_time* filter option. More details at [add_time filter plugin](https://docs.treasuredata.com/smart/project-product-documentation/add_time-filter-function).


```
$ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column modifieddate
```

The above command assumes you have already created *database(td_sample_db)* and *table(td_sample_table)*. If the database or the table do not exist in TD this command will not succeed, so create the database and table manually or use *--auto-create-table* option with *td connector:issue* command to auto create the database and table:


```
$ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column modifieddate --auto-create-table
```

You can assign Time Format column to the "Partitioning Key" by "--time-column" option.

# Scheduled Execution

You can schedule periodic data connector execution for periodic Brandwatch import. We configure our scheduler carefully to ensure high availability. By using this feature, you no longer need a *cron* daemon on your local data center.

A new schedule can be created using the *td connector:create* command. The name of the schedule, cron-style schedule, the database and table where their data will be stored, and the Data Connector configuration file are required.


```
$ td connector:create \
    daily_Brandwatch_import \
    "9 0 * * *" \
    td_sample_db \
    td_sample_table \
    load.yml
```

The `cron` parameter also accepts these three options: `@hourly`, `@daily` and `@monthly`. By default, schedule is setup in UTC timezone. You can set the schedule in a timezone using -t or --timezone option. The `--timezone` option only supports extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.

## List the Schedules

You can see the list of scheduled entries by *td connector:list*.


```
$ td connector:list
+-------------------------+-------------+----------+-------+--------------+-----------------+------------------------------+
| Name                    | Cron        | Timezone | Delay | Database     | Table           | Config                       |
+-------------------------+-------------+----------+-------+--------------+-----------------+------------------------------+
| daily_brandwatch_import | 9 0 * * *   | UTC      | 0     | td_sample_db | td_sample_table | {"type"=>"brandwatch", ... } |
+-------------------------+-------------+----------+-------+--------------+-----------------+------------------------------+
```

## Show the Setting and History of Schedules

*td connector:show* shows the execution setting of a schedule entry.


```
% td connector:show daily_brandwatch_import
Name     : daily_brandwatch_import
Cron     : 9 0 * * *
Timezone : UTC
Delay    : 0
Database : td_sample_db
Table    : td_sample_table
```

*td connector:history* shows the execution history of a schedule entry. To investigate the results of each individual execution, use *td job jobid*.


```
% td connector:history daily_salesforce_marketing_cloud_import
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| JobID  | Status  | Records | Database     | Table           | Priority | Started                   | Duration |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| 678066 | success | 10000   | td_sample_db | td_sample_table | 0        | 2017-07-28 00:09:05 +0000 | 160      |
| 677968 | success | 10000   | td_sample_db | td_sample_table | 0        | 2017-07-27 00:09:07 +0000 | 161      |
| 677914 | success | 10000   | td_sample_db | td_sample_table | 0        | 2017-07-26 00:09:03 +0000 | 152      |
| 677872 | success | 10000   | td_sample_db | td_sample_table | 0        | 2017-07-25 00:09:04 +0000 | 163      |
| 677810 | success | 10000   | td_sample_db | td_sample_table | 0        | 2017-07-24 00:09:04 +0000 | 164      |
| 677766 | success | 10000   | td_sample_db | td_sample_table | 0        | 2017-07-23 00:09:04 +0000 | 155      |
| 677710 | success | 10000   | td_sample_db | td_sample_table | 0        | 2017-07-22 00:09:05 +0000 | 156      |
| 677610 | success | 10000   | td_sample_db | td_sample_table | 0        | 2017-07-21 00:09:04 +0000 | 157      |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
8 rows in set
```

## Delete the Schedule

*td connector:delete* removes the schedule.


```
$ td connector:delete daily_brandwatch_import
```

# Incremental Loading for Data Extensions

By enabling Incremental Loading, you can schedule a job to run iteratively. The next iteration of the job run is calculated from the **Start Date** and **End Date** values.

In the following example, let’s use an 11-day range between the start and end date:


```
Start Date: 2018-03-01T00:00:00Z
End Date: 2018-03-11T00:00:00Z
```

Each job will have the same time range as determined by the period between the start and end dates. The transfer of mentions begins at the completion of the previous job until the period extends past the current date. Further transfers are delayed until a complete period is available, at which the job executes and then pauses until the next period is available.

For example:

- The current date is *2018-04-26*, you pass the incremental loading with *from_date* = *2018-04-01T00:00:00Z* and *to_date* = *2018-04-11T00:00:00Z*
- Cron is configured to run daily at a certain hour


1st runs at 2018-04-26: *from_date: 2018-04-01T00:00:00Z to_date: 2018-04-11T00:00:00Z (exclusive, Mention is fetched up to 2018-03-10T23:59:59Z)*

2nd runs at 2018-04-27: *from_date: 2018-04-11T00:00:00Z to_date: 2018-04-22T00:00:00Z*

3rd runs at 2018-04-28: it’s not able to run because *to_date* is in the future

4th runs at 2018-04-29: it’s not able to run

5th runs at 2018-04-30: it’s not able to run

6th runs at 2018-05-01: it’s not able to run

7th runs at 2018-05-02: it’s not able to run

8th runs at 2018-05-03: *from_date: 2018-04-22T00:00:00Z to_date: 2017-05-03T00:00:00Z*