# Amazon Redshift Export Integration

Learn more about [Installing td-agent on AWS Linux](https://docs.treasuredata.com/smart/project-product-documentation/installing-td-agent-on-aws-linux).

You can write job results to your existing Amazon Redshift cluster. For sample workflows on Amazon Redshift for Export, view [Treasure Boxes](https://github.com/treasure-data/treasure-boxes/tree/master/td/redshift).

# Prerequisites

- Basic knowledge of Treasure Data, including the TD Toolbelt
- An **Amazon Redshift cluster** setup and running – either a single or multi-node cluster
- At least ‘Query only’ privileges to the Treasure Data table to be queried


# Limitations

- When exporting results to Redshift, Redshift tries to convert the column type if the destination table already exists. If a column becomes NULL during conversion, and all records are rejected if the column on Redshift is a NOT NULL field. This may be the case even when the job for exporting results to Redshift was successful, but your Redshift doesn’t get any data.
- This connector does not support timestamp/date type data. You need to convert that type of data to string column or Unix timestamp.


# Supported Regions

Result Output to Redshift can export data to several regions. The following are the supported regions:

- us-east-1 (US Standard)
- us-west-2 (Oregon)
- eu-west-1 (Ireland)
- ap-northeast-1 (Tokyo)
- us-west-1 (N. California)
- ap-southeast-1 (Singapore)
- ap-southeast-2 (Sydney)
- sa-east-1 (São Paulo)


The following regions are not supported:

- us-east-2 (Ohio)
- ap-south-1 (Mumbai)
- ap-northeast-2 (Seoul)
- ca-central-1 (Central)
- eu-central-1 (Frankfurt)
- eu-west-2 (London)


If you have other regions that you want to support, [contact Treasure Data support](https://docs.treasuredata.com/smart/project-product-documentation/contacting-treasure-data-support).

# Architecture

A front-end application streams data to be collected in Treasure Data via [Treasure Agent](https://docs.treasuredata.com/smart/project-product-documentation/about-treasure-data-s-server-side-agent) (td-agent). Treasure Data periodically runs jobs on the data, then writes the job results to your Redshift cluster.

![](/assets/image2021-3-1_12-11-53.286db825500f33438282d58ee89b9f7a3847264bba23b703f45aa708641c95cf.0edad2c5.png)

The image shows a fairly common architecture that enables data analysts, well versed in using Redshift, to focus on queries and visualizations rather than how to get the data uploaded.

## Static IP Address of Treasure Data Integration

If your security policy requires IP whitelisting, you must add Treasure Data's IP addresses to your allowlist to ensure a successful connection.

Please find the complete list of static IP addresses, organized by region, at the following [document](/apis/endpoints/ip-addresses-integrations-result-workers)

# Amazon Redshift Configuration

Amazon Redshift can be configured in single-node mode or multi-node/cluster mode. The multi-node configuration provides more query computation power by means of parallelization of the query execution on the available nodes.

# Use the TD Console to Create Your Connection

## Create a New Authenticaiton

In Treasure Data, you must create and configure the data connection prior to running your query. As part of the data connection, you provide authentication to access the integration.

1. Open **TD Console**.
2. Navigate to **Integrations Hub** > **Catalog**.
3. Click the search icon on the far-right of the Catalog screen, and enter **Amazon Redshift**.
4. Hover over the Amazon Redshift connector and select **Create Authentication**.


![](/assets/amazonredshift.72b30214eceebe7b0fea54137a0622f0286279c4112f82ec8cef094f631cb6ae.0edad2c5.png)

Type the credentials to authenticate.

![](/assets/image2021-3-1_14-40-15.e7a57af257de8db9efedfdd4cef6ca2b742481ac5db98f0cbe2c5254c0aeb12c.0edad2c5.png)

| Parameter | Value |
|  --- | --- |
| Host | The host information of the source database, such as an IP address. can be retrieved from the Redshift configuration page. Typically, the format is: name.<instance_id>.region.redshift.amazonaws.com. The name is the one provided for the cluster, the instance id is auto-generated by Amazon Redshift upon creation of the cluster, the region is the Amazon availability zone of choice. If you are using an IP address instead of hostname, you must set `region` option explicitly. |
| Port | The connection port on the source instance. The PostgreSQL default is 5432.  The port number through which the Redshift server is accessible. ":" is optional and assumed to be 5439 by default. It may vary for multi-node cluster configurations. The actual value can be retrieved from the Redshift cluster detail page. |
| User | Username to connect to the source database.  The credentials to the Amazon Redshift instance. These credentials are specified when first creating the Redshift cluster and they are different from the S3 public/private access keys. |
| Password | The password to connect to the source database. |
| Use SSL | Check this box to connect using SSL |
| JDBC Connection options | Any special JDBC connections required by the source database (optional). |
| Region | The AWS regions in which your Redshift instance is hosted.  Specify the region where your Redshift instance is located. This option is required if your hostname does not contain the region name.  redshift://user:password@host/database/table?region=us-east-1 |
| Socket connection timeout | Timeout (in seconds) for socket connection (default is 300). |
| Network timeout | Timeout (in seconds) for network socket operations. 0 means no timeout. |
| Rows per batch | Number of rows to fetch one time. |
| Options | Options that you want to give to the JDBC driver.
See [Installing Redshift JDBC Driver](https://github.com/aws/amazon-redshift-jdbc-driver?tab=readme-ov-file#installation-and-configuration-of-driver).
`For example, you can use these parameterand value fields to define various URL options such as LogLevel or LogPath.  jdbc:redshift://company.us-west1.redshift.amazonaws.com:9000/Default;LogLevel=3;LogPath =C:	emp`
Result output to Redshift supports various options that can be specified as optional URL parameters. The options are compatible with each other and can be combined. Where applicable, the default behavior is indicated. SSL Option **ssl** option determines whether to use SSL or not for connecting to Redshift. `ssl=true` Use SSL from Treasure Data to Redshift connection.   `redshift://user:password@host/database/table?ssl=true`  `ssl=false` (default) Do not use SSL from Treasure Data to Redshift.   `redshift://user:password@host/database/table?ssl=false` |


1. Select **Continue** after entering the required connection details.
2. Name the connection so you can find it later should you need to modify any of the connection details.
3. Optionally, select **Share with others**, if you would like to share this connection with other users in your organization.
4. Select **Done.**


## Define your Query

1. Complete the instructions in [Creating a Destination Integration](https://docs.treasuredata.com/smart/project-product-documentation/creating-a-destination-integration).
2. Navigate to **Data Workbench > Queries**.
3. Select a query for which you would like to export data.
4. Run the query to validate the result set.
5. Select **Export Results**.
6. Select an existing integration authentication.
![](/assets/image2020-12-18_13-44-6.09e8af43184e33e337bef7c546600eaaa5be9f010b690af1d591c7c2b4bb2df3.c27c97ee.png)
7. Define any additional Export Results details. In your export integration content review the integration parameters.
For example, your Export Results screen might be different, or you might not have additional details to fill out:
![](/assets/image2023-5-17_14-42-52.d2483b20e117c4abf1aaa32c5071595644e4c6b9d987ef4ea4ce4a2038171b85.c27c97ee.png)
8. Select **Done**.
9. Run your query.
10. Validate that your data moved to the destination you specified.


## Integration Parameters for Redshift

![](/assets/image2021-3-2_16-49-44.a46c316ca063ddebba7e30d47d889873237d2845170b127f8300ef13eceba64e.0edad2c5.png)

| Parameter | Values | Description |
|  --- | --- | --- |
| Database |  | Name of the database to export the data to.  The name of the database specified at the creation of the Redshift cluster. It can be retrieved from the Redshift cluster detail page. |
| Table |  | Name of the table within the database to export the data to.  The name of a table within the database. It may not exist when the query output is executed. If the table does not exist, a table with the specified name is created. |
| Mode | appendreplacetruncateupdate | Controls the various ways of modifying the database data.- **Append Mode**: This is the default mode. The query results are appended to a collection. If the collection does not exist yet, a new collection is created. This method is not [**atomic**](https://docs.treasuredata.com/smart/project-product-documentation/glossary/a/atomicity).
- **Replace Mode**:  The **replace** mode consists of replacing the entire content of an existing table with the resulting output of the query. If the table does not exist, a new table is created.  For the fastest results, we recommend that you have only one **write** connection open to your RDS database table at a time.  The replace mode achieves [**atomicity**](https://docs.treasuredata.com/smart/project-product-documentation/glossary/a/atomicity) (so that a consumer of the table always has consistent data) by performing the following steps in a **single transaction**:   1. Create a temporary table; 2. Write to the temporary table; 3. Replace the existing table with the temporary table using ALTER TABLE RENAME.   For the fastest results, we recommend you have only one **write** connection open to your RDS database table at a time.
- **Truncate Mode** :Truncate mode retains the indexes of the table. With the **truncate** mode, the system first truncates the existing table to remove all records, then inserts the query results into the existing table without modifying the table's schema.  If the result output table contains columns whose name or type (or both) mismatches from the schema of the destination table, then the columns are dropped, and they will not get written into the destination table. This is not a problem when the destination table does not exist: a new table will be created with schema matching that of the result output table.  The truncate mode uses a temporary table, hence achieving **atomicity** of write.  Unlike replace, the truncate mode retains the indexes of the table.
- **Update Mode**: In the **update** mode, a row is inserted unless it would cause a duplicate value in the columns specified in the "unique" parameter. In such a case, an update is performed instead. A unique parameter is required when using the update mode. If the table does not exist, it will be created.  This mode is **atomic** because it uses a temporary table to store the incoming data before attempting to insert the data into the actual destination table.

 |
| In-place append | select  clear | Select to modify the existing data rather than copying it. This can be a big performance gain.  When `mode=append`, the `inplace` minor option can control the atomicity of the action. When it is `true,` no temporary table is used and the operation is not guaranteed to be atomic. When the option is `false`, a temporary table is used.  The default value is true. |
| Schema |  | Controls the schema the target table is located. If not specified, a default schema is used. The default schema depends on the user's "search_path" setting but it is usually "public". |
| Method | copy  bulk copy | `copy` (default) This option allows you to use the COPY Redshift SQL command to export data in different flows.  **`bulk_copy`**  This option uses the `copy` command but with a different flow.      You can use SSL from Treasure Data to instantiate the Redshift connection.  For example: `redshift://user:password@host/database/table?method=bulk_copy`. |
| Serial Copy | select  clear | The **serial_copy** option determines whether to upload all the files in order one by one, to avoid some deadlocks that could happen when uploading files parallel. `serial_copy=true`  `serial_copy=false`(default) |


## Example Query


```SQL
select * from an_redshift_data limit 800000000
```

## Activate a Segment in Audience Studio

You can also send segment data to the target platform by creating an activation in the Audience Studio.

1. Navigate to **Audience Studio**.
2. Select a parent segment.
3. Open the target segment, right-mouse click, and then select **Create Activation.**
4. In the **Details** panel, enter an Activation name and configure the activation according to the previous section on Configuration Parameters.
5. Customize the activation output in the **Output Mapping** panel.


![](/assets/ouput.b2c7f1d909c4f98ed10f5300df858a4b19f71a3b0834df952f5fb24018a5ea78.8ebdf569.png)

- Attribute Columns
  - Select **Export All Columns** to export all columns without making any changes.
  - Select **+ Add Columns** to add specific columns for the export. The Output Column Name pre-populates with the same Source column name. You can update the Output Column Name. Continue to select **+ Add Columns**to add new columns for your activation output.
- String Builder
  - **+ Add string** to create strings for export. Select from the following values:
    - String: Choose any value; use text to create a custom value.
    - Timestamp: The date and time of the export.
    - Segment Id: The segment ID number.
    - Segment Name: The segment name.
    - Audience Id: The parent segment number.


1. Set a **Schedule**.


![](/assets/snippet-output-connector-on-audience-studio-2024-08-28.a99525173709da1eb537f839019fa7876ffae95045154c8f2941b030022f792c.8ebdf569.png)

- Select the values to define your schedule and optionally include email notifications.


1. Select **Create**.


If you need to create an activation for a batch journey, review [Creating a Batch Journey Activation](/products/customer-data-platform/journey-orchestration/batch/creating-a-batch-journey-activation).

# CLI Examples

If you use the TD Toolbelt to run the query that returns the data that you want to have exported to Amazon Redshift:

## One Time Export using CLI


```bash
td query -w -d testdb \
--result 'redshift://username:password@host.redshift.amazonaws.com/database/table?mode=replace' \
"SELECT code, COUNT(1) FROM www_access GROUP BY code"
```

## Scheduled Export using CLI


```bash
td sched:create hourly_count_example "0 * * * *" -d testdb \
--result 'redshift://username:password@host.redshift.amazonaws.com/database/table?mode=replace' \
"SELECT COUNT(*) FROM www_access"
```

## Troubleshooting

### If your Redshift doesn’t get any data, even when the job for Result Output to Redshift was successful

Result Output to Redshift tries to convert column type if the destination table already exists. If the conversion fails, the column becomes NULL and all records are rejected if the column on Redshift is NOT NULL field.