Skip to content
Last updated

Replacing Data Tank 1.0 With 2.0

This article covers changes around integrations with replacing Data Tank with Data Tank 2.0. As part of its commitment to security and protection of sensitive data, Treasure Data recommends reviewing the entire document before beginning any steps.

This topic contains:

Features differences between Data Tank and Data Tank 2.0

In this section, feature gaps between Data Tank and Data Tank 2.0 are described.

FeatureStatusData Tank 1.0Data Tank 2.0
EndpointDIFFStatic IP addressWriter endpoint URI; associated IP address cannot be fixed
Database nameDIFFdatatankaciddb
Access controls (roles and schemas)DIFFTreasure Data provides tank_user and tank_integration_user accountsUsers and schemas are created in Management UI (accounts are called Access Keys); access control is managed within the UI (ref)
DDL statementsDIFFNo additional command requiredSET ROLE _owner_aciddb_<schema_name> must be executed before running DDL
SSL encryptionDIFFNot enforcedAll connections require SSL
Maintenance windowsDIFFNo regular window; support coordinates as neededCustomers define a weekly maintenance window; patches are applied during that time (see Understanding Data Tank 2.0 Maintenance)
IP allowlist sizeDIFFNo limitUp to 200 IP addresses
Engine behavior gapsDIFFBased on PostgreSQL 12Based on Aurora PostgreSQL 12.13; some engine differences apply
Custom port numberUNSUPPORTEDDefault 5432, customizable via SupportFixed to port 5432
Port forwardingUNSUPPORTEDSupport can add ports 80, 443, or 5439Not supported
IPv6 allowlistUNSUPPORTEDSupports IPv6IPv6 is not supported
ExtensionsUNSUPPORTEDplpgsql, pgcrypto, cstore_fdw, mysql_fdw, tds_fdw, foreign_table_exposer installed by defaultOnly plpgsql and pgcrypto available; SERVER, FOREIGN DATA WRAPPER, FOREIGN TABLE, and USER MAPPING objects cannot be migrated
Management UINEWNot availableWeb interface for managing configurations, roles, schemas, and monitoring
Log collectionNEWNot availablePostgreSQL logs viewable in Management UI
MonitoringNEWDashboards provided via DataDog linkDashboards available directly in Management UI

Refer to the AWS Aurora PostgreSQL extension documentation for the full list of extensions supported in Data Tank 2.0 (Aurora PostgreSQL 12.13).

Within the Management UI you can configure database settings, create and manage schemas and access keys, and review logs and dashboards from a single interface.

Legacy MonitoringManagement UI Monitoring

Change Integration with Data Tank to that with Data Tank 2.0

The following two functions can be used to import records in Data Tank or export the result of Hive/Trino query to Data Tank.

To use these integrations, you need to create an Authentication in advance in most cases.

Authentications for Data Tank

The following Authentications are prepared by default. tank_integration_useris used as a database user for both of the following Authentications.

  • datatank
  • datatank_cstore

You can also create user-defined Authentications by yourself.

Authentications for Data Tank 2.0

Authentications are not prepared by default. You need to create new Authentications by yourself.

Import from Data Tank

To import from Data Tank, you need to run a Bulk Load job. To run this type of job, you use either of the following. If you want to replace Data Tank with Data Tank 2.0, you must change the import source from Data Tank to Data Tank 2.0.

  • Sources(TD Console)
  • CLI(Toolbelt, td_load>: operator of Workflow)

Sources

If you have existing Sources for Data Tank, you can see those Sources connected to the Authentication in the TD Console as follows. The red arrow in the image points to the number of Sources associated with the Authentication named databank.

If an existing Source imports from Data Tank 2.0, a new Source associated with the Authentication for Data Tank 2.0 is needed because each Source's authentication cannot be replaced. When you create a new Source that imports from Data Tank 2.0, it is recommended that the existing Source not be scheduled to run.

CLI (TD Toolbelt)

If you use CLI to import to Plazma DB using an Authentication, you use td_authentication_id option in your yaml file as follows. The Authentication ID set in this option must be changed to change the import source. In addition, you also need to change other options like database option.

in:   
  type: postgresql   
  td_authentication_id: <your_authentication_id>   
  ...  

If you don’t use td_authentication_id option in the yaml file, the credentials of Data Tank would be specified instead of Authentication as shown in the following example. If so, please replace them with one of Data Tank 2.0.

in:   
  type: postgresql   
  host: xx.xxx.xxx.xxx   
  user: tank_integration_user   
  password: xxxxx   
  ...  

Export Result of Hive/Trino to Data Tank

Either of the following features can exports the result of Hive/Trino job to Data Tank. If you are using these features to export to Data Tank, you will need to change the destination to export to Data Tank 2.0.

  • Query Editor on TD Console
  • td_run>: operator of Workflow
  • td>: operator of Workflow
  • CLI(Toolbelt)
  • REST API

Query Editor on TD Console

You can configure the export settings by selecting **Export Results** as shown below. Then, you need to configure it again with a new Authentication to replace the existing One. Once you have deleted the existing export settings, please use Authentication for Data Tank 2.0 to make new export settings.

If DDL is required, you need to set _owner_<database_name>_<schema_name> to Set Role option as follows

td_run>: operator of Workflow

This workflow operator calls the existing Saved Query. Therefore, you need to do the same operation in the case of Query Editor.

td>: operator of Workflow

For this Workflow operator, you specify your Authentication name for result_connection parameter as follows. Replace this parameter with the Authentication for Data Tank 2.0 as well as any other parameters.

+export_task:   
  td>: queries/sample.sql   
  result_connection: data_tank # Authentication Name   
  result_settings:   
    database: datatank   
    table: ...   
    ...   

In addition, if DDL is required, you need to specifyset_role option as follows.

+export_task:   
  td>: queries/sample.sql   
  result_connection: <your_authentication_name>   
  result_settings:   
    database: aciddb   
    table: ...   
    schema: <schema_name>   
    set_role: _owner_aciddb_<schema_name>   
    ...   

If you don’t specify your Authentication for result_connection, use result_urlparameter instead of result_connection and result_settingsas shown.

+export_task:   
  td>: queries/sample.sql   
  result_url: postgresql://tank_integration_user:xxx@......  

In this case, please modify the value for result_url parameter.

In addition, add set_role=_owner_aciddb_<schema_name> as query parameter if DDL statement is required.

CLI (Toolbelt) for Export

For CLI, you specify the information for export in URL format.

For example, you can export the result of the query to Data Tank by using td query command as follows. As you specify the information for export to -r option directly, please specify the setting for Data Tank 2.0.

$ td query -d kazzy_test -w -r "postgresql://tank_integration_user:<password>@<ip_address>/datatank/<table_name>?mode=replace" "SELECT 1 AS col1;" 

Connect to Data Tank via "pg>:" operator of Workflow

You can run any query statement in Data Tank from Workflow by using pg>: operator as shown in the following example.

_export:   
  pg:   
    host: ....   
    port: 5432   
    database: datatank   
    user: tank_integration_user   
    schema: public   

+access_data_tank:   
  pg>: queries/sample.sql   
  ...  

You need to change the values set for the options of pg>: operator in order to run the query statements in Data Tank 2.0. If you use DDL, you need to add SET ROLE statement before the DDL statements.

Connect to Data Tanks via 3rd party tool

To connect to your Data Tank via any third-party tool, you specify the connection information in the tool. Replace them with one of Data Tank 2.0.

References