# Merging Data You can merge data to and from Treasure Data tables: * Merging from one table into an existing table * Merging from two tables into a third table consists of querying the data from one or more source tables and writing the result to a new or existing table using [Treasure Data Data Exchange Export Integration](https://docs.treasuredata.com/smart/project-integrations/treasure-data-data-exchange-export-integration). ## Prerequisites * Basic knowledge of Treasure Data, including the TD Toolbelt * Basic understanding of Job Result Output to Treasure Data ## Merging One Table into Another Existing Table In the following scenario, both the source and the destination table exist and contain data. Merging consists of querying the data from the source table and writing the result into the destination table using the Treasure Data Data Exchange connector. Let’s assume the source table is called `src_dbs.src_tbl` and has a schema like this. - TD Toolbelt command sample ```bash $ td schema:show src_dbs src_tbl ``` - Response Sample ``` src_dbs.src_tbl ( common_col:string src_col_1:string src_col_2:long ) ``` The time column is implicit. See the [Schema documentation](/products/customer-data-platform/data-workbench/databases/schema-management) page for more details. And the destination table `dst_dbs.dst_tbl` has a schema like this: ```bash $ td schema:show dst_dbs dst_tbl ``` ``` dst_dbs.dst_tbl ( common_col:string dst_col_1:string dst_col_2:long ) ``` The data belonging to columns having the same name in the source and destination table will be merged – this includes the **time** column or its equivalent. Source table columns that don’t exist in the destination table are instead just copied over. To merge the tables run this command in the CLI: ```bash $ td query -d src_dbs -w \ -r 'td://@/dst_dbs/dst_tbl' \ 'SELECT `time`, src_col_1, src_col_2, common_col FROM src_tbl' ``` We select the time column to be used to preserve the original timestamps when writing the result to the destination table. If the time column is omitted, the time at which the query is ran is used instead. By using `td://@/dst_dbs/dst_tbl` as a job result output target, the result is written in **append** mode, appending the result of this query to the data already in the destination table `dst_dbs.dst_tbl`. After the query completes, the schema of the destination table is updated to include the columns in the source table’s result (that is, those columns indicated in the select clause of the query) that did not exist before; the new schema will be: ```bash $ td schema:show dst_dbs dst_tbl ``` - Result ``` dst_dbs.dst_tbl ( common_col:string dst_col_1:string dst_col_2:long src_col_1:string src_col_2:long ) ``` Clearly data for both `time` and `common_col` was merged in the process. If you want to avoid merging `common_col` in the destination table with `common_col` in the source table, perhaps because the two columns contain data with different meanings that should not mixed, an easy way to do so is to rename the source column in the SELECT clause with the "`old_name AS new_name"` diction: ```bash $ td query ... \ 'SELECT `time`, src_col_1, src_col_2, common_col AS src_common_col FROM src_tbl' ``` ## Merging Two Tables Into A Third Table In the following scenario, you merge data from two tables. Whether writing into a new or existing table, the approach is very similar. The schema caveats discussed previously apply here as well. Let’s assume the two source tables are called `src_dbs_1.src_tbl_1` and `src_dbs_2.src_tbl_2` and have a schema: ```bash $ td schema:show src_dbs_1 src_tbl_1 src_dbs_1.src_tbl_1 ( common_col:string src_col_1_1:string src_col_1_2:long ) ``` ```bash $ td schema:show src_dbs_2 src_tbl_2 src_dbs_2.src_tbl_2 ( common_col:string src_col_2_1:string src_col_2_2:long ) ``` For simplicity let’s assume the destination database `dst_dbs` exists but the destination table `dst_dbs.dst_tbl` does not (for schema update simplification). This query merges the data from the two tables into the new destination table, which is created automatically when the result of the query is written to the destination. ```bash $ td query -d src_dbs_1 -w -r 'td://@/dst_dbs/dst_tbl' \ 'SELECT * FROM ( SELECT tbl_1_alias.time, tbl_1_alias.common_col, tbl_1_alias.src_col_1_1, tbl_1_alias.src_col_1_2, NULL as src_col_2_1, NULL as src_col_2_2 FROM src_dbs_1.src_tbl_1 tbl_1_alias UNION ALL SELECT tbl_2_alias.time, tbl_2_alias.common_col, NULL as src_col_1_1, NULL as src_col_1_2 tbl_2_alias.src_col_2_1, tbl_2_alias.src_col_2_2 FROM src_dbs_2.src_tbl_2 tbl_2_alias ) tbl_1_union_tbl_2' ``` Note the following details: * We use an UNION ALL to merge the content of the columns with identical names in the two source tables (`common_col`in this example) without elimination or de-duplication – UNION ALL also preserves the original timestamps of the records from both tables by including them in the two inner select clauses for each respective tables. Therefore, even if there were no columns to be merged, UNION ALL is still needed to merge the time column. * In **an UNION ALL both sides of the union must contain the same number, name, and type of columns.** Therefore, we need to specify the second source table’s columns missing in the first source table as `NULL as tbl_2_col_X` to act as column placeholders and vice versa. This process can very easily become complicated if the tables contain many unrelated columns; * If the two source tables reside in the same database, one can drop the database specified in the two (or more) inner FROM clauses; * Both source tables need to be aliased with a `src_dbs_2.src_tbl_2 tbl_2_alias` syntax and the `tbl_2_alias` is used to namespace columns in either table; * The last alias at the end and outside of the round parenthesis is an arbitrary name, required by syntax. If the destination table existed and contained data, the query can get a bit more complicated. Special care must be taken to make sure no unwanted merging or overwriting of columns happens in the process. The actual query can vary dramatically, depending on the specific use case.