The expand_json filter plugin for Treasure Data's Data Connector expands columns having JSON into multiple columns. This filter takes five options as explained below:
| Option | Description |
|---|---|
| json_column_name | column name having JSON to be expanded (string, required) |
| root | property to start fetching each entries, specify in JSONPath style (string, default: "$.") |
expanded_columns | expanded into multiple columns (array of hash, required)
|
| default_timezone | Time zone of timestamp columns if values don't include time zone description (UTC by default) |
| keep_expanding_json_column | Not remove the expanding JSON column from input schema if it's true (false by default) |
| stop_on_invalid_record | Stop bulk load transaction if an invalid record is included (false by default) |
| name | description |
|---|---|
| boolean | true or false |
| long | 64-bit signed integers |
| timestamp | Date and time with nano-seconds precision |
| double | 64-bit floating point numbers |
| string | Strings |
Add the filter section with type expand_json to your load.yml, for example:
in:
...
filters:
- type: expand_json
json_column_name: json_payload
root: "$."
expanded_columns:
- {name: "phone_numbers", type: string}
- {name: "app_id", type: long}
- {name: "point", type: double}
- {name: "created_at", type: timestamp, format: "%Y-%m-%d", timezone: "UTC"}
- {name: "profile.anniversary.et", type: string}
- {name: "profile.anniversary.voluptatem", type: string}
- {name: "profile.like_words[1]", type: string}
- {name: "profile.like_words[2]", type: string}
- {name: "profile.like_words[0]", type: string}
out:
...When you have a table has a json column, add this section to your .yml.
Example of Source data
+-----------+---------------------------------------------------------------+
| code:long | jsons:string |
+-----------+---------------------------------------------------------------+
| 1 | {"id":1,"name":"A green door","price":11.50,"tags":"green"} |
| 2 | {"id":2,"name":"A blue door","price":12.50,"tags":"blue"} |
| 3 | {"id":3,"name":"A red door","price":13.50,"tags":"red"} |
| 4 | {"id":4,"name":"A pink door","price":14.50,"tags":"pink"} |
| 5 | {"id":5,"name":"A white door","price":15.50,"tags":"white"} |
| 6 | {"id":6,"name":"A black door","price":16.50,"tags":"black"} |
| 7 | {"id":7,"name":"A yellow door","price":17.50,"tags":"yellow"} |
| 8 | {"id":8,"name":"A purple door","price":18.50,"tags":"purple"} |
+-----------+---------------------------------------------------------------+Added section. In this case, Data Connector will expand jsons column.
filters:
- type: expand_json
json_column_name: jsons
root: "$."
expanded_columns:
- {name: "id", type: long}
- {name: "name", type: string}
- {name: "price", type: double}
- {name: "tags", type: string}The result is below:
+-----------+---------+---------------+--------------+-------------+
| code:long | id:long | name:string | price:double | tags:string |
+-----------+---------+---------------+--------------+-------------+
| 1 | 1 | A green door | 11.5 | green |
| 2 | 2 | A blue door | 12.5 | blue |
| 3 | 3 | A red door | 13.5 | red |
| 4 | 4 | A pink door | 14.5 | pink |
| 5 | 5 | A white door | 15.5 | white |
| 6 | 6 | A black door | 16.5 | black |
| 7 | 7 | A yellow door | 17.5 | yellow |
| 8 | 8 | A purple door | 18.5 | purple |
+-----------+---------+---------------+--------------+-------------+When you want to keep original JSON column, add keep_expanding_json_column option to your .yml.
filters:
- type: expand_json
json_column_name: jsons
root: "$."
keep_expanding_json_column: true
expanded_columns:
- {name: "id", type: long}
- {name: "name", type: string}
- {name: "price", type: double}
- {name: "tags", type: string}The result is below:
+-----------+---------------------------------------------------------------+---------+---------------+--------------+-------------+
| code:long | jsons:string | id:long | name:string | price:double | tags:string |
+-----------+---------------------------------------------------------------+---------+---------------+--------------+-------------+
| 1 | {"id":1,"name":"A green door","price":11.50,"tags":"green"} | 1 | A green door | 11.5 | green |
| 2 | {"id":2,"name":"A blue door","price":12.50,"tags":"blue"} | 2 | A blue door | 12.5 | blue |
| 3 | {"id":3,"name":"A red door","price":13.50,"tags":"red"} | 3 | A red door | 13.5 | red |
| 4 | {"id":4,"name":"A pink door","price":14.50,"tags":"pink"} | 4 | A pink door | 14.5 | pink |
| 5 | {"id":5,"name":"A white door","price":15.50,"tags":"white"} | 5 | A white door | 15.5 | white |
| 6 | {"id":6,"name":"A black door","price":16.50,"tags":"black"} | 6 | A black door | 16.5 | black |
| 7 | {"id":7,"name":"A yellow door","price":17.50,"tags":"yellow"} | 7 | A yellow door | 17.5 | yellow |
| 8 | {"id":8,"name":"A purple door","price":18.50,"tags":"purple"} | 8 | A purple door | 18.5 | purple |
+-----------+---------------------------------------------------------------+---------+---------------+--------------+-------------+When you have multiple JSON columns, add this sections to your .yml.
Example of Source data
+-----------+---------------------------------------------------------------+-----------------------------------+
| code:long | jsons:string | jsons2:string |
+-----------+---------------------------------------------------------------+-----------------------------------+
| 1 | {"id":1,"name":"A green door","price":11.50,"tags":"green"} | {"color":"green","size":"Large"} |
| 2 | {"id":2,"name":"A blue door","price":12.50,"tags":"blue"} | {"color":"blue","size":"Small"} |
| 3 | {"id":3,"name":"A red door","price":13.50,"tags":"red"} | {"color":"red","size":"Medium"} |
| 4 | {"id":4,"name":"A pink door","price":14.50,"tags":"pink"} | {"color":"pink","size":"Medium"} |
| 5 | {"id":5,"name":"A white door","price":15.50,"tags":"white"} | {"color":"white","size":"Large"} |
| 6 | {"id":6,"name":"A black door","price":16.50,"tags":"black"} | {"color":"black","size":"Large"} |
| 7 | {"id":7,"name":"A yellow door","price":17.50,"tags":"yellow"} | {"color":"yellow","size":"Small"} |
| 8 | {"id":8,"name":"A purple door","price":18.50,"tags":"purple"} | {"color":"purple","size":"Large"} |
+-----------+---------------------------------------------------------------+-----------------------------------+Added section. In this case, Data Connector will expand jsons and jsons2 column.
filters:
- type: expand_json
json_column_name: jsons
root: "$."
expanded_columns:
- {name: "id", type: long}
- {name: "name", type: string}
- {name: "price", type: double}
- {name: "tags", type: string}
- type: expand_json
json_column_name: jsons2
root: "$."
expanded_columns:
- {name: "color", type: string}
- {name: "size", type: string}The result is below:
+-----------+---------+---------------+--------------+-------------+--------------+-------------+
| code:long | id:long | name:string | price:double | tags:string | color:string | size:string |
+-----------+---------+---------------+--------------+-------------+--------------+-------------+
| 1 | 1 | A green door | 11.5 | green | green | Large |
| 2 | 2 | A blue door | 12.5 | blue | blue | Small |
| 3 | 3 | A red door | 13.5 | red | red | Medium |
| 4 | 4 | A pink door | 14.5 | pink | pink | Medium |
| 5 | 5 | A white door | 15.5 | white | white | Large |
| 6 | 6 | A black door | 16.5 | black | black | Large |
| 7 | 7 | A yellow door | 17.5 | yellow | yellow | Small |
| 8 | 8 | A purple door | 18.5 | purple | purple | Large |
+-----------+---------+---------------+--------------+-------------+--------------+-------------+When you have JSON columns with different keys, add this sections to your .yml.
Example of Source data
{"id":1,"name":"A green door","price":11.50,"tags":"green", "discount": 1.10}
{"id":2,"name":"A blue door","price":12.50,"tags":"blue"}
{"id":3,"name":"A red door","price":13.50,"tags":"red","discount": 1.00}
{"id":4,"name":"A pink door","price":14.50,"tags":"pink"}Added section. In this case, Data Connector will expand json line and set as null value for keys that are not present.
filters:
- type: expand_json
json_column_name: record
root: "$."
expanded_columns:
- {name: "id", type: long}
- {name: "name", type: string}
- {name: "price", type: double}
- {name: "tags", type: string}
- {name: "discount", type: double}The result is below:
+-----------+--------------+--------------+-------------+-----------------+
| id:string | name:string | price:double | tags:string | discount:double |
+-----------+--------------+--------------+-------------+-----------------+
| 1 | A green door | 11.5 | green | 1.1 |
| 2 | A blue door | 12.5 | blue | |
| 3 | A red door | 13.5 | red | 1.0 |
| 4 | A pink door | 14.5 | pink | |
+-----------+--------------+--------------+-------------+-----------------+