Skip to content
Last updated

Expand_json Filter Function

The expand_json filter plugin for Treasure Data's Data Connector expands columns having JSON into multiple columns. This filter takes five options as explained below:

OptionDescription
json_column_namecolumn name having JSON to be expanded (string, required)
rootproperty to start fetching each entries, specify in JSONPath style (string, default: "$.")

expanded_columns

expanded into multiple columns (array of hash, required)

  • name : name of the column. you can define JSONPath style.
  • type : type of the column (see below)
  • format : format of the timestamp if type is timestamp
  • timezone : Timezone of column if values don't include time zone description
default_timezoneTime zone of timestamp columns if values don't include time zone description (UTC by default)
keep_expanding_json_columnNot remove the expanding JSON column from input schema if it's true (false by default)
stop_on_invalid_recordStop bulk load transaction if an invalid record is included (false by default)

Type of columns

namedescription
booleantrue or false
long64-bit signed integers
timestampDate and time with nano-seconds precision
double64-bit floating point numbers
stringStrings

Configuration

Add the filter section with type expand_json to your load.yml, for example:

in:
...
filters:
  - type: expand_json
    json_column_name: json_payload
    root: "$."
    expanded_columns:
      - {name: "phone_numbers", type: string}
      - {name: "app_id", type: long}
      - {name: "point", type: double}
      - {name: "created_at", type: timestamp, format: "%Y-%m-%d", timezone: "UTC"}
      - {name: "profile.anniversary.et", type: string}
      - {name: "profile.anniversary.voluptatem", type: string}
      - {name: "profile.like_words[1]", type: string}
      - {name: "profile.like_words[2]", type: string}
      - {name: "profile.like_words[0]", type: string}
 out:
 ...

Use Case 1: Expand one json column

When you have a table has a json column, add this section to your .yml.

Example of Source data

+-----------+---------------------------------------------------------------+
| code:long |                                                  jsons:string |
+-----------+---------------------------------------------------------------+
|         1 |   {"id":1,"name":"A green door","price":11.50,"tags":"green"} |
|         2 |     {"id":2,"name":"A blue door","price":12.50,"tags":"blue"} |
|         3 |       {"id":3,"name":"A red door","price":13.50,"tags":"red"} |
|         4 |     {"id":4,"name":"A pink door","price":14.50,"tags":"pink"} |
|         5 |   {"id":5,"name":"A white door","price":15.50,"tags":"white"} |
|         6 |   {"id":6,"name":"A black door","price":16.50,"tags":"black"} |
|         7 | {"id":7,"name":"A yellow door","price":17.50,"tags":"yellow"} |
|         8 | {"id":8,"name":"A purple door","price":18.50,"tags":"purple"} |
+-----------+---------------------------------------------------------------+

Added section. In this case, Data Connector will expand jsons column.

filters:
  - type: expand_json
    json_column_name: jsons
    root: "$."
    expanded_columns:
      - {name: "id", type: long}
      - {name: "name", type: string}
      - {name: "price", type: double}
      - {name: "tags", type: string}

The result is below:

+-----------+---------+---------------+--------------+-------------+
| code:long | id:long |   name:string | price:double | tags:string |
+-----------+---------+---------------+--------------+-------------+
|         1 |       1 |  A green door |         11.5 |       green |
|         2 |       2 |   A blue door |         12.5 |        blue |
|         3 |       3 |    A red door |         13.5 |         red |
|         4 |       4 |   A pink door |         14.5 |        pink |
|         5 |       5 |  A white door |         15.5 |       white |
|         6 |       6 |  A black door |         16.5 |       black |
|         7 |       7 | A yellow door |         17.5 |      yellow |
|         8 |       8 | A purple door |         18.5 |      purple |
+-----------+---------+---------------+--------------+-------------+

Use Case 2: Keep Original JSON Column

When you want to keep original JSON column, add keep_expanding_json_column option to your .yml.

filters:
  - type: expand_json
    json_column_name: jsons
    root: "$."
    keep_expanding_json_column: true
    expanded_columns:
      - {name: "id", type: long}
      - {name: "name", type: string}
      - {name: "price", type: double}
      - {name: "tags", type: string}

The result is below:

+-----------+---------------------------------------------------------------+---------+---------------+--------------+-------------+
| code:long |                                                  jsons:string | id:long |   name:string | price:double | tags:string |
+-----------+---------------------------------------------------------------+---------+---------------+--------------+-------------+
|         1 |   {"id":1,"name":"A green door","price":11.50,"tags":"green"} |       1 |  A green door |         11.5 |       green |
|         2 |     {"id":2,"name":"A blue door","price":12.50,"tags":"blue"} |       2 |   A blue door |         12.5 |        blue |
|         3 |       {"id":3,"name":"A red door","price":13.50,"tags":"red"} |       3 |    A red door |         13.5 |         red |
|         4 |     {"id":4,"name":"A pink door","price":14.50,"tags":"pink"} |       4 |   A pink door |         14.5 |        pink |
|         5 |   {"id":5,"name":"A white door","price":15.50,"tags":"white"} |       5 |  A white door |         15.5 |       white |
|         6 |   {"id":6,"name":"A black door","price":16.50,"tags":"black"} |       6 |  A black door |         16.5 |       black |
|         7 | {"id":7,"name":"A yellow door","price":17.50,"tags":"yellow"} |       7 | A yellow door |         17.5 |      yellow |
|         8 | {"id":8,"name":"A purple door","price":18.50,"tags":"purple"} |       8 | A purple door |         18.5 |      purple |
+-----------+---------------------------------------------------------------+---------+---------------+--------------+-------------+

Use Case 3: Expand Multiple JSON Columns

When you have multiple JSON columns, add this sections to your .yml.

Example of Source data

+-----------+---------------------------------------------------------------+-----------------------------------+
| code:long |                                                  jsons:string |                     jsons2:string |
+-----------+---------------------------------------------------------------+-----------------------------------+
|         1 |   {"id":1,"name":"A green door","price":11.50,"tags":"green"} |  {"color":"green","size":"Large"} |
|         2 |     {"id":2,"name":"A blue door","price":12.50,"tags":"blue"} |   {"color":"blue","size":"Small"} |
|         3 |       {"id":3,"name":"A red door","price":13.50,"tags":"red"} |   {"color":"red","size":"Medium"} |
|         4 |     {"id":4,"name":"A pink door","price":14.50,"tags":"pink"} |  {"color":"pink","size":"Medium"} |
|         5 |   {"id":5,"name":"A white door","price":15.50,"tags":"white"} |  {"color":"white","size":"Large"} |
|         6 |   {"id":6,"name":"A black door","price":16.50,"tags":"black"} |  {"color":"black","size":"Large"} |
|         7 | {"id":7,"name":"A yellow door","price":17.50,"tags":"yellow"} | {"color":"yellow","size":"Small"} |
|         8 | {"id":8,"name":"A purple door","price":18.50,"tags":"purple"} | {"color":"purple","size":"Large"} |
+-----------+---------------------------------------------------------------+-----------------------------------+

Added section. In this case, Data Connector will expand jsons and jsons2 column.

filters:
  - type: expand_json
    json_column_name: jsons
    root: "$."
    expanded_columns:
      - {name: "id", type: long}
      - {name: "name", type: string}
      - {name: "price", type: double}
      - {name: "tags", type: string}
  - type: expand_json
    json_column_name: jsons2
    root: "$."
    expanded_columns:
      - {name: "color", type: string}
      - {name: "size", type: string}

The result is below:

+-----------+---------+---------------+--------------+-------------+--------------+-------------+
| code:long | id:long |   name:string | price:double | tags:string | color:string | size:string |
+-----------+---------+---------------+--------------+-------------+--------------+-------------+
|         1 |       1 |  A green door |         11.5 |       green |        green |       Large |
|         2 |       2 |   A blue door |         12.5 |        blue |         blue |       Small |
|         3 |       3 |    A red door |         13.5 |         red |          red |      Medium |
|         4 |       4 |   A pink door |         14.5 |        pink |         pink |      Medium |
|         5 |       5 |  A white door |         15.5 |       white |        white |       Large |
|         6 |       6 |  A black door |         16.5 |       black |        black |       Large |
|         7 |       7 | A yellow door |         17.5 |      yellow |       yellow |       Small |
|         8 |       8 | A purple door |         18.5 |      purple |       purple |       Large |
+-----------+---------+---------------+--------------+-------------+--------------+-------------+

Use Case 4: Expand JSON with Different Keys

When you have JSON columns with different keys, add this sections to your .yml.

Example of Source data

{"id":1,"name":"A green door","price":11.50,"tags":"green", "discount": 1.10}
{"id":2,"name":"A blue door","price":12.50,"tags":"blue"}
{"id":3,"name":"A red door","price":13.50,"tags":"red","discount": 1.00}
{"id":4,"name":"A pink door","price":14.50,"tags":"pink"}

Added section. In this case, Data Connector will expand json line and set as null value for keys that are not present.

filters:
  - type: expand_json
    json_column_name: record
    root: "$."
    expanded_columns:
      - {name: "id", type: long}
      - {name: "name", type: string}
      - {name: "price", type: double}
      - {name: "tags", type: string}
      - {name: "discount", type: double}

The result is below:

+-----------+--------------+--------------+-------------+-----------------+
| id:string |  name:string | price:double | tags:string | discount:double |
+-----------+--------------+--------------+-------------+-----------------+
|         1 | A green door |         11.5 |       green |             1.1 |
|         2 |  A blue door |         12.5 |        blue |                 |
|         3 |   A red door |         13.5 |         red |             1.0 |
|         4 |  A pink door |         14.5 |        pink |                 |
+-----------+--------------+--------------+-------------+-----------------+