The rename filter plugin for Treasure Data's Data Connector renames column names. This plugin takes two options below. columns is supported, but no longer recommended for new use.
Option | Description |
---|---|
rules | an array of rule-based renaming operations. (See below for rules.) |
columns | a map whose keys are existing column names. values are new names. |
Configuration
Add a type: rename filter in the filters section in your load.yml, as in this example:
in: ... filters: - type: rename rules: - rule: upper_to_lower - rule: character_types pass_types: [ "a-z", "0-9" ] pass_characters: "_" replace: "_" out: ...
Renaming rules
The rules is an array of rules as shown in the following applied top-down for all the columns.
rule | description |
---|---|
character_types | Restrict characters by types. Replace restricted characteres. |
first_character_types | Restrict the first character by types. Prefix or replace first restricted characters. |
lower_to_upper | Convert lower-case alphabets to upper-case. |
regex_replace | Replace with a regular expressions. |
truncate | Truncate. |
upper_to_lower | Convert upper-case alphabets to lower-case. |
unique_number_suffix | Make column names unique in the schema. |
Renaming rule: character_types
The rule character_types replaces restricted characters.
option | description | required? |
---|---|---|
pass_characteres | Characters to be allowed. |
|
pass_types | Sets of characters to be allowed. The array must consist of “a-z” (lower-case alphabets), “A-Z” (upper-case alphabets), or “0-9” (digits). |
|
replace | A character that disallowed characters are replaced with. It must consist of just 1 character. |
|
Example: character_types
# This configuration replaces characters into "_" except for "_", lower-case alphabets, and digits. filters: ... - type: rename rules: - rule: character_types pass_characters: "_" pass_types: [ "a-z", "0-9" ]
Renaming rule: first_character_types
The rule first_character_types prefixes or replaces a restricted character at the beginning.
option | description | required? |
---|---|---|
pass_characters | Characters to be allowed. | "" by default |
pass_types | Sets of characters to be allowed. The array must consist of “a-z” (lower-case alphabets), “A-Z” (upper-case alphabets), or “0-9” (digits). | [ ] by default |
prefix | A character that a disallowed first character is replaced with. | one of prefix or replace is required |
replace | A character that a disallowed first character is prefixed with. | one of prefix or replace is required |
Example: first_character_types
# This configuration prefixes a column name with "_" unless the name starts from "_" or a lower-case alphabet. filters: ... - type: rename rules: - rule: first_character_types pass_characters: "_" pass_types: [ "a-z" ] prefix: "_"
Renaming rule: lower_to_upper
The rule lower_to_upper converts lower-case alphabets to upper-case.
Example: lower_to_upper
# This configuration converts all lower-case alphabets to upper-case. filters: ... - type: rename rules: - rule: lower_to_upper
Renaming rule: regex_replace
The rule regex_replace replaces column names based on a regular expression.
option | description | required? |
---|---|---|
match | A Java-style regular expression to which this string is to be matched. | required |
replace | A string to be substituted for each match in Java-style. | required |
Example: regex_replace
# This configuration replaces all patterns filters: ... - type: rename rules: - rule: regex_replace match: "([0-9]+)_dollars" replace: "USD$1"
Renaming rule: truncate
The rule truncate truncates column names.
option | description | required? |
---|---|---|
max_length | The length to which the column names are truncated. | 128 by default |
Example: truncate
# This configuration drops all characters after the 20th character. filters: ... - type: rename rules: - rule: truncate max_length: 20
Renaming rule: upper_to_lower
The rule upper_to_lower converts upper-case alphabets to lower-case.
Example: upper_to_lower
# This configuration converts all upper-case alphabets to lower-case. filters: ... - type: rename rules: - rule: upper_to_lower
Renaming rule: unique_number_suffix
The rule unique_number_suffix makes column names unique in the schema by suffixing numbers.
option | description | required? |
---|---|---|
delimiter | A delimiter character inserted before a suffix number. It must be just 1 non-digit character. |
|
digits | An integer that specifies the number of zero-filled digits of a suffix number. The suffix number zero-filled to the digits. | optional |
max_length | The length to which the column names are truncated. The column name is truncated before the suffix number. | optional |
offset | An integer where the suffix number starts. The first duplicative column name is suffixed by ( |
|
Hint: unique_number_suffix
The procedure to make column names unique is not very trivial. There are many feasible ways. This renaming rule works as follows:
Basic policies:
Suffix numbers are counted per original column name.
Column names are fixed from the first column to the last column.
Actual procedure applied from the first (leftmost) column to the last (rightmost) column:
Fix the column name as-is with truncating if the truncated name is not duplicated with left columns.
Suffix the column name otherwise.
a. Try to append the suffix number for the original column name with truncating. b. Fix it if the suffixed name is not duplicated with left columns nor original columns. c. Retry (a) with the suffix number increased otherwise.
Example: unique_number_suffix
# This configuration suffixes numbers to duplicative column names. (Ex. ["column", "column", "column"] goes to ["column", "column_2", "column_3"].) filters: ... - type: rename rules: - rule: unique_number_suffix
Use Case Example
To import your data with different column names, add this rename filter in your .yml.
Example of Source data
$ td connector:preview load.yml +---------+---------------+ | 0_ID | 1_Animal-Name | +---------+---------------+ | 1 | "Lion" | | 2 | "Gorilla" | | 3 | "Zebra" | | 4 | "Giraffe" | | 6 | "Frog" | +---------+---------------+
When the following rename filter is added:
in: ... filters: - type: rename rule: - rule: upper_to_lower - rule: character_types pass_types: ["a-z", "0-9"] pass_characters: "_" replace: "_" - rule: first_character_types pass_types: ["a-z"] pass_characters: "_" prefix: "_"
Data Connector renames the columns 0_ID to _0_id, and 1_Animal-Name to _1_animal_name in this case.
$ td connector:preview load.yml +---------+---------------+ | 0_id | 1_animal_name | +---------+---------------+ | 1 | "Lion" | | 2 | "Gorilla" | | 3 | "Zebra" | | 4 | "Giraffe" | | 6 | "Frog" | +---------+---------------+
Columns: not recommended
columns renames each column name as the map given. columns is no longer recommended for new use. Use rules instead.
filters: ... - type: rename columns: my_existing_column1: new_column1 my_existing_column2: new_column2
columns is applied before rules if columns and rules are specified together. (It is not recommended to specify them together.)