The rename filter plugin for Treasure Data's Data Connector renames column names. This plugin takes two options below. columns is supported, but no longer recommended for new use.
| Option | Description |
|---|---|
| rules | an array of rule-based renaming operations. (See below for rules.) |
| columns | a map whose keys are existing column names. values are new names. |
- Configuration
- Renaming rules
- Renaming rule: character_types
- Example: character_types
- Renaming rule: first_character_types
- Example: first_character_types
- Renaming rule: lower_to_upper
- Example: lower_to_upper
- Renaming rule: regex_replace
- Example: regex_replace
- Renaming rule: truncate
- Example: truncate
- Renaming rule: upper_to_lower
- Example: upper_to_lower
- Renaming rule: unique_number_suffix
- Hint: unique_number_suffix
- Example: unique_number_suffix
- Use Case Example
- Columns: not recommended
Add a type: rename filter in the filters section in your load.yml, as in this example:
in:
...
filters:
- type: rename
rules:
- rule: upper_to_lower
- rule: character_types
pass_types: [ "a-z", "0-9" ]
pass_characters: "_"
replace: "_"
out:
...The rules is an array of rules as shown in the following applied top-down for all the columns.
| rule | description |
|---|---|
| character_types | Restrict characters by types. Replace restricted characteres. |
| first_character_types | Restrict the first character by types. Prefix or replace first restricted characters. |
| lower_to_upper | Convert lower-case alphabets to upper-case. |
| regex_replace | Replace with a regular expressions. |
| truncate | Truncate. |
| upper_to_lower | Convert upper-case alphabets to lower-case. |
| unique_number_suffix | Make column names unique in the schema. |
The rule character_types replaces restricted characters.
| option | description | required? |
|---|---|---|
| pass_characteres | Characters to be allowed. | "" by default |
| pass_types | Sets of characters to be allowed. The array must consist of “a-z” (lower-case alphabets), “A-Z” (upper-case alphabets), or “0-9” (digits). | [] by default |
| replace | A character that disallowed characters are replaced with. It must consist of just 1 character. | "_" by default |
# This configuration replaces characters into "_" except for "_", lower-case alphabets, and digits.
filters:
...
- type: rename
rules:
- rule: character_types
pass_characters: "_"
pass_types: [ "a-z", "0-9" ]The rule first_character_types prefixes or replaces a restricted character at the beginning.
| option | description | required? |
|---|---|---|
| pass_characters | Characters to be allowed. | "" by default |
| pass_types | Sets of characters to be allowed. The array must consist of “a-z” (lower-case alphabets), “A-Z” (upper-case alphabets), or “0-9” (digits). | [ ] by default |
| prefix | A character that a disallowed first character is replaced with. | one of prefix or replace is required |
| replace | A character that a disallowed first character is prefixed with. | one of prefix or replace is required |
# This configuration prefixes a column name with "_" unless the name starts from "_" or a lower-case alphabet.
filters:
...
- type: rename
rules:
- rule: first_character_types
pass_characters: "_"
pass_types: [ "a-z" ]
prefix: "_"The rule lower_to_upper converts lower-case alphabets to upper-case.
# This configuration converts all lower-case alphabets to upper-case.
filters:
...
- type: rename
rules:
- rule: lower_to_upperThe rule regex_replace replaces column names based on a regular expression.
| option | description | required? |
|---|---|---|
| match | A Java-style regular expression to which this string is to be matched. | required |
| replace | A string to be substituted for each match in Java-style. | required |
# This configuration replaces all patterns
filters:
...
- type: rename
rules:
- rule: regex_replace
match: "([0-9]+)_dollars"
replace: "USD$1"The rule truncate truncates column names.
| option | description | required? |
|---|---|---|
| max_length | The length to which the column names are truncated. | 128 by default |
# This configuration drops all characters after the 20th character.
filters:
...
- type: rename
rules:
- rule: truncate
max_length: 20The rule upper_to_lower converts upper-case alphabets to lower-case.
# This configuration converts all upper-case alphabets to lower-case.
filters:
...
- type: rename
rules:
- rule: upper_to_lowerThe rule unique_number_suffix makes column names unique in the schema by suffixing numbers.
| option | description | required? |
|---|---|---|
| delimiter | A delimiter character inserted before a suffix number. It must be just 1 non-digit character. | "_" by default |
| digits | An integer that specifies the number of zero-filled digits of a suffix number. The suffix number zero-filled to the digits. | optional |
| max_length | The length to which the column names are truncated. The column name is truncated before the suffix number. | optional |
| offset | An integer where the suffix number starts. The first duplicative column name is suffixed by (offset + 1). | 1 by default |
The procedure to make column names unique is not very trivial. There are many feasible ways. This renaming rule works as follows:
Basic policies:
Suffix numbers are counted per original column name.
Column names are fixed from the first column to the last column.
Actual procedure applied from the first (leftmost) column to the last (rightmost) column:
Fix the column name as-is with truncating if the truncated name is not duplicated with left columns.
Suffix the column name otherwise.
a. Try to append the suffix number for the original column name with truncating. b. Fix it if the suffixed name is not duplicated with left columns nor original columns. c. Retry (a) with the suffix number increased otherwise.
# This configuration suffixes numbers to duplicative column names. (Ex. ["column", "column", "column"] goes to ["column", "column_2", "column_3"].)
filters:
...
- type: rename
rules:
- rule: unique_number_suffixTo import your data with different column names, add this rename filter in your .yml.
Example of Source data
$ td connector:preview load.yml
+---------+---------------+
| 0_ID | 1_Animal-Name |
+---------+---------------+
| 1 | "Lion" |
| 2 | "Gorilla" |
| 3 | "Zebra" |
| 4 | "Giraffe" |
| 6 | "Frog" |
+---------+---------------+When the following rename filter is added:
in:
...
filters:
- type: rename
rule:
- rule: upper_to_lower
- rule: character_types
pass_types: ["a-z", "0-9"]
pass_characters: "_"
replace: "_"
- rule: first_character_types
pass_types: ["a-z"]
pass_characters: "_"
prefix: "_"Data Connector renames the columns 0_ID to _0_id, and 1_Animal-Name to _1_animal_name in this case.
$ td connector:preview load.yml
+---------+---------------+
| 0_id | 1_animal_name |
+---------+---------------+
| 1 | "Lion" |
| 2 | "Gorilla" |
| 3 | "Zebra" |
| 4 | "Giraffe" |
| 6 | "Frog" |
+---------+---------------+columns renames each column name as the map given. columns is no longer recommended for new use. Use rules instead.
filters:
...
- type: rename
columns:
my_existing_column1: new_column1
my_existing_column2: new_column2columns is applied before rules if columns and rules are specified together. (It is not recommended to specify them together.)