Skip to content
Last updated

Rename Filter Function

The rename filter plugin for Treasure Data's Data Connector renames column names. This plugin takes two options below. columns is supported, but no longer recommended for new use.

OptionDescription
rulesan array of rule-based renaming operations. (See below for rules.)
columnsa map whose keys are existing column names. values are new names.

Configuration

Add a type: rename filter in the filters section in your load.yml, as in this example:

    in:
    ...
    filters:
    - type: rename
      rules:
      - rule: upper_to_lower
      - rule: character_types
        pass_types: [ "a-z", "0-9" ]
        pass_characters: "_"
        replace: "_"
    out:
    ...

Renaming rules

The rules is an array of rules as shown in the following applied top-down for all the columns.

ruledescription
character_typesRestrict characters by types. Replace restricted characteres.
first_character_typesRestrict the first character by types. Prefix or replace first restricted characters.
lower_to_upperConvert lower-case alphabets to upper-case.
regex_replaceReplace with a regular expressions.
truncateTruncate.
upper_to_lowerConvert upper-case alphabets to lower-case.
unique_number_suffixMake column names unique in the schema.

Renaming rule: character_types

The rule character_types replaces restricted characters.

optiondescriptionrequired?
pass_characteresCharacters to be allowed."" by default
pass_typesSets of characters to be allowed. The array must consist of “a-z” (lower-case alphabets), “A-Z” (upper-case alphabets), or “0-9” (digits).[] by default
replaceA character that disallowed characters are replaced with. It must consist of just 1 character."_" by default

Example: character_types

    # This configuration replaces characters into "_" except for "_", lower-case alphabets, and digits.
    filters:
      ...
      - type: rename
        rules:
        - rule: character_types
          pass_characters: "_"
          pass_types: [ "a-z", "0-9" ]

Renaming rule: first_character_types

The rule first_character_types prefixes or replaces a restricted character at the beginning.

optiondescriptionrequired?
pass_charactersCharacters to be allowed."" by default
pass_typesSets of characters to be allowed. The array must consist of “a-z” (lower-case alphabets), “A-Z” (upper-case alphabets), or “0-9” (digits).[ ] by default
prefixA character that a disallowed first character is replaced with.one of prefix or replace is required
replaceA character that a disallowed first character is prefixed with.one of prefix or replace is required

Example: first_character_types

    # This configuration prefixes a column name with "_" unless the name starts from "_" or a lower-case alphabet.
    filters:
      ...
      - type: rename
        rules:
        - rule: first_character_types
          pass_characters: "_"
          pass_types: [ "a-z" ]
          prefix: "_"

Renaming rule: lower_to_upper

The rule lower_to_upper converts lower-case alphabets to upper-case.

Example: lower_to_upper

    # This configuration converts all lower-case alphabets to upper-case.
    filters:
      ...
      - type: rename
        rules:
        - rule: lower_to_upper

Renaming rule: regex_replace

The rule regex_replace replaces column names based on a regular expression.

optiondescriptionrequired?
matchA Java-style regular expression to which this string is to be matched.required
replaceA string to be substituted for each match in Java-style.required

Example: regex_replace

    # This configuration replaces all patterns
    filters:
      ...
      - type: rename
        rules:
        - rule: regex_replace
          match: "([0-9]+)_dollars"
          replace: "USD$1"

Renaming rule: truncate

The rule truncate truncates column names.

optiondescriptionrequired?
max_lengthThe length to which the column names are truncated.128 by default

Example: truncate

    # This configuration drops all characters after the 20th character.
    filters:
      ...
      - type: rename
        rules:
        - rule: truncate
          max_length: 20

Renaming rule: upper_to_lower

The rule upper_to_lower converts upper-case alphabets to lower-case.

Example: upper_to_lower

    # This configuration converts all upper-case alphabets to lower-case.
    filters:
      ...
      - type: rename
        rules:
        - rule: upper_to_lower

Renaming rule: unique_number_suffix

The rule unique_number_suffix makes column names unique in the schema by suffixing numbers.

optiondescriptionrequired?
delimiterA delimiter character inserted before a suffix number. It must be just 1 non-digit character."_" by default
digitsAn integer that specifies the number of zero-filled digits of a suffix number. The suffix number zero-filled to the digits.optional
max_lengthThe length to which the column names are truncated. The column name is truncated before the suffix number.optional
offsetAn integer where the suffix number starts. The first duplicative column name is suffixed by (offset + 1).1 by default

Hint: unique_number_suffix

The procedure to make column names unique is not very trivial. There are many feasible ways. This renaming rule works as follows:

Basic policies:

  • Suffix numbers are counted per original column name.

  • Column names are fixed from the first column to the last column.

Actual procedure applied from the first (leftmost) column to the last (rightmost) column:

  1. Fix the column name as-is with truncating if the truncated name is not duplicated with left columns.

  2. Suffix the column name otherwise.

a. Try to append the suffix number for the original column name with truncating. b. Fix it if the suffixed name is not duplicated with left columns nor original columns. c. Retry (a) with the suffix number increased otherwise.

Example: unique_number_suffix

    # This configuration suffixes numbers to duplicative column names. (Ex. ["column", "column", "column"] goes to ["column", "column_2", "column_3"].)
    filters:
      ...
      - type: rename
        rules:
        - rule: unique_number_suffix

Use Case Example

To import your data with different column names, add this rename filter in your .yml.

Example of Source data

$ td connector:preview load.yml
+---------+---------------+
|    0_ID | 1_Animal-Name |
+---------+---------------+
| 1       | "Lion"        |
| 2       | "Gorilla"     |
| 3       | "Zebra"       |
| 4       | "Giraffe"     |
| 6       | "Frog"        |
+---------+---------------+

When the following rename filter is added:

    in:
    ...
    filters:
    - type: rename
      rule:
      - rule: upper_to_lower
      - rule: character_types
        pass_types: ["a-z", "0-9"]
        pass_characters: "_"
        replace: "_"
      - rule: first_character_types
        pass_types: ["a-z"]
        pass_characters: "_"
        prefix: "_"

Data Connector renames the columns 0_ID to _0_id, and 1_Animal-Name to _1_animal_name in this case.

$ td connector:preview load.yml
+---------+---------------+
|    0_id | 1_animal_name |
+---------+---------------+
| 1       | "Lion"        |
| 2       | "Gorilla"     |
| 3       | "Zebra"       |
| 4       | "Giraffe"     |
| 6       | "Frog"        |
+---------+---------------+

Columns: not recommended

columns renames each column name as the map given. columns is no longer recommended for new use. Use rules instead.

    filters:
      ...
      - type: rename
        columns:
          my_existing_column1: new_column1
          my_existing_column2: new_column2

columns is applied before rules if columns and rules are specified together. (It is not recommended to specify them together.)