rename filter plugin for Data Connector

The rename filter plugin for Data Connector renames column names. This plugin takes two options below. columns is supported, but no longer recommended for new use.

Option Description
rules an array of rule-based renaming operations. (See below for rules.)
columns a map whose keys are existing column names. values are new names.

Table of Contents

Configuration

Add a type: rename filter in the filters section in your load.yml, for example:

in:
...
filters:
- type: rename
  rules:
  - rule: upper_to_lower
  - rule: character_types
    pass_types: [ "a-z", "0-9" ]
    pass_characters: "_"
    replace: "_"
out:
...

Renaming rules

The rules is an array of rules as below applied top-down for all the columns.

rule description
character_types Restrict characters by types. Replace restricted characteres.
first_character_types Restrict the first character by types. Prefix or replace first restricted characters.
lower_to_upper Convert lower-case alphabets to upper-case.
regex_replace Replace with a regular expressions.
truncate Truncate.
upper_to_lower Convert upper-case alphabets to lower-case.
unique_number_suffix Make column names unique in the schema.

Renaming rule: character_types

The rule character_types replaces restricted characters.

option description required?
pass_characteres Characters to be allowed. "" by default
pass_types Sets of characters to be allowed. The array must consist of “a-z” (lower-case alphabets), “A-Z” (upper-case alphabets), or “0-9” (digits). [] by default
replace A character that disallowed characters are replaced with. It must consist of just 1 character. "_" by default

Example: character_types

# This configuration replaces characters into "_" except for "_", lower-case alphabets, and digits.
filters:
  ...
  - type: rename
    rules:
    - rule: character_types
      pass_characters: "_"
      pass_types: [ "a-z", "0-9" ]

Renaming rule: first_character_types

The rule first_character_types prefixes or replaces a restricted character at the beginning.

option description required?
pass_characteres Characters to be allowed. "" by default
pass_types Sets of characters to be allowed. The array must consist of “a-z” (lower-case alphabets), “A-Z” (upper-case alphabets), or “0-9” (digits). [] by default
prefix A character that a disallowed first character is replaced with. one of prefix or replace is required
replace A character that a disallowed first character is prefixed with. one of prefix or replace is required

Example: first_character_types

# This configuration prefixes a column name with "_" unless the name starts from "_" or a lower-case alphabet.
filters:
  ...
  - type: rename
    rules:
    - rule: first_character_types
      pass_characters: "_"
      pass_types: [ "a-z" ]
      prefix: "_"

Renaming rule: lower_to_upper

The rule lower_to_upper converts lower-case alphabets to upper-case.

Example: lower_to_upper

# This configuration converts all lower-case alphabets to upper-case.
filters:
  ...
  - type: rename
    rules:
    - rule: lower_to_upper

Renaming rule: regex_replace

The rule regex_replace replaces column names based on a regular expression.

option description required?
match A Java-style regular expression to which this string is to be matched. required
replace A string to be substibuted for each match in Java-style. required

Example: regex_replace

# This configuration replaces all patterns
filters:
  ...
  - type: rename
    rules:
    - rule: regex_replace
      match: "([0-9]+)_dollars"
      replace: "USD$1"

Renaming rule: truncate

The rule truncate truncates column names.

option description required?
max_length The length to which the column names are truncated. 128 by default

Example: truncate

# This configuration drops all characters after the 20th character.
filters:
  ...
  - type: rename
    rules:
    - rule: truncate
      max_length: 20

Renaming rule: upper_to_lower

The rule upper_to_lower converts upper-case alphabets to lower-case.

Example: upper_to_lower

# This configuration converts all upper-case alphabets to lower-case.
filters:
  ...
  - type: rename
    rules:
    - rule: upper_to_lower

Renaming rule: unique_number_suffix

The rule unique_number_suffix makes column names unique in the schema by suffixing numbers.

option description required?
delimiter A delimiter character inserted before a suffix number. It must be just 1 non-digit character. "_" by default
digits An integer that specifies the number of zero-filled digits of a suffix number. The suffix number zero-filled to the digits. optional
max_length The length to which the column names are truncated. The column name is truncated before the suffix number. optional
offset An integer where the suffix number starts. The first duplicative column name is suffixed by (offset + 1). 1 by default

Hint: unique_number_suffix

The procedure to make column names unique is not very trivial. There are many feasible ways. This renaming rule works as follows:

Basic policies:

  • Suffix numbers are counted per original column name.
  • Column names are fixed from the first column to the last column.

Actual procedure applied from the first (leftmost) column to the last (rightmost) column:

  1. Fix the column name as-is with truncating if the truncated name is not duplicated with left columns.
  2. Suffix the column name otherwise.

    a. Try to append the suffix number for the original column name with truncating. b. Fix it if the suffixed name is not duplicated with left columns nor original columns. c. Retry (a) with the suffix number increased otherwise.

Example: unique_number_suffix

# This configuration suffixes numbers to duplicative column names. (Ex. ["column", "column", "column"] goes to ["column", "column_2", "column_3"].)
filters:
  ...
  - type: rename
    rules:
    - rule: unique_number_suffix

Use Case Example:

When you would like to import your data with different column names, please add this rename filter in your .yml.

Example of Source data

$ td connector:preview load.yml
+---------+---------------+
|    0_ID | 1_Animal-Name |
+---------+---------------+
| 1       | "Lion"        |
| 2       | "Gorilla"     |
| 3       | "Zebra"       |
| 4       | "Giraffe"     |
| 6       | "Frog"        |
+---------+---------------+

When the following rename filter is added:

in:
...
filters:
- type: rename
  rule:
  - rule: upper_to_lower
  - rule: character_types
    pass_types: ["a-z", "0-9"]
    pass_characters: "_"
    replace: "_"
  - rule: first_character_types
    pass_types: ["a-z"]
    pass_characters: "_"
    prefix: "_"

Data Connector renames the columns 0_ID to _0_id, and 1_Animal-Name to _1_animal_name in this case.

$ td connector:preview load.yml
+---------+---------------+
|    0_id | 1_animal_name |
+---------+---------------+
| 1       | "Lion"        |
| 2       | "Gorilla"     |
| 3       | "Zebra"       |
| 4       | "Giraffe"     |
| 6       | "Frog"        |
+---------+---------------+

Columns: not recommended

columns renames each column name as the map given. columns is no longer recommended for new use. Please consider using rules instead.

filters:
  ...
  - type: rename
    columns:
      my_existing_column1: new_column1
      my_existing_column2: new_column2

columns is applied before rules if columns and rules are specified together. (It is not recommended to specify them together, though.)


Last modified: Dec 20 2016 03:42:11 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.