Data Connector “Guess” Command Upcoming Change (20161219)

We are announcing a change to our Data Connector guess command functionality, to enable the creation of “schema-flexible” ingestion configurations.

Table of Contents

Summary of Impact

On December 19th PST we will change the behavior of the Data Connector guess command to provide more flexibility for use when pulling data from SaaS Applications, Business Systems, & Object Stores.

This change will not impact any scheduled Data Connectors pulling data into Treasure Data. It only impacts the creation of new Transfers in the Connectors GUI or Configuration File when using “td connector:guess” command or when creating a new transfer in the My Connections tab of the new console.

What does the “guess” command do?

The guess command automatically suggests the best configuration of Extract, Transform, & Load (ETL) processes to occur when data is bulk loaded into Treasure Data using Data Connector.

What change is being made? Why is being made?

The change will enable more flexibility for situations when the schema changes at a data source you are pulling data into Treasure Data from.

Today, there are a number of integrations that, when guess command is run, suggests explicit changes to every column name in order to accommodate requirements of Treasure Data storage system.

Current example auto-generated configuration:

- type: rename
  columns:
    Column-Name1: column_name1
    Column-Name2: column_name2
    ...

The problem with this above approach is that, if a new column is added or if a column name is modified, the date will not be successfully brought into Treasure Data without creating a new configuration.

Moving forward, we want these schema changes to be handled without requiring the creation of a new configuration. Specifically, the above example for current guess behavior will behave as follows:

New example auto-generated configuration:

- type: rename
  rules:
  - rule: upper_to_lower
  - rule: character_types
    pass_types: ["a-z", "0-9"]
    pass_characters: "_"
    replace: "_"
  - rule: first_character_types
    pass_types: ["a-z"]
    pass_characters: "_"
    prefix: "_"
  - rule: unique_number_suffix
    max_length: 128

With this new rules based approach, the guess command will no longer explicitly list every column change. Instead, only the rule applied to all columns will be listed. This will allow the source system to have new columns, or column name changes, to be incorporated automatically.

Which Data Connectors are impacted?

This will impact new configurations generated for most Data Connectors.

But, be certain this will have no impact to currently running transfer configurations. The only impact may occur if user re-creates a transfer from scratch.

Backwards compatibility of “guess”

As we strive to improve of guess over time, we do not guarantee the results to be backwards compatible at all times and reserve the right to tweak or change the behavior in the event we believe the new behavior is better. We do however commit to maintaining the mechanisms leveraged by guess functional over time and we will strive to not change the behavior of existing Data Connector transfers unchanged, even in the event of backwards incompatible changes being introduced forward (as it is in this case).

Moving forward, please do not expect that guess command will always return the same result for the same input.

Our goal is to continuously improve this functionality, and thus it will change gradually over time. Please rest assured that all configurations deployed in production will continue to work as expected.


Last modified: Dec 14 2016 12:18:56 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.