Data Connector “Guess” Command Upcoming Change (20161219)
We are announcing a change to our Data Connector
guess command functionality, to enable the creation of “schema-flexible” ingestion configurations.
Table of Contents
Summary of Impact
On December 19th PST we will change the behavior of the Data Connector
guess command to provide more flexibility for use when pulling data from SaaS Applications, Business Systems, & Object Stores.
This change will not impact any scheduled Data Connectors pulling data into Treasure Data. It only impacts the creation of new Transfers in the Connectors GUI or Configuration File when using “td connector:guess” command or when creating a new transfer in the My Connections tab of the new console.
What does the “guess” command do?
guess command automatically suggests the best configuration of Extract, Transform, & Load (ETL) processes to occur when data is bulk loaded into Treasure Data using Data Connector.
What change is being made? Why is being made?
The change will enable more flexibility for situations when the schema changes at a data source you are pulling data into Treasure Data from.
Today, there are a number of integrations that, when
guess command is run, suggests explicit changes to every column name in order to accommodate requirements of Treasure Data storage system.
Current example auto-generated configuration:
- type: rename columns: Column-Name1: column_name1 Column-Name2: column_name2 ...
The problem with this above approach is that, if a new column is added or if a column name is modified, the date will not be successfully brought into Treasure Data without creating a new configuration.
Moving forward, we want these schema changes to be handled without requiring the creation of a new configuration. Specifically, the above example for current
guess behavior will behave as follows:
New example auto-generated configuration:
- type: rename rules: - rule: upper_to_lower - rule: character_types pass_types: ["a-z", "0-9"] pass_characters: "_" replace: "_" - rule: first_character_types pass_types: ["a-z"] pass_characters: "_" prefix: "_" - rule: unique_number_suffix max_length: 128
With this new rules based approach, the
guess command will no longer explicitly list every column change. Instead, only the rule applied to all columns will be listed. This will allow the source system to have new columns, or column name changes, to be incorporated automatically.
Which Data Connectors are impacted?
This will impact new configurations generated for most Data Connectors.
But, be certain this will have no impact to currently running transfer configurations. The only impact may occur if user re-creates a transfer from scratch.
Backwards compatibility of “guess”
As we strive to improve of
guess over time, we do not guarantee the results to be backwards compatible at all times and reserve the right to tweak or change the behavior in the event we believe the new behavior is better. We do however commit to maintaining the mechanisms leveraged by
guess functional over time and we will strive to not change the behavior of existing Data Connector transfers unchanged, even in the event of backwards incompatible changes being introduced forward (as it is in this case).
Moving forward, please do not expect that
guess command will always return the same result for the same input.
Our goal is to continuously improve this functionality, and thus it will change gradually over time. Please rest assured that all configurations deployed in production will continue to work as expected.
Last modified: Feb 24 2017 09:41:25 UTC