Release Note 20150217

Table of Contents

Features & Improvements

This is a summary of the new features and improvements introduced in this release:

Console: Signup Workflow

We completely redesigned and improved the signup for our Service: both look and feel and workflow were redesigned from scratch.

Console: Signup Workflow

Client Tools: Released Ruby Client Library v0.8.68

A new Ruby Client Library was released: v0.8.68.

A few fixes were added for this release:

  • Fixed the ‘history’ API to returned the appropriate value for the ‘result’ and ‘priority’ fields – the schedule’s priority level -2, -1, 0, 1, and 2 was improperly returned in the ‘result’ field instead.
  • Improved the message when POST requests are executed and failed – no retrying is performed by default
  • Upgraded to httpclient v2.5 gem

Client Tools: Released Ruby CLI v0.11.8

We released a new version of the TD CLI, version 0.11.8.

Several fixes and improvements were added for this release:

  • Greatly improved the result download performance for ‘td query’ and ‘td job:show’ commands
  • Modified ‘td job:list’ to return the whole query text when rendering the jobs list in CSV, TSV, or JSON formats instead of default tabular format (on stdout)
  • Improved handling of the Bulk Import CLI’s Java process timeouts by adding the —bulk-import-timeout option to allow the user more control
  • Fixed the ‘td server:endpoint’ command (was causing errors before)
  • The ‘td sched:history’ command was improperly rendering -2, -1, 0, 1, 2 in the Result column – the schedule’s priority was being displayed instead. Furthermore the command crashed when the schedule’s ‘scheduled_at’ field was empty (as for Unscheduled/Saved queries)
  • Sanitizing input file names for Bulk Import CLI using UTF-8 encoding to avoid APIs issues (Multi byte encoding is not supported)


Bug Fixes

These are the most important Bug Fixes made in this release:

Backend: Hive’s TD_DATE_TRUNC Handling of Null Arguments

  • [Problem]
    If the 2nd argument (typically containing the column name) of the TD_DATE_TRUNC function is NULL, the query fails.
    [Solution]
    This is due to the fact that NULL values in the 2nd argument of the functions cannot be properly handled and cause Null Pointer Exceptions (NPE) internally – thus the query fails.
    We improved the UDFs handling of this case and made it return NULL gracefully in such condition.

Backend: CREATE TABLE AS SELECT Succeeds in Hive

  • [Problem]
    Hive query using the CREATE TABLE AS SELECT fails silently but their final status is ‘success’.
    [Solution]
    CREATE TABLE AS SELECT (CTAS) is not currently supported for Hive, only for Presto. However we weren’t handling this case appropriately by barring the user from utilizing CTAS.
    We modified Hive’s internal logic to throw an exception if the CTAS statement is used in a Treasure Data query.

Backend: Presto Queries without Progress

  • [Problem]
    Sometimes Presto queries can get stuck in running state and show no progress at all.
    [Solution]
    This problem is typically occurring when a Presto worker node in the cluster is restarted while the query was being executed on said node (as well as on others). This causes the query to never complete because no execution task is actually committed to the restarted node so the Presto coordinator is waiting for a task completion that never happens.
    This problem was solved in the upstream Presto repository and was back ported to our codebase. In such case, the coordinator will restart the task (and that task only) on the restarted node to resolve the deadlock.

Backend: Presto TD_SCHEDULED_TIME within other Time-based Presto UDFs

  • [Problem]
    When the TD_SCHEDULED_TIME UDF is used within other TD Presto UDFs (e.g. TD_TIME_ADD) the time returned is ‘current time’ instead of the expected ‘scheduled time’.
    [Solution]
    This problem is due to the internal referencing of the ‘scheduled time’ when TD_SCHEDULED_TIME is used inside another time-based Treasure Data Presto UDF such as TD_TIME_ADD.
    We fixed this inconsistency to make sure that TD_SCHEDULED_TIME can reference ‘scheduled time’ and not ‘current time’ under all conditions.



Last modified: Feb 24 2015 10:05:37 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.