Hive’s Array column type behavior change

This page describes that our upcoming release for the Hadoop 2 environment will include a Hive change that is backwards incompatible.

Table of Contents

What Is Changing

The change in behavior affects Hive users in the Hadoop 2 clusters when querying a table that contains a column of type Array:

image1

and the column value is NULL.

Before the change Hive was treating NULL values as empty arrays []. After, Hive will treat NULL values as NULL.

image2

Release Date

The Release is set to take place on 2016-02-02

Who Is Affected

All customers already migrated to the Hadoop 2 clusters and using Hive will be affected by this change of behavior. However queries running at the time the release takes place, will not be affected. All customers still assigned to the Hadoop 1 clusters will not be affected. They will only observe this change of behavior after they migrated to an Hadoop 2 cluster.

Furthermore, this change doesn’t affect Presto queries and does not alter the stored data from the existing tables in any way.

Why Are We Changing It

This fix is needed to make the behavior of our Hive engine consistent with that of the standard Apache Hive and requires a change in the integration between Hive and our proprietary storage system called ‘PlazmaDB’.

If you have any questions about this change, please contact us at support@treasuredata.com.


Last modified: Dec 23 2016 04:08:01 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.