Most queries from Hive 0.13 should work with minimal or no modification in Hive 2. However, Hive 2 is stricter than Hive 0.13.
(A MySQL-like dialect)
ANSI SQL syntax
More reserved words than HiveQL. Quote reserve words with backticks.
Numeric data types
Some values must be CAST explicitly.
Treasure Data specific syntax
Obsolete. Remove syntax from queries.
The following Hive 0.13 constructs must be modified to be ready for Hive 2.
Stricter Syntax: UNION and Matching or Unmatching Types
Stricter Syntax: Additional Reserved Words and Function Names Require Escapes
Reserved words and UDF names used as column names must be escaped with backticks.
The most common case of this is `date`.
Reserved words manifest as parse errors in Hive 2. Parse error issues indicate that you need to find the keyword in the script and put backticks around it.
Parse errors received from Hive 2.
To fix this, given the current query syntax of the following query.
Rewrite the query. Notice that the date is surrounded by left-single quotation marks.
For more information, review the TD Hive UDF documentation. Additionally, you might want to review the additional UDFs in HiveMall UDF documentation.
Stricter Syntax: In SELECT, Use “AS” with Aliases
In testing some SELECT statements you might encounter a semantic error such as:
FAILED: SemanticException Line 0:-1 Invalid table alias or column reference 'sq_1': (possible column names are…
This can happen when Hive 2 is unable to unambiguously determine whether an identifier is a column alias or the name of a real column. For example, the following query produced an error.
The following error was thrown:
FAILED: SemanticException Line 0:-1 Invalid table alias or column reference 'sq_1': (possible column names are: _table_or_col a) x_uuid) sq_corr_1)), (. (tok_table_or_col sq_1) sq_corr_1))
The problem is that x_uuid is the name of a source column in the source table X, as well as the desired column alias in the SELECT.
The fix is to use an AS to make it unambiguous that the identifier is an alias.
Alternatively, you can avoid using the name of an existing column as an alias.
Stricter Syntax: Group by Columns and Partition by Columns are Different
A query that uses a
Group by and
Partition in a single query does not work on Hive 2.0.
To run this query on Hive 2.0, you would use the following code where all deleted columns are in
Stricter Types: CAST Non-Integer Numeric Literals to DOUBLE Before Storage in TD
Numeric literals that are not integers, default to DECIMAL in Hive 2. However, TD storage only supports DOUBLE. Therefore you must CAST any numeric value to DOUBLE before storing it in a table.
Without the casting, you may see an error similar to this:
Here is an example that works with Hive 0.13, but fails with Hive 2 if not type cast.
Assume that new_table_with_no_columns does not exist when this statement is run. TD assigns new_table_with_no_columns a schema based on the schema of sourceCTE. The constant 51.5211 has type DECIMAL(6,4), so sourceCTE.col1decimal is of type DECIMAL(6,4).
The INSERT tries to insert the DECIMAL(6,4) into the output table. In Hive 0.13, the value is automatically CAST as needed to DOUBLE and written. In Hive 2, this CAST is not automatic, so you get the error.
This requirement only applies to literal numeric values; if you attempt to store an INTEGER column, return value from a function, etc. into a TD table, TD Hive casts the value appropriately.
Stricter Types: All Numeric Literals in an Array Must be CAST to the Same Data Type
All numeric literals in an array must be CAST to a consistent data type.
Here is an example that works with Hive 0.13 and gives an error with Hive 2.
Can produce an error similar to this:
To fix the error, CAST all values to decimal(19,9) (19 digits with 9 digits after the decimal).
Stricter Types: Mismatched data types on either side of a UNION
Hive 0.13 allows mismatched data types ( in some cases) for a given column name on either side of a UNION.
Hive 2020.1 produces the following error due to stricter data type checking:
There are two solutions that enable you to run a UNION in both Hive 0.13 and Hive 2020.1.
If you have two tables with the following data types and are trying to run a UNION that used to work in Hive 0.13 but doesn’t in Hive 2020.1.
A UNION will work in Hive 0.13 but fail in Hive2020.1.
In this solution, we do not change the underlying types of tables. but Instead, CAST columns in the SQL. The following code describes two options.
Solution 1: Option A
Solution 1: Option B
In this solution, change the underlying types of one of the tables to match.
Solution 2: Option A
Solution 2: Option B
CROSS JOIN requires Query Hint
As a precaution, TD Hive 2 restricts the use of CROSS JOINs because they are often coding errors and are expensive to execute. If you intend to perform a CROSS JOIN, apply the following query hint to bypass this protection:
This is similar to setting hive.mapred.mode to nonstrict in a generic Hive environment.
Explicit CROSS JOIN example:
Implicit CROSS JOIN example (a JOIN without a WHERE clause):
If you run a query with an explicit or implicit CROSS JOIN without adding the query hint, you’ll see an error like this:
FAILED: SemanticException Cartesian products are disabled for safety reasons. If you know what you are doing, please set hive.strict.checks.cartesian.product to false and that hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get errors or incorrect results if you make a mistake while using some of the unsafe features.
TD Hive 0.13 Obsolete Feature: v-column Syntax
v-column syntax is an obsolete TD-specific method of referring to columns in TD tables. It has been deprecated for years, but still occasionally appears in some old queries. It is not supported in Hive 2.
v-column syntax is still useful in certain TD debugging scenarios, examining ingested data before schema is assigned to it. In such scenarios, we recommend the use of Hive 0.13 during debugging.
The following changes are required for any query using v column syntax:
v column values are always returned as STRING type
In Hive 2, rather than use the v column syntax, reference the column, but explicitly CAST it to STRING, or whatever other type is needed. For example:
If you use v column in Hive 2, you’ll see the following error message:
Columns selected through v column where names start with underscore need to be escaped with back quotes (`) when rewritten without v column
Make sure that the queries don’t use v syntax because the Hive 2 compiler will give an error when using v syntax. For example:
‘Invalid table alias or column reference 'v'
If the original query looks like this:
V column version:
Rewrite this as:
Example: If a current query is:
Rewrite this as:
Semantic Error: GROUP BY and PARTITION BY Columns are Different
The syntax correctly fails in Hive 2 when GROUP BY and PARTITION BY columns are different. The same query runs successfully in Hive 0.13, but the query results are incorrect.
Example of a successful query because there is no mismatch in aggregation and selected columns.
Example of a correct fail in Hive 2 because id column must be included in GROUP BY if using a window function such as PARTITION BY.
To return a successful result, include all selected columns in GROUP BY and rewrite as: