Most queries from Hive 0.13 should work with minimal or no modification in Hive 2. However, Hive 2 is stricter than Hive 0.13.
(A MySQL-like dialect)
ANSI SQL syntax
More reserved words than HiveQL. Quote reserve words with backticks.
Numeric data types
Some values must be CAST explicitly.
Treasure Data specific syntax
Obsolete. Remove syntax from queries.
The following Hive 0.13 constructs must be modified to be ready for Hive 2.
Stricter Syntax: Additional Reserved Words and Function Names Require Escapes
Reserved words and UDF names used as column names must be escaped with backticks.
The most common case of this is `date`.
Reserved words manifest as parse errors in Hive 2. Parse error issues indicate that you need to find the keyword in the script and put backticks around it.
Parse errors received from Hive 2.
To fix this, given the current query syntax of the following query.
Rewrite the query. Notice that the date is surrounded by left-single quotation marks.
For more information, review the TD Hive UDF documentation. Additionally, you might want to review the additional UDFs in HiveMall UDF documentation.
Stricter Syntax: In SELECT, Use “AS” with Aliases
In testing some SELECT statements you might encounter a semantic error such as:
FAILED: SemanticException Line 0:-1 Invalid table alias or column reference 'sq_1': (possible column names are…
This can happen when Hive 2 is unable to unambiguously determine whether an identifier is a column alias or the name of a real column. For example, the following query produced an error.
The following error was thrown:
FAILED: SemanticException Line 0:-1 Invalid table alias or column reference 'sq_1': (possible column names are: _table_or_col a) x_uuid) sq_corr_1)), (. (tok_table_or_col sq_1) sq_corr_1))
The problem is that x_uuid is the name of a source column in the source table X, as well as the desired column alias in the SELECT.
The fix is to use an AS to make it unambiguous that the identifier is an alias.
Alternatively, you can avoid using the name of an existing column as an alias.
Stricter Syntax: Lack GROUP BY Key
GROUP BY clause in Hive2 is stricter than Hive 0.13.
If the name of column in GROUP BY does not match the name of column in SELECT, Hive 2 will fail as the error, Expression not in GROUP BY key 'column_name'.
For example, the following query will fail because the name, column1 in GROUP BY clause does not match the name, 'test_table.column1 in SELECT.
Since the syntax of the above query statement is wrong to begin with, please modify it as follows.
Stricter Syntax: Ambiguous Column Reference
If the query attempts to extract the specific column 2 or more times, Hive 2 checks the syntax stricter than Hive 0.13, the query will fail as the error, Ambiguous column reference <column_name> in <table_name>.
For example, the following query extracts the columns named id twice. In this case, Hive 0.13 will succeed, but Hive 2020.1 will be the error.
Since the second and subsequent ones are unnecessary, please delete them as follows.
Stricter Syntax: Require Type Conversion
Hive 2 check the data type of argument for functions stricter.
For example, LENGTH function expects string type as data type of argument.
In the case of Hive 0.13, even if the data type of argument is NOT string, the following query will work.
Hive 2 will fail as the error, LENGTH() only takes STRING/CHAR/VARCHAR/BINARY types as first argument, got LONG.
Therefore, please convert the data type of argument from unsupported data type to supported data type like string type as follows.
Stricter Syntax: Forbid ORDER BY or LIMIT in UNION Table
Hive 2 does not accept ORDER BY clause and LIMIT n in query statements against UNION ALL operator.
For example, the following query will fail in the case of Hive 2.
Since ORDER BY clause does not work as you expected firstly, please remove the part.
In the case of LIMIT n, you can avoid the error by using subquery or WITH clause as follows.
Stricter Types: CAST Non-Integer Numeric Literals to DOUBLE Before Storage in TD
Numeric literals that are not integers, default to DECIMAL in Hive 2. However, TD storage only supports DOUBLE. Therefore you must CAST any numeric value to DOUBLE before storing it in a table.
Without the casting, you may see an error similar to this:
Here is an example that works with Hive 0.13, but fails with Hive 2 if not type cast.
Assume that new_table_with_no_columns does not exist when this statement is run. TD assigns new_table_with_no_columns a schema based on the schema of sourceCTE. The constant 51.5211 has type DECIMAL(6,4), so sourceCTE.col1decimal is of type DECIMAL(6,4).
The INSERT tries to insert the DECIMAL(6,4) into the output table. In Hive 0.13, the value is automatically CAST as needed to DOUBLE and written. In Hive 2, this CAST is not automatic, so you get the error.
This requirement only applies to literal numeric values; if you attempt to store an INTEGER column, return value from a function, etc. into a TD table, TD Hive casts the value appropriately.
Stricter Types: All Numeric Literals in an Array Must be CAST to the Same Data Type
All numeric literals in an array must be CAST to a consistent data type.
Here is an example that works with Hive 0.13 and gives an error with Hive 2.
Can produce an error similar to this:
To fix the error, CAST all values to decimal(19,9) (19 digits with 9 digits after the decimal).
CROSS JOIN requires Query Hint
As a precaution, TD Hive 2 restricts the use of CROSS JOINs because they are often coding errors and are expensive to execute. If you intend to perform a CROSS JOIN, apply the following query hint to bypass this protection:
This is similar to setting hive.mapred.mode to nonstrict in a generic Hive environment.
Explicit CROSS JOIN example:
Implicit CROSS JOIN example (a JOIN without a WHERE clause):
If you run a query with an explicit or implicit CROSS JOIN without adding the query hint, you’ll see an error like this:
FAILED: SemanticException Cartesian products are disabled for safety reasons. If you know what you are doing, please set hive.strict.checks.cartesian.product to false and that hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get errors or incorrect results if you make a mistake while using some of the unsafe features.
TD Hive 0.13 Obsolete Feature: v-column Syntax
v-column syntax is an obsolete TD-specific method of referring to columns in TD tables. It has been deprecated for years, but still occasionally appears in some old queries. It is not supported in Hive 2.
v-column syntax is still useful in certain TD debugging scenarios, examining ingested data before schema is assigned to it. In such scenarios, we recommend the use of Hive 0.13 during debugging.
The following changes are required for any query using v column syntax:
v column values are always returned as STRING type
In Hive 2, rather than use the v column syntax, reference the column, but explicitly CAST it to STRING, or whatever other type is needed. For example:
If you use v column in Hive 2, you’ll see the following error message:
Columns selected through v column where names start with underscore need to be escaped with back quotes (`) when rewritten without v column
Make sure that the queries don’t use v syntax because the Hive 2 compiler will give an error when using v syntax. For example:
‘Invalid table alias or column reference 'v'
If the original query looks like this:
V column version:
Rewrite this as:
Example: If a current query is:
Rewrite this as: