Treasure Data officially stopped support for v0.13 on May 23, 2022. Customers will note Hive 2/Tez is now the default processing engine. Treasure Data migrated because Hive 2020.1 complies with ANSI SQL standards and allows Treasure Data to utilize future release enhancements.
The following outline summarizes a possible Hive0.13 to Hive 2020.1 migration path. The goals are:
Minimize disruption to workloads currently running successfully in Hive 0.13.
Maximize the benefits from Hive 2 in stages appropriate to the business.
Migration Stages
Testing
Treasure Data recommends you run queries with Hive 2020.1 for testing purposes:
Modify Existing Queries: Update your existing queries to remove Hive 2 incompatible syntax. We recommend rewriting your queries for stricter Hive 2020.1-compatible syntax and then run the modified queries in Hive 0.13. For more information on syntaxes, see Hive 0.13 and Hive 2020.1.
New Queries: Make sure they are compatible with Hive 2020.1 syntax and avoid the incompatible syntax from Hive 0.13.
Test Workloads: Work with your technical account management or support to test new and existing workloads for use with Hive 2020.1.
Production
Treasure Data suggests that you use Hive 2020.1 for the following workload types:
Production workloads
New ad-hoc and batch workloads
You should also review and upgrade all other existing workloads.
Default Processing Engine
TD Console: Hive users will see Hive 2020.1 as the default Hive version.
CLI / API: If no version is specified, Hive 2/Tez is used after changing the default processing engine.
Run Queries
Review the following table to understand how to run Hive 0.13 and Hive 2/Tez queries.
Hive Engine | Run Hive 0.13 Query | Run Hive 2/Tez Query |
---|---|---|
Default = Hive 0.13 | No changes | Override the engine version from None to Stable. |
Default = Hive 2020.1 | Explicitly specify | No changes |
Semantic Checking Process
There are a couple of options available for the semantic checking of Hive queries; both positive and negative impacts are explained.
Test Query
Run ‘EXPLAIN <query>’
Positive Impact
No impact on production
Check SQL compliance easily
Negative Impact
The statement prints the query plan only; you cannot look at the result.
Run Query / Insert into Temporary Table
Positive Impact
Make comparisons on results.
Negative Impact
If you do not rewrite insert queries, Treasure Data will insert the results into production tables for testing.
EXPLAIN Example
The EXPLAIN command can help check for errors without having to execute a query.
1 explain select count (*) from call center;
The following is a sample of a result from the call:
1 FAILED: ParseException line 1:29 cannot recognize input near ‘from’ ‘call center’<EOF> in join source
See also: