The Treasure Data Hive service (TD Hive) provides batch data processing on data stored in Treasure Data’s data lake, based on Apache Hive. TD eliminates the need to run your own Hadoop clusters to handle Hive processing. Instead, Treasure Data operates compute clusters for running Hive jobs.
The Customer Data Platform (CDP) application uses Hive jobs for some of its internal operations, and you can also run your own Hive jobs. You can submit SELECT or DML queries using Hive’s query language, using TD Console, API calls, the TD Toolbelt, or from TD workflows. The service queues, executes the queries, and returns the results. You can also design your system so that results are delivered to destinations specified in your Result Output.
The Hive query language (HiveQL) is the primary data processing method for Treasure Data. HiveQL is powered by Apache Hive. Treasure Data is a CDP that allows users to collect, store, and analyze their data on the cloud. Treasure Data manages its own Hadoop cluster, which accepts queries from users and executes them using the Hadoop MapReduce framework. HiveQL is one of the languages it supports.
Hive and HiveQL Differences
TD Hive supports the flexible schema capabilities of the TD platform; schema can be inferred from data loaded into tables or can be set explicitly.
Treasure Data supports HiveQL semantics, but unlike Apache Hive, Treasure Data allows the user to set and modify the schema at any time. We do not require that a table schema be defined upfront.
Hive 0.13 and Hive 2 Open Source Documentation
The standard Hive 0.13 HiveQL and Hive 2.x ANSI SQL are documented together in the open-source projects. For full information, see https://hive.apache.org/.