The two most popular engines are Hive and Presto:

  • Hive. Designed for querying of large data systems in the open-source Hadoop platform. Hive 2019.1 converts SQL-like queries into MapReduce jobs for easy execution and processing of extremely large volumes of data. Hive is optimized for query throughput and is described as a pull-model.

  • Presto. Designed for fast, interactive queries on data in HDFS, and others. Presto is optimized for latency and is often described as a pull model.


Hive

Presto

Optimized for

Throughput

Interactivity

SQL Standardized fidelity

HiveQL (a subset of common data warehousing SQL)

Designed to comply with ANSI SQL

Window functions

Yes

Yes

Large JOINs

Very good for large Fact-to-Fact joins

Optimized for star scheme joins (1 large Fact table and many smaller dimension tables).

  • No labels