The two most popular engines are Hive and Presto:

  • Hive. Designed for querying of large data systems in the open-source Hadoop platform. Hive 2019.1 converts SQL-like queries into MapReduce jobs for easy execution and processing of extremely large volumes of data. Hive is optimized for query throughput and is described as a pull-model.

  • Presto. Designed for fast, interactive queries on data in HDFS, and others. Presto is optimized for latency and is often described as a pull model.



Optimized for



SQL Standardized fidelity

HiveQL (a subset of common data warehousing SQL)

Designed to comply with ANSI SQL

Window functions



Large JOINs

Very good for large Fact-to-Fact joins

Optimized for star scheme joins (1 large Fact table and many smaller dimension tables).

  • No labels