Hive/Presto/Standard SQL Tips

Table of Contents

Tips 1. PIVOT / UNPIVOT

This tip shows how to use the PIVOT and UNPIVOT operators in Standard SQL and Hive and Presto, and Pandas.

The PIVOT operator transforms rows into columns. On the other hand, The UNPIVOT operator transforms columns into rows.

Example Tables

  • Example of vertical table (vtable)
uid key value
101 c1 11
101 c2 12
101 c3 13
102 c1 21
102 c2 22
102 c3 23
  • Example of horizontal table (htable)
uid c1 c2 c3
101 11 12 13
102 21 22 23

Standard SQL

This example can be used wherever SQL-like query language.

PIVOT

SELECT uid,
   max(CASE WHEN key = 'c1' THEN value END) AS c1,
   max(CASE WHEN key = 'c2' THEN value END) AS c2,
   max(CASE WHEN key = 'c3' THEN value END) AS c3
FROM vtable
GROUP BY uid
uid c1 c2 c3
101 11 12 13
102 21 22 23

UNPIVOT

SELECT uid, 'c1' AS key, c1 AS value FROM htable
UNION ALL
SELECT uid, 'c2' AS key, c2 AS value FROM htable
UNION ALL
SELECT uid, 'c3' AS key, c3 AS value FROM htable
uid key value
101 c1 11
102 c1 21
101 c2 12
102 c2 22
101 c3 13
102 c3 23

Hive

PIVOT

Hive on TreasureData supports to_map UDAF, which can generate Map type, and then transforms rows into columns. But, general Hive function doesn’t support.

SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
  SELECT uid, to_map(key, value) kv
  FROM vtable
  GROUP BY uid
) t
uid c1 c2 c3
101 11 12 13
102 21 22 23

UNPIVOT

LATERAL VIEW explode function transforms columns into rows.

SELECT t1.uid, t2.key, t2.value
FROM htable t1
LATERAL VIEW explode (map(
  'c1', c1,
  'c2', c2,
  'c3', c3
)) t2 as key, value
uid key value
101 c1 11
101 c2 12
101 c3 13
102 c1 21
102 c2 22
102 c3 23

Presto

PIVOT

This SQL transforms rows into columns by map_agg function.

SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
  SELECT uid, map_agg(key, value) kv
  FROM vtable
  GROUP BY uid
) t
uid c1 c2 c3
101 11 12 13
102 21 22 23

UNPIVOT

CROSS JOIN unnest function is similar to LATERAL VIEW explode function. It also transforms columns into rows.

SELECT t1.uid, t2.key, t2.value
FROM htable t1
CROSS JOIN unnest (
  array['c1', 'c2', 'c3'],
  array[c1, c2, c3]
) t2 (key, value)
uid key value
101 c1 11
101 c2 12
101 c3 13
102 c1 21
102 c2 22
102 c3 23

Pandas

Pandas is the popular Python-based data analysis tool. You can learn the detail on here. On Pandas, you can also use PIVOT/UNPIVOT by pivot/melt function

PIVOT

In [1]: vtable.pivot('uid', 'key', 'value')
Out[1]:
key  c1  c2  c3
uid
101  11  12  13
102  21  22  23

UNPIVOT

In [2]: pd.melt(htable, 'uid', var_name='key')
Out[2]:
  uid  key  value
0 101   c1     11
1 101   c2     12
2 101   c3     13
3 102   c1     21
4 102   c2     22
5 102   c3     23

Tips 2. Get data across different databases

This tip shows how to get data across several databases in Hive and Presto.

Hive and Presto are able to analyze different database data by one query.

Example Databases and Tables

  • Example of animals table in zoo_a database
id name sex
1 Lion m
2 Gorilla m
3 Zebra f
4 Giraffe f
  • Example of emp animals in zoo_b database
animal_id name sex
101 Lion m
102 Bear f
103 Elephant f
104 Gorilla f
105 Tiger m
106 Monkey m
107 Rhinoceros f

Example SQL

If you use refer to fully qualified name of a table in this format database.table, you can query and process data from multiple databases.

SELECT name,sex FROM zoo_a.animals
UNION
SELECT name,sex FROM zoo_b.animals
name sex
Lion m
Giraffe f
Bear f
Monkey m
Gorilla m
Gorilla f
Elephant f
Rhinoceros f
Tiger m
Zebra f

Last modified: Apr 08 2016 01:31:52 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.