This article lists Treasure Data-specific UDFs available in our Hive environments.


TD_ARRAY_INDEX

Syntax

int/long/string TD_ARRAY_INDEX(array column, int i)

Description

This UDF returns an array’s value at the index.

Example

SELECT TD_ARRAY_INDEX( ARRAY(11,12,13), 2 )
         => 13
         SELECT TD_ARRAY_INDEX( ARRAY(11,12,13), 3 )
         => NULL
        

TD_AVGIF

Syntax

double TD_AVGIF(double column, boolean predicate)

Description

This UDF returns the average of a column that satisfies the predicate. TD_AVGIF accepts two parameters:

  1. The calculation for the AVG value.

  2. A check to see if parameter 1 (the AVG calculation) contains a NULL or False value. If a NULL or False value is detected, TD_AVGIF does not calculate the AVG value.

Example

SELECT TD_AVGIF(age, age > 20) FROM tbl;     

TD_DATE_TRUNC

Syntax

long TD_DATE_TRUNC(string unit, long time  [, string default_timezone = 'UTC'])

Description

This UDF performs a timestamp truncation at the level specified by the ‘unit’ parameter. The supported units are:

  • ‘minute’

  • ‘hour’

  • ‘day’

  • ‘week’

  • ‘month’

  • ‘quarter’

  • ‘year’

An optional ‘timezone’ parameter can be specified to indicate an alternative reference timezone the ‘unit’ is referenced to. While the input ‘time’ is in the global Unix time format, in different timezones ‘day’ that is the start of a day corresponds to different times.

This function mimics the functionality of native Presto’s date_trunc function, the main difference being that Presto’s date_trunc does not allow the specification of the timezone, because it uses the sessions' reference timezone.

Example

SELECT TD_DATE_TRUNC('day', time) FROM tbl

with time equal 1416787667 corresponding to ‘2014-11-24 00:07:47 UTC’ will return 1416787200 corresponding to ‘2014-11-24 00:00:00 UTC’.

With the same value and timezone ‘PST’ instead:

SELECT TD_DATE_TRUNC('day', time, 'PST') FROM tbl

the function returns 1416758400 since the start of the day for the ‘PST’ timezone is 8 hours behind the start of the day for ‘UTC’.

TD_DIVIDE

Syntax

double TD_DIVIDE(double numerator, double denominator)

Description

This UDF returns the division of numeric types safely even if the denominator is zero. If the denominator is zero, it returns 0. Otherwise, it returns the numerator divided by the denominator.

Example

SELECT TD_DIVIDE(nume, denom) FROM tbl;
        

TD_FIRST

Syntax

TD_FIRST(ret_col, cmp_col1, cmp_col2, ...)

Description

This aggregate UDF finds the row with the smallest value in the ‘comparison column’ (cmp_col1) and returns that row’s ‘retrieve column’ (ret_col) value. Additional comparison columns (cmp_col2, cmp_col3, etc.) are used as tiebreakers when the preceding comparison column has more than one row with the smallest value.

Example

This example selects the referrer URL of the earliest access log (the row with the smallest time) for each page_id.

SELECT page_id, TD_FIRST(referer, time) AS first_referer FROM access_logs GROUP BY page_id

TD_INTERVAL

Syntax

TD_INTERVAL(time, interval_string, default_timezone)

boolean TD_INTERVAL(int/long time,
                    string interval_string,
                    [, string default_timezone = 'UTC'])

Description

TD_INTERVAL() is a companion function to TD_TIME_RANGE(). Both are especially useful in WHERE clauses, to make sure that your queries take advantage of time-based partitioning. TD_INTERVAL is used to compute relative time ranges that would otherwise require complex date manipulation. (TD_TIME_RANGE is used for absolute time ranges.)

This function is supported from Hive 2020.1 version.

Example

These examples assume that the scheduled_time (or query start time) is 2018-08-14 01:23:45 (Tue, UTC):

# The last 7 days [2018-08-07 00:00:00, 2018-08-14 00:00:00)
SELECT  ... WHERE TD_INTERVAL(time, '-7d')
# The last week. Monday is the beginning of the week (ISO standard) [2018-08-05 00:00:00, 2018-08-13 00:00:00)
SELECT  ... WHERE TD_INTERVAL(time, '-1w')
# Today [2018-08-14 00:00:00, 2018-08-15 00:00:00)
SELECT  ... WHERE TD_INTERVAL(time, '1d')
# The last month [2018-07-01 00:00:00, 2018-08-01 00:00:00)
SELECT  ... WHERE TD_INTERVAL(time, '-1M')
# This month [2018-08-01 00:00:00, 2018-09-01 00:00:00)
SELECT  ... WHERE TD_INTERVAL(time, '1M')
# This year [2018-01-01 00:00:00, 2019-01-01 00:00:00)
SELECT  ... WHERE TD_INTERVAL(time, '1y')
# The last 15 minutes [2018-08-14 00:08:00, 2018-08-14 01:23:00)
SELECT  ... WHERE TD_INTERVAL(time, '-15m')
# The last 30 seconds [2018-08-14 01:23:15, 2018-08-14 01:23:45)
SELECT  ... WHERE TD_INTERVAL(time, '-30s')
# The last hour [2018-08-14 00:00:00, 2018-08-14 01:00:00)
SELECT  ... WHERE TD_INTERVAL(time, '-1h')
# From the last hour to now [2018-08-14 00:00:00, 2018-08-14 01:23:45)
SELECT  ... WHERE TD_INTERVAL(time, '-1h/now')
# The last hour since the beginning of today [2018-08-13 23:00:00, 2018-08-14 00:00:00)
SELECT  ... WHERE TD_INTERVAL(time, '-1h/0d')
# The last 7 days since 2015-12-25 [2015-12-18 00:00:00, 2015-12-25 00:00:00)
SELECT  ... WHERE TD_INTERVAL(time, '-7d/2015-12-25')
# The last 10 days since the beginning of the last month [2018-06-21 00:00:00, 2018-07-01 00:00:00)
SELECT  ... WHERE TD_INTERVAL(time, '-10d/-1M')
# The last 7 days in JST
SELECT  ... WHERE TD_INTERVAL(time, '-7d', 'JST')

TD_IP_TO_CITY_NAME

Syntax

string TD_IP_TO_CITY_NAME(string ip)

Description

This UDF converts IP address to the city name. This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_CITY_NAME('106.142.252.8') AS ipv4,
             TD_IP_TO_CITY_NAME('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
        

TD_IP_TO_CONNECTION_TYPE

Syntax

string TD_IP_TO_CONNECTION_TYPE(string ip)

Description

This UDF converts IP address to connection type. This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_CONNECTION_TYPE('106.142.252.8') AS ipv4,
             TD_IP_TO_CONNECTION_TYPE('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
        

Possible values are dial-up, cable/DSL, corporate, or cellular.

TD_IP_TO_COUNTRY_CODE

Syntax

string TD_IP_TO_COUNTRY_CODE(string ip)

Description

This UDF converts IP address to country code. This UDF supports IPv4 and IPv6.

Example

SELECT      TD_IP_TO_COUNTRY_CODE('106.142.252.8') AS ipv4,
  TD_IP_TO_COUNTRY_CODE('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6

The function returns JP in this example.

TD_IP_TO_COUNTRY_NAME

Syntax

string TD_IP_TO_COUNTRY_NAME(string ip)

Description

This UDF converts IP address to country code. This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_COUNTRY_NAME('106.142.252.8') AS ipv4,
             TD_IP_TO_COUNTRY_NAME('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
        

The function returns Japan in this example.

TD_IP_TO_DOMAIN

Syntax

string TD_IP_TO_DOMAIN(string ip)

Description

This UDF converts IP address to a domain. This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_DOMAIN('106.142.252.8') AS ipv4,
             TD_IP_TO_DOMAIN('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
        

TD_IP_TO_LATITUDE

Syntax

string TD_IP_TO_LATITUDE(string ip)

Description

This UDF converts IP addresses to latitude. This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_LATITUDE('106.142.252.8') AS ipv4,
             TD_IP_TO_LATITUDE('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
        

TD_IP_TO_LEAST_SPECIFIC_SUBDIVISION_NAME

Syntax

string TD_IP_TO_LEAST_SPECIFIC_SUBDIVISION_NAMES(string ip)

Description

This UDF converts IP addresses to the least specific subdivisions (e.g. US states, JP prefectures, etc). This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_LEAST_SPECIFIC_SUBDIVISION_NAME('106.142.252.8') AS ipv4,
             TD_IP_TO_LEAST_SPECIFIC_SUBDIVISION_NAME('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
        

TD_IP_TO_LONGITUDE

Syntax

string TD_IP_TO_LONGITUDE(string ip)

Description

This UDF converts IP address to longitude. This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_LONGITUDE('106.142.252.8') AS ipv4,
             TD_IP_TO_LONGITUDE('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
        

TD_IP_TO_METRO_CODE (US Only)

Syntax

string TD_IP_TO_METRO_CODE(string ip)

Description

This UDF converts IP address to metro code (US Only). This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_METRO_CODE('106.142.252.8') AS ipv4,
             TD_IP_TO_METRO_CODE('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
        

TD_IP_TO_MOST_SPECIFIC_SUBDIVISION_NAME

Syntax

string TD_IP_TO_MOST_SPECIFIC_SUBDIVISION_NAME(string ip)

Description

This UDF converts IP addresses to the most specific subdivisions (e.g. US states, JP prefectures, etc). This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_MOST_SPECIFIC_SUBDIVISION_NAME('106.142.252.8') AS ipv4,
             TD_IP_TO_MOST_SPECIFIC_SUBDIVISION_NAME('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
        

TD_IP_TO_POSTAL_CODE

Syntax

string TD_IP_TO_POSTAL_CODE(string ip)

Description

This UDF converts IP address to postal code. This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_POSTAL_CODE('106.142.252.8') AS ipv4,
             TD_IP_TO_POSTAL_CODE('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
        

TD_IP_TO_SUBDIVISION_NAMES

Syntax

array<string> TD_IP_TO_SUBDIVISION_NAMES(string ip)

Description

This UDF converts IP addresses to a list of subdivisions (e.g. US states, JP prefectures, etc). This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_SUBDIVISION_NAMES('106.142.252.8') AS ipv4,
            TD_IP_TO_SUBDIVISION_NAMES('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
         

TD_IP_TO_TIME_ZONE

Syntax

string TD_IP_TO_TIME_ZONE(string ip)

Description

This UDF converts IP address to time zone. This UDF supports IPv4 and IPv6.

Example

SELECT    TD_IP_TO_TIME_ZONE('106.142.252.8') AS ipv4,
             TD_IP_TO_TIME_ZONE('2001:268:d005:f5be:c43e:af35:81f:8f60') AS ipv6
        

TD_LAST

Syntax

TD_LAST(ret_col, cmp_col1, cmp_col2, ...)

Description

This aggregate UDF finds the row with the largest value in the ‘comparison column’ (cmp_col1) and returns that row’s ‘retrieve column’ (ret_col) value. Additional comparison columns (cmp_col2, cmp_col3, etc.) are used as tiebreakers when the preceding comparison column has more than one row with the largest value.

Example

This example selects the URL of the most recent access log (the row with the largest time) for each user.

SELECT user, TD_LAST(url, time) AS last_url FROM access_logs GROUP BY user

TD_LAT_LONG_TO_COUNTRY

Syntax

string TD_LAT_LONG_TO_COUNTRY(string type, double latitude, double longitude)

Description

This UDF converts geolocation information (latitude/longitude) to the country name.

Example

SELECT  TD_LAT_LONG_TO_COUNTRY('FULL_NAME',                 37, -122)
           TD_LAT_LONG_TO_COUNTRY('THREE_LETTER_ABBREVIATION', 37, -122)
           TD_LAT_LONG_TO_COUNTRY('POSTAL_ABBREVIATION',       37, -122)
           TD_LAT_LONG_TO_COUNTRY('SORTABLE_NAME',             37, -122)

You might notice occasional, inconsistent results between Hive and Presto geolocation functions.

Between Hive, Presto, and their different versions, UDFs use a geolocation database supplied by Maxmind. However, due to release schedules, the release level of the Maxmind database used by Hive and Presto might be different.

An example of different results is as follows:

jobid

type

td_ip_to_city_name_v6

td_ip_to_latitude_v6

td_ip_to_longitude_v6

td_ip_to_postal_code_v6

218018944

hive

Tokyo

35.685

139.7514

102-0082

218019099

presto


35.6594

139.8533

134-0087


TD_MD5

Syntax

string TD_MD5(col)

Description

This UDF calculates the MD5 hash digest from a given string. The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value.

Example

SELECT TD_MD5(column) FROM tbl
        

TD_NUMERIC_RANGE

Syntax

int TD_NUMERIC_RANGE(double column, boolean predicate)

Description

This UDF generates a range of integers from a to b incremented by c or the elements of a map into multiple rows and columns.

Example

SELECT TD_NUMERIC_RANGE(0,10,2)
         => 0
            2
            4
            6
            8
        

TD_PARSE_AGENT

Syntax

MAP(string,string) TD_PARSE_AGENT(user_agent string)

This UDF returns a Map value of the result to parse a user agent string. The UDF is implemented by Woothee. Support for Google Search App, Microsoft Edge (based on Chromium), Edge for iOS/Android, Android 9, and Yandex Browser.

Example

The example shows the result of the parsing user agent from the access log.

SELECT TD_PARSE_AGENT(agent) AS parsed_agent, agent FROM www_access
         {"os":"Windows 7","vendor":"Google","os_version":"NT 6.1","name":"Chrome","category":"pc","version":"16.0.912.77"},

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7

SELECT TD_PARSE_AGENT(agent)['os'] AS os FROM www_access

Windows 7 => os from user-agent, or carrier name of mobile phones

SELECT TD_PARSE_AGENT(agent)['vendor'] AS vendor FROM www_access

Google // => name of vendor

SELECT TD_PARSE_AGENT(agent)['os_version'] AS os_version FROM www_access
        NT 6.1 // => "NT 6.3" (for Windows), "10.8.3" (for OSX), "8.0.1" (for iOS), ....
        SELECT TD_PARSE_AGENT(agent)['name'] AS name FROM www_access   

Chrome // => name of browser (or string like name of user-agent)

SELECT TD_PARSE_AGENT(agent)['category'] AS category FROM www_access
         pc // => "pc", "smartphone", "mobilephone", "appliance", "crawler", "misc", "unknown"
         SELECT TD_PARSE_AGENT(agent)['version'] AS version FROM www_access
         16.0.912.77 => version of browser, or terminal type name of mobile phones 

TD_PARSE_USER_AGENT

Syntax

string TD_PARSE_USER_AGENT(user_agent string [, options string])

Description

This UDF returns the result of parsing a user agent string. The user agent is parsed by the basis of rules. Accepts the following options as a string, as user options:

os, os_family, os_major, os_minor, ua, ua_family, ua_major, ua_minor, device

os and ua return JSON. With _family, _major and _minor return a string. The device option also returns a string.

Example

The example shows the result of the parsing user agent from the access log.

SELECT TD_PARSE_USER_AGENT(agent) AS agent FROM www_access
         {"user_agent": {"family": "IE", "major": "9", "minor": "0", "patch": ""}, "os": {"family": "Windows 7", "major": "", "minor": "", "patch": "", "patch_minor": ""}, "device": {"family": "Other"}}
         SELECT TD_PARSE_USER_AGENT(agent, 'os') AS agent_os FROM www_access
         {"family": "Windows 7", "major": "", "minor": "", "patch": "", "patch_minor": ""}
         SELECT TD_PARSE_USER_AGENT(agent, 'os_family') AS agent_os_family FROM www_access
          Windows 7
        

TD_PIVOT

Pivoting allows you to write a cross-tabulation; you can aggregate your results and rotate rows into columns.

Syntax

TD_PIVOT(key column, value column, ‘key_value1,key_value2’)

Pivoting in Hive 0.13 requires the following SQL:

SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
  SELECT uid, to_map(key, value) kv
  FROM vtable
  GROUP BY uid
) 

Hive 2 adds TD_PIVOT function to perform this task. For example:

SELECT
  uid,
  element_at(COLLECT_LIST(c1), 0) as c1,
  element_at(COLLECT_LIST(c2), 0) as c2,
  element_at(COLLECT_LIST(c3), 0) as c3
FROM vtable Lateral View TD_PIVOT(key,value,'c1,c2,c3') t
group by uid
SELECT
  uid,
  sum(c1) as c1,
  sum(c2) as c2,
  sum(c3) as c3
FROM vtable Lateral View TD_PIVOT(key,value,'c1,c2,c3') t
group by uid  

TD_SCHEDULED_TIME

Syntax

long TD_SCHEDULED_TIME()

Description

This UDF returns the exact time when the job was scheduled by the scheduled query feature. The returned value may differ from NOW() because the actual query start time may be delayed.

If the query is not a scheduled query, the UDF returns the time when the job was issued. You may use this UDF with TD_TIME_ADD for incremental aggregation.

TD_SESSIONIZE

Syntax

string TD_SESSIONIZE(int/long unix_timestamp, int timeout, string sessionize_by)

Description

Sessionization of a table of event data groups a series of event rows associated with users into individual sessions for analysis. As long as the series of events are associated with the same user identifier (typically IP address, email, cookie, or similar identifier) and events are separated by no more than a chosen timeout interval, they will be grouped into a session.

This UDF takes three arguments:

  • The time field specified in UNIX epoch

  • A timeout window in seconds (when this amount of time elapses, it indicates the start of a new session)

  • The field name to sessionize by

It returns a UUID for the session of the request.

Example

You need to CLUSTER BY or ORDER BY within a subquery to use this feature.  The following query below sessionizes based on ip_address and the timestamp. You may want to use user_id or cookie_id instead of ip_address for non-anonymous logs.

SELECT TD_SESSIONIZE(time, 3600, ip_address) as session_id, time, ip_address, path
         FROM (
           SELECT time, ip_address, path   from web_logs
           distribute by ip_address sort by ip_address, time ) t

TD_SUBSTRING_INENCODING

Syntax

string TD_SUBSTRING_INENCODING(string str, int max_len_inbytes, string charset)

Description

This UDF returns the substring or slice of the byte array of str from the 0-index position at most max_len_inbytes with charset encoding. A charset can be selected from java.nio.charset.Charset.

Example

SELECT TD_SUBSTRING_INENCODING(column, 10, 'UTF-8') FROM tbl
        

TD_SUMIF

Syntax

double TD_SUMIF(double column, boolean predicate)

Description

This UDF returns the sum of the column which satisfies the predicate. TD_SUMIF accepts two parameters:

  1. The calculation for the SUM value.

  2. A check to see if parameter 1 (the SUM calculation) contains a NULL or False value. If a NULL or False value is detected, TD_SUMIF does not calculate the SUM value.

Example

SELECT TD_SUMIF(amount, amount 0) FROM tbl;
        

TD_TIME_ADD

Syntax

long TD_TIME_ADD(int/long/string time,
     string duration[, string default_timezone = 'UTC'])

Description

This UDF returns a timestamp equal to the time offset by duration. The UDF supports the following formats for the duration:

  • "Nw": after N weeks (e.g. “1w”, “2w”, “5w”)

  • "-Nw": before N weeks (e.g. “-1w”, “-2w”, “-5w”)

  • “Nd”: after N days (e.g. “1d”, “2d”, “30d”)

  • “-Nd”: before N days (e.g. “-1d”, “-2d”, “-30d”)

  • “Nh”: after N hours (e.g. “1h”, “2h”, “48h”)

  • “-Nh”: before N hours (e.g. “-1h”, “-2h”, “-48h”)

  • “Nm”: after N minutes (e.g. “1m”, “2m”, “90m”)

  • “-Nm”: before N minutes (e.g. “-1m”, “-2m”, “-90m”)

  • “Ns”: after N seconds (e.g. “1s”, “2s”, “90s”)

  • “-Ns”: before N seconds (e.g. “-1s”, “-2s”, “-90s”)

The formats above can be combined. For example, “1h30m” means “after 1 hour and 30 minutes”.

default_timezone is used to interpret time. If time itself has timezone (e.g. “2012-01-01 +0700”), then default_timezone is ignored. If default_timezone is not specified while time also does not specify a timezone, then the UDF uses ‘UTC’ as the timezone for time. Supported Time Formats in TD_TIME_FORMAT UDF

If the formats of the time or duration strings are invalid, the UDF returns NULL.

"year" and "month" durations are NOT supported, because these have complicated implications. A month can be 28, 29, 30, or 31 days, and a year could be 365 or 366 days. To implement these, this function will become a lot heavier and impact performance.

Example

This example selects records with timestamps “2013-01-01 00:00:00 UTC” or later but older than “2013-01-02 00:00:00 UTC”.

SELECT ... WHERE TD_TIME_RANGE(time,
            '2013-01-01',
          TD_TIME_ADD('2013-01-01', '1d'))
        

A typical use of this function within scheduled queries is in conjunction with TD_TIME_RANGE and TD_SCHEDULED_TIME to narrow the time range to a determined set of days, hours, minutes, or seconds. For example:

SELECT ... WHERE TD_TIME_RANGE(time,
          TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1d'),
          TD_SCHEDULED_TIME())

TD_TIME_FORMAT

Syntax

string TD_TIME_FORMAT(long unix_timestamp, string format                      [, string timezone = 'UTC'])

Description

This UDF converts a UNIX timestamp to a string with the specified format (see the Supported time formats in TD_TIME_FORMAT UDF page for available formats). For example, “yyyy-MM-dd-HH:mm:ss z” converts 1325376000 to “2012-01-01 00:00:00 UTC”. If no timezone is specified, the UDF uses UTC.

Example

This example formats a UNIX timestamp into a date formatted string

SELECT TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z') ... FROM ...
        SELECT TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z', 'PST') ... FROM ...
        SELECT TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z', 'JST') ... FROM ...
        

TD_TIME_PARSE

Syntax

long TD_TIME_PARSE(string time   [, string default_timezone = 'UTC'])

Description

This UDF converts a time string into a UNIX timestamp.

default_timezone is used to interpret time. If time itself has timezone (e.g. “2012-01-01 +0700”), then default_timezone is ignored. If default_timezone is not specified while time also does not specify a timezone, then the UDF uses ‘UTC’ as the timezone for time. A list of supported time zones can be found here.

If the format of the time string is invalid, the UDF returns NULL.

TD_TIME_RANGE

Syntax

boolean TD_TIME_RANGE(int/long unix_timestamp,
  int/long/string start_time,  
  int/long/string end_time [, string default_timezone = 'UTC'])

Description

We strongly recommend that you take advantage of time-based partitioning, see Performance Tuning.

This UDF returns true if unix_timestamp is equal to or later than start_time and older than end_time (start_time <= time && time < end_time). If end_time is omitted or NULL, the UDF assumes it’s infinite. If start_time is NULL, the UDF assumes it’s 0.

start_time and end_time could be a string which represents a time (e.g. “2012-01-01 00:00:00 +0900”) or a UNIX timestamp (e.g. 1325343600). If the format of start_time or end_time strings are invalid, the UDF returns NULL.

default_timezone is used to interpret the timezone of start_time or end_time. If start_time or end_time themselves specify a timezone (e.g. “2012-01-01 +0700”), then default_timezone is ignored. If default_timezone is not specified while start_time or end_time also does not specify a timezone, then the UDF uses ‘UTC’ as the timezone for start_time or end_time. Supported Time Formats in TD_TIME_FORMAT UDF

Example

This example selects records with timestamps “2013-01-01 00:00:00 PDT” or later.

SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01 PDT')
SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01','PDT', NULL)

The time of day ("00:00:00") can be omitted, as shown above. Alternately, the time of day can be specified up to seconds. In general, the time string should be formatted as either "YYYY-MM-DD" or "YYYY-MM-DD hh:mm:ss", e.g., "2013-01-01" or "1999-01-01 07:00:00".

TD_UNPIVOT

Unpivoting reverses the process of pivoting. It converts data from the column level to the row level.

Syntax

TD_UNPIVOT(‘key_name1, key_name2, ...’, value_column1, value_column2, ...)

Unpivoting in Hive 0.13 requires the following SQL:

SELECT uid, 'c1' AS key, c1 AS value FROM htable
UNION ALL
SELECT uid, 'c2' AS key, c2 AS value FROM htable
UNION ALL
SELECT uid, 'c3' AS key, c3 AS value FROM htable

Hive 2 adds the TD_UNPIVOT function to perform this task:

SELECT uid, t1.key, t1.value FROM htable
LATERAL VIEW TD_UNPIVOT('c1,c2,c3',c1,c2,c3) t1

TD_URL_DECODE

URL decoding reverses the URL encoding.

Syntax

string TD_URL_DECODE(col)

Description

Unescapes the URL encoded value.

TD_URL_ENCODE

URL encoding converts characters that are not allowed in a URL into character-entity equivalents.

Syntax

string TD_URL_ENCODE(col)

Description

Escapes value by encoding it so that it can be safely included in URL query parameter names and values.

Example

select
 TD_URL_DECODE(encoded),
 TD_URL_ENCODE(decoded)
from
  table

or

select
 TD_URL_DECODE(encoded),
 TD_URL_ENCODE(decoded, 'utf-8')
from
  table

TD_X_RANK

Syntax

long TD_X_RANK(keys)

Description

Returns the rank of each row within the partition of the result set. The rank of a row is one plus the number of ranks that come before the row.

Example

You need to CLUSTER BY or ORDER BY within a subquery to use this feature. CLUSTER BY is more scalable than ORDER BY because it doesn’t require total order across multiple nodes, thus allowing us to process the query in parallel.

SELECT TD_X_RANK(c), c, u FROM
          (SELECT country AS c, user_id AS u
            FROM users CLUSTER BY c) t
 		160;SELECT TD_X_RANK(c, lc1), c, lc1, u FROM
           (select country AS c, location1 AS lc1, user_id AS u
            FROM users CLUSTER BY c, lc1) t
        
  • No labels

1 Comment

  1. SELECT TD_SESSIONIZE(time, 3600, ip_address) as session_id, time, ip_address, path FROM ( SELECT time, ip_address, path from web_logs order by ip_address, time ) t
    SELECT TD_SESSIONIZE(time, 3600, ip_address) as session_id, time, ip_address, path
             FROM (
               SELECT time, ip_address, path   from web_logs
               distribute by ip_address sort by ip_address, time ) t