Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
maxLevel1

TD_APPROX_MOST_FREQUENT

Signature

Code Block
linenumberstrue
 TD_APPROX_MOST_FREQUENT(long num_buckets, long/varchar values, long capacity)

Example

Code Block
linenumberstrue
SELECT TD_APPROX_MOST_FREQUENT(3, values, 10);

Description

This function picks the frequent distinct items from the collection of values. This selection is approximate. The top `num_buckets` elements are obtained `values`. It returns a map whose keys are elements and values are estimated frequencies in the collection.

Unlike a normal histogram, it selects the frequent values online to significantly save memory resources. The error rate is bounded by the capacity parameter controlling the size of the internal data structure.

TD_TIME_RANGE

For convenience, we recommend using TD_INTERVAL instead of TD_TIME_RANGE.

...

Code Block
linenumberstrue
SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01 PDT')                 # OK
SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01', '2013-01-02','PDT') # OK
SELECT ... WHERE TD_TIME_RANGE(time, NULL, '2013-01-01', 'PDT')        # OK
SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01', NULL, 'PDT')        # OK
SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01', 'PDT')              # NG


Description

We strongly recommend that you take advantage of time-based partitioning. Refer to Performance Tuning for more information.

This UDF returns true if the unix_timestamp is equal to or later than the start_time and older than the end_time (start_time <= time && time < end_time). If end_time is omitted or NULL, the UDF assumes it’s infinite. If start_time is NULL, the UDF assumes it’s 0.

...

TD_INTERVAL() is a companion function to TD_TIME_RANGE(). Both are especially useful in WHERE clauses, to make sure that your queries take advantage of time-based partitioning. TD_INTERVAL is used to compute relative time ranges that would otherwise require complex date manipulation. (TD_TIME_RANGE is used for absolute time ranges.)

We strongly recommend that you take advantage of time-based partitioning. Refer to the Performance Tuning article for more information. Not using time-based filtering in SQL SELECT statements can cause inefficient full table scans that affect query performance.

This UDF returns true if time value is within the interval which is represented by interval_string (state time <= time < end time).

...

If the formats of the time or duration strings are invalid, the UDF returns NULL.

'year' and 'month' durations are NOT supported, because to do so would adversely impact performance. A month can be 28, 29, 30, or 31 days, and a year could be 365 or 366 days.


TD_TIME_FORMAT

For convenience, we recommend using TD_TIME_STRING instead of TD_TIME_FORMAT.

...

TD_TIME_FORMAT converts a UNIX timestamp to a string with the specified format (see the Supported time formats in TD_TIME_FORMAT UDF page for available formats). For example, ‘yyyy-MM-dd HH:mm:ss z’ converts 1325376000 to ‘2012-01-01 00:00:00 UTC’. If no timezone is specified, the UDF uses UTC.

...

If the format string has ! as the suffix, it truncates the date time string at the specified unit.

format string

format

example

y

yyyy-MM-dd HH:mm:ssZ

2018-01-01 00:00:00+0700

q

yyyy-MM-dd HH:mm:ssZ

2018-04-01 00:00:00+0700

M

yyyy-MM-dd HH:mm:ssZ

2018-09-01 00:00:00+0700

w

yyyy-MM-dd HH:mm:ssZ

2018-09-09 00:00:00+0700

d

yyyy-MM-dd HH:mm:ssZ

2018-09-13 00:00:00+0700

h

yyyy-MM-dd HH:mm:ssZ

2018-09-13 16:00:00+0700

m

yyyy-MM-dd HH:mm:ssZ

2018-09-13 16:45:00+0700

s

yyyy-MM-dd HH:mm:ssZ

2018-09-13 16:45:34+0700

y!

yyyy

2018

q!

yyyy-MM

2018-04

M!

yyyy-MM

2018-09

w!

yyyy-MM-dd

2018-09-09

d!

yyyy-MM-dd

2018-09-13

h!

yyyy-MM-dd HH

2018-09-13 16

m!

yyyy-MM-dd HH:mm

2018-09-13 16:45

s!

yyyy-MM-dd HH:mm:ss

2018-09-13 16:45:34

If there is no !, the return value should be the same as:

...

This UDF returns the result of parsing a user agent string. The user agent is parsed on the basis of rules. Where options are:

Options

Accepts

Returns

os

sting

JSON

os_family

string

string

os_major

string

string

os_minor

string

string

ua

string

JSON

ua_family

string

string

ua_major

string

string

ua_minor

string

string

device

string

string

Example

The example shows the result of parsing user agent from access log.

...

Code Block
linenumberstrue
SELECT TD_CURRENCY_CONV('2015-01-01', 'USD', 'JPY', 1.0)


TD_IP_TO_COUNTRY_CODE

Both Hive and Presto UDFs use a geolocation database supplied by Maxmind. Due to release schedules, the release level of the Maxmind database used by Hive and Presto might be different. This might cause inconsistent results between Hive and Presto geolocation functions.

An example of different results is as follows:

jobid

type

td_ip_to_city_name_v6

td_ip_to_latitude_v6

td_ip_to_longitude_v6

td_ip_to_postal_code_v6

218018944

hive

Tokyo

35.685

139.7514

102-0082

218019099

presto


35.6594

139.8533

134-0087

Signature

Code Block
linenumberstrue
string TD_IP_TO_COUNTRY_CODE(string ip)

...

Code Block
linenumberstrue
SELECT groupkey, SORTED_GROUP_CONCAT(column, '.', time)
FROM table
GROUP BY groupkey


Notes for Geometry types

Geometry types are the building blocks of geospatial queries and calculations. They are used as arguments for geospatial functions. Geometry type is the product of a constructor function.


ST_Point and ST_Polygon are examples of geospatial functions used to obtain binary representations of a point, `line, or polygon. You can also use them to convert a geometry data type to text.


ST_Point

Signature

Code Block
linenumberstrue
point ST_Point(double, double)

...

RESULT: Park Area Hope Fountain 4000 Hot Springs 5156

ST_Intersection and ST_Intersects are examples of Geospatial Relationship Functions. Relationship functions allow you to find relationships between two different geometric inputs. They return a boolean result type.


ST_Intersection

Signature

Code Block
linenumberstrue
Geometry ST_Intersection(Geometry, Geometry) 

...