...
Table of Contents | ||
---|---|---|
|
TD_APPROX_MOST_FREQUENT
Signature
Code Block | ||
---|---|---|
| ||
TD_APPROX_MOST_FREQUENT(long num_buckets, long/varchar values, long capacity) |
Example
Code Block | ||
---|---|---|
| ||
SELECT TD_APPROX_MOST_FREQUENT(3, values, 10); |
Description
This function picks the frequent distinct items from the collection of values. This selection is approximate. The top `num_buckets` elements are obtained `values`. It returns a map whose keys are elements and values are estimated frequencies in the collection.
Unlike a normal histogram, it selects the frequent values online to significantly save memory resources. The error rate is bounded by the capacity parameter controlling the size of the internal data structure.
TD_TIME_RANGE
For convenience, we recommend using TD_INTERVAL instead of TD_TIME_RANGE.
...
Code Block | ||
---|---|---|
| ||
SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01 PDT') # OK SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01', '2013-01-02','PDT') # OK SELECT ... WHERE TD_TIME_RANGE(time, NULL, '2013-01-01', 'PDT') # OK SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01', NULL, 'PDT') # OK SELECT ... WHERE TD_TIME_RANGE(time, '2013-01-01', 'PDT') # NG |
Description
We strongly recommend that you take advantage of time-based partitioning. Refer to Performance Tuning for more information. |
This UDF returns true if the unix_timestamp is equal to or later than the start_time and older than the end_time (start_time <= time && time < end_time). If end_time is omitted or NULL, the UDF assumes it’s infinite. If start_time is NULL, the UDF assumes it’s 0.
...
TD_INTERVAL() is a companion function to TD_TIME_RANGE(). Both are especially useful in WHERE clauses, to make sure that your queries take advantage of time-based partitioning. TD_INTERVAL is used to compute relative time ranges that would otherwise require complex date manipulation. (TD_TIME_RANGE is used for absolute time ranges.)
We strongly recommend that you take advantage of time-based partitioning. Refer to the Performance Tuning article for more information. Not using time-based filtering in SQL SELECT statements can cause inefficient full table scans that affect query performance. |
This UDF returns true if time value is within the interval which is represented by interval_string (state time <= time < end time).
...
If the formats of the time or duration strings are invalid, the UDF returns NULL.
'year' and 'month' durations are NOT supported, because to do so would adversely impact performance. A month can be 28, 29, 30, or 31 days, and a year could be 365 or 366 days. |
TD_TIME_FORMAT
For convenience, we recommend using TD_TIME_STRING instead of TD_TIME_FORMAT.
...
TD_TIME_FORMAT converts a UNIX timestamp to a string with the specified format (see the Supported time formats in TD_TIME_FORMAT UDF page for available formats). For example, ‘yyyy-MM-dd HH:mm:ss z’ converts 1325376000 to ‘2012-01-01 00:00:00 UTC’. If no timezone is specified, the UDF uses UTC.
...
If the format string has !
as the suffix, it truncates the date time string at the specified unit.
format string | format | example |
---|---|---|
y | yyyy-MM-dd HH:mm:ssZ | 2018-01-01 00:00:00+0700 |
q | yyyy-MM-dd HH:mm:ssZ | 2018-04-01 00:00:00+0700 |
M | yyyy-MM-dd HH:mm:ssZ | 2018-09-01 00:00:00+0700 |
w | yyyy-MM-dd HH:mm:ssZ | 2018-09-09 00:00:00+0700 |
d | yyyy-MM-dd HH:mm:ssZ | 2018-09-13 00:00:00+0700 |
h | yyyy-MM-dd HH:mm:ssZ | 2018-09-13 16:00:00+0700 |
m | yyyy-MM-dd HH:mm:ssZ | 2018-09-13 16:45:00+0700 |
s | yyyy-MM-dd HH:mm:ssZ | 2018-09-13 16:45:34+0700 |
y! | yyyy | 2018 |
q! | yyyy-MM | 2018-04 |
M! | yyyy-MM | 2018-09 |
w! | yyyy-MM-dd | 2018-09-09 |
d! | yyyy-MM-dd | 2018-09-13 |
h! | yyyy-MM-dd HH | 2018-09-13 16 |
m! | yyyy-MM-dd HH:mm | 2018-09-13 16:45 |
s! | yyyy-MM-dd HH:mm:ss | 2018-09-13 16:45:34 |
If there is no !
, the return value should be the same as:
...
This UDF returns the result of parsing a user agent string. The user agent is parsed on the basis of rules. Where options are:
Options | Accepts | Returns |
---|---|---|
os | sting | JSON |
os_family | string | string |
os_major | string | string |
os_minor | string | string |
ua | string | JSON |
ua_family | string | string |
ua_major | string | string |
ua_minor | string | string |
device | string | string |
Example
The example shows the result of parsing user agent from access log.
...
Code Block | ||
---|---|---|
| ||
SELECT TD_CURRENCY_CONV('2015-01-01', 'USD', 'JPY', 1.0) |
TD_IP_TO_COUNTRY_CODE
Both Hive and Presto UDFs use a geolocation database supplied by Maxmind. Due to release schedules, the release level of the Maxmind database used by Hive and Presto might be different. This might cause inconsistent results between Hive and Presto geolocation functions. |
An example of different results is as follows:
jobid | type | td_ip_to_city_name_v6 | td_ip_to_latitude_v6 | td_ip_to_longitude_v6 | td_ip_to_postal_code_v6 |
---|---|---|---|---|---|
218018944 | hive | Tokyo | 35.685 | 139.7514 | 102-0082 |
218019099 | presto | 35.6594 | 139.8533 | 134-0087 |
Signature
Code Block | ||
---|---|---|
| ||
string TD_IP_TO_COUNTRY_CODE(string ip) |
...
Code Block | ||
---|---|---|
| ||
SELECT groupkey, SORTED_GROUP_CONCAT(column, '.', time) FROM table GROUP BY groupkey |
Notes for Geometry types
Geometry types are the building blocks of geospatial queries and calculations. They are used as arguments for geospatial functions. Geometry type is the product of a constructor function. |
ST_Point and ST_Polygon are examples of geospatial functions used to obtain binary representations of a point, `line, or polygon. You can also use them to convert a geometry data type to text. |
ST_Point
Signature
Code Block | ||
---|---|---|
| ||
point ST_Point(double, double) |
...
RESULT: Park Area Hope Fountain 4000 Hot Springs 5156
ST_Intersection and ST_Intersects are examples of Geospatial Relationship Functions. Relationship functions allow you to find relationships between two different geometric inputs. They return a boolean result type. |
ST_Intersection
Signature
Code Block | ||
---|---|---|
| ||
Geometry ST_Intersection(Geometry, Geometry) |
...