Hive Built-in Functions and their Return Type with Examples
In Apache Hive, there are various functions available for performing several tasks. These functions are known as Built-in functions. In this article, we will study Hive Built-in functions.
The Hive Built-in functions are categorized as Mathematical function, Collection function, Type conversion function, Date function, Conditional function, and String function. We will study all of these Hive Built-in functions in detail.
Let’s start!!!
Hive Built-in Functions
Hive was designed to perform SQL like queries on the huge datasets stored in HDFS. It supports different types of data types, as well as operators that are not supported by different databases.
For performing some specific mathematical and arithmetic operations, Hive provides some built-in functions. These built-in functions extract data from tables in hive and process the calculations.
For using Hive built-in functions in our applications, we have to first check the application requirement. It is possible to call these hive built-in functions directly in our application.
There are several types of Hive Built-in Functions such as Mathematical function, Collection function, Type conversion function, Date function, Conditional function, and String function.
Now, let us study each Hive Built-in Functions in detail.
1. Mathematical Functions
The Hive Built-in Mathematical functions are used for performing mathematical calculations. Most of the mathematical function returns NULL when the arguments(s) are NULL.
Name | Return Type | Description |
abs(DOUBLE a) | DOUBLE | It will return the absolute value of the number passed ‘a’. |
round(DOUBLE a) | DOUBLE | It will return the rounded BIGINT value of the number passed ‘a’. |
round(DOUBLE a, INT d) | DOUBLE | It will return a rounded value of ‘a’ to ‘d’ decimal places. |
bround(DOUBLE a) | DOUBLE | It will return the rounded BIGINT value of the number passed ‘a’ by using HALF_EVEN rounding mode. It is also known as the Gaussian rounding or the bankers’ rounding. Example: bround(2.5) = 2, bround(3.5) = 4. |
bround(DOUBLE a, INT d) | DOUBLE | It will return a rounded value of ‘a’ to ‘d’ decimal places by using HALF_EVEN rounding mode.
Example: bround(8.25, 1) = 8.2, bround(8.35, 1) = 8.4. |
floor(DOUBLE a) | BIGINT | It will return the maximum BIGINT value which is equal to or less than the number passed ‘a’. |
ceil(DOUBLE a), ceiling(DOUBLE a) | BIGINT | It will return the minimum BIGINT value which is equal to or greater than the number passed ‘a’. |
rand(), rand(INT seed) | DOUBLE | It will return a random number, which changes from row to row, and that is uniformly distributed from 0 to 1. By specifying the seed will ensure that the generated random number sequence is deterministic. |
log10(DOUBLE a), log10(DECIMAL a) | DOUBLE | It will return the base-10 logarithm of the argument passed ‘a’. |
log2(DOUBLE a), log2(DECIMAL a) | DOUBLE | It will return the base-2 logarithm of the argument passed ‘a’. |
log(DOUBLE base, DOUBLE a)
log(DECIMAL base, DECIMAL a) |
DOUBLE | It will return the base-base logarithm of the argument passed ‘a’. |
pow(DOUBLE a, DOUBLE p), power(DOUBLE a, DOUBLE p) | DOUBLE | It will return ap. |
sqrt(DOUBLE a), sqrt(DECIMAL a) | DOUBLE | It will return the square root of argument a. |
bin(BIGINT a) | STRING | It will return the number in the binary format |
hex(BIGINT a) hex(STRING a) hex(BINARY a) | STRING | If the argument is an INT or binary, it will return the number as a STRING in the hexadecimal format. Otherwise, if the number is a STRING, then it converts each character into its hexadecimal representation and returns the resulting STRING. |
unhex(STRING a) | BINARY | It is the inverse of hex. It interprets each pair of characters as a hexadecimal number and converts them to the byte representation of the number. |
factorial(INT a) | BIGINT | It will return the factorial of argument ‘a’. The Valid a is [0..20]. |
cbrt(DOUBLE a) | DOUBLE | It will return the cube root of double value passed. |
2. Collection Functions
Collection functions are used for the collections. Collections means the grouping of the elements and returning the single or array of elements on the basis of the return type mentioned in the function name.
The following collection functions are supported in Hive:
Name | Return Type | Description |
size(Map<K.V>) | int | It will return the number of elements in the data type map. |
size(Array<T>) | int | It will return the number of elements in the data type array. |
map_keys(Map<K.V>) | array<K> | It will return an unordered array that contains the keys of the input map. |
map_values(Map<K.V>) | array<V> | It will return an unordered array that contains the values of the input map. |
array_contains(Array<T>, value) | boolean | It will return TRUE if an array contains value. |
sort_array(Array<T>) | array<t> | It will sort the input array in ascending order and return it. |
3. Type Conversion Functions
The following Built-in Type Conversion functions are supported in Hive:
Name | Return Type | Description |
binary(string|binary) | binary | It will cast the parameter passed into a binary. |
cast(expr as <type>) | Expected “=” to follow “type” | It will convert the results of the expression ‘expr’ to the <type>.
For example, cast(‘1’ as BIGINT). It will convert the string ‘1’ to its integral representation. If the conversion doesn’t succeed then, NULL is returned. If cast(expr as boolean), then Hive returns true for the non-empty string. |
4. Date Functions
Date functions in Hive are used for performing Date Manipulations and Converting Date types from one type to another type.
The following built-in date functions are supported in Hive:
Name | Return Type | Description |
unix_timestamp(string date) | bigint | It will Convert the time string in the format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), by using the default timezone and the default locale. It will return 0 if failed. |
unix_timestamp(string date, string pattern) | bigint | It will convert the time string with the given pattern to the Unix timestamp (in seconds). It returns 0 if failed. |
year(string date) | int | It will return the year part of a date or a timestamp string.
Example: year(“2020-05-11 00:00:00”) = 2020 |
quarter(date/timestamp/string) | int | It will return the quarter of the year for a date, timestamp, or string in the range 1 to 4.
Example: quarter(‘2020-05-11’) = 2. |
month(string date) | int | It will return the month part of a date or a timestamp string.
Example: month(“2020-05-11 00:00:00”) = 05. |
day(string date) dayofmonth(date) | int | It will return the day part of a date or a timestamp string.
Example: day(“2020-05-11 00:00:00”) = 11. |
hour(string date) | int | It will return the hour of the timestamp.
Example: hour(‘2020-05-11 12:58:59′) = 12, hour(’12:58:59’) = 12. |
minute(string date) | int | It will return the minute of the timestamp. |
second(string date) | int | It will return the second of the timestamp. |
weekofyear(string date) | int | It will return the week number of a timestamp string. |
extract(field FROM source) | int | It will retrieve the fields such as days or hours from the source.
Source must be a timestamp, date, interval or a string that can be converted into either a date or timestamp.
The supported fields include: day, dayofweek, hour, minute, month, quarter, second, week and year. Examples:
|
current_date | date | It will return the current date at the start of query evaluation. All the calls of the current_ date function within the same query return the same value. |
current_timestamp | timestamp | It will return the current timestamp at the start of query evaluation. All the calls of the current_timestamp function within the same query return the same value. |
5. Conditional Functions
We use Conditional function for conditional values checks. The following are the built-in conditional function supported by Hive:
Name | Return Type | Description |
if(boolean testCondition, T valueTrue, T valueFalseOrNull) | T | It returns the valueTrue when testCondition is true. It returns valueFalse Or Null otherwise. |
isnull( a ) | boolean | It will return true if argument ‘a’ is NULL and false otherwise. |
isnotnull ( a ) | boolean | It will return true if argument ‘a’ is not NULL and false otherwise. |
nvl(T value, T default_value) | T | It will return default value if the value is null. Otherwise it returns value. |
COALESCE(T v1, T v2, …) | T | It will return the first v that is not NULL, or NULL if all v’s are NULL. |
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END | T | When a = b, it will return c.
when a = d, it will return e; else it will return f. |
CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END | T | When a = true, it will return b;
when c = true, it will return d; else it will return e. |
nullif( a, b ) | T | It will return NULL if a=b; otherwise returns ‘a’. |
assert_true(boolean condition) | void | It will throw an exception if the ‘condition’ is not true. otherwise it will return NULL. |
6. STRING Functions
We use String functions for string manipulations and string operations. The following are the built-in Hive String functions:
Name | Return Type | Description |
ascii(string str) | int | It will return the numeric value of the first character of argument str. |
concat(string|binary A, string|binary B…) | string | It will return the string or bytes resulting from the concatenation of the strings or bytes passed in as arguments in order.
For example, concat(‘teddy’, ‘bear’) results in ‘teddybear’. Note: It will take any number of input strings. |
character_length(string str) | int | It will return the number of UTF-8 characters contained in argument str. The function char_length is shorthand for this function. |
decode(binary bin, string charset) | string | It will decode the first argument into a String using the provided character set.
If any of the arguments is null, the result will also be null. (As of Hive 0.12.0.) |
encode(string src, string charset) | binary | It will encode the first argument into the BINARY using the provided character set. If any of the arguments is null, then the result will also be null. |
field(val T,val1 T,val2 T,val3 T,…) | int | It will return the index of val in the list of val1,val2,val3,… or 0 if not found.
Example: field(‘Isha’,’Isha’,’says’,’hello’) returns 1. All primitive types are supported, arguments are compared using str.equals(x). In case, if val is NULL, the return value is 0. |
find_in_set(string str, string strList) | int | It will return the first occurrence of argument str in strList where strList is a comma-delimited string.
This will return null if either argument is null. It will return 0 if the first argument contains any commas. Example: find_in_set(‘ab’, ‘abc,b,ab,c,def’) returns 3. |
instr(string str, string substr) | int | It will return the position of the first occurrence of substr in str.
This will return null if either of the arguments are null. It will return 0 if substr could not be found in str. Note: This is not zero based. The first character in str has index 1. |
in_file(string str, string filename) | boolean | It will return true if the string str appears as the entire line in the file filename. |
length(string A) | int | It will return the length of the string passed. |
locate(string substr, string str[, int pos]) | int | It will return the position of the first occurrence of the substr in str after position pos. |
lower(string A) lcase(string A) | string | It will return the string by converting all characters of string A to lowercase.
Example, lower(‘BaR’) results in ‘bar’. |
parse_url(string urlString, string partToExtract [, string keyToExtract]) | string | It will return the specified part from the URL.
The valid values for the partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. |
repeat(string str, int n) | string | It will repeat str n times. |
regexp_extract(string subject, string pattern, int index) | string | It will return the string extracted using the pattern. |
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT) | string | It will return the string resulting from replacing all the substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. |
replace(string A, string OLD, string NEW) | string | It will return the string ‘A’ with all the non-overlapping occurrences of the OLD replaced with the NEW.
Example: select replace(“aabab”, “bab”, “Z”); returns “aaZ”. |
reverse(string A) | string | It will return the reversed string. |
upper(string A) ucase(string A) | string | It will return the string resulting from converting all the characters of string ‘A’ to uppercase. Example: upper(‘BaR’) results in ‘BAR’ |
Summary
This article had explained all the Hive Built-in functions in detail along with their return type. I hope after reading this article, you clearly understand Hive Built-in functions.
The article had covered all types of Hive built-in functions such as mathematical function, collection function, type conversion function, date function, conditional function, and string function.
If you still have any doubt in hive built-in function, then feel free to ask us in the comment section.