Hive Built-in Functions and their Return Type with Examples

In Apache Hive, there are various functions available for performing several tasks. These functions are known as Built-in functions. In this article, we will study Hive Built-in functions. The Hive Built-in functions are categorized as Mathematical function, Collection function, Type conversion function, Date function, Conditional function, and String function. We will study all of these Hive Built-in functions in detail. Let’s start!!!

Apache Hive Built-in Functions

Hive Built-in Functions

Hive was designed to perform SQL like queries on the huge datasets stored in HDFS. It supports different types of data types, as well as operators that are not supported by different databases. For performing some specific mathematical and arithmetic operations, Hive provides some built-in functions. These built-in functions extract data from tables in hive and process the calculations.

For using Hive built-in functions in our applications, we have to first check the application requirement. It is possible to call these hive built-in functions directly in our application. There are several types of Hive Built-in Functions such as Mathematical function, Collection function, Type conversion function, Date function, Conditional function, and String function.

Now, let us study each Hive Built-in Functions in detail.

1. Mathematical Functions

The Hive Built-in Mathematical functions are used for performing mathematical calculations. Most of the mathematical function returns NULL when the arguments(s) are NULL.

Name Return Type Description
abs(DOUBLE a) DOUBLE It will return the absolute value of the number passed ‘a’.
round(DOUBLE a) DOUBLE It will return the rounded BIGINT value of the number passed ‘a’. 
round(DOUBLE a, INT d) DOUBLE It will return a rounded value of ‘a’ to ‘d’ decimal places.
bround(DOUBLE a) DOUBLE It will return the rounded BIGINT value of the number passed ‘a’ by using HALF_EVEN rounding mode. It is also known as the Gaussian rounding or the bankers’ rounding. 

Example: bround(2.5) = 2, bround(3.5) = 4.

bround(DOUBLE a, INT d) DOUBLE It will return a rounded value of ‘a’ to ‘d’ decimal places by using HALF_EVEN rounding mode.

Example: bround(8.25, 1) = 8.2, bround(8.35, 1) = 8.4.

floor(DOUBLE a) BIGINT It will return the maximum BIGINT value which is equal to or less than the number passed ‘a’.
ceil(DOUBLE a), ceiling(DOUBLE a) BIGINT It will return the minimum BIGINT value which is equal to or greater than the number passed ‘a’.
rand(), rand(INT seed) DOUBLE It will return a random number, which changes from row to row, and that is uniformly distributed from 0 to 1. By specifying the seed will ensure that the generated random number sequence is deterministic.
log10(DOUBLE a), log10(DECIMAL a) DOUBLE It will return the base-10 logarithm of the argument passed ‘a’. 
log2(DOUBLE a), log2(DECIMAL a) DOUBLE It will return the base-2 logarithm of the argument passed ‘a’. 
log(DOUBLE base, DOUBLE a)

log(DECIMAL base, DECIMAL a)

DOUBLE It will return the base-base logarithm of the argument passed ‘a’. 
pow(DOUBLE a, DOUBLE p), power(DOUBLE a, DOUBLE p) DOUBLE It will return ap.
sqrt(DOUBLE a), sqrt(DECIMAL a) DOUBLE It will return the square root of argument a. 
bin(BIGINT a) STRING It will return the number in the binary format
hex(BIGINT a) hex(STRING a) hex(BINARY a) STRING If the argument is an INT or binary, it will return the number as a STRING in the hexadecimal format. Otherwise, if the number is a STRING, then it converts each character into its hexadecimal representation and returns the resulting STRING.
unhex(STRING a) BINARY It is the inverse of hex. It interprets each pair of characters as a hexadecimal number and converts them to the byte representation of the number.
factorial(INT a) BIGINT It will return the factorial of argument ‘a’. The Valid a is [0..20].
cbrt(DOUBLE a) DOUBLE It will return the cube root of double value passed.

2. Collection Functions

Collection functions are used for the collections. Collections means the grouping of the elements and returning the single or array of elements on the basis of the return type mentioned in the function name. The following collection functions are supported in Hive:

Name Return Type Description
size(Map<K.V>) int It will return the number of elements in the data type map. 
size(Array<T>) int It will return the number of elements in the data type array.
map_keys(Map<K.V>) array<K> It will return an unordered array that contains the keys of the input map.
map_values(Map<K.V>) array<V> It will return an unordered array that contains the values of the input map.
array_contains(Array<T>, value) boolean It will return TRUE if an array contains value.
sort_array(Array<T>) array<t> It will sort the input array in ascending order and return it.

3. Type Conversion Functions

The following Built-in Type Conversion functions are supported in Hive:

Name Return Type Description
binary(string|binary) binary It will cast the parameter passed into a binary.
cast(expr as <type>) Expected “=” to follow “type” It will convert the results of the expression ‘expr’ to the  <type>. 

For example, cast(‘1’ as BIGINT). It will convert the string ‘1’ to its integral representation. 

If the conversion doesn’t succeed then, NULL is returned.

If cast(expr as boolean), then Hive returns true for the non-empty string.

4. Date Functions

Date functions in Hive are used for performing Date Manipulations and Converting Date types from one type to another type. The following built-in date functions are supported in Hive:

Name Return Type  Description
unix_timestamp(string date) bigint It will Convert the time string in the format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), by using the default timezone and the default locale. It will return 0 if failed. 
unix_timestamp(string date, string pattern) bigint It will convert the time string with the given pattern to the Unix timestamp (in seconds). It returns 0 if failed. 
year(string date) int It will return the year part of a date or a timestamp string.

Example: year(“2020-05-11 00:00:00”) = 2020

quarter(date/timestamp/string) int It will return the quarter of the year for a date, timestamp, or string in the range 1 to 4.

Example: quarter(‘2020-05-11’) = 2.

month(string date) int It will return the month part of a date or a timestamp string.

Example: month(“2020-05-11 00:00:00”) = 05.

day(string date) dayofmonth(date) int It will return the day part of a date or a timestamp string.

Example:

day(“2020-05-11 00:00:00”) = 11.

hour(string date) int It will return the hour of the timestamp.

Example: hour(‘2020-05-11 12:58:59′) = 12, hour(’12:58:59’) = 12.

minute(string date) int It will return the minute of the timestamp.
second(string date) int It will return the second of the timestamp.
weekofyear(string date) int It will return the week number of a timestamp string.
extract(field FROM source) int It will retrieve the fields such as days or hours from the source. 

 

Source must be a timestamp, date, interval or a string that can be converted into either a date or timestamp. 

 

The supported fields include: day, dayofweek, hour, minute, month, quarter, second, week and year.

Examples:

  1. select extract(month from “2020-05-11”) results in 05.
  2. select extract(hour from “2020-05-11 05:06:07”) results in 5.
current_date date It will return the current date at the start of query evaluation. All the calls of the current_  date function within the same query return the same value.
current_timestamp timestamp It will return the current timestamp at the start of query evaluation. All the  calls of the current_timestamp function within the same query return the same value.

5. Conditional Functions

We use Conditional function for conditional values checks. The following are the built-in conditional function supported by Hive:

Name Return Type Description
if(boolean testCondition, T valueTrue, T valueFalseOrNull) T It returns the valueTrue when testCondition is true. It returns valueFalse Or Null otherwise.
isnull( a ) boolean It will return true if argument ‘a’ is NULL and false otherwise.
isnotnull ( a ) boolean It will return true if argument ‘a’ is not NULL and false otherwise.
nvl(T value, T default_value) T It will return default value if the value is null. Otherwise it  returns value.
COALESCE(T v1, T v2, …) T It will return the first v that is not NULL, or NULL if all v’s are NULL.
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END T When a = b, it will return c.

 when a = d, it will return e;

else it will return f.

CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END T When a = true, it will return b; 

when c = true, it will  return d; 

else it will return e.

nullif( a, b ) T It will return NULL if a=b; otherwise returns ‘a’.
assert_true(boolean condition) void It will throw an exception if  the ‘condition’ is not true.  otherwise it will return NULL. 

6. STRING Functions

We use String functions for string manipulations and string operations. The following are the built-in Hive String functions:

Name Return Type Description
ascii(string str) int It will return the numeric value of the first character of argument str.
concat(string|binary A, string|binary B…) string It will return the string or bytes resulting from the concatenation of  the strings or bytes passed in as arguments in order.

For example, concat(‘teddy’, ‘bear’) results in ‘teddybear’. Note: It will take any number of input strings.

character_length(string str) int It will return the number of UTF-8 characters contained in argument str.

The function char_length is shorthand for this function.

decode(binary bin, string charset) string It will decode the first argument into a String using the provided character set. 

If any of the arguments is null, the result will also be null. (As of Hive 0.12.0.)

encode(string src, string charset) binary It will encode the first argument into the BINARY using the provided character set. If any of the arguments is null, then  the result will also be null.
field(val T,val1 T,val2 T,val3 T,…) int It will return the index of val in the list of val1,val2,val3,… or 0 if not found. 

Example:

field(‘Isha’,’Isha’,’says’,’hello’) returns 1.

All primitive types are supported, arguments are compared using str.equals(x). 

In case, if val is NULL, the return value is 0.

find_in_set(string str, string strList) int It will return the first occurrence of argument str in strList where strList is a comma-delimited string. 

This will return null if either argument is null. 

It will return 0 if the first argument contains any commas. 

Example: find_in_set(‘ab’, ‘abc,b,ab,c,def’) returns 3.

instr(string str, string substr) int It will return the position of the first occurrence of substr in str. 

This will return null if either of the arguments are null.

It will return 0 if substr could not be found in str. 

Note: This is not zero based. The first character in str has index 1.

in_file(string str, string filename) boolean It will return true if the string str appears as the entire line in the file filename.
length(string A) int It will return the length of the string passed.
locate(string substr, string str[, int pos]) int It will return the position of the first occurrence of the substr in str after position pos.
lower(string A) lcase(string A) string It will return the string by converting all characters of string A to lowercase.

 Example, lower(‘BaR’) results in ‘bar’.

parse_url(string urlString, string partToExtract [, string keyToExtract]) string It will return the specified part from the URL. 

The valid values for the partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. 

repeat(string str, int n) string It will repeat str n times.
regexp_extract(string subject, string pattern, int index) string It will return the string extracted using the pattern. 
regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT) string It will return the string resulting from replacing all the substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT.
replace(string A, string OLD, string NEW) string It will return the string ‘A’ with all the non-overlapping occurrences of the OLD replaced with the NEW.

Example: select replace(“aabab”, “bab”, “Z”); returns “aaZ”.

reverse(string A) string It will return the reversed string.
upper(string A) ucase(string A) string It will return the string resulting from converting all the characters of string ‘A’ to uppercase. 

Example: upper(‘BaR’) results in ‘BAR’

Summary

This article had explained all the Hive Built-in functions in detail along with their return type. I hope after reading this article, you clearly understand Hive Built-in functions. The article had covered all types of Hive built-in functions such as mathematical function, collection function, type conversion function, date function, conditional function, and string function. If you still have any doubt in hive built-in function, then feel free to ask us in the comment section.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.