Features of Hive – Limitations of Hive
In this article, we will study various features of Apache Hive. The article enlisted various key-features of Apache Hive. You will also see some of the limitations in Apache Hive. Before learning Hive features and limitations, we will first see the introduction to Apache Hive. After Hive introduction, you will see various Hive features and limitations.
Introduction to Apache Hive
Apache Hive is a tool for processing structured data stored in a Hadoop Distributed File System. Apache Hive is a data warehouse infrastructure. It resides on the top of Apache Hadoop and makes analyzing and querying easy. Hive was initially developed by Facebook and later on Apache Software Foundation took it up. Apache Software Foundation then developed Hive as an open source tool under the name Apache Hive. Many different companies use Apache Hive.
Let us now see various features of Apache Hive.
Features of Apache Hive
The Various key-features of Apache Hive are:
- Open-source: Apache Hive is an open-source tool. We can use it free of cost.
- Query large datasets: Hive can query and manage huge datasets stored in Hadoop Distributed File System.
- Multiple-users: Multiple users can query the data using Hive Query Language simultaneously.
- Backward compatible: Apache Hive perfectly fits the low level interface requirement of Apache Hadoop.
- Partitioning and Bucketing: Apache Hive supports partitioning and bucketing of data at the table level to improve performance.
- File-formats: Hive provides support for various file formats such as textFile, ORC, Avro Files, SequenceFile, Parquet, RCFile, LZO Compression etc.
- Hive Query Language: Hive uses Hive Query Language which is similar to SQL. We do not require any knowledge of programming languages to work with Hive. Only the knowledge of basic SQL query is enough to work with Hive.
- Built-In function: Hive provides various Built-In functions.
- User-Defined Functions: It also provides support for User-Defined Functions for the tasks like data cleansing and filtering. We can define UDFs according to our requirements
- External Table: Apache Hive supports external tables. This allows us to process data without actually storing data in HDFS.
- Fast: Hive is a fast, scalable, extensible tool and uses familiar concepts.
- Warehouse: Apache Hive is a distributed data warehouse tool.
- Table Structure: Table structure in Hive is similar to table structure in RDBMS.
- ETL support: Hive supports ETL operations. Hive is an effective ETL tool.
- Storage: Hive allows us to access files stored in HDFS and other similar data storage systems such as HBase.
- OLAP: Hive is designed for OLAP (Online Analytical Processing).
- Client application: Hive can support client applications written in PHP, Python, Java, C++ and Ruby.
- Rule Based Optimizer: Hive has a rule based optimizer for optimizing logical plans.
- Ad-hoc queries: Hive allows us to run Ad-hoc queries which are the loosely typed command or query whose value depends on some variable for the data analysis.
- Data Visualization: Hive can be used for Data Visualization. Integrating Hive with Apache Tez will provide the real time processing capabilities.
Let us now study some limitations of Apache Hive.
Limitations of Apache Hive
Some of the limitations of Apache Hive are:
- Hive is not designed for the OLTP (Online transaction processing). We can use it for OLAP.
- It does not offer real-time queries.
- It provides limited subquery support.
- Latency of Hive is generally very high.
I hope after reading this article you clearly understood the various features of Hive. Hive features make it the most popular data querying tool. The article had also explained some of the limitations of Apache Hive.
Still if you are having any query related to Hive, then ask us in the comment box.