Apache Ambari Tutorial for Beginners
In this Ambari tutorial, you will learn about Apache Ambari. The article will cover the basic concepts related to Ambari, which helps you to understand what Ambari is and why Ambari came into existence. Basically, Apache Ambari is a tool that keeps track of the running applications and their status. In this Ambari tutorial, you will explore:
- Introduction to Apache Ambari
- How Apache Ambari came into existence?
- Ambari Architecture
- Features of Apache Ambari.
- How is recovery achieved in Apache Ambari?
- How is Ambari different from Zookeeper?
- Who should learn Apache Ambari?
Introduction to Apache Ambari
Apache Ambari is basically an open-source administration tool that is deployed on the top of Hadoop clusters. Ambari keeps track of the running applications and their status. We can refer to Ambari as the web-based management tool that monitors, manages, and provisions the health of the Hadoop clusters.
Apache Ambari provides a highly interactive dashboard that allows the administrators to present the application progress and status which are running over the Hadoop cluster.
Apache Ambari has a flexible and scalable user interface that allows a range of tools like MapReduce, Pig, Hive, etc. to be installed on a Hadoop cluster and administers their performances in a very user-friendly manner.
How Apache Ambari came into existence?
The commencement of Apache Ambari traces back to the exposure of Apache Hadoop when its distributed storage and scalable computing took over the Big data world. With the establishment of Apache Hadoop, more and more new technologies were added into its existing infrastructure. With this Hadoop got overloaded, and hence it became difficult for the cluster to manage the multiple nodes and the applications simultaneously. That is when the Apache Ambari came into existence in order to make distributed computing easier.
At present, Apache Ambari is one of the leading projects running under the Apache Software Foundation.
Why Should You Learn Apache Ambari?
Apache Ambari is a project designed to make the management of Apache Hadoop simpler and easier. It allows developers and administrators to manage Hadoop clusters effectively. It offers various advantages due to which engineers and administrators are trying to get their hands on Ambari. Following are the specific reasons that explain to you why you should learn Apache Ambari:
a. Administrative Management Capabilities
Apache Ambari offers excellent management capabilities to administrators and developers for managing the Hadoop ecosystem like:
- Adding or Removing hosts to the cluster.
- Add remove, start, stop, or restart the services.
- It allows rollback.
- Restarting the services or clusters after configuration changes.
- It helps in moving nodes to different hosts.
b. Ease of Use and Learn
It offers an easy web interface which makes Apache Ambari easy to learn. Apache Ambari offers an easier way for managing and monitoring the Hadoop ecosystem. Ambari APIs can perform cluster automation without any user interaction. Also, by using Ambari, we can easily pick and choose the hosts required along with the services that need to be installed.
c. Future Scope
From the past few years, there is a huge demand for the professionals having Ambari skills. A lot of MNC’s are hiring those candidates who have good hands on Apache Ambari. The Hadoop professionals having Ambari skills were receiving high growth in the average salary. In the coming years, this requirement will be increasing to a great extent. So in this field, there is huge career growth and opportunities.
d. Lifecycle Management
Apache Ambari follows a lifecycle management format for managing the services in a Hadoop ecosystem. Thus the commands like install, configure, status, start and stop are used for working with the services. This will also offer the flexibility of configuration.
Some of the benefits offered by Apache Ambari are:
- Simple Installation, configuration, and management.
- Provide centralized security.
- Provide complete visibility to the Hadoop cluster health and status.
- Offers metrics visualization and dashboarding.
- It is extensible and customizable.
Apache Ambari provides the intuitive and the REST APIs that automates the operations in a Hadoop cluster.
The below image depicts the architecture of Apache Ambari.
Apache Ambari follows the master–slave architecture. The master node instructs the slave nodes for performing actions and reports back the state of every action. The master node keeps track of the state of the infrastructure.
For doing this, the master node uses the database server. During setup time we configure the database server.
There are the following Applications that are the core of Apache Ambari:
- Web UI
1. Ambari Server
Ambari Server is the entry point for administrative activities on the master server. It is basically a shell script which internally uses Python code, ambari-server.py, and routes all the requests to it.
The Ambari Server consists of several entry points which are available when different parameters are passed to the Ambari Server program.
These entry points are:
- Daemon management
- Software upgrade
- Software setup
- LDAP/PAM /Kerberos management
- Ambari backup and restore
- Miscellaneous options
2. Ambari Agent
It is the application that runs on all the nodes which we want to manage with the Ambari. Ambari agent periodically sends the heartbeats to the master node. Ambari Server executes tasks on the servers by using Ambari Agent.
3. Ambari Web User Interface
Ambari Web UI is a powerful feature of Ambari. The web applications are deployed through the server of the Ambari program which are running on the master host exposed on the port 8080. These web applications are protected by the authentication. We can then access, control, and view all the aspects of our Hadoop cluster once we get logged in into the web portal.
Apache Ambari provides support for multiple RDBMS for keeping track of all the state of the whole Hadoop infrastructure. We can choose the database we want to use at the time of the Ambari setup. At the time of writing Ambari supports the following databases:
- MySQL or MariaDB
- Embedded PostgreSQL
- Microsoft SQL Server
- SQL Anywhere
- Berkeley DB
Features of Apache Ambari
The main features of Apache Ambari are:
- Platform independent: Apache Ambari can run in Windows, Mac, Ubuntu, SLES, RHEL, and many other platforms because architecturally, it supports any hardware and software systems.
- Pluggable component: We can customize any current Ambari application. We can encapsulate any specific tools and technologies by the pluggable components. The pluggability goal does not encompass the inter-component standardization.
- Version management and upgrade: Apache Ambari itself maintains the versions. There is no need for the external tools like Git for upgrading and managing versions.
- Extensibility: We can simply extend the functionality of an existing Ambari application by adding different view components.
- Failure recovery: Apache Ambari recovers gracefully if something goes wrong.
- Security: Ambari comes up with robust security. It can sync with the LDAP over the active directory.
- User-friendly Configuration: Apache Ambari possesses a user-friendly configuration that provides an easy step-by-step guide for its installation.
- Authentication: By installing the Kerberos-based Hadoop clusters, Apache Ambari provides the authentication, authorization, and auditing.
How is recovery achieved in Apache Ambari?
There are two ways in which recovery happens in Apache Ambari. They are:
1. Based on actions
In this mechanism, every action is persisted. The master node checks for the pending actions after the restart and then reschedules them. In a database, the cluster state is persisted, and when there is a restart, the master rebuilds the state machines. When there is the race condition and the actions’ get complete and the master actually crashes before recording their completion. Then a special consideration is taken that the actions should be idempotent. The master node restarts all those actions which are not marked complete or actions that have failed in the DB. We can see these persisted actions on the Redo log.
2. Based on the desired state
In this mechanism, the master persists the desired state of a cluster. The master tries to make the cluster in the live state as per the desired state when there is a restart.
As we know, Ambari is an intuitive way for managing Hadoop clusters by using the web UI. It provides various services to the System Administrators. Besides that, Amari helps in accessing and managing the complex and large distributed systems of Hadoop. It provides support for many components such as Pig, Hive, MapReduce, HDFS, HBase, and many more. We can use Apache Ambari for:
1. Provisioning of a Hadoop Cluster:
We can install services for any number of hosts as it provides a step by step process for Hadoop services.
2. Monitoring of a Hadoop Cluster:
It monitors the Hadoop cluster. Through the dashboard we can easily monitor the health and status of Hadoop clusters.
3. Management of a Hadoop Cluster:
Ambari offers a central platform for managing the services of Hadoop. Ambari is responsible for the start, stop, and configuration of all the services over the clusters.
How is Ambari different from Zookeeper?
Apache Ambari and Zookeeper seem similar, but these two technologies perform the different tasks on the same Hadoop cluster and thus making it agile, responsive, scalable, and fault-tolerant. The basic difference between these two technologies are:
|Basis of Difference||Apache Ambari||Apache ZooKeeper|
|Basic Task||Apache Ambari basically performs monitoring, provisioning, and managing the Hadoop clusters.||Apache Zookeeper maintains the configuration information, naming, and synchronizes the clusters.|
|Nature||It is a Web interface.||It is an open-source server.|
|Status Maintenance||It is maintained through APIs.||Status maintained through znodes.|
Who should learn Apache Amari?
- Hadoop Administrators
- Database Professionals
- Mainframe and Hadoop Testing Professionals
- DevOps Professionals
In short, we can say that Apache Ambari is an open-source tool that keeps track of the running application in the cluster. This provides an interactive dashboard through which the administrators can visualize the status of the running applications. It follows the master-slave architecture. It is different from Zookeeper. As an Ambari Administrator, we can create and manage Ambari users and groups. The Ambari Tutorial had explained various features of Apache Ambari as well as Ambari architecture.
I hope after reading this Ambari Tutorial, you came to know how Ambari came into existence.