Sqoop Metastore Tutorial – How to setup Sqoop Metastore

Sqoop Metastore is a tool that is used in configuring Sqoop to host the shared metadata repositories. In this Sqoop Metastore article, you will explore what it is, its syntax, arguments, and much more. So let’s start!!!

 

What is Sqoop Metastore?

Sqoop metastore is a tool for storing Sqoop job information in the central place. It helps collaboration between the Sqoop users and developers.

For example, a user named user1 creates a job for loading some specific data, then any other user can easily access it from any other node in the cluster and just run it again.

Purpose of Sqoop Metastore

The primary purpose of this tool is to configure Apache Sqoop to host the shared metadata repository. Multiple users or remote users can define and execute the saved jobs defined in this metastore.

We have to configure the clients in the sqoop-site.xml or with the –meta-connect argument to connect to the metastore.

How to setup Sqoop Metastore

Let us now see the procedure for setting up Sqoop metastore.

1. First, we have to choose a server for hosting Sqoop metastore. In best practices, it is better to choose administrative or master server.

2. In second step, we have to decide which user will execute the metastore. After deciding the user, we will create the user and the home directory if required. Then we have to configure the metastore detail in sqoop-xite.xml.

3. Now we have to update the service configuration for accessing the metastore automatically.

4. At last, we have to start the metastore.

The syntax

$ sqoop metastore (generic-args) (metastore-args)
$ sqoop-metastore (generic-args) (metastore-args)

We have to pass the Hadoop generic arguments before any metastore arguments. We can enter the metastore arguments in any order with respect to each other.

The Metastore management options are:

Argument Description
–shutdown It will shut down the running metastore instance on the same machine.

Running the sqoop-metastore launches the shared HSQLDB database instance on the current machine. The clients can connect to this metastore and create Sqoop jobs that can be shared between the users for execution.

We can control the location of the metastore’s files on the disk by configuring the sqoop.metastore.server.location property in the conf/sqoop-site.xml configuration file. This should point to the directory on the local filesystem.

Metastore is available over the TCP/IP. We can control the port by the sqoop.metastore.server.port configuration parameter, and by default, it is set to 16000.

We can connect to the metastore via sqoop.metastore.client.autoconnect.url or –meta-connect argument with the value

jdbc:hsqldb:hsql://<server-name>:<port>/sqoop.

For example,

jdbc:hsqldb:hsql://metaserver.example.com:16000/sqoop.

Sqoop hosts this metastore on the machine within the Hadoop cluster or somewhere else on the network.

Summary

I hope after reading this article, you have clearly understood the Sqoop Metastore tool. If you have any queries related to this topic, then please do share it in the comment box.