Sqoop Jobs – Create and Schedule Jobs Through Sqoop

This Sqoop jobs article will explain how to create and maintain Sqoop jobs. The article first explains what Sqoop Job is, how we can create, and execute the Sqoop Saved job.

Later on, you will explore the purpose of Sqoop job, what its syntax is, and many more. The article will provide a complete guide about how we can create, inspect, verify, and execute the Sqoop jobs. Let’s start!!!

 

What is Sqoop Saved Job?

In Sqoop, the imports and exports can be performed repeatedly by issuing the same Sqoop command multiple times. Apache Sqoop allows us to define the saved jobs that make this process easier.

The Sqoop saved job records the configuration information needed for executing the Sqoop command at a later time. The sqoop-job tool allows us to create and work with saved jobs.

Now it’s time to explore Sqoop jobs.

What are Sqoop Jobs?

The Sqoop Job is a tool that allows us to create and work with the Saved jobs. The Sqoop Saved jobs remember the parameters which are used for specifying a job, so they can be easily re-executed by invoking the job by its handle.

In simple words, the Sqoop job creates and saves the Sqoop import and export commands. The Sqoop job specifies the parameters to identify and recall the Sqoop saved job.

This re-executing or re-calling is used in the Sqoop incremental import, which imports the updated rows from relational database tables to HDFS.

Syntax of Sqoop job:

$ sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
$ sqoop-job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]

We have to pass the Hadoop generic arguments before passing any job arguments, but we can pass job arguments in any order with respect to each other.

Sqoop Job Management arguments are:

Argument Description
–create <job-id> It will define a new saved job with a specified job-id (name). The second Sqoop command-line, separated by a — should be specified. This will define the saved job.
–delete <job-id> Delete a saved job.
–exec <job-id> It will execute the saved job.
–show <job-id> Show the parameters for the saved job.
–list List all saved jobs

Common arguments are:

Argument Description
–help Print usage instructions
–verbose Print more information while working

Let us now explore how we can create, inspect, verify, and execute the Sqoop Job.

How to Create Sqoop Jobs?

  • We can create the saved jobs with the –create action.
  • The –create operation requires a — followed by the tool name and its arguments.
  • The tool and its arguments form the basis of the Sqoop saved job.

Example 1: In this example, we are creating a Sqoop job named firstjob which can import table data from relational databases to HDFS.

$ sqoop job --create firstjob -- import \
--connect jdbc:mysql://localhost/demodb \
--username root \
-p root \
--table emp \
-m 1\
--target-dir /targetfolder

The above command will create a job that imports data from the “emp” table in the “demodb” database.

Verifying Sqoop Jobs

We can verify the Saved job by using the –list argument.

$ sqoop job --list

How to inspect a job in Sqoop?

We can inspect the Sqoop job with the –show argument.
Example:

$ sqoop job --show firstjob

How to Execute a Sqoop job?

We can use the ‘–exec’ option for executing a Sqoop saved job.
Example:

$ sqoop job --exec firstjob

The above command will execute the saved job named “firstjob”.

Summary

In this article, you have explored how you can create, inspect, verify, and execute sqoop jobs. Apart from this, you had also learned the syntax and the purpose of the sqoop job. The article had also enlisted the essential arguments used with the Sqoop job.