Sqoop Import All Tables into HDFS

Sqoop Import all tables is a tool that imports a set of tables from relational databases to HDFS. In this Sqoop import all tables article, we will study the import all tables tool in detail. The article first explains what Sqoop import-all-tables is.

Later on, in this article, you will explore the sqoop import all tables syntax. At the end of the article, you will explore the example of sqoop import all table command to make your understanding better.

So, let’s start with the introduction to Sqoop Import All Tables.

 

What is Sqoop Import All Tables?

The Sqoop import-all-tables is a tool for importing a set of tables from the relational database to the Hadoop Distributed File System. On importing the set of tables, the data from each table is stored in the separate directory in HDFS.

Certain conditions must be met for making the import-all-tables tool to be useful. The conditions are:

  • Each table to be imported must have a single-column primary key.
  • We must intend to import all the columns of each table.
  • We must not intend to use the non-default splitting column, and we should not impose any conditions through the WHERE clause.

Syntax for Sqoop Import All Tables

The syntax for Sqoop Import All Tables is:

$ sqoop import-all-tables (generic-args) (import-args)
$ sqoop-import-all-tables (generic-args) (import-args)

We can enter the import arguments in any order with respect to each other, but the Hadoop generic arguments should precede import arguments.

Common Arguments

Argument
Description
--connect <jdbc-uri>
This argument specifies JDBC connect string
--connection-manager <class-name>
This argument specifies connection manager class to use
--driver <class-name>
This argument specifies the manually specified JDBC driver class to use
--hadoop-home <dir>
This argument specifies the override $HADOOP_HOME
--help
This argument specifies the print usage instructions
-P
This argument read password from console
--password <password>
This argument sets the authentication password
--username <username>
This argument sets the  authentication username
--verbose
This argument print more information while working
--connection-param-file <filename>
This argument specifies optional properties file who provides connection parameters

Import Control Arguments

Argument Description
–as-avrodatafile Imports data to Avro Data Files
–as-sequencefile Imports data to SequenceFiles
–as-textfile Imports data as plain text (default)
–direct Use direct import fast path
–direct-split-size <n> Split the input stream every n bytes when data is imported in direct mode
–inline-lob-limit <n> Sets maximum size for inline LOB
-m,–num-mappers <n> Use the ‘n’ map tasks for importing data in parallel
–warehouse-dir <dir> HDFS parent for table destination
-z,–compress Enable compression
–compression-codec <c> Use Hadoop codec. By default it is gzip

All these arguments behave in a similar way as when used with the sqoop-import tool. However, the arguments like –table, –split-by, –columns, and –where are invalid for the sqoop-import-all-tables

Output Line Formatting Arguments

Argument Description
–enclosed-by <char> It will set the required field enclosing character
–escaped-by <char> It will set the escape character
–fields-terminated-by <char> It will set field separator character
–lines-terminated-by <char> It will set end-of-line character
–mysql-delimiters It Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ‘
–optionally-enclosed-by <char> It will set the field enclosing character

Input Parsing Arguments

Argument Description
–input-enclosed-by <char> It will set the required field encloser
–input-escaped-by <char> It will set input escape character
–input-fields-terminated-by <char> It will set input field separator
–input-lines-terminated-by <char> It will set input end-of-line character
–input-optionally-enclosed-by <char> It will set the field enclosing character

Hive Arguments

Argument Description
–hive-home <dir> Override $HIVE_HOME
–hive-import It will import tables into the Hive. By default, it uses Hive’s default delimiters if not set. 
–hive-overwrite It will overwrite the existing data in the Hive table.
–create-hive-table If it is set, then the job fails if the target hive table exits. This property is false by default.
–hive-table <table-name> It will set the table name to be used while importing to Hive.
–hive-drop-import-delims It will drop \n, \r, and \01 from the string fields when importing to Apache Hive.
–hive-delims-replacement It will replace \n, \r, and \01 from the string fields with the user-defined string when importing to Apache Hive.
–hive-partition-key It will specify the name of the hive field to which the partition are sharded on
–hive-partition-value <v> It will specify the string-value that serves as a partition key for this imported into the Hive in this job.
–map-column-hive <map> Override the default mapping from the SQL type to the Hive type for the  configured columns.

Code Generation Arguments

Argument Description
–bindir <dir> Specify the Output directory for the compiled objects
–jar-file <file> It will disable code generation. It uses a specified jar.
–outdir <dir> Specify the Output directory for the generated code
–package-name <name> It will put auto-generated classes in this package

The Sqoop import-all-tables tool doesn’t support –class-name argument. However, we can specify a package with the –package-name argument in which all the generated classes will be placed.

Sqoop Import All Tables Example

Suppose we want to import all the tables from the demo database. We can do this by:

sqoop import-all-tables --connect jdbc:mysql://localhost/demo

Summary

In short, we have seen the entire concepts of Apache Sqoop- import-all-tables. After reading this article you are now aware of all the argument types like common arguments, import control argument, Hive arguments, etc.

Do not forget to share your feedback in the comment section!!!