Sqoop Codegen Command with Syntax and Examples

Sqoop Codegen is a tool that generates the Java classes that encapsulate and interpret the imported records. The article has explained the entire concept of Sqoop Codegen along with its syntax and arguments.

Let us first start with an introduction to Sqoop Codegen.

 

What is Sqoop Codegen?

  • In order to run the object-oriented application, every table in the database must have one DAO class that contains the ‘getter’ and the ‘setter’ methods for objects initialization. In Sqoop, the Codegen tool automatically generates the DAO class.
  • This tool generates the DAO class in Java, based on the Table Schema structure.
  • The Java definition of the record is instantiated as a part of the import process.
  • This tool is mainly used for checking whether the Java lost the Java code. If java lost the java code, then this tool will create the new version of Java with default delimiter between the fields.

Purpose of Sqoop Codegen

The primary purpose of the Codegen tool is to generate the Java classes that encapsulate and interpret the imported records. The Java definition of the record is instantiated as a part of the import process. This can also be performed separately.

For example, if the Java source is lost, then it can be recreated. The new versions of the class can be created which use different delimiters between the fields, and so on.

Syntax for Sqoop Codegen

The Syntax for Sqoop Codegen is:

$ sqoop codegen (generic-args) (codegen-args)
$ sqoop-codegen (generic-args) (codegen-args)

The Hadoop generic arguments must precede the codegen arguments, and the codegen arguments can be passed in any order.

Sqoop Codegen Arguments

The Common arguments are:

Argument Description
–connect <jdbc-uri> It specify the JDBC connect string
–connection-manager <class-name> It specify the connection manager class to be use
–driver <class-name> Manually specify JDBC driver class to use
–hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME
–help Print usage instructions
–password-file It will set the path for a file containing authentication password
-P Read the password from console
–password <password> Set authentication password
–username <username> Set authentication username
–verbose It will print more information while working
–connection-param-file <filename> Optional properties file that provides connection parameters
–relaxed-isolation It will set the connection transaction isolation to the read uncommitted for the mappers.

The Code generation arguments are:

Argument Description
–bindir <dir> Specify the Output directory for the compiled objects
–class-name <name> It will set the generated class name. This will override the –package-name. When it is combined with the –jar-file, then it will set the input class.
–jar-file <file> It will disable the code generation; use specified jar
–outdir <dir> Specify the Output directory for the generated code
–package-name <name> It will put the auto-generated classes in this package
–map-column-java <m> It is used for Overriding the default mapping from SQL type to the Java type for the configured columns.

The Output line formatting arguments are:

Argument Description
–enclosed-by <char> It will set the required field enclosing character
–escaped-by <char> It will set the escape character
–fields-terminated-by <char> It will set the field separator character
–lines-terminated-by <char> This argument will set end-of-line character
–mysql-delimiters This argument specify to use MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ‘
–optionally-enclosed-by <char> It will set the field enclosing character

The Input parsing arguments are:

Argument Description
–input-enclosed-by <char> Sets the required field encloser
–input-escaped-by <char> Sets the input escape character
–input-fields-terminated-by <char> Sets the input field separator
–input-lines-terminated-by <char> Sets the input end-of-line character
–input-optionally-enclosed-by <char> Sets a field enclosing character

The Hive arguments are:

Argument Description
–hive-home <dir> Override $HIVE_HOME
–hive-import Import tables into Hive. It uses the Hive’s default delimiters if none are set.
–hive-overwrite Overwrite the existing data in the Hive table.
–create-hive-table If we set this option, then the Sqoop job will fail if a target hive table exits. This property, by default, is set to false.
–hive-table <table-name> It will set a table name to use while importing to Hive.
–hive-drop-import-delims It will drop the \n, \r, and \01 from the string fields while importing to Hive.
–hive-delims-replacement It will replace the \n, \r, and \01 from the string fields with the user defined string while importing to Hive.
–hive-partition-key It will specify the name of the hive field to which the partition are sharded on.
–hive-partition-value <v> It will specify the String-value which serves as a partition key for this imported into the hive in this job.
–map-column-hive <map> Override the default mapping from the SQL type to the Hive type for the configured columns.

If the Hive arguments were provided to the code generation tool, then the Sqoop generates the file containing HQL statements for creating a table and loading data.

Advantage of Sqoop Codegen

This tool checks java code. If the java code is lost then it will be regenerated by using “codegen tool”.

Example Invocations

Example: In this example we are recreating the record interpretation code for the emp_info table of a demo_db database:

$ sqoop codegen --connect jdbc:mysql://localhost/demo_db \
    --table emp_info

We can use the below command for verifying the generated java code

$ hadoop fs -ls /home/demo_db/emp_infojavacode/result

Summary

I hope after reading this article, you have understood the Sqoop Codegen. The article had explained the whole concept related to the Codegen tool. The article had enlisted the syntax, purpose, as well as the arguments for Sqoop Codegen.

If you have any doubts related to this tool, feel free to share your comment with us.