Sqoop Troubleshooting – Issues while Using Apache sqoop

While working with Apache Sqoop, some failures may occur. For troubleshooting any failure that occurs while working on Apache Sqoop, we have to follow several steps.

In this Sqoop Troubleshooting article, you will learn the “Sqoop Troubleshooting” process and some known Apache Sqoop issues. You will also explore some known issues that generally occur in Sqoop.

Sqoop troubleshooting

The General Troubleshooting Process

We have to follow the below mentioned steps to troubleshoot any failure that we encounter while running Apache Sqoop.

  • At first, we have to turn on the verbose output by executing the same Sqoop command again with the –verbose option specified. This will produce more debug output on a console, which we can inspect easily to identify any obvious errors.
  • Secondly, look at the task logs from the Hadoop in order to see that any specific failures recorded there. It may be possible that the failure which is encountered during task execution is not conveyed correctly to the console.
  • We have to make sure that the required input files or the input or output tables were present and can be accessed by the user that Apache Sqoop is executing as. Sometimes it may be possible that the required files or tables are present but the user that Sqoop connects to doesn’t have the permissions required for accessing these files.
  • If you are doing a compound action and faced some error then try to break the job into two separate actions in order to see from where the problem occurs. Suppose an import which creates and populates the Hive table fails, then in such a case we can break it in two steps, that is, doing the import alone, and secondly creating a Hive table by using create-hive-table tool. This helps to narrow down the problem.
  • You can also search the JIRA and the mailing lists archives for keywords relating to your problem. It may be possible that you will find the solution of your problem discussed there which will help you in solving your problem.

Now let’s explore some specific Sqoop Troubleshooting Tips.

Specific Troubleshooting Tips

1. Oracle: Connection Reset Errors

Problem: It may happen that while using the default Sqoop connector for Oracle, then some data get transferred, but a lot of errors may occur during the map-reduce job.

Solution: The above problem generally occurs because of the shortage of the fast random number generation device on a host where the mappers execute.
On the Linux systems we can address this issue by setting following property in a java.security file:

securerandom.source=file:/dev/../dev/urandom

The file java.security is found under the $JAVA_HOME/jre/lib/security directory.
Alternatively, we can also specify this property on the command line via:

-D mapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom”

2. Oracle: Case-Sensitive Catalog Query Errors

Problem: Sometimes, while working with Oracle, we may encounter the problems when ApacheSqoop can not figure out the column names.

This usually happens due to the catalog queries that Apache Sqoop uses for Oracle to expect the correct case which is to be specified for the user name and the table name. For example, using the –hive-import and resulting in the NullPointerException.

Solution: The solution to this problem is:

  1. Specify user name, which Apache Sqoop is connecting as, in the upper case (unless the username was created with the mixed/lower case within the quotes).
  2. Specify table name, you are working with, in the upper case (unless the table name was created with the mixed/lower case within the quotes).

3. MySQL: Connection Failure

Problem: It may be possible that while importing the MySQL table into the Sqoop, if we don’t have the necessary permissions for accessing our MySQL database over the network, then we may get error stating connection failure.

Solution: To solve this error, firstly, we have to verify that we can connect to a database from the machine where we are running Sqoop:

$ mysql --host=<IP Address> --database=test --user=<username> --password=<password>

If the above command works, it rules out the problem either with the client network configuration or the security/authentication configuration.
So, add network port for the server to your my.cnf file /etc/my.cnf:

[mysqld]
port = xxxx

Now, set a user account to connect through Sqoop and grant permissions to a user to access the database over the network. For doing this:

  1. Log into MySQL as root mysql -u root -p<ThisIsMyPassword>.
  2. Issue the following command:
mysql> grant all privileges on test.* to 'testuser'@'%' identified by 'testpassword'

Note : This is not advisable for the production environment. For the production environment, consult your DBA to grant the required privileges based on setup topology.

4. Oracle: ORA-00933 error (SQL command not properly ended)

Problem: Sometimes while working with the Oracle we may face problems when Sqoop command explicitly specifies the option –driver <driver name>.

When the option driver is included in a Sqoop command, the built-in connection manager selection defaults to the generic connection manager, which causes issues with Oracle.

If in case the driver option is not specified, then the built-in connection manager selection mechanism selects Oracle-specific connection manager which will generate a valid SQL for the Oracle and uses a driver “oracle.jdbc.OracleDriver”.

The error occur will be:

ERROR manager.SqlManager: Error executing statement:
java.sql.SQLSyntaxErrorException: ORA-00933: SQL command not properly ended

Solution: The solution for the above problem is to omit the option –driver oracle.jdbc.driver.OracleDriver. After omitting it, just re-run the Sqoop command.

6. MySQL: Import of TINYINT(1) from MySQL behaves strangely

Problem: It may happen that the Apache Sqoop is treating TINYINT(1) columns as the booleans types. This may cause issues with the HIVE import.

This issue occurs because, by default, the MySQL JDBC connector maps TINYINT(1) to the java.sql.Types.BIT, which Apache Sqoop by default maps to Boolean.

Solution: A more clean solution to this problem is to force the MySQL JDBC Connector to stop converting the TINYINT(1) to the java.sql.Types.BIT by adding the property tinyInt1isBit=false into your JDBC path (creating something like jdbc:mysql://localhost/test?tinyInt1isBit=false).
Alternatively, we can explicitly override the column mapping for the TINYINT(1) datatype column.

For example, if a column name is foo, then you can pass the following option to Apache Sqoop during import:

--map-column-hive foo=tinyint

In a case of non-Hive imports to HDFS, use

--map-column-java foo=integer

Summary

In short, we can say that you have learned all the possible sqoop troubleshooting steps. This will help you to troubleshoot all the possible failures that may occur while using Sqoop.

The article had enlisted the general Sqoop troubleshooting steps as well as some Apache sqoop known issues.

If you want to ask any query, then feel free to share it through the comment section. We will get back to you soon.