Apache Sqoop Tutorial

1. Introduction to Sqoop

Apache Sqoop (SQL-to-Hadoop) is an open-source tool designed for efficiently transferring bulk data between relational databases (RDBMS) and Hadoop ecosystems (HDFS, Hive, HBase, etc.).

It is widely used in Big Data projects where enterprises need to import/export structured data to and from Hadoop.

Key Features of Sqoop:

2. Sqoop Architecture

3. Sqoop Installation

Prerequisites:

Steps:

  1. Download Sqoop from Apache Sqoop.
  2. Extract and configure environment variables in .bashrc:
export SQOOP_HOME=/usr/local/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
  
  1. Place JDBC driver (e.g., mysql-connector-java.jar) in $SQOOP_HOME/lib/.
  2. Test installation:
sqoop version
  

4. Sqoop Commands

4.1 List Databases

sqoop list-databases \
--connect jdbc:mysql://localhost:3306/ \
--username root --password root
  

4.2 Import Data from RDBMS to HDFS

sqoop import \
--connect jdbc:mysql://localhost:3306/testdb \
--username root --password root \
--table employees \
--target-dir /user/hadoop/employees \
--m 1
  

--m 1 → Number of mappers (parallelism).

4.3 Import Data to Hive

sqoop import \
--connect jdbc:mysql://localhost:3306/testdb \
--username root --password root \
--table employees \
--hive-import \
--create-hive-table \
--hive-table emp_hive
  

4.4 Export Data from HDFS to RDBMS

sqoop export \
--connect jdbc:mysql://localhost:3306/testdb \
--username root --password root \
--table employees_export \
--export-dir /user/hadoop/employees \
--m 1
  

4.5 Incremental Import

For handling new records:

sqoop import \
--connect jdbc:mysql://localhost:3306/testdb \
--username root --password root \
--table employees \
--incremental append \
--check-column id \
--last-value 100
  

5. Sqoop Use Cases

6. Sqoop Best Practices

7. Sqoop vs Alternatives


✅ With this, you can now import/export data between relational databases and Hadoop ecosystems efficiently.