Hadoop sqoop is an open source and sub project of Hadoop. Hadoop sqoop is a tool that designed for efficiently transfer the huge amount of data between Apache hadoop and structure databases such as relational database management systems (RDBMS) like Sql, oracle, MySQL databases.
In other words, Hadoop sqoop is used for import and export the huge amount of data from RDBMS to HDFS and HDFS to RDBMS
RDBMS such as MySQL, oracle, sql
HDFS such as Hive, Hbase
Hadoop sqoop mainly used for import and export the huge amount of data from RDBMS to HDFS and HDFS to RDBMS
Below is the list of RDBMSs that are supported by Sqoop Currently.
- MySQL
- PostGreSQL
- Oracle
- Microsoft SQL
- IBMโs Netezza
- Teradata
Currently Sqoop Supports data imported into below services.
- HDFS
- Hive
- HBase
- HCatalog
- Accumulo
In Sqoop Majorly Import and export commands are used. But below commands are also useful some times.
- codegen
- eval
- import-all-tables
- job
- list-databases
- list-tables
- merge
- metastore
The Sqoop jar in classpath should be included in the java code. After this the method Sqoop.runTool () method must be invoked. The necessary parameters should be created to Sqoop programmatically just like for command line.
The process to perform incremental data load in Sqoop is to synchronize the modified or updated data (often referred as delta data) from RDBMS to Hadoop. The delta data can be facilitated through the incremental load command in Sqoop.
Incremental load can be performed by using Sqoop import command or by loading the data into hive without overwriting it. The different attributes that need to be specified during incremental load in Sqoop are-
Mode (incremental) โThe mode defines how Sqoop will determine what the new rows are. The mode can have value as Append or Last Modified.
Col (Check-column) โThis attribute specifies the column that should be examined to find out the rows to be imported.
Value (last-value) โThis denotes the maximum value of the check column from the previous import operation.
Hadoop sqoop is a Data transfer tool
Just type the Hadoop sqoop help command
Hadoop sqoop help
Generate code to interact with database records
Import a table definition into Hive
Evaluate a SQL statement and display the results
Export an HDFS directory to a database table
Import a table from a database to HDFS
Import tables from a database to HDFS
List available databases on a server
List available tables in a database
Display version information