Sqoop Flashcards
What is Sqoop?
Sqoop is a tool designed to transfer data between Hadoop and relational database servers.
What is the syntax to import data from a RDBMS (e.g. mysql) table into a hive warehouse directory into a specified path, while also deleting that target directory if it already existed
sqoop import –connect jdbc:mysql://localhost/movielens –driver com.mysql.jdbc.Driver –table movies -m 1 –target-dir /user/maria_dev/all_tables –delete-target-dir
The command above will extract/get all the tables from movielens Database to hadoop/hdfs. “–target-dir” flag will create new directory/folder under HDFS and “–delete-target” will delete directory/folder if declared directory/folder exists.
Syntax to extract/get all the tables from a table Database and transfer to hadoop/hdfs. The “–warehouse” flag will create new directory/folder under HDFS
sqoop import-all-tables –connect jdbc:mysql://localhost/movielens –driver com.mysql.jdbc.Driver -m 1 –warehouse /movielens_all_tables/ –delete-target-dir
When you run Sqoop which application executes the query?
RDBMS
What is the sqoop syntax to create create a hive table and load inside the Hive warehouse directory means under HDFS directory.
sqoop import
- -connect jdbc:mysql://localhost/movielens
- -driver com.mysql.jdbc.Driver
- -username root –password password
- -query “SELECT * FROM movies WHERE $CONDITIONS limit 100”
- -split-by id
- -target-dir /user/maria_dev/test/
- -hive-import
- -delete-target-dir {delete-target-dir isn’t neccessary}
- -hive-database default
- -hive-table new_movies
- -create-hive-table
- -fields-terminated-by ‘,’
What is the SQL syntax used for just the Sqoop query? (not the surrounding code)
- -query “SELECT * FROM _____ WHERE $CONDITIONS”
- -split-by id
- -hive-import
- -hive-database ____ –hive-table ____ —fields-terminated-by ‘,’
what does the hive flag accomplish in the Data warehouse?
Data will be stored in the hdfs directory then automatically move into HIVE Warehouse directory.
what is the syntax to access SQL
service mysql start {if needed}
start mysql -u root -p
password
What is CDC?
Change data capture
Q: How can you execute a free form SQL query in Sqoop to import the rows in a sequential manner?
A: his can be accomplished using the –m 1 option in the Sqoop import command. It will create only one MapReduce task which will then import rows serially.