Sqoop Flashcards

1
Q

What is Sqoop?

A

Sqoop is a tool designed to transfer data between Hadoop and relational database servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the syntax to import data from a RDBMS (e.g. mysql) table into a hive warehouse directory into a specified path, while also deleting that target directory if it already existed

A

sqoop import –connect jdbc:mysql://localhost/movielens –driver com.mysql.jdbc.Driver –table movies -m 1 –target-dir /user/maria_dev/all_tables –delete-target-dir

The command above will extract/get all the tables from movielens Database to hadoop/hdfs. “–target-dir” flag will create new directory/folder under HDFS and “–delete-target” will delete directory/folder if declared directory/folder exists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Syntax to extract/get all the tables from a table Database and transfer to hadoop/hdfs. The “–warehouse” flag will create new directory/folder under HDFS

A

sqoop import-all-tables –connect jdbc:mysql://localhost/movielens –driver com.mysql.jdbc.Driver -m 1 –warehouse /movielens_all_tables/ –delete-target-dir

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When you run Sqoop which application executes the query?

A

RDBMS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the sqoop syntax to create create a hive table and load inside the Hive warehouse directory means under HDFS directory.

A

sqoop import

  • -connect jdbc:mysql://localhost/movielens
  • -driver com.mysql.jdbc.Driver
  • -username root –password password
  • -query “SELECT * FROM movies WHERE $CONDITIONS limit 100”
  • -split-by id
  • -target-dir /user/maria_dev/test/
  • -hive-import
  • -delete-target-dir {delete-target-dir isn’t neccessary}
  • -hive-database default
  • -hive-table new_movies
  • -create-hive-table
  • -fields-terminated-by ‘,’
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the SQL syntax used for just the Sqoop query? (not the surrounding code)

A
  • -query “SELECT * FROM _____ WHERE $CONDITIONS”
  • -split-by id
  • -hive-import
  • -hive-database ____ –hive-table ____ —fields-terminated-by ‘,’
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does the hive flag accomplish in the Data warehouse?

A

Data will be stored in the hdfs directory then automatically move into HIVE Warehouse directory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the syntax to access SQL

A

service mysql start {if needed}
start mysql -u root -p
password

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is CDC?

A

Change data capture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Q: How can you execute a free form SQL query in Sqoop to import the rows in a sequential manner?

A

A: his can be accomplished using the –m 1 option in the Sqoop import command. It will create only one MapReduce task which will then import rows serially.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly