HIVE Flashcards

Question 1

Q

Hive
-It is a tool that provides ____ of data stored in HDFS or Hbase
-___ running on top of MapReduce using the ___
-Orgniannlly developed at ___ and isnot the largest analytical ____

Answer

A

 It is a tool that provides SQL
querying of data stored in
HDFS or Hbase
 Higher level application running
on top of MapReduce using the
HiveQL
 Originally developed at
Facebook and is now the largest
analytical open
-source Apache
project

Question 2

Q

Hive Design
-Accessed using a ____ called
-Transforms unstrucured data into _____
-It is not a database, but it uses _____
-consists of a ___ and data in ___
-Allows for easy ___

Answer

A

 Accessed using a SQL-Like language called HiveQL
 Transforms unstructured data into structures like tables
 It is not a database, but it uses database techniques to store data
 Consists of a schema in Metastore and data in HDFS
 Allows for easy ad-hoc queries

Question 3

Q

Nature of Hive
___ is great for performing analyss
excels at ___
good for verifiying the reuslts of other ____
Commonly used with ____ problems

Answer

A

 Hive is great for performing analysis
 Excels at ad-hoc queries
 Good for verifying the results of other hadoop jobs
 Commonly used with structured query problems

Question 4

Q

HIVE QL
-Combine the ___ standard with item from __ and ___
-More efficient than writing ___ or ___ directly
-___ lines of Hive QL ____ lines of java
-Queries are dun in the _______ as well as the _______
-__ does not support exisiting files, ____ overwrites existing files with a new file

Answer

A

 Combines the SQL-92 standard with items from MySQL and
Oracle’s DBMS
 More efficient than writing MapReduce or Spark jobs directly
 Five lines of HiveQL ~ 200 lines of Java
 Queries are run in the Command-line shell (Beeline) as well as
the Hue Web UI (Hive Query Editor)
 HDFS does not update existing files, HiveQL overwrites
existing files with a new file

Question 5

Q

Hive Features
-Brings _____ analysis to a broader audience
leveraging existing knowledge of SQL
-Provides a ____ for data lakes
-Ability to handle ____types (arrays, maps)
-Uses ___ to as a bridge to other application toolsets like
Tableau, Power BI, and Qlikview

Answer

A

 Brings large-scale data analysis to a broader audience
leveraging existing knowledge of SQL
 Provides a macro schema for data lakes
 Ability to handle complex data types (arrays, maps)
 Uses ODBC to as a bridge to other application toolsets like
Tableau, Power BI, and Qlikview

Question 6

Q

Hive Approach
-Designed to organize ____ files into tables
-_____ to build structure into HDFS
-Table schemas and metadata are stored in a database called the
____
-Table schemas are connected to saved ___ data files
-Hive table definitions are ___ to the machine where they are
created

Answer

A

 Designed to organize unstructured data files into tables
 Uses database techniques to build structure into HDFS
 Table schemas and metadata are stored in a database called the
metastore
 Table schemas are connected to saved HDFS data files
 Hive table definitions are local to the machine where they are
created

Question 7

Q

Hive Use Cases
- ____ treat a directory full of unstructured log files
as a table for query execution
-____ability to query and visualize data instantly
into dashboards used for reporting
-____analyze social media coverage as positive
negative or neutral

Answer

A

Log file analytics - treat a directory full of unstructured log files
as a table for query execution
Business intelligence - ability to query and visualize data instantly
into dashboards used for reporting
Sentiment analysis - analyze social media coverage as positive
negative or neutral

Question 8

Q

Hive Architecture
There are ______ client appraoches to engage hive

Answer

A

There are several unique client approaches to engage hive

Question 9

Q

Hive Clienets
____ - provides clients for Python and Ruby
_____ - Beeline approach
— Setup a _____ and connect with a host and port
—Executing Hive in _____ uses the same ___ as the invoking application
—___ uses this approach
_____ allows BI software to communicate with hive

Answer

A

Thrift application - provides clients for Python and Ruby
JDBC driver - Beeline approach
 Setup a Hive server and connect with a host and port
 Executing Hive in imbedded mode uses the same JVM as the
invoking application
 Beeline uses this approach
ODBC driver - allows business intelligence software to
communicate with Hive

Question 10

Q

Beeline
You can execute ___ statments in the ____
Beeline is an ______ based on the ___ utility
Start Beeline by specifying the URL for a ____

Answer

A

 You can execute HiveQL statements in the Beeline shell
 Beeline is an interactive shell based on the SQLLine utility
 Start Beeline by specifying the URL for a Hive2 server

$ beeline -u jbc:hive2://host:1000
-n username -p password

0: jdbc:hive2://localhost:1000>

Question 11

Q

Beeline Commands
Execute Beeline commands with …. _
Some helpful commands
_____ -connect to a different Hive2 server
_____ -executes a query like myquery.hql
____-shows the full list of commands
____exit the shell
____- show added details of the queries
SQL commands are terminated with a __

Answer

A

 Execute Beeline commands with … !
 Some helpful commands
 !connect url - connect to a different Hive2 server
 !run - executes a query like myquery.hql
 !help - shows the full list of commands
 !exit - exit the shell
 !verbose - show added details of queries
 SQL commands are terminated with a semi-colon ;

Question 12

Q

Command Line Execution
Execute HiveQL directly from command line using ___
Execute files containing HIVEQL code using __

Answer

A

Execute HiveQL directly from command line using -e
$ beeline -u … -e ‘SELECT * FROM users’

Execute files containing HiveQL code using -f
$ beeline -u … -f myquery.hql

Question 13

Q

Hue
____ based user interface
Can be used to access Query data-____
View and manage the metastore - _____

Answer

A

 Web based User Interface
 Can be used to access
—Query data - Hive Query
Editor
—-View and manage the
metastore - Metastore
Manager

Question 14

Q

Hie Query Editor in Hue

Answer

A

Look at pic in slides

Question 15

Q

Hive Execution Engines
-____ is the ____ for Hive
-Mapreduce will create ______ and ))) output files in ____
-____ is the ____ supporting Hive using Directed Acylic Graph
-___ allows writing to the local filestyem or ____

Answer

A

 MapReduce is the default execution engine for Hive
 MapReduce will create processed and saved output files in
HDFS or Parquet
 Spark is the newer execution engine supporting Hive using
Directed Acylic Graph
 Spark allows writing to the local filesystem or maintaining data
in memory

Question 16

Q

Hive Metastore
-____ for query executions
-Hive does not store ___ like a relational database
-Hive creates _____ and the sequence of table ___ that coresspond to the character values in the _____
-_____ for the same file can be stored in the
Hive Metastore
-Hive table defenitions are ____to provide structure_____

Answer

A

Hive Metastore saves table definitions for query executions
 Hive does not store hard tables like a relational database
 Hive creates table definitions and the sequence of table
columns that correspond to the character values in the
unstructured data file
 Multiple table definitions for the same file can be stored in the
Hive Metastore
 Hive table definitions are used on the fly to provide structure
during query execution

Question 17

Q

Schema on Read
-____ and data value organization when query is excecuted
-____ do not exist outside of the query
-___ serving different needs from the same data file
-approach provides ____ as the table structure is not applied when saving the data the data file
More flexible option when the schema is not know at the _____

Answer

A

 Defines table columns and data value organization when a
query is executed
 Table structures do not exist outside of the query
 Multiple table schemas can exist serving different needs from
the same data file
 Approach provides faster data loading as the table structure is
not applied when saving the data file
 More flexible option when the schema is not known at the
time data is saved

Question 18

Q

Embedded Metastore
Central ___ for Hive metadata
Service uses the ___ as the Hive service
Derby is a ____ backed by the ___
Allow on __ at a time

Answer

A

Central repository for Hive metadata
 Service uses the same JVM as the Hive service
 Derby is a tiny database instance backed by the local disk
 Allows one Hive session at a time

Question 19

Q

Local Metastore
Central ___ for Hive metadata
Service uses the same __ as the ___ service
Connects to the database using ___
Uses a ___ compiant database instance
Allows multiple ____ at a time

Answer

A

Central repository for Hive metadata
 Service uses the same JVM as the Hive service
 Connects to the database using separate service
 Uses a JDBC compliant database instance
 Allows multiple Hive sessions at a time

Question 20

Q

Remote Metastore
central repository for _____
one or more ____ servers running seperate from HIve service
connects to the ____ using seprate service
uses ____ compliant database instance with firewall
allows __Hive session that not need ____

Answer

A

 Central repository for Hive metadata
 One or more metastore servers running separate from Hive service
 Connects to the database using separate service
 Uses JDBC compliant database instance with firewall
 Allows multiple Hive sessions that do not need DB credentials

Question 21

Q

Hive Tables
Two types of Hive tables ___ and ___
____-internal to Hive stored in the standard directory
if usage is within ___ only then ___ should be used
____ - where Hive does not manage the data
-if the ____ file is shared application, then ___ shoudl be used

Answer

A

 Two types of Hive tables Managed and External
 Managed Tables - internal to Hive stored in the standard
directory
 If usage is within Hive only then managed should be used
 External Tables - where Hive does not manage the data
 If the HDFS file is shared between applications, then external
should be used

Question 22

Q