HIVE Flashcards
Hive
-It is a tool that provides ____ of data stored in HDFS or Hbase
-___ running on top of MapReduce using the ___
-Orgniannlly developed at ___ and isnot the largest analytical ____
It is a tool that provides SQL
querying of data stored in
HDFS or Hbase
Higher level application running
on top of MapReduce using the
HiveQL
Originally developed at
Facebook and is now the largest
analytical open
-source Apache
project
Hive Design
-Accessed using a ____ called
-Transforms unstrucured data into _____
-It is not a database, but it uses _____
-consists of a ___ and data in ___
-Allows for easy ___
Accessed using a SQL-Like language called HiveQL
Transforms unstructured data into structures like tables
It is not a database, but it uses database techniques to store data
Consists of a schema in Metastore and data in HDFS
Allows for easy ad-hoc queries
Nature of Hive
___ is great for performing analyss
excels at ___
good for verifiying the reuslts of other ____
Commonly used with ____ problems
Hive is great for performing analysis
Excels at ad-hoc queries
Good for verifying the results of other hadoop jobs
Commonly used with structured query problems
HIVE QL
-Combine the ___ standard with item from __ and ___
-More efficient than writing ___ or ___ directly
-___ lines of Hive QL ____ lines of java
-Queries are dun in the _______ as well as the _______
-__ does not support exisiting files, ____ overwrites existing files with a new file
Combines the SQL-92 standard with items from MySQL and
Oracle’s DBMS
More efficient than writing MapReduce or Spark jobs directly
Five lines of HiveQL ~ 200 lines of Java
Queries are run in the Command-line shell (Beeline) as well as
the Hue Web UI (Hive Query Editor)
HDFS does not update existing files, HiveQL overwrites
existing files with a new file
Hive Features
-Brings _____ analysis to a broader audience
leveraging existing knowledge of SQL
-Provides a ____ for data lakes
-Ability to handle ____types (arrays, maps)
-Uses ___ to as a bridge to other application toolsets like
Tableau, Power BI, and Qlikview
Brings large-scale data analysis to a broader audience
leveraging existing knowledge of SQL
Provides a macro schema for data lakes
Ability to handle complex data types (arrays, maps)
Uses ODBC to as a bridge to other application toolsets like
Tableau, Power BI, and Qlikview
Hive Approach
-Designed to organize ____ files into tables
-_____ to build structure into HDFS
-Table schemas and metadata are stored in a database called the
____
-Table schemas are connected to saved ___ data files
-Hive table definitions are ___ to the machine where they are
created
Designed to organize unstructured data files into tables
Uses database techniques to build structure into HDFS
Table schemas and metadata are stored in a database called the
metastore
Table schemas are connected to saved HDFS data files
Hive table definitions are local to the machine where they are
created
Hive Use Cases
- ____ treat a directory full of unstructured log files
as a table for query execution
-____ability to query and visualize data instantly
into dashboards used for reporting
-____analyze social media coverage as positive
negative or neutral
Log file analytics - treat a directory full of unstructured log files
as a table for query execution
Business intelligence - ability to query and visualize data instantly
into dashboards used for reporting
Sentiment analysis - analyze social media coverage as positive
negative or neutral
Hive Architecture
There are ______ client appraoches to engage hive
There are several unique client approaches to engage hive
Hive Clienets
____ - provides clients for Python and Ruby
_____ - Beeline approach
— Setup a _____ and connect with a host and port
—Executing Hive in _____ uses the same ___ as the invoking application
—___ uses this approach
_____ allows BI software to communicate with hive
Thrift application - provides clients for Python and Ruby
JDBC driver - Beeline approach
Setup a Hive server and connect with a host and port
Executing Hive in imbedded mode uses the same JVM as the
invoking application
Beeline uses this approach
ODBC driver - allows business intelligence software to
communicate with Hive
Beeline
You can execute ___ statments in the ____
Beeline is an ______ based on the ___ utility
Start Beeline by specifying the URL for a ____
You can execute HiveQL statements in the Beeline shell
Beeline is an interactive shell based on the SQLLine utility
Start Beeline by specifying the URL for a Hive2 server
$ beeline -u jbc:hive2://host:1000
-n username -p password
0: jdbc:hive2://localhost:1000>
Beeline Commands
Execute Beeline commands with …. _
Some helpful commands
_____ -connect to a different Hive2 server
_____ -executes a query like myquery.hql
____-shows the full list of commands
____exit the shell
____- show added details of the queries
SQL commands are terminated with a __
Execute Beeline commands with … !
Some helpful commands
!connect url - connect to a different Hive2 server
!run - executes a query like myquery.hql
!help - shows the full list of commands
!exit - exit the shell
!verbose - show added details of queries
SQL commands are terminated with a semi-colon ;
Command Line Execution
Execute HiveQL directly from command line using ___
Execute files containing HIVEQL code using __
Execute HiveQL directly from command line using -e
$ beeline -u … -e ‘SELECT * FROM users’
Execute files containing HiveQL code using -f
$ beeline -u … -f myquery.hql
Hue
____ based user interface
Can be used to access Query data-____
View and manage the metastore - _____
Web based User Interface
Can be used to access
—Query data - Hive Query
Editor
—-View and manage the
metastore - Metastore
Manager
Hie Query Editor in Hue
Look at pic in slides
Hive Execution Engines
-____ is the ____ for Hive
-Mapreduce will create ______ and ))) output files in ____
-____ is the ____ supporting Hive using Directed Acylic Graph
-___ allows writing to the local filestyem or ____
MapReduce is the default execution engine for Hive
MapReduce will create processed and saved output files in
HDFS or Parquet
Spark is the newer execution engine supporting Hive using
Directed Acylic Graph
Spark allows writing to the local filesystem or maintaining data
in memory