Apache Hive Flashcards
What is Apache Hive?
Apache Hive is a query language.
What do querying languages do?
Query languages use broad terms/instructions that are then turned into a procedural language
What do procedural languages do?
Procedural languages do specific task but even procedural languages very in terms of their “level”.
What are the challenges with MapReduce programming?
1) The ability to conceptually visualize the problem in MapReduce -> not natural
2) Knowledge of a programming language like Java, Python, C++, etc
3) Time and effort in writing and debugging
Why is Hive so popular?
1) It attempts to treat data as tables and use traditional SQL queries to analyze
2) It was developed by Facebook
3) Widely used in industry
4) A client tool
What do we need to load a Hive table?
Table itself (the shell or columns)
Dataset (the actual column values)
What do grouping or aggregate functions allow?
Grouping or aggregate functions allow you to perform calculations on data and get statistical information from your data.
What are the two types of tables in Hive?
Managed and External.
What is a Managed Table?
A managed table has full control over it’s data.
When the table is dropped, so is its data.
What is an External Table?
External table does not have full control over its dataset.
When you drop the table, the dataset is not deleted from HDFS