Review Quiz - SAS and Hadoop Flashcards
What role does the NameNode play in the Hadoop cluster?
Select one:
a. provides naming conventions used to store data
b. provides developer access to the Hadoop cluster
c. stores metadata about where the data is stored
d. provides table names to sql queries
c. stores metadata about where the data is stored
What is the purpose of the HDFS file system?
Select one:
a. be the access point for developers to use the Hadoop cluster
b. provide scalable and reliable data storage across the DataNodes
c. maintain metadata about the Hadoop ecosystem components
b. provide scalable and reliable data storage across the DataNodes
What is the purpose of MapReduce in the Hadoop ecosystem?
Select one:
a. perform distributed data processing across the DataNodes
b. manage cluster utilization for jobs and applications
c. provide browser access to the Hadoop jobs
a. perform distributed data processing across the DataNodes
Which Hive component contains table definitions that point to HDFS files? Select all that apply.
Select one:
a. Hive Client
b. Hive Server
c. Hive Metastore
d. Hive Driver
c. Hive Metastore
Examine the following Hive database structures.
Which of the following database operations will execute successfully in this Hive structure? Select all that apply.
Select one:
a. create database student;
b. drop database student;
c. create table default.TableB (col1 int);
d. use student; create table dihdm.TableB (col1 int);
d. use student; create table dihdm.TableB (col1 int);
When implementing a data governance solution that leverages Hive tables, which table type would you consider the best?
Select one:
a. managed
b. external
c. temporary
d. hidden
b. external
Which interface would you use to submit a Pig script from the Client Node command prompt?
Select one:
a. Beeline
b. Grunt
c. Beeswax
d. hdfs dfs
b. Grunt
The Grunt shell is the command-line interface for submitting Pig scripts.
Is this a valid multi-line comment?
Select one:
a. Yes
b. No
a. Yes
The – at the beginning of each comment line is valid for multi-line comments.
Will this Pig script execute?
Select one:
a. Yes
b. No
b. No
PigStorage is a case-sensitive keyword.
Will this Pig script execute?
Select one:
a. Yes
b. No
a. Yes
Positional notation, starting at $0, is valid for reading an entire row into a single field.
What is the result of dumping the B relation after you execute the following Pig script?
Select one:
a. (x,y),(x,z)
b. (x,y,x,z)
c. (x,y,z)
c. (x,y,z)
The FLATTEN operator removes the inside parentheses from the row.
The FLATTEN operator enables you to “un-nest” or “transpose” tuples or bags. For tuples, the FLATTEN operator substitutes the fields of a tuple in place of the tuple. For bags, FLATTEN creates new tuples.
When using the SAMPLE keyword in a Pig program, is the data sample ever guaranteed to be the same data?
Select one:
a. Yes
b. No
b. No
The SAMPLE keyword returns a random sample of data.
Which of the following statements are true? Select all that apply.
Select one or more:
a. The results of the SPLIT operator guarantee that every row is assigned a relation.
b. The results of the SPLIT operator guarantee that every row will reside in only one relation.
c. The results of the SPLIT operator do not have to be in two equal relations.
d. The IF keyword is required using the SPLIT operator.
c. The results of the SPLIT operator do not have to be in two equal relations.
d. The IF keyword is required using the SPLIT operator.
The expression used with the required IF keyword determines how relations are split, which might not contain equal rows.
T = LOAD … ;
SPLIT T INTO T1 IF population > 50000,
T2 IF population <= 50000;
STORE T1 INTO ‘output/split1’;
STORE T2 INTO ‘output/split2’;
Which specialized join works well if the entire relation can fit into memory?
Select one:
a. merge
b. bloom
c. skewed
d. replicated
d. Replicated
Replicated joins require that the entire relation fit into memory. If not, the join fails.
In Pig Latin, replicated joins are a special type of join that works well if one or more relations can fit into memory. If the smaller relation cannot fit into memory, the join fails and generates an error. This join supports the inner and left outer join.
Which programming language is most extensively supported when writing Pig user-defined functions?
Select one:
a. Python
b. Ruby
c. Java
d. Groovy
c. Java
Java functions have the most extensive support and are very efficient because they are written in the same language as Pig.