Review Quiz - SAS and Hadoop Flashcards

1
Q

What role does the NameNode play in the Hadoop cluster?

Select one:
a. provides naming conventions used to store data
b. provides developer access to the Hadoop cluster
c. stores metadata about where the data is stored
d. provides table names to sql queries

A

c. stores metadata about where the data is stored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of the HDFS file system?

Select one:
a. be the access point for developers to use the Hadoop cluster
b. provide scalable and reliable data storage across the DataNodes
c. maintain metadata about the Hadoop ecosystem components

A

b. provide scalable and reliable data storage across the DataNodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the purpose of MapReduce in the Hadoop ecosystem?

Select one:
a. perform distributed data processing across the DataNodes
b. manage cluster utilization for jobs and applications
c. provide browser access to the Hadoop jobs

A

a. perform distributed data processing across the DataNodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which Hive component contains table definitions that point to HDFS files? Select all that apply.

Select one:
a. Hive Client
b. Hive Server
c. Hive Metastore
d. Hive Driver

A

c. Hive Metastore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Examine the following Hive database structures.

Which of the following database operations will execute successfully in this Hive structure? Select all that apply.

Select one:
a. create database student;
b. drop database student;
c. create table default.TableB (col1 int);
d. use student; create table dihdm.TableB (col1 int);

A

d. use student; create table dihdm.TableB (col1 int);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When implementing a data governance solution that leverages Hive tables, which table type would you consider the best?

Select one:
a. managed
b. external
c. temporary
d. hidden

A

b. external

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which interface would you use to submit a Pig script from the Client Node command prompt?

Select one:
a. Beeline
b. Grunt
c. Beeswax
d. hdfs dfs

A

b. Grunt

The Grunt shell is the command-line interface for submitting Pig scripts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Is this a valid multi-line comment?

Select one:
a. Yes
b. No

A

a. Yes

The – at the beginning of each comment line is valid for multi-line comments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Will this Pig script execute?

Select one:
a. Yes
b. No

A

b. No

PigStorage is a case-sensitive keyword.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Will this Pig script execute?

Select one:
a. Yes
b. No

A

a. Yes

Positional notation, starting at $0, is valid for reading an entire row into a single field.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the result of dumping the B relation after you execute the following Pig script?

Select one:
a. (x,y),(x,z)
b. (x,y,x,z)
c. (x,y,z)

A

c. (x,y,z)

The FLATTEN operator removes the inside parentheses from the row.

The FLATTEN operator enables you to “un-nest” or “transpose” tuples or bags. For tuples, the FLATTEN operator substitutes the fields of a tuple in place of the tuple. For bags, FLATTEN creates new tuples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When using the SAMPLE keyword in a Pig program, is the data sample ever guaranteed to be the same data?

Select one:
a. Yes
b. No

A

b. No

The SAMPLE keyword returns a random sample of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of the following statements are true? Select all that apply.

Select one or more:
a. The results of the SPLIT operator guarantee that every row is assigned a relation.
b. The results of the SPLIT operator guarantee that every row will reside in only one relation.
c. The results of the SPLIT operator do not have to be in two equal relations.
d. The IF keyword is required using the SPLIT operator.

A

c. The results of the SPLIT operator do not have to be in two equal relations.
d. The IF keyword is required using the SPLIT operator.

The expression used with the required IF keyword determines how relations are split, which might not contain equal rows.

T = LOAD … ;
SPLIT T INTO T1 IF population > 50000,
T2 IF population <= 50000;
STORE T1 INTO ‘output/split1’;
STORE T2 INTO ‘output/split2’;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which specialized join works well if the entire relation can fit into memory?

Select one:
a. merge
b. bloom
c. skewed
d. replicated

A

d. Replicated

Replicated joins require that the entire relation fit into memory. If not, the join fails.

In Pig Latin, replicated joins are a special type of join that works well if one or more relations can fit into memory. If the smaller relation cannot fit into memory, the join fails and generates an error. This join supports the inner and left outer join.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which programming language is most extensively supported when writing Pig user-defined functions?

Select one:
a. Python
b. Ruby
c. Java
d. Groovy

A

c. Java

Java functions have the most extensive support and are very efficient because they are written in the same language as Pig.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What option do you need in your FILENAME statement to indicate that the fileref is reading from a directory in HDFS?

Select one:
a. dir
b. concat
c. all
d. folder

A

b. concat

The concat option is used for reading files in an HDFS directory. The dir option is used for writing to an HDFS directory.

filename in hadoop
‘/user/student/test_table’ concat user=’student’;

17
Q

Does this DATA step read a single HDFS file or a concatenated directory?

Select one:
a. single HDFS file
b. concatenated directory

A

a. single HDFS file

This is a single HDFS file. The concat option was not used in the Hadoop FILENAME statement.

18
Q

Which HDFS command is not a command that you can submit using PROC HADOOP?

Select one:
a. MKDIR
b. CHMOD
c. DELETE
d. RMDIR

A

d. RMDIR

RMDIR is not a valid PROC HADOOP HDFS command. To remove a directory using PROC HADOOP, use the HDFS DELETE command.

19
Q

Which Hive schema is SAS connected to in this code?

Select one:
a. DIHDM
b. DIAHD
c. DEFAULT
d. none

A

c. DEFAULT

When a schema is not included in the connection parameters, the schema called DEFAULT is used.

20
Q

When the CREATE TABLE statement below is executed, which of the following is true?

Select one:
a. A table definition is stored in the Hive metadata.
b. A table definition is stored in the Hive metadata and a data file is created in HDFS.
c. A data file is stored in HDFS in /user/student/test.
d. Data is transferred from /user/student/test into the Hive database.

A

a. A table definition is stored in the Hive metadata.

21
Q

When the CREATE TABLE statement below is executed, which of the following is true?

Select one:
a. The data must already exist in the HDFS directory /user/student/test.
b. The data must not yet exist in the HDFS directory /user/student/test.
c. The data can either exist already or be placed there later.

A

c. The data can either exist already or be placed there later.

22
Q

When using the SAS/ACCESS LIBNAME to Hadoop method to work with Hive tables, the DATA step and SAS procedures can be used.

Select one:
True
False

A

True

Although any DATA step or PROC can be used, it will be important to explore where the processing is happening because of how we’ve written our code.

23
Q

This DS2 DATA program INIT method contains a SET statement that reads orion.banks. If the data set contains three observations, how many times is the SET statement executed?

Select one:
a. 0
b. 1
c. 3
d. cannot be determined

A

b. 1

The INIT system method automatically executes only once, when the DS2 DATA program first begins execution.

24
Q

Will this program execute without producing an error?

Select one:
a. Yes
b. No

A

a. Yes

The PROC DS2 SCOND=NONE option overrides the system DS2SCOND=ERROR.

25
Q

As long as the same program is used to process the same data, the results are always identical. It is not significant whether DS2 is operating in SAS mode or ANSI mode.

Select one:
True
False

A

False

Expressions involving NULL and MISSING values might evaluate differently depending on the DS2 mode.

26
Q

Which DS2 system method would be the best choice for each section of this DATA step?

Select one:
a. 1-INIT, 2-RUN, 3-TERM
b. 2-INIT, 3-RUN, 1-TERM
c. 3-INIT, 1-RUN, 2-TERM
d. cannot be determined

A

c. 3-INIT, 1-RUN, 2-TERM

27
Q

How many parameters are required for a method that calculates SUM(Amount,Amount*Rate) for a specified number of years?

Select one:
a. one
b. two
c. three
d. four
e. other

A

c. Three

One parameter for each variable in the equation (amount and rate) and one to specify the number of years.

28
Q

Will this DS2 DATA program compile and execute properly?

Select one:
a. Yes
b. No

A

b. No

Global DCL statements must be placed before any method definition.

29
Q

When you compound interest annually, three parameters are required. When you compound interest more frequently, four parameters are required. Can you create a single interest method that can work in both of these scenarios?

Select one:
a. Yes
b. No

A

b. No

DS2 methods definitions must specify a fixed number of arguments.

30
Q

Which of the following statements about SAS Data Loader for Hadoop is true?

Select one:
a. The Copy Data to Hadoop Directive uses the Oozie workflow to call PROC SQL.
b. SAS data sets use PROC SQL code and HiveQL to copy data into Hadoop.
c. SAS library data sources enable you to specify how many parallel threads to execute.
d. Typically, the user configures the relational database in SAS Data Loader.

A

b. SAS data sets use PROC SQL code and HiveQL to copy data into Hadoop.

SAS Data Loader is designed to provide business users access to data in the Hadoop cluster, without requiring them to be programmers.

31
Q

Which of the following capabilities are provided by the Query or Join Data in Hadoop directive in SAS Data Loader for Hadoop? Select all that apply.

Select one or more:
a. read a single SAS table
b. query multiple tables after joining them
c. customize the generated PROC SQL code as needed
d. generate a table or view
e. remove duplicate rows
f. sort output data in ascending or descending order

A

a. read a single SAS table
b. query multiple tables after joining them
d. generate a table or view
e. remove duplicate rows
f. sort output data in ascending or descending order

32
Q

In SAS Data Loader for Hadoop, which of the following transforms are available in the Transform Data directive? Select all that apply.

Select one or more:
a. Manage Columns
b. DS2 Advanced Expression
c. Filter Data
d. Summarize Rows

A

a. Manage Columns
c. Filter Data
d. Summarize Rows

33
Q

Match the role that can be used with the SAS Data Loader for Hadoop Transpose Data in Hadoop directive with the description of the role.

Transpose columns >
Columns to Group By >
ID Columns >
Copy columns >

  1. Values from columns used as names for new columns
  2. Columns that you want to transpose to become rows in a target table
  3. Columns identified as distinct groups to be transposed separately
  4. Columns that are copied to the target table without being transposed
A

Transpose columns > 2. Columns that you want to transpose to become rows in a target table
Columns to Group By > 3. Columns identified as distinct groups to be transposed separately
ID Columns > 1. Values from columns used as names for new columns
Copy columns > 4. Columns that are copied to the target table without being transposed

34
Q

Which of the following column values would be good candidates for the Parse Data transformation in SAS Data Loader for Hadoop? Select all that apply.

Select one or more:
a. Mr. Robert A. Smith, Esq. (Name)
b. Gold Medal Flour (Product)
c. 104 North 61st Terrace (Address)
d. 1-800-555-1212 (Phone)
e. Ellicott City (City)

A

a. Mr. Robert A. Smith, Esq. (Name)
c. 104 North 61st Terrace (Address)
d. 1-800-555-1212 (Phone)

35
Q

Which of the following statements is true regarding match code sensitivity? Select all that apply.

Select one or more:
a. Sensitivity does not influence exactitude.
b. Higher sensitivity values specify that data values must be more similar to be declared a match.
c. Sensitivity spans from 50 to 95.
d. The amount of processing applied to the data does not vary with sensitivity.

A

b. Higher sensitivity values specify that data values must be more similar to be declared a match.
c. Sensitivity spans from 50 to 95.

36
Q

Which of the following are valid Pattern Analysis definitions? Select all that apply.

Select one or more:
a. Locale Identification
b. Word
c. Character
d. Word (Script Identification)

A

b. Word
c. Character
d. Word (Script Identification)

37
Q

Which of the following statements is not true regarding the HiveQL language?

Select one:
a. HiveQL queries are converted to MapReduce.
b. HiveQL supports most SQL features and clauses.
c. HiveQL is fully ANSI compliant.
d. HiveQL generates table metadata in the Hive metastore database.

A

c. HiveQL is fully ANSI compliant.