Combining and Shaping Data Flashcards
Some of the features of transactional processing and analytical processing are given below. Which is true?
Due to the sheer volume, variety, and the velocity at which big companies generate data, big data processing requires the setting up of a distributed cluster that has multiple machines.
Transactional processing involves analyzing large batches of data whereas analytical processing involves analyzing individual entries in a data set.
Transactional processing is performed in a traditional elational database management system (RDBMS) whereas analytical processing is performed in a data warehouse.
Although the objectives of transactional processing and analytical processing are completely different, both of these objectives can be achieved by the same database system even with huge volumes of data.
Transactional processing is performed in a traditional elational database management system (RDBMS) whereas analytical processing is performed in a data warehouse.
What is the practice of combining many disparate servers, each of limited capacity and running generic hardware called?
Vertical scaling
Horizontal scaling
Online analytical processing (OLAP)
Data warehousing
Horizontal scaling
You are operating on a stream of data with timestamps using a stream processing system and you want to divide your input data into fixed window sizes based on time intervals that overlap. Which window will you use?
Tumbling window
Sliding window
Count window
Global window
Sliding window
Window operations can only be performed on what kinds of data?
Any kind of data
Data associated with timestamps
Streaming data
Batch data
Data associated with timestamps
Which of the following best describes the operation of a left outer join?
A: Each record in the right table will be present in the result, either with a matched record from the left table, or padded with null.
B. Each record in both the left and right tables will be present in the result, either with a matched record from the other table, or padded with null.
C. Each record in the left table will be present in the result, either with a matched record from the right table, or padded with null.
D. Each record in the left table will be present in the result, matched once with each record in the table on the right.
left outer join
C. Each record in the left table will be present in the result, either with a matched record from the right table, or padded with null.
What best describes the operation of an inner join?
Each record in the left table will be present in the result, either with a matched record from the right table, or padded with null.
Each record in both the left and right tables will be present in the result, either with a matched record from the other table, or padded with null.
Each record in the tables that matches (joins) a record in the other table will be present in the result.
The cross join of a table containing N rows with itself will contain how many rows?
N
NxN
2N
1
NxN
When would you choose to represent data in wide format?
When the schema for your data is not defined up front and may change
When the schema, once defined, does not change
When you have dense data with a strict predefined schema
When your data is very small
When the schema for your data is not defined up front and may change
Which of the following are valid techniques you might use to cope with the presence of outliers in the data set?
Set to mean and cap or floor outliers
Cap or floor outliers alone
Delete outlier values and cap or floor outliers
Delete outlier values, set to mean, and cap or floor outliers
Delete outlier values, set to mean, and cap or floor outliers
Which of the following will yield a more balanced, albeit more biased data set?
Oversampling of the least common label
Overfitting
Oversampling of the most common label
Undersampling of the least common label
Oversampling of the least common label