Representing, Processing, and Preparing Data Flashcards

1
Q

You are looking for fast prototyping and do not want to use code. What tool is a good choice for you to explore and work with data?

  • AutoML
  • Python and Pandas
  • Spark
  • Excel Spreadsheet
A
  • Excel Spreadsheet
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a weakness of mean substitution as an imputation technique for missing data?

It reduces the strength of correlations that exist in the data.
It increases the strength of correlations that exist in the data.
It reduces bias in the data.
It increases bias in the data.

A

It reduces the strength of correlations that exist in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is standardization applied to?

  • Rows in a data set
  • Individual features
  • A feature vector
  • A three-dimensional matrix
A
  • Individual features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which scaler subtracts the median from each data point?

  • RobustScaler
  • Max-abs scaler
  • Min-max scaler
  • StandardScaler
A

Robust scaler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following measures of dispersion is most robust (least vulnerable) to outliers?

Range
Inter-quartile range (IQR)
Median
Variance

A

Inter-quartile range (IQR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which operation is helpful in simplifying the calculation of cosine similarity?

Standardization
Box-Cox transformation
Power transformation
Normalization

A

Normalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Two vectors are oriented at 90 degrees to each other. What is their cosine similarity?

1
-1
90
0

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the practice of combining many disparate servers, each of limited capacity and running generic hardware called?

Vertical scaling

Horizontal scaling

Data warehousing

Online analytical processing (OLAP)

A

Horizontal scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the two sets of statistical tools that a data analyst can use?

Descriptive statistics and inferential statistics

Alternating statistics and data statistics

Inferential statistics and data statistics

Alternating statistics and descriptive statistics

A

Descriptive statistics and inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following is not a valid imputation technique to deal with missing data?

Fill in the mean of the data set.

Fill in values from within the range.

Interpolate values using a model.

Last observation carried forward.

A

Fill in values from within the range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly