Quiz #1 Flashcards

Exam Prep

1
Q

Errors due to _____ are errors made as a result of choosing a learning algorithm that is not well suited for the data or problem.

A. Bias
B. Variance
C. Sampling
D. Noise

A

A. Bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of these is a regression problem?

A. Can I determine a person’s income based on their age and type of job?
B. Which states have the highest infant mortality rate?
C. How can I group supermarket products using purchase frequency?
D. Identify similarities in shopping patterns between customers of a department store.

A

A. Can I determine a person’s income based on their age and type of job?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Clustering is a type of unsupervised learning.

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The _______ of a dataset represents the number of features in the dataset.

A. Resolution
B. Density
C. Dimensionality
D. Coarseness

A

C. Dimensionality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

As the complexity of a model increases, bias decreases but variance increases.

True
False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of these terms is used to describe the degree to which data exists for each feature of all observations.

A. Density
B. Resolution
C. Dimensionality
D. Coarseness

A

A. Density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

As part of the data transformation process, we sometimes have to discretize our data or create dummy variables. Which of these is a reason why we would need to do this?

A. It helps when trying to fix duplicate data.
B. This is an important step in balancing imbalanced datasets.
C. Some algorithms only work with either continuous or discrete variables.
D. This is an approach to normalize our data set.

A

C. Some algorithms only work with either continuous or discrete variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of these types of visualizations is best to use to explore the correlation between two continuous features?

A. Scatter plot
B. Sankey diagram
C. Histogram
D. Pie chart

A

A. Scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In class we discussed 6 stages in the “Analytic Process”. Which of these is not one of those stages?

A. Data Exploration
B. Validation and Interpretation
C. Data Summarization
D. Modeling

A

C. Data Summarization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A dataset with two class values that is significantly skewed (more than 90%) towards one of those class values is known as _______ dataset.

A. an inverted
B. a bimodal
C. an imbalanced
D. a skewed

A

C. an imbalanced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The method of imputation that fills in missing values using similar instances from the same dataset is known as _________ imputation.

A. Same-deck
B. Cold-deck
C. Hot-deck
D. Warm-deck

A

C. Hot-deck

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The goal of unsupervised learning is to predict future outcomes based on prior experience.
True
False

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
Error due to \_\_\_\_\_\_\_\_ are errors made as a result of not providing the learning algorithm with the right amount or type of training data.
  A. Bias
  B. Sampling
  C. Randomness
  D. Variance
A

D. Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
The attribute or feature that you are trying to predict, which is described by the other features within an instance is known as the \_\_\_\_\_\_\_\_.
  A. Instance
  B. Feature
  C. Class
  D. Dependent variable
A

C. Class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Color, shape, angle and number of edges are examples of nominal (or discrete) features.
True
False

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
Which of these data transformation approaches results in a data set with the mean located at zero.
  A. z-score normalization
  B. min-max normalization
  C. decimal scaling
  D. mean-max normalization
A

A. z-score normalization

17
Q

Missing values can have meaning.
True
False

A

True

18
Q
The primary difference between classification and regression is that classification is used to predict \_\_\_\_\_ values, while regression is used to predict \_\_\_\_\_\_ values.
  A. continuous, discrete
  B. discrete, continuous
  C. nominal, binomial
  D. ordinal, nominal
A

B. discrete, continuous

19
Q
The random sampling method that tries to maintain the same class distribution as the original dataset is known as \_\_\_\_\_\_\_\_\_\_\_.
  A. Systematic random sampling
  B. Purposeful random sampling
  C. Stratified random sampling
  D. Random sampling with replacement
A

C. Stratified random sampling

20
Q

According to the formal definition of machine learning, “A computer program is said to learn from _______ with respect to some class of _______ and performance measure P, if its performance at ________, as measured by P, improves with __________”.

A. experience (E), test (T), test (T), experience (E)
B. tasks (T), experience (E), tasks (T), experience (E)
C. exposure (E), tasks (T), tasks (T), exposure (E)
D. experience (E), tasks (T), tasks (T), experience (E)

A

D. experience (E), tasks (T), tasks (T), experience (E)