Quiz #1 Flashcards

Question 1

Q

Errors due to _____ are errors made as a result of choosing a learning algorithm that is not well suited for the data or problem.

A. Bias
B. Variance
C. Sampling
D. Noise

Question 2

Q

Which of these is a regression problem?

A. Can I determine a person’s income based on their age and type of job?
B. Which states have the highest infant mortality rate?
C. How can I group supermarket products using purchase frequency?
D. Identify similarities in shopping patterns between customers of a department store.

Answer

A

A. Can I determine a person’s income based on their age and type of job?

Question 3

Q

Clustering is a type of unsupervised learning.

True
False

Question 4

Q

The _______ of a dataset represents the number of features in the dataset.

A. Resolution
B. Density
C. Dimensionality
D. Coarseness

Answer

A

C. Dimensionality

Question 5

Q

As the complexity of a model increases, bias decreases but variance increases.

True
False

Question 6

Q

Which of these terms is used to describe the degree to which data exists for each feature of all observations.

A. Density
B. Resolution
C. Dimensionality
D. Coarseness

Answer

A

A. Density

Question 7

Q

As part of the data transformation process, we sometimes have to discretize our data or create dummy variables. Which of these is a reason why we would need to do this?

A. It helps when trying to fix duplicate data.
B. This is an important step in balancing imbalanced datasets.
C. Some algorithms only work with either continuous or discrete variables.
D. This is an approach to normalize our data set.

Answer

A

C. Some algorithms only work with either continuous or discrete variables.

Question 8

Q

Which of these types of visualizations is best to use to explore the correlation between two continuous features?

A. Scatter plot
B. Sankey diagram
C. Histogram
D. Pie chart

Answer

A

A. Scatter plot

Question 9

Q

In class we discussed 6 stages in the “Analytic Process”. Which of these is not one of those stages?

A. Data Exploration
B. Validation and Interpretation
C. Data Summarization
D. Modeling

Answer

A

C. Data Summarization

Question 10

Q

A dataset with two class values that is significantly skewed (more than 90%) towards one of those class values is known as _______ dataset.

A. an inverted
B. a bimodal
C. an imbalanced
D. a skewed

Answer

A

C. an imbalanced

Question 11

Q

The method of imputation that fills in missing values using similar instances from the same dataset is known as _________ imputation.

A. Same-deck
B. Cold-deck
C. Hot-deck
D. Warm-deck

Answer

A

C. Hot-deck

Question 12

Q

The goal of unsupervised learning is to predict future outcomes based on prior experience.
True
False

Question 13

Q

Error due to \_\_\_\_\_\_\_\_ are errors made as a result of not providing the learning algorithm with the right amount or type of training data.
  A. Bias
  B. Sampling
  C. Randomness
  D. Variance

Answer

A

D. Variance

Question 14

Q

The attribute or feature that you are trying to predict, which is described by the other features within an instance is known as the \_\_\_\_\_\_\_\_.
  A. Instance
  B. Feature
  C. Class
  D. Dependent variable

Question 15

Q

Color, shape, angle and number of edges are examples of nominal (or discrete) features.
True
False

Question 16

Q

Which of these data transformation approaches results in a data set with the mean located at zero.
  A. z-score normalization
  B. min-max normalization
  C. decimal scaling
  D. mean-max normalization

Answer

A

A. z-score normalization

Question 17

Q

Missing values can have meaning.
True
False

Question 18

Q

The primary difference between classification and regression is that classification is used to predict \_\_\_\_\_ values, while regression is used to predict \_\_\_\_\_\_ values.
  A. continuous, discrete
  B. discrete, continuous
  C. nominal, binomial
  D. ordinal, nominal

Answer

A

B. discrete, continuous

Question 19

Q

The random sampling method that tries to maintain the same class distribution as the original dataset is known as \_\_\_\_\_\_\_\_\_\_\_.
  A. Systematic random sampling
  B. Purposeful random sampling
  C. Stratified random sampling
  D. Random sampling with replacement

Answer

A

C. Stratified random sampling

Question 20

Q

According to the formal definition of machine learning, “A computer program is said to learn from _______ with respect to some class of _______ and performance measure P, if its performance at ________, as measured by P, improves with __________”.

A. experience (E), test (T), test (T), experience (E)
B. tasks (T), experience (E), tasks (T), experience (E)
C. exposure (E), tasks (T), tasks (T), exposure (E)
D. experience (E), tasks (T), tasks (T), experience (E)

Answer

A

D. experience (E), tasks (T), tasks (T), experience (E)

Quiz #1 Flashcards

Exam Prep