quizzes Flashcards

1
Q

Which of the following would be the least effective way to represent a color (e.g., “Pink”) in a dataset used in a predictive modeling task?
a) As a single numeric value of a color temperature scale (Kelvin)
b) As a one-hot nominal value
c) As an ordinal value based on its rank in an alphanumeric sorting of all colors
d) Based on three separate numerical values for Red, Green, and Blue (RGB)

A

As an ordinal value based on its rank in an alphanumeric sorting of all colors. There is no meaningful correlation between a rank and fundamental feature of color.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Consider a dataset with the following structure:

city | state | date | temp |
Berk | CA | 01/25/18 | 11 |

Assuming we wanted to transform this dataset into a dataset with only the features of State, Month, Temperature, with State represented by the longitude and latitude of the State’s capital, Month represented by a one-hot, and temperature left as a numeric: How many total features (columns) would be in this dataset?

A

15

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sum of Squares Error (SSE) can be used with K-means clustering to:

(check all that apply)

K = number of clusters

n = number of data points being clustered

a) Choose a value of K based on the heuristic of the “elbow” method

b) Choose between different clusterings (for a fixed K) produced by starting with different random K-means centroids

c) Find the best K by choosing the K with the minimum SSE for values of K from 1 to n

A

a & b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the range of the silhouette score?

A

[-1, 1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How could a data point have a silhouette coefficient of 0?

A

If the data point is as close to points in its cluster as it is to points in the nearest cluster (not including its own)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How many different assignments of data points to clusters are there given n data points and K clusters? Assume a data point can only belong to a single cluster.

A

K^n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The plot below depicts data points for a dataset of 10 credit card seeking individuals, 6 of whom are considered to be a high credit risk and 4 of whom are considered to be a low credit risk.

What is the starting Gini impurity (index) of this dataset given credit risk as the target?

A

0.48

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If there were equal low credit risk as high credit risk individuals, what would the Gini impurity be of the dataset without any splits?

A

0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If you were creating a decision tree based on this dataset using the C4.5 or CART algorithm, the first step would be to choose an attribute and split point that best partitioned the data points by the target value.

According to the credit risk plot, which attribute and split point would be the best choice among the following options?

A

Age with a split point of 35

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Given enough depth (splits), a decision tree can successfully classify any training dataset with 100% accuracy.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Assume you are a building an image classification neural network to predict an image as either a dog, cat, or turtle. The images are 32x32 pixels and serialized into a vector of 1024 features per image. Assume there is only one hidden layer between the input and output layer. The hidden layer has 10 neurons (nodes). Ignoring bias terms, what is the total number of weights for this network?

A

10,270

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Using a sigmoid as the activation function for a binary class in the output layer, what output value produced by the sigmoid would denote highest uncertainty for a class prediction:

A

+0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What input value into the sigmoid function would produce the highest uncertainty output value?

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A binary classifier needs to predict the question: “Does the patient have lung cancer?” The table below shows a validation dataset labels and predictions. Compute the precision of these predictions:

Sample Number Actual Predicted
1 Normal Cancer
2 Cancer Cancer
3 Cancer Cancer
4 Normal Normal
5 Cancer Normal
Assume “Cancer” represents the positive class, and “Normal” represents the negative class.
Please round your answer to the 2nd decimal place.

[note: precision is a value between 0 and 1]

A

0.67

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In which of the following prediction scenarios would it be appropriate to apply AUC as the metric?

A

When predicting a binary label with a probabilistic prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the benefit of evaluating models using cross-validation instead of an arbitrary train/test split?

A

The benefit of evaluating models using cross-validation is that it allows us to have increased confidence in how generalizable our model is on our test set compared to when we simply perform an arbitrary train/test split. Essentially, It allows us to perform model selection to understand what model architecture to use and which hyper parameters perform best.

17
Q

Is the bias term considered a feature?

A

No

18
Q

Simple aggregation (also known as a simple combiner) differs from bagging in the following ways (check all that apply):
a) bagging requires bootstrapping and simple aggregation does not
b) bagging requires that the same algorithm be used for prediction/classification and simple aggregation does not
c) Simple aggregation uses averaging or majority vote and bagging does not
d) Simple aggregation does not require training a combining function but bagging does

A

a & b

19
Q

What type of ensemble technique uses bootstrapping but modifies the probability of sampling an instance based on how well it was predicted in previously trained models:

A

Boosting

20
Q

Which ensemble method does not allow for parallel training of the models in the ensemble:

A

Boosting