exam 2 Flashcards

1
Q

A qualitative variable with more than 2 levels is best modeled by which type of distribution?

A

multinomial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A quantitative variable is best modeled by which type of distribution?

A

Gaussian

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A qualitative variable with 2 levels is best modeled by which distribution?

A

binomial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The conjugate prior of a multinomial is which kind of distribution?

A

Dirichlet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The conjugate prior of a binomial is which type of distribution?

A

Beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following types of ML algorithms would typically be used to predict housing prices?

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following types of ML algorithms would typically be used to predict whether a student is likely to be admitted to university (yes/no)?

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In building a machine learning model with supervised learning, one column will be the _______________ column, and all others will be attributes used to predict that column.

A

Target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following terms refers to a row in a data set? Check all that apply.

A

observation
instance
example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Categorical data (aka qualitative data) can take on real number values.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In regression, the target is quantitative data.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the shape of the following array (as would be returned by the .shape attribute)?

arr1 = np.array([1, 2, 3])

A

(3,)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A NumPy array was created with the following line of code. What is the shape of the array?

arr1 = np.array([(10,15,45), (21,17,9)])

A

(2, 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which expression will sum the second column of a 2D array named arr1?

A

np.sum(arr1[:, 1])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which array reference would select element 23 from the following 2x2 NumPy array, named arr1?

18 23

21 9

A

arr1[0, 1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T/F:
The following two lines of code would retrieve the same column from a pandas data frame.

df[‘A’]

df.A

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T/F:
The following line of code returns a pandas data frame consisting of just one column extracted from pandas data frame df:

df[‘A’]

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T/F:
The difference between categorical data and one-hot encoding is that categorical data can be represented in one column and one-hot encoding would require n columns, where n is the number of levels for the category.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

T/F:
The following line of code trains the kNN algorithm.

knn = KNeighborsClassifier(n_neighbors=7)

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

T/F:
Instead of using Pandas or NumPy, we can use built-in Python lists for data and get similar execution times.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which metric quantifies TP / (TP + FN)?

A

Recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How to calculate recall?

A

TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which metric tries to adjust for the likelihood that the classifier guessed correctly by chance?

A

Kappa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How to calculate precision?

A

TP/(TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Which metric quantifies TP / (TP + FP)?
Precision
24
In Bayes' theorem, P(data|class) is called the:
likelihood
25
The naive Bayes algorithm is 'naive' because:
It assumes predictors are independent
26
T/F: Logistic regression is used for regression tasks where the target is a real number.
False
27
T/F: The terms 'probability' and 'odds' refer to the same thing mathematically.
False.
28
What does "Probability" refer to mathematically?
Percentage of an outcome out of all possible outcomes. Range is 0 to 1.
29
What does "odds" refer to mathematically?
The ratio of (outcome)/(!outcome). Range from 0 to infinity.
30
T/F: Logistic Regression and Naive Bayes are both considered to be linear classifiers.
True
31
T/F: Logistic regression is considered to be a generative classifier.
False
31
T/F: A direct mathematical solution can be found for the loss function in logistic regression.
False
32
T/F: If your classifier performs well on the training data and poorly on the test data, you have likely overfit the model.
True
33
Which activation function for a neural network is said to make training faster because it zeroes out negative values?
ReLu
33
Which activation function for a neural network scales the output to be between 0 and 1?
Sigmoid
34
Which activation function for a neural network scales the output to be between -1 and +1?
Hyperbolic Tangent
34
Which activation function for a neural network is only used for the final node of a regression problem?
Linear
35
T/F: If you have a small amount of training data you should start out with a large neural network (many nodes and layers) and gradually reduce the network size.
False
36
T/F: The forward pass of a neural network is where weights are adjusted.
False
37
T/F: An epoch in neural network training is one forward and backward pass for one batch of the data.
True
38
Which of these nodes in a neural network have a bias term? Select all that are true.
input node output node hidden node
39
This activation is often used for binary classification problems.
Sigmoid
39
T/F: A neural network will always outperform Naive Bayes and Logistic Regression.
False
40
T/F: Adding weight regularization usually increases the weights of nodes.
False
41
T/F: Deep neural networks on small data sets should have many layers and nodes.
False
42
In Keras, this object represents one transformation step in a deep network.
layer
43
This activation is often used for multi-class classification problems.
Softmax
44
In Keras, this object is used to define the network topology.
model
45
In Keras, this object is used to represent n-dimensional data.
tensor
46
T/F: An advantage of deep neural networks is that they can learn a representation of the data.
True
47
What is a convolution in mathematics?
the process of combining two functions to create a new function
48
In Keras, this defines one forward and backward pass through the data.
epoch
48
T/F: The filter size of a CNN should be considered like a hyperparameter that should be tuned during training since it has a significant impact on performance.
True
49
A dropout layer in deep learning is used to reduce:
Overfitting
49
This type of model is often the first approach to a deep learning problem.
sequential
50
T/F: In CNN, multiple filters can learn different features from the data.
True
50
This type of model was developed to deal with the vanishing or exploding gradient problem.
LSTM
51
What is pooling in CNN?
Reduces the dimension/size of the data by simplifying it. Tries to maintain the most relevant information.
52
The convolution process in CNN results in data reduction.
True
53
This type of model is often used with image/video or text data.
CNN
53
This type of model is often used with text or time series data.
RNN
54
This type of model runs fast because it combines the input and forgetting states of complex models.
GRU
54
What is a CNN?
Convolutional Neural Network. Creates a window that iterates across the data, helps with recognizing patterns (especially in more than one dimension)
55
What is an RNN?
Recurrent Neural Network. Has memory or state which enables it to learn a sequence. Is looped to produce a new state at each iteration.
56
What is a GRU?
Gated recurrent unit. Improvement over RNN to help solve the vanishing gradient problem. Faster than LSTM.
57
What is an LSTM?
Long Short-term Memory. RNN variation which keeps the memory path independent of the backpropagation path, allowing info to be remembered and forgotten. Helps solve the vanishing gradient problem.
58
What is a sequential model?
Regular feed forward model. What you expect for deep learning: input layer, some hidden layers, and an output layer.
59
What is a Kappa?
Metric. Tries to account for correct predictions generated by chance. Calculated as [Pr(a) - Pr(e)]/[1 - Pr(e)] where Pr(a) is the actual agreement, and Pr(e) is the expected agreement based on the distribution of the classes. Ranges from about 0 to 1, with low values = poor agreement, and high values = high agreement.
60
What is the ROC curve?
Reciever operating characteristic curve. Plots True positives vs. false positives (y vs. x). We want curve to shoot to upper left.
61
What is Logistic Regression?
A method of performing classification. Creates a line and separates data that way. Sets up a sigmoid function based on probabilities, and sets a cut off. Only really works best with binary data but you can use OvA to work with multi-class
62
How do you use logistic regression for multi-class classification?
Use OvA/One versus All, which builds n classifiers for n classes. All set up as one class versus all of the others.
63
How are Naive Bayes and Logistic Regression different?
NB is considered a generative classifier because it learns P(Y) as well as P(X|Y), which generated the data. Logistic Regression is considered a discriminative classifier because it directly learns P(Y|X) from the data.
64
What is Bias? and what is its trend with model complexity?
Bias is the tendency of an algorithm to make assumptions about the shape of the data. Decreases as the model gets more complex.
65
What is Variance? and what is its trend with model complexity?
Variance is the sensitivity of an algorithm to noise in the data. Increases as model gets more complex.
66
How to do Naive Bayes in sklearn?
MultinomialNB(). Has alpha setting (for laplace smoothing), class_prior for setting class priors, and fit_prior, which learns priors from data if set to True.
67
How to do a confusion matrix in sklearn?
from sklearn.metrics import confusion_matrix print(confusion_matrix(y_test, pred))
68
How to do a classification report in sklearn?
from sklearn.metrics import classification_report print(classification_report(y_test, pred))
69
How to calculate accuracy?
[TP + TN]/[all results]
70
What is the Conjugate Prior?
It's, uh, the distribution of a distribution? It integrates to 1 I think