exam 2 Flashcards

1
Q

A qualitative variable with more than 2 levels is best modeled by which type of distribution?

A

multinomial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A quantitative variable is best modeled by which type of distribution?

A

Gaussian

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A qualitative variable with 2 levels is best modeled by which distribution?

A

binomial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The conjugate prior of a multinomial is which kind of distribution?

A

Dirichlet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The conjugate prior of a binomial is which type of distribution?

A

Beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following types of ML algorithms would typically be used to predict housing prices?

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following types of ML algorithms would typically be used to predict whether a student is likely to be admitted to university (yes/no)?

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In building a machine learning model with supervised learning, one column will be the _______________ column, and all others will be attributes used to predict that column.

A

Target

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following terms refers to a row in a data set? Check all that apply.

A

observation
instance
example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Categorical data (aka qualitative data) can take on real number values.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In regression, the target is quantitative data.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the shape of the following array (as would be returned by the .shape attribute)?

arr1 = np.array([1, 2, 3])

A

(3,)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A NumPy array was created with the following line of code. What is the shape of the array?

arr1 = np.array([(10,15,45), (21,17,9)])

A

(2, 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which expression will sum the second column of a 2D array named arr1?

A

np.sum(arr1[:, 1])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which array reference would select element 23 from the following 2x2 NumPy array, named arr1?

18 23

21 9

A

arr1[0, 1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T/F:
The following two lines of code would retrieve the same column from a pandas data frame.

df[‘A’]

df.A

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T/F:
The following line of code returns a pandas data frame consisting of just one column extracted from pandas data frame df:

df[‘A’]

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T/F:
The difference between categorical data and one-hot encoding is that categorical data can be represented in one column and one-hot encoding would require n columns, where n is the number of levels for the category.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

T/F:
The following line of code trains the kNN algorithm.

knn = KNeighborsClassifier(n_neighbors=7)

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

T/F:
Instead of using Pandas or NumPy, we can use built-in Python lists for data and get similar execution times.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which metric quantifies TP / (TP + FN)?

A

Recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How to calculate recall?

A

TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which metric tries to adjust for the likelihood that the classifier guessed correctly by chance?

A

Kappa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How to calculate precision?

A

TP/(TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Which metric quantifies TP / (TP + FP)?

A

Precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In Bayes’ theorem, P(data|class) is called the:

A

likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

The naive Bayes algorithm is ‘naive’ because:

A

It assumes predictors are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

T/F: Logistic regression is used for regression tasks where the target is a real number.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

T/F: The terms ‘probability’ and ‘odds’ refer to the same thing mathematically.

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does “Probability” refer to mathematically?

A

Percentage of an outcome out of all possible outcomes. Range is 0 to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What does “odds” refer to mathematically?

A

The ratio of (outcome)/(!outcome). Range from 0 to infinity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

T/F: Logistic Regression and Naive Bayes are both considered to be linear classifiers.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

T/F: Logistic regression is considered to be a generative classifier.

A

False

31
Q

T/F: A direct mathematical solution can be found for the loss function in logistic regression.

A

False

32
Q

T/F: If your classifier performs well on the training data and poorly on the test data, you have likely overfit the model.

A

True

33
Q

Which activation function for a neural network is said to make training faster because it zeroes out negative values?

A

ReLu

33
Q

Which activation function for a neural network scales the output to be between 0 and 1?

A

Sigmoid

34
Q

Which activation function for a neural network scales the output to be between -1 and +1?

A

Hyperbolic Tangent

34
Q

Which activation function for a neural network is only used for the final node of a regression problem?

A

Linear

35
Q

T/F: If you have a small amount of training data you should start out with a large neural network (many nodes and layers) and gradually reduce the network size.

A

False

36
Q

T/F: The forward pass of a neural network is where weights are adjusted.

A

False

37
Q

T/F: An epoch in neural network training is one forward and backward pass for one batch of the data.

A

True

38
Q

Which of these nodes in a neural network have a bias term? Select all that are true.

A

input node
output node
hidden node

39
Q

This activation is often used for binary classification problems.

A

Sigmoid

39
Q

T/F: A neural network will always outperform Naive Bayes and Logistic Regression.

A

False

40
Q

T/F: Adding weight regularization usually increases the weights of nodes.

A

False

41
Q

T/F: Deep neural networks on small data sets should have many layers and nodes.

A

False

42
Q

In Keras, this object represents one transformation step in a deep network.

A

layer

43
Q

This activation is often used for multi-class classification problems.

A

Softmax

44
Q

In Keras, this object is used to define the network topology.

A

model

45
Q

In Keras, this object is used to represent n-dimensional data.

A

tensor

46
Q

T/F: An advantage of deep neural networks is that they can learn a representation of the data.

A

True

47
Q

What is a convolution in mathematics?

A

the process of combining two functions to create a new function

48
Q

In Keras, this defines one forward and backward pass through the data.

A

epoch

48
Q

T/F:
The filter size of a CNN should be considered like a hyperparameter that should be tuned during training since it has a significant impact on performance.

A

True

49
Q

A dropout layer in deep learning is used to reduce:

A

Overfitting

49
Q

This type of model is often the first approach to a deep learning problem.

A

sequential

50
Q

T/F: In CNN, multiple filters can learn different features from the data.

A

True

50
Q

This type of model was developed to deal with the vanishing or exploding gradient problem.

A

LSTM

51
Q

What is pooling in CNN?

A

Reduces the dimension/size of the data by simplifying it. Tries to maintain the most relevant information.

52
Q

The convolution process in CNN results in data reduction.

A

True

53
Q

This type of model is often used with image/video or text data.

A

CNN

53
Q

This type of model is often used with text or time series data.

A

RNN

54
Q

This type of model runs fast because it combines the input and forgetting states of complex models.

A

GRU

54
Q

What is a CNN?

A

Convolutional Neural Network. Creates a window that iterates across the data, helps with recognizing patterns (especially in more than one dimension)

55
Q

What is an RNN?

A

Recurrent Neural Network. Has memory or state which enables it to learn a sequence. Is looped to produce a new state at each iteration.

56
Q

What is a GRU?

A

Gated recurrent unit. Improvement over RNN to help solve the vanishing gradient problem. Faster than LSTM.

57
Q

What is an LSTM?

A

Long Short-term Memory. RNN variation which keeps the memory path independent of the backpropagation path, allowing info to be remembered and forgotten. Helps solve the vanishing gradient problem.

58
Q

What is a sequential model?

A

Regular feed forward model. What you expect for deep learning: input layer, some hidden layers, and an output layer.

59
Q

What is a Kappa?

A

Metric. Tries to account for correct predictions generated by chance. Calculated as [Pr(a) - Pr(e)]/[1 - Pr(e)] where Pr(a) is the actual agreement, and Pr(e) is the expected agreement based on the distribution of the classes. Ranges from about 0 to 1, with low values = poor agreement, and high values = high agreement.

60
Q

What is the ROC curve?

A

Reciever operating characteristic curve. Plots True positives vs. false positives (y vs. x). We want curve to shoot to upper left.

61
Q

What is Logistic Regression?

A

A method of performing classification. Creates a line and separates data that way. Sets up a sigmoid function based on probabilities, and sets a cut off. Only really works best with binary data but you can use OvA to work with multi-class

62
Q

How do you use logistic regression for multi-class classification?

A

Use OvA/One versus All, which builds n classifiers for n classes. All set up as one class versus all of the others.

63
Q

How are Naive Bayes and Logistic Regression different?

A

NB is considered a generative classifier because it learns P(Y) as well as P(X|Y), which generated the data.

Logistic Regression is considered a discriminative classifier because it directly learns P(Y|X) from the data.

64
Q

What is Bias? and what is its trend with model complexity?

A

Bias is the tendency of an algorithm to make assumptions about the
shape of the data. Decreases as the model gets more complex.

65
Q

What is Variance? and what is its trend with model complexity?

A

Variance is the sensitivity of an algorithm to noise in the data. Increases as model gets more complex.

66
Q

How to do Naive Bayes in sklearn?

A

MultinomialNB(). Has alpha setting (for laplace smoothing), class_prior for setting class priors, and fit_prior, which learns priors from data if set to True.

67
Q

How to do a confusion matrix in sklearn?

A

from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test, pred))

68
Q

How to do a classification report in sklearn?

A

from sklearn.metrics import classification_report
print(classification_report(y_test, pred))

69
Q

How to calculate accuracy?

A

[TP + TN]/[all results]

70
Q

What is the Conjugate Prior?

A

It’s, uh, the distribution of a distribution? It integrates to 1 I think