exam 2 Flashcards
A qualitative variable with more than 2 levels is best modeled by which type of distribution?
multinomial
A quantitative variable is best modeled by which type of distribution?
Gaussian
A qualitative variable with 2 levels is best modeled by which distribution?
binomial
The conjugate prior of a multinomial is which kind of distribution?
Dirichlet
The conjugate prior of a binomial is which type of distribution?
Beta
Which of the following types of ML algorithms would typically be used to predict housing prices?
Regression
Which of the following types of ML algorithms would typically be used to predict whether a student is likely to be admitted to university (yes/no)?
Classification
In building a machine learning model with supervised learning, one column will be the _______________ column, and all others will be attributes used to predict that column.
Target
Which of the following terms refers to a row in a data set? Check all that apply.
observation
instance
example
Categorical data (aka qualitative data) can take on real number values.
False
In regression, the target is quantitative data.
True
What is the shape of the following array (as would be returned by the .shape attribute)?
arr1 = np.array([1, 2, 3])
(3,)
A NumPy array was created with the following line of code. What is the shape of the array?
arr1 = np.array([(10,15,45), (21,17,9)])
(2, 3)
Which expression will sum the second column of a 2D array named arr1?
np.sum(arr1[:, 1])
Which array reference would select element 23 from the following 2x2 NumPy array, named arr1?
18 23
21 9
arr1[0, 1]
T/F:
The following two lines of code would retrieve the same column from a pandas data frame.
df[‘A’]
df.A
True
T/F:
The following line of code returns a pandas data frame consisting of just one column extracted from pandas data frame df:
df[‘A’]
False
T/F:
The difference between categorical data and one-hot encoding is that categorical data can be represented in one column and one-hot encoding would require n columns, where n is the number of levels for the category.
True
T/F:
The following line of code trains the kNN algorithm.
knn = KNeighborsClassifier(n_neighbors=7)
False
T/F:
Instead of using Pandas or NumPy, we can use built-in Python lists for data and get similar execution times.
False
Which metric quantifies TP / (TP + FN)?
Recall
How to calculate recall?
TP / (TP + FN)
Which metric tries to adjust for the likelihood that the classifier guessed correctly by chance?
Kappa
How to calculate precision?
TP/(TP + FP)
Which metric quantifies TP / (TP + FP)?
Precision
In Bayes’ theorem, P(data|class) is called the:
likelihood
The naive Bayes algorithm is ‘naive’ because:
It assumes predictors are independent
T/F: Logistic regression is used for regression tasks where the target is a real number.
False
T/F: The terms ‘probability’ and ‘odds’ refer to the same thing mathematically.
False.
What does “Probability” refer to mathematically?
Percentage of an outcome out of all possible outcomes. Range is 0 to 1.
What does “odds” refer to mathematically?
The ratio of (outcome)/(!outcome). Range from 0 to infinity.
T/F: Logistic Regression and Naive Bayes are both considered to be linear classifiers.
True
T/F: Logistic regression is considered to be a generative classifier.
False
T/F: A direct mathematical solution can be found for the loss function in logistic regression.
False
T/F: If your classifier performs well on the training data and poorly on the test data, you have likely overfit the model.
True
Which activation function for a neural network is said to make training faster because it zeroes out negative values?
ReLu
Which activation function for a neural network scales the output to be between 0 and 1?
Sigmoid
Which activation function for a neural network scales the output to be between -1 and +1?
Hyperbolic Tangent
Which activation function for a neural network is only used for the final node of a regression problem?
Linear
T/F: If you have a small amount of training data you should start out with a large neural network (many nodes and layers) and gradually reduce the network size.
False
T/F: The forward pass of a neural network is where weights are adjusted.
False
T/F: An epoch in neural network training is one forward and backward pass for one batch of the data.
True
Which of these nodes in a neural network have a bias term? Select all that are true.
input node
output node
hidden node
This activation is often used for binary classification problems.
Sigmoid
T/F: A neural network will always outperform Naive Bayes and Logistic Regression.
False
T/F: Adding weight regularization usually increases the weights of nodes.
False
T/F: Deep neural networks on small data sets should have many layers and nodes.
False
In Keras, this object represents one transformation step in a deep network.
layer
This activation is often used for multi-class classification problems.
Softmax
In Keras, this object is used to define the network topology.
model
In Keras, this object is used to represent n-dimensional data.
tensor
T/F: An advantage of deep neural networks is that they can learn a representation of the data.
True
What is a convolution in mathematics?
the process of combining two functions to create a new function
In Keras, this defines one forward and backward pass through the data.
epoch
T/F:
The filter size of a CNN should be considered like a hyperparameter that should be tuned during training since it has a significant impact on performance.
True
A dropout layer in deep learning is used to reduce:
Overfitting
This type of model is often the first approach to a deep learning problem.
sequential
T/F: In CNN, multiple filters can learn different features from the data.
True
This type of model was developed to deal with the vanishing or exploding gradient problem.
LSTM
What is pooling in CNN?
Reduces the dimension/size of the data by simplifying it. Tries to maintain the most relevant information.
The convolution process in CNN results in data reduction.
True
This type of model is often used with image/video or text data.
CNN
This type of model is often used with text or time series data.
RNN
This type of model runs fast because it combines the input and forgetting states of complex models.
GRU
What is a CNN?
Convolutional Neural Network. Creates a window that iterates across the data, helps with recognizing patterns (especially in more than one dimension)
What is an RNN?
Recurrent Neural Network. Has memory or state which enables it to learn a sequence. Is looped to produce a new state at each iteration.
What is a GRU?
Gated recurrent unit. Improvement over RNN to help solve the vanishing gradient problem. Faster than LSTM.
What is an LSTM?
Long Short-term Memory. RNN variation which keeps the memory path independent of the backpropagation path, allowing info to be remembered and forgotten. Helps solve the vanishing gradient problem.
What is a sequential model?
Regular feed forward model. What you expect for deep learning: input layer, some hidden layers, and an output layer.
What is a Kappa?
Metric. Tries to account for correct predictions generated by chance. Calculated as [Pr(a) - Pr(e)]/[1 - Pr(e)] where Pr(a) is the actual agreement, and Pr(e) is the expected agreement based on the distribution of the classes. Ranges from about 0 to 1, with low values = poor agreement, and high values = high agreement.
What is the ROC curve?
Reciever operating characteristic curve. Plots True positives vs. false positives (y vs. x). We want curve to shoot to upper left.
What is Logistic Regression?
A method of performing classification. Creates a line and separates data that way. Sets up a sigmoid function based on probabilities, and sets a cut off. Only really works best with binary data but you can use OvA to work with multi-class
How do you use logistic regression for multi-class classification?
Use OvA/One versus All, which builds n classifiers for n classes. All set up as one class versus all of the others.
How are Naive Bayes and Logistic Regression different?
NB is considered a generative classifier because it learns P(Y) as well as P(X|Y), which generated the data.
Logistic Regression is considered a discriminative classifier because it directly learns P(Y|X) from the data.
What is Bias? and what is its trend with model complexity?
Bias is the tendency of an algorithm to make assumptions about the
shape of the data. Decreases as the model gets more complex.
What is Variance? and what is its trend with model complexity?
Variance is the sensitivity of an algorithm to noise in the data. Increases as model gets more complex.
How to do Naive Bayes in sklearn?
MultinomialNB(). Has alpha setting (for laplace smoothing), class_prior for setting class priors, and fit_prior, which learns priors from data if set to True.
How to do a confusion matrix in sklearn?
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test, pred))
How to do a classification report in sklearn?
from sklearn.metrics import classification_report
print(classification_report(y_test, pred))
How to calculate accuracy?
[TP + TN]/[all results]
What is the Conjugate Prior?
It’s, uh, the distribution of a distribution? It integrates to 1 I think