exam 2 Flashcards by AJ Prosser

A qualitative variable with more than 2 levels is best modeled by which type of distribution?

multinomial

How well did you know this?

Not at all

Perfectly

A quantitative variable is best modeled by which type of distribution?

Gaussian

How well did you know this?

Not at all

Perfectly

A qualitative variable with 2 levels is best modeled by which distribution?

binomial

How well did you know this?

Not at all

Perfectly

The conjugate prior of a multinomial is which kind of distribution?

Dirichlet

How well did you know this?

Not at all

Perfectly

The conjugate prior of a binomial is which type of distribution?

Beta

How well did you know this?

Not at all

Perfectly

Which of the following types of ML algorithms would typically be used to predict housing prices?

Regression

How well did you know this?

Not at all

Perfectly

Which of the following types of ML algorithms would typically be used to predict whether a student is likely to be admitted to university (yes/no)?

Classification

How well did you know this?

Not at all

Perfectly

In building a machine learning model with supervised learning, one column will be the _______________ column, and all others will be attributes used to predict that column.

Target

How well did you know this?

Not at all

Perfectly

Which of the following terms refers to a row in a data set? Check all that apply.

observation
instance
example

How well did you know this?

Not at all

Perfectly

Categorical data (aka qualitative data) can take on real number values.

False

How well did you know this?

Not at all

Perfectly

In regression, the target is quantitative data.

True

How well did you know this?

Not at all

Perfectly

What is the shape of the following array (as would be returned by the .shape attribute)?

arr1 = np.array([1, 2, 3])

(3,)

How well did you know this?

Not at all

Perfectly

A NumPy array was created with the following line of code. What is the shape of the array?

arr1 = np.array([(10,15,45), (21,17,9)])

(2, 3)

How well did you know this?

Not at all

Perfectly

Which expression will sum the second column of a 2D array named arr1?

np.sum(arr1[:, 1])

How well did you know this?

Not at all

Perfectly

Which array reference would select element 23 from the following 2x2 NumPy array, named arr1?

18 23

21 9

arr1[0, 1]

How well did you know this?

Not at all

Perfectly

T/F:
The following two lines of code would retrieve the same column from a pandas data frame.

df[‘A’]

df.A

True

How well did you know this?

Not at all

Perfectly

T/F:
The following line of code returns a pandas data frame consisting of just one column extracted from pandas data frame df:

df[‘A’]

False

How well did you know this?

Not at all

Perfectly

T/F:
The difference between categorical data and one-hot encoding is that categorical data can be represented in one column and one-hot encoding would require n columns, where n is the number of levels for the category.

True

How well did you know this?

Not at all

Perfectly

T/F:
The following line of code trains the kNN algorithm.

knn = KNeighborsClassifier(n_neighbors=7)

False

How well did you know this?

Not at all

Perfectly

T/F:
Instead of using Pandas or NumPy, we can use built-in Python lists for data and get similar execution times.

False

How well did you know this?

Not at all

Perfectly

Which metric quantifies TP / (TP + FN)?

Recall

How well did you know this?

Not at all

Perfectly

How to calculate recall?

TP / (TP + FN)

How well did you know this?

Not at all

Perfectly

Which metric tries to adjust for the likelihood that the classifier guessed correctly by chance?

Kappa

How well did you know this?

Not at all

Perfectly

How to calculate precision?

TP/(TP + FP)

How well did you know this?

Not at all

Perfectly

Which metric quantifies TP / (TP + FP)?

Precision

In Bayes' theorem, P(data|class) is called the:

likelihood

The naive Bayes algorithm is 'naive' because:

It assumes predictors are independent

T/F: Logistic regression is used for regression tasks where the target is a real number.

False

T/F: The terms 'probability' and 'odds' refer to the same thing mathematically.

False.

What does "Probability" refer to mathematically?

Percentage of an outcome out of all possible outcomes. Range is 0 to 1.

What does "odds" refer to mathematically?

The ratio of (outcome)/(!outcome). Range from 0 to infinity.

T/F: Logistic Regression and Naive Bayes are both considered to be linear classifiers.

True

T/F: Logistic regression is considered to be a generative classifier.

False

T/F: A direct mathematical solution can be found for the loss function in logistic regression.

False

T/F: If your classifier performs well on the training data and poorly on the test data, you have likely overfit the model.

True

Which activation function for a neural network is said to make training faster because it zeroes out negative values?

ReLu

Which activation function for a neural network scales the output to be between 0 and 1?

Sigmoid

Which activation function for a neural network scales the output to be between -1 and +1?

Hyperbolic Tangent

Which activation function for a neural network is only used for the final node of a regression problem?

Linear

T/F: If you have a small amount of training data you should start out with a large neural network (many nodes and layers) and gradually reduce the network size.

False

T/F: The forward pass of a neural network is where weights are adjusted.

False

T/F: An epoch in neural network training is one forward and backward pass for one batch of the data.

True

Which of these nodes in a neural network have a bias term? Select all that are true.

input node output node hidden node

This activation is often used for binary classification problems.

Sigmoid

T/F: A neural network will always outperform Naive Bayes and Logistic Regression.

False

T/F: Adding weight regularization usually increases the weights of nodes.

False

T/F: Deep neural networks on small data sets should have many layers and nodes.

False

In Keras, this object represents one transformation step in a deep network.

layer

This activation is often used for multi-class classification problems.

Softmax

In Keras, this object is used to define the network topology.

model

In Keras, this object is used to represent n-dimensional data.

tensor

T/F: An advantage of deep neural networks is that they can learn a representation of the data.

True

What is a convolution in mathematics?

the process of combining two functions to create a new function

In Keras, this defines one forward and backward pass through the data.

epoch

T/F: The filter size of a CNN should be considered like a hyperparameter that should be tuned during training since it has a significant impact on performance.

True

A dropout layer in deep learning is used to reduce:

Overfitting

This type of model is often the first approach to a deep learning problem.

sequential

T/F: In CNN, multiple filters can learn different features from the data.

True

This type of model was developed to deal with the vanishing or exploding gradient problem.

LSTM

What is pooling in CNN?

Reduces the dimension/size of the data by simplifying it. Tries to maintain the most relevant information.

The convolution process in CNN results in data reduction.

True

This type of model is often used with image/video or text data.

CNN

This type of model is often used with text or time series data.

RNN

This type of model runs fast because it combines the input and forgetting states of complex models.

GRU

What is a CNN?

Convolutional Neural Network. Creates a window that iterates across the data, helps with recognizing patterns (especially in more than one dimension)

What is an RNN?

Recurrent Neural Network. Has memory or state which enables it to learn a sequence. Is looped to produce a new state at each iteration.

What is a GRU?

Gated recurrent unit. Improvement over RNN to help solve the vanishing gradient problem. Faster than LSTM.

What is an LSTM?

Long Short-term Memory. RNN variation which keeps the memory path independent of the backpropagation path, allowing info to be remembered and forgotten. Helps solve the vanishing gradient problem.

What is a sequential model?

Regular feed forward model. What you expect for deep learning: input layer, some hidden layers, and an output layer.

What is a Kappa?

Metric. Tries to account for correct predictions generated by chance. Calculated as [Pr(a) - Pr(e)]/[1 - Pr(e)] where Pr(a) is the actual agreement, and Pr(e) is the expected agreement based on the distribution of the classes. Ranges from about 0 to 1, with low values = poor agreement, and high values = high agreement.

What is the ROC curve?

Reciever operating characteristic curve. Plots True positives vs. false positives (y vs. x). We want curve to shoot to upper left.

What is Logistic Regression?

A method of performing classification. Creates a line and separates data that way. Sets up a sigmoid function based on probabilities, and sets a cut off. Only really works best with binary data but you can use OvA to work with multi-class

How do you use logistic regression for multi-class classification?

Use OvA/One versus All, which builds n classifiers for n classes. All set up as one class versus all of the others.

How are Naive Bayes and Logistic Regression different?

NB is considered a generative classifier because it learns P(Y) as well as P(X|Y), which generated the data. Logistic Regression is considered a discriminative classifier because it directly learns P(Y|X) from the data.

What is Bias? and what is its trend with model complexity?

Bias is the tendency of an algorithm to make assumptions about the shape of the data. Decreases as the model gets more complex.

What is Variance? and what is its trend with model complexity?

Variance is the sensitivity of an algorithm to noise in the data. Increases as model gets more complex.

How to do Naive Bayes in sklearn?

MultinomialNB(). Has alpha setting (for laplace smoothing), class_prior for setting class priors, and fit_prior, which learns priors from data if set to True.

How to do a confusion matrix in sklearn?

from sklearn.metrics import confusion_matrix print(confusion_matrix(y_test, pred))

How to do a classification report in sklearn?

from sklearn.metrics import classification_report print(classification_report(y_test, pred))

How to calculate accuracy?

[TP + TN]/[all results]

What is the Conjugate Prior?

It's, uh, the distribution of a distribution? It integrates to 1 I think

exam 2 Flashcards

(81 cards)