Midterm Flashcards

Question 1

Q

How is multiclass classification connected to Logistic Regression

Answer

A

In multiclass classifcation, a simular method is used to convert predicted outputs into probabilities called the softmax function. In this function, each class has its own predicted output aka “score” which is the dot product between the input vetcor x and the paramter vector theta. These scores are then converted to probabilities by the softmaxfucntion, ensuring that they sum to one.

Question 2

Q

How do you train a multiclass?

Question 3

Q

How to estimate parameters for likelihood functions using bayes rule? what parameters needed to be calculated to obtain P(Y|x)?

Answer

A

The parameters we need to estimate are the class priors (P(Y): the probablity of each class in your training data) and the liklihoods (P(X|Y): (take the product of the probabilities of all features.).

Question 4

Q

How do you handle continuous and discrete X in NB?

Answer

A

for discrete features, we calculate P(X|Y) using frequency counts ( ec: how many times each feature value appears in class Y)
for continuous features we assume a parametric distribution for P(X|Y) and estimate the parameters for that distribution

Question 5

Q

How many independent parameters do we need to estimate for calculation of joint probabilities? How does NB assumption improve it?

Answer

A

without the NB assumption (Independence), estimating the full joint distribution would require an exponential number of parameters.
with the NB assumption (Independence) the joint likelihood becomes the product the the conditional probabilities for each feature. The total number of parameters grows linearly with the number of feature classes.

Question 6

Q

What are the subtleties of Naive Bayes

Answer

A

Independence Assumption: In practice, features are rarely completely independent. However, NB performs well even when this assumption is violated

Question 7

Q

How is model complexity connected to bias, variance, and test error

Answer

A

A model that is too complexity will cause overfitting. It learns the patterns in the training data too well and fails to generalize to unseen cases. In this case, the model has high variance and low bias. Consequently resulting in low training error and high test error
If a model is too simple then it will fail to capture the patterns in the training data, resulting in high bias and low variance. This also results in high test eror.

Question 8

Q

How does L1 and L2 regualrization affect classifiers?

Answer

A

L1 and L2 regularization simplifies classifiers by controlling the strength of the parameters.
- L1 regularization (LASSO) can bring parameters completely to zero.
-L2 regularization (RIDGE) reduces parameters but does not bring them completely to zero.

Question 9

Q

How does cross validation help us to better generalize?

Answer

A

Instead of relying on a single train test split, cross validation divides the data into multiple folds. The model is trained on a combination of these folds and validated on the remaining fold, ensuring that every data point is used for both training and testing.

Question 10

Q

what is the cost function and what is the log-liklihood of Logistic regression?

Question 11

Q

How do you obtain the gradient descent update rule from cost function

Question 12

Q

how do you get to log-liklihood from h(x)

Question 13

Q

why do we need 0-1 and perceptron loss?

Question 14

Q

What is cross validation?

Answer

A

cross validation is a method to determine how well a model performs on an independent dataset. It involves dividing the data into multiple folds or subset, using one as the validation set and the remaining to train the model. This process is repeated multiple times using a different fold as the validations et each time. Finally, the results from each validation set are averaged to produce a robust estimate of the models performance.

Question 15

Q

How can cross validation be used to to choose the value of a model parameter P?

Question 16

Q

True or False: When evaluating ML algorithms, steeper learning curves are better?

Answer

Study These Flashcards

A

True

Question 17

Q

How does the perceptron algorithm work?

Answer

Study These Flashcards

A

Question 18

Q

What is Support Vector Machine?

Answer

Study These Flashcards

A

SVM is a ML algorithm used for classification and regression. It finds the best line (or hyperplane) to separate data into groups, maxing the distance between the closest points (support vectors)

Question 19

Q

What is Bayes rule for classification?

Answer

Study These Flashcards

A

Question 20

Q

How to train Naive Bayes Classifiers?

Answer

Study These Flashcards

A

Question 21

Q

Answer

Study These Flashcards

A

Midterm Flashcards

(21 cards)