Midterm Flashcards

(21 cards)

1
Q

How is multiclass classification connected to Logistic Regression

A

In multiclass classifcation, a simular method is used to convert predicted outputs into probabilities called the softmax function. In this function, each class has its own predicted output aka “score” which is the dot product between the input vetcor x and the paramter vector theta. These scores are then converted to probabilities by the softmaxfucntion, ensuring that they sum to one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you train a multiclass?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to estimate parameters for likelihood functions using bayes rule? what parameters needed to be calculated to obtain P(Y|x)?

A

The parameters we need to estimate are the class priors (P(Y): the probablity of each class in your training data) and the liklihoods (P(X|Y): (take the product of the probabilities of all features.).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you handle continuous and discrete X in NB?

A
  • for discrete features, we calculate P(X|Y) using frequency counts ( ec: how many times each feature value appears in class Y)
  • for continuous features we assume a parametric distribution for P(X|Y) and estimate the parameters for that distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How many independent parameters do we need to estimate for calculation of joint probabilities? How does NB assumption improve it?

A
  • without the NB assumption (Independence), estimating the full joint distribution would require an exponential number of parameters.
  • with the NB assumption (Independence) the joint likelihood becomes the product the the conditional probabilities for each feature. The total number of parameters grows linearly with the number of feature classes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the subtleties of Naive Bayes

A
  • Independence Assumption: In practice, features are rarely completely independent. However, NB performs well even when this assumption is violated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is model complexity connected to bias, variance, and test error

A
  • A model that is too complexity will cause overfitting. It learns the patterns in the training data too well and fails to generalize to unseen cases. In this case, the model has high variance and low bias. Consequently resulting in low training error and high test error
  • If a model is too simple then it will fail to capture the patterns in the training data, resulting in high bias and low variance. This also results in high test eror.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does L1 and L2 regualrization affect classifiers?

A

L1 and L2 regularization simplifies classifiers by controlling the strength of the parameters.
- L1 regularization (LASSO) can bring parameters completely to zero.
-L2 regularization (RIDGE) reduces parameters but does not bring them completely to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does cross validation help us to better generalize?

A

Instead of relying on a single train test split, cross validation divides the data into multiple folds. The model is trained on a combination of these folds and validated on the remaining fold, ensuring that every data point is used for both training and testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the cost function and what is the log-liklihood of Logistic regression?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you obtain the gradient descent update rule from cost function

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how do you get to log-liklihood from h(x)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

why do we need 0-1 and perceptron loss?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is cross validation?

A

cross validation is a method to determine how well a model performs on an independent dataset. It involves dividing the data into multiple folds or subset, using one as the validation set and the remaining to train the model. This process is repeated multiple times using a different fold as the validations et each time. Finally, the results from each validation set are averaged to produce a robust estimate of the models performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can cross validation be used to to choose the value of a model parameter P?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

True or False: When evaluating ML algorithms, steeper learning curves are better?

17
Q

How does the perceptron algorithm work?

18
Q

What is Support Vector Machine?

A

SVM is a ML algorithm used for classification and regression. It finds the best line (or hyperplane) to separate data into groups, maxing the distance between the closest points (support vectors)

19
Q

What is Bayes rule for classification?

20
Q

How to train Naive Bayes Classifiers?