General Flashcards

Question 1

Q

What does the superscript within parentheses mean?

e.g. X⁽ⁱ⁾

Answer

A

The ith row of X

Question 2

Q

What does the symbol ∀ mean?

Answer

A

“for all” or “for any”

Question 3

Q

In a joint probability distribution table, how many rows are there?

Answer

A

One for each possible combination of variable values

Question 4

Q

Given a joint probability distribution, what can we calculate?

Answer

A

Conditional or joint probabilities over any subset of the variables

Question 5

Q

What’s the conditional probability equation?

Question 6

Q

What is bias?
What is variance?

Answer

A

The inability for a ML method to capture the true relationship

Question 7

Q

What does high bias correspond to? (2)

Answer

A

High bias—-> underfitting —–> More train set error

Question 8

Q

What does high variance correspond to?

Answer

A

High variance —–> overfitting —–> More dev set error and more test set error

Question 9

Q

What is the development set (aka dev set)?

Answer

A

It’s another term for “validation set”

Question 10

Q

What’s a useful way to think of bias?

Answer

A

how well does my model fit the training data?

Question 11

Q

What’s a useful way to think of variance?

Answer

A

how well does my model generalize to unseen data sets?

Question 12

Q

Can you have high bias and high variance?

Question 13

Q

Given training error and validation error, how can you assess the bias and variance?

Answer

A

Training error can tell you the bias.
How much higher your validation error is than the training error can tell you the variance.

Question 14

Q

Why can’t we use the validation set for testing performance?

Answer

A

Because we used the validation set to tune our model parameters. If we tested on the validation set, we wouldn’t know whether those performance gains from tuning were beneficial for unseen data, or just improved our performance on the validation set

Question 15

Q

What is the validation set?
What is the test set?

Answer

A

Set you use to pick the best parameters or model to use
Only used to get a metric of how well your model is performing on unseen data

Question 16

Q

What models can we use higher-order features for?

Answer

A

Non-exhaustive list:

Logistic regression
Linear regression

Question 17

Q

What defines a decision boundary?

Answer

A

The hypothesis h and the parameters

Question 18

Q

What do we know about the update rule for logistic regression?

Answer

A

Looks very similar to the update rule for linear regression, except the hypothesis is different

Question 19

Q

Does feature scaling work for logistic regression?

Question 20

Q

How does multiclass logistic regression work?

Answer

A

It’s called one vs all classification
Train multiple classifiers (one for each class)
For each class, take the binary logistic regression prediction for for each point
- Then choose the class with the highest probability output