Machine Learning NanoDegree Flashcards

1
Q

apply bias and variance to ( underfitting || overfitting )

A

bias - underfitting, variance - overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define the harmonic mean for (x, y)

A

2xy/x+y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the f1 score

A

the harmonic mean of precision and recall - raised a flag if any of the values are small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is precision

A

the percent of labeled positives that are actually positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is recall

A

the percent of total positives that are label positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is fbeta score?

A

an f1 score that allows biasing towards either precision or recall. beta = 1 = harmonic mean, beta > 1 tends towards recall, 0 < beta < 1 tends towards precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is an ROC curve and how do you interpret it?

A

close to 1 is good, 0.5 is random,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is r2 score and how do you interpret it?

A

the difference between a regression model and the simple averaging of all the points. close to 1 is good, close to 0 is bad

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the point of having a bias node in a NN layer?

A

To provide the constant or intercept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the ‘perceptron trick’ to get a line to move closer to a point?

A

subtract the the point vector (plus one for bias) time the learning rate from the linear equation if the point is negative labeled positive, add if the point is positive labeled negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the formula for multi-class entropy?

A
  • sum(for i in p){ p[i] * log2(p[i]) }
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does ‘naive’ refer to in naive bayes?

A

assuming that all variables are independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

a function must be ___ not ___ in order to be optimized

A

continuous, discreet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

describe l2 regularization, including its alternate name

A

also called ridge regression, l2 regularization adds the square of the coefficients to the cost function, perhaps scaled by lambda. This works to penalize the model for being too complex and reduce overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

describe l1 regularization, including its alternate name

A

also called lasso regression, l1 regularization adds the absolute value of the coefficients to the cost function, perhaps scaled by lambda. This works to penalize the model for being too complex and reduce overfitting. Reduces less important features to 0 and thus may be suitable for feature engineering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a polynomial kernal and what does degree refer to?

A

A polynomial kernal projects a two dimensional function into 5 dimensions by adding terms x^2 , xy, and y^2. Higher degree polynomials add more exponents and combinations and therefore higher dimensions.

17
Q

is softmax for n=2 the same as sigmoid activation ?

A

yes

18
Q

how is softmax defined?

A

for i, softmax = ei / sum(e0…e**n)

19
Q

What is cross entropy ?

A

-log(P)

20
Q

What is the chain rule?

A

the partial derivative of a series of composed functions is equal to the product of the partial derivative of each of the functions

21
Q

What is a monotonic function?

A

A function which is either entirely non-decreasing or non-increasing.

22
Q

What is early stopping?

A

Stop training when the cross - val error starts to increase

23
Q

how do you use a nn for regression instead of classification?

A

remove the final activation function and let it return the result of the last layer.