Final Test Flashcards
What does Entropy mean ?
It’s the degree of disorder
How random it is
Entropy’s formula
Entropy([6+,2-])
-(6/8)log2(6/8) - (2/8)log2(2/8)
=0.8113
Entropy([0+,4-]) with log2
0
Entropy([4+,4-]) with log2
1
What are the main concepts of Backpropagation ?
It optimizes the Weight and Biases of a Neural Network
It starts from the last parameter and works its way inward
What’s the Chain Rule ?
dSize = dSize x dHeight
———– ——— ———–
dFat dHeight dFat
Sigmoid function’s formula
1 + pow(e,-(x))
What is the use of Gradient Descent ?
Calculate better parameters for prediction.
Take bigger steps when far, and small steps when the parameter is close
finds the minimum value by taking steps from an initial guess until it reaches the best value
=> Good when derivative = 0
What is the meaning of SSR ?
Sum of Squared Residuals
1.1² + 0.4² + (-1.3)² = …
What are the steps of Gradient Descent ?
When making Predictions
1. Choose the Loss function
2. Calculate SSR (how different we are)
3. Take derivative of SSR
4. Pick random value for the intercept
5. Calculate derivative using that intercept
6. Calculate the Step Size
7. Calculate the New Intercept
8. Use new intercept and repeat 5 to 8
9. Stop when Step Size is close to 0
SSR’s formula
(observed1 - predicted1)² +
(observed2 - predicted2)² + …
(1.4 - (intercept + 0.64 x 0.5))²
observed = real value
predicted = equation line
= (intercept + 0.64 x axis val)
= (intercept + 0.64 x 0.5)
Derivative of SSR with respect to the intercept
= derivative of all parts
=> Use CHAIN RULE
= -2 (1.4 - (intercept + 0.64 x 0.5))
Exemple of Loss function
SSR
What is the intercept in Gradient Descent
It’s the Y value that touches the line
How to calculate the Step Size in Gradient Descent
Step Size = Slope x Learning rate
How to calculate the New Intercept in Gradient Descent
New Intercept = Old Intercept - Step Size
What happens when we do a gradient descent for 2 variables
By calculating for the Intercept and the Slope at the same time :
Gives a 3D graph with SSR, intercept & Slope
How does the backpropagation finds the value of the last bias ?
Goal : Get the least Loss by using Gradient Descent
1. Need to calculate (d SSR / d bias3)
Because SSR = sum(observed - predict)²
and predicted = green squiggle = blue + orange + b3
2. Use chain rule
3. (d SSR / d bias3) = (d SSR / d Predicted) x (d Predicted / b3)
4. = sum( -2x(observed - predicted) ) x 1
5. = (d SSR / d bias3) [Original goal]
6. corresponds to the slope
7. calculate step size… [gradient descent]
8. Find smallest value