Week2. Logistic regression as NN Flashcards
Logistic regression for binary classification
Linear regression Y = wT *x + b Linear regression doesn’t work well as values could be big. Sigmoid function from linear regression gives values between -1 and 1 .
Sigmoid function
G(z) = 1 / (1+e-z)
Why Cost function
We need to learn parameters p and WT given values X and Y of training exemples.
Logistic regression loss function
Better NOT to use square error in this case (1/2*(Y-y)2
Good function for logistic regression is:
F(Y,y)= - ( y * log Y + (1-y) * log (1-Y) )
Loss function for logistic regreassion
Is a loss function over all exemples
J(w,b) = - 1/m Sum(i=1,m)[y(i) * log Y(i) + (1-y(i) * log(1-Y(i) )]
Differencial calculus
Tries to find rate of change at each point on the curve of a function, or slope of the curve at each point.
Derivative - is a slope = height / width of a triangle.
Integral calculus
Finds area under the curve of a function untill X axis. If you draw rectangles you can add them up to get area and if you make them infinitely small - it will be an exact area under the curve.
Gradient decent
Updates for w:=w-alpha * dJ(w,b) / dw
and for b := b - alpha * dJ(w,b) / db
Derivative measures slope of the function.
In code we will use
dw = dJ(w,b) / dw
Gradient decent for logistic regreasion - one example
z=w1x1+w2x2+b
a= Q(Z)
Loss(a,y)
Gradient decent
dz = a-y
dw1 = x1*dz
Vectorization
Save a lot of time - SIMD - single instruction multiple data
Z = wT * x + b
Z=np.dot(w,x)+b
Vectorizing Logistic regresion - Forward propogation
For one example z1 = wTx1+b and a1=zigmoid(z1)
Vectorized for m exemples Z = np.dot(w.T,X) + b
Z, w b is broadcasted in to a vector
Dimention of vector X is (nx,m)
A=sigma(Z)
Vectorizing Logistic regression Gradient output
Regular dz(1) = a(1) - y(1) … dz(m)
dZ = [dz(1) …dz(m)] / m
Vector version
dz = A - Y
dw = 1/m X dZT
db = 1 / m*(np.sum(dz))
Logistic regression full vectorized
Z = wT X + b = np.dot(w.T,X) + b
A = G(Z)
dz = A - Y
dw = 1/m X dzT
db= 1/m np.sum(dz)
Practice: Common steps for pre-processing a new dataset are:
Figure out the dimensions and shapes of the problem (m_train, m_test, num_px, …)
Reshape the datasets such that each example is now a vector of size (num_px * num_px * 3, 1)
“Standardize” the data