Optimisation and gradient descent algorithm Flashcards
explain simply how a machine learning model works?
predict—->calculate error—–>learn—–>predict
this is called an algorithm
what is an algorithm?
algorithm is a set of mathematical instruction for solving a problem
it is basically a word used by programmers when they don’t want to explain what they did
where did the term algorithm originate from?
Muhammad ibn Musa Al-Khwarizmi a ninth-century Persian mathematician who wrote a popular mathematics book of that time, when that book was translated to latin the translators where confused with his name and termed it algorithm
what is cost function in machine learining?
-a cost function is an important parameter in deciding how well a machine learning model fits into the dataset
-it the sum of squares of the difference between actual and fitted values
-we need a function that can find when the model is most accurate whole the way between undertrained and overtrained model
-by minimizing the value of cost function we will get an optimal solution
-Cost function is a measure of how wrong the model is in estimating the relationship between X(input) and Y(output) Parameter
-also called as loss function, error function etc..
what is Latex markdown?
it is a syntax to write down mathematical expressions
what is linspace in numpy?
it generates an array of linearly spaced numbers between a and b
what is subplot and how to implement it in matplotlib?
used to display two figures side by side
explain about cost function implementation in python
-represent function
-represent derivative of function
-at the minimum f(x) the slope of function will be zero which is found from the derivative plot
explain briefly about gradient descent algorithm?
Gradient Descent is an optimization algorithm used for minimizing the cost function in various machine learning algorithms.
visualize 3d model of cost function
downward convex
Implement an optimization algorithm in python
Gradient Descent
new_x = 3
previous_x = 0
step_multiplier = 0.1
precision = 0.00001
x_list = [new_x]
slope_list = [df(new_x)]
for n in range(500):
previous_x = new_x
gradient = df(previous_x)
new_x = previous_x - step_multiplier * gradient
step_size = abs(new_x - previous_x) # print(step_size) x_list.append(new_x) slope_list.append(df(new_x)) if step_size < precision: print('Loop ran this many times:', n) break
print(‘Local minimum occurs at:’, new_x)
print(‘Slope or df(x) value at this point is:’, df(new_x))
print(‘f(x) value or cost at this point is:’, f(new_x))
scatter function can plot with list of x true or false?
False
scatter function cannot plot list it can only plot arrays therefore we will have to convert list to array using numpy
what happens with gradient descent when there is a maxima ,local minima and global minima?
the gradient descent depends on the initial guess
if the initial guess is near the local minima then the algorithm won’t converge to global minima thereby giving us the wrong output
implement gradient descent by calling a function?
def gradient_descent(df,initial_guess,step_multiplier=0.02,precision=0.001):
new_x = initial_guess x_list = [new_x] slope_list = [df(new_x)] for n in range(500): previous_x = new_x gradient = df(previous_x) new_x = previous_x - step_multiplier * gradient step_size = abs(new_x - previous_x) x_list.append(new_x) slope_list.append(df(new_x)) if step_size < precision: print('Loop ran this many times:', n) break return new_x,x_list,slope_list
localmin,x_list,slope_list=gradient_descent(df,0,0.02,0.001)
print(localmin)
what is the difference between stochastic and batch gradient descent?
stochastic descent has the feature of randomness
it can deal with random initial guesses thereby trying to predict the correct minima better that batch
what is divergence and overflow in gradient descent how it occurs and how can you solve it ?
overflow the result is too large for the system to handle
it can be solved by limiting the number of iterations
what is sys module in python?
system module gives various information about python runtime environment
like max floating number python can deal with
what is tuple packing and tuple unpacking?
packing- breakfast=”bacon”,”beans”,”avacado”
unpacking- “x”,”y”,”z”=breakfast
what is learning rate in gradient descent algorithm?
learning rate decides how fast the algorithm can converge to the minimal point
if the learning rate is small then it will take more time to converge
if the learning rate is large then it might diverge and never converge to the minima
in our example learning rate can be changed by changing the multipler
Bold driver learning rate mechanism?
if your cost fn has reduced since the last iteration then increase learning rate by 5%
if your cost fn has increased since the last iteration (algorithm crossed minimal point) then go back to the last iteration and reduce the learning rate by 50%
how can you create a 3d model of cost function in python? what is cmap and how it is implemented?
TODO generat 3d plot
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm
fig=plt.figure(figsize=(16,12))
ax=fig.gca(projection=”3d”)
ax.set_xlabel(“x”,fontsize=20)
ax.set_ylabel(“y”,fontsize=20)
ax.set_zlabel(“f(x,y)-cost”,fontsize=20)
#gca-get current axes
ax.plot_surface(x4,y4,f(x4,y4),cmap=cm.coolwarm,alpha=0.4)
plt.show()
what is a bug?
an unintended behaviour or defect in a program that causes it to crash or malfunction
How do you find partial derivative of a function in python? what does symbols do ?
from sympy import symbols,diff
a,b=symbols(“x,y”) - it recognises x,y as a,b (now we can print function by calling f(a,b))
f(a,b)
diff(f(a,b),a)-find partial diff of f(x) w.r.t a
f(a,b).evalf(subs={a:1.8,b:1.0})-evaluate f(1.8,1.0)
diff(f(a,b).evalf(subs={a:1.8,b:1.0}))
implement batch gradient descent for multivariable cost function?
TODO Batch gradient descent with python
in case of multivariable function we have two differentials w.r.t both x and y both of them has to be considered for finding the minimal point
multiplier=0.1
max_iter=200
params=np.array([1.8,1.0])#initial guess
for i in range(max_iter):
gradient_x=diff(f(a,b),a).evalf(subs={a:params[0],b:params[1]})
gradient_y=diff(f(a,b),b).evalf(subs={a:params[0],b:params[1]})
gradinets=np.array([gradient_x,gradient_y])
params=params-multiplier*gradinets
print(params[0],params[1])
print(‘cost is’,f(params[0],params[1]))
what is the drawback of sympy module ?
computational time is higher as it have to differentiate the function every time it is run so we can write partial derivative as a function to reduce the time required
what type of datastructure can be used to plot 3d function?how to create that datastructure?
2d array
kirk = np.array([[‘Captain’, ‘Guitar’]])
print(kirk.shape)
hs_band = np.array([[‘Black Thought’, ‘MC’], [‘Questlove’, ‘Drums’]])
print(hs_band.shape)
print(‘hs_band[0] :’, hs_band[0])
print(‘hs_band[0][1] :’, hs_band[1][0])
or you can use reshape function
How do you append data to a 2d array?what is axis?
kirk = np.array([[‘Captain’, ‘Guitar’]])
print(kirk.shape)
hs_band = np.array([[‘Black Thought’, ‘MC’], [‘Questlove’, ‘Drums’]])
print(hs_band.shape)
print(‘hs_band[0] :’, hs_band[0])
print(‘hs_band[0][1] :’, hs_band[1][0])
the_roots = np.append(arr=hs_band, values=kirk, axis=0)
print(the_roots)
axis defines the way by which you want to add the data either by column or by row
if you want to add the data by row then the column number must match
if you want to add the data by column then the row
number must match
i.e dimensions should match
you can do this by reshaping the array
how do you access a particular row or column in a 2d array?
print(‘Printing nicknames…’, the_roots[:, 0])
: selects all the rows
0-prints first column
explain ways in which you can add elements to a 2d array?
values_array = np.append(values_array, params.reshape(1, 2), axis=0)
values_array = np.concatenate((values_array, params.reshape(1, 2)), axis=0)
what is the need for MSE when there is RSS?
when there are large number of datapoints RSS becomes very big and we might encounter overflow error but when we divide it with the number of datapoints it becomes easy to deal with
write a python code to return MSE without using a for loop when two arrays are passed as input
TODO define a function to
def MSE(pred,actu):
mse_calc=(1/len(pred))*sum((pred-actu)**2)
return mse_calc
mse=MSE(pred_v,actu_v)
print(mse)
where pred and actu are arrays
what is an array ? is tuple an array what about dictionary? what is the difference between array and dictionary?
array is a collection of same datatype in contiguous memory locations
tuple is an array if it have same datatype
dictionary is like an array but instead of index, keys are used to access these element
what is the difference between meshgrid and reshape function?
meshgrid adds more elements to the array by duplicating current element
Input : x = [0, 1, 2, 3, 4, 5]
y = [2, 3, 4, 5, 6, 7, 8]
Output :
x_1 = array([[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.]])
y_1 = array([[2., 2., 2., 2., 2., 2.],
[3., 3., 3., 3., 3., 3.],
[4., 4., 4., 4., 4., 4.],
[5., 5., 5., 5., 5., 5.],
[6., 6., 6., 6., 6., 6.],
[7., 7., 7., 7., 7., 7.],
[8., 8., 8., 8., 8., 8.]]
reshape cannot add new elements but can only reshape the order of the array
x=np.arange(12)
y=np.reshape(x, (4,3))
how do you access all elements of rows and column seperately using two for loops?
for i in range(no):
for j in range(no):
x=matrix[i][j]-access elements of row
y=matrix[j][i]-access elements of column
what does unravel_index do in numpy?
it helps to obtain a particular index(row and column index) from a matrix
ij_min=np.unravel_index(indices=plot_cost.argmin(),shape=plot_cost.shape)
find partial derivative of mean square error by substituting hypothesis equation?
we get two seperate equation for both partial derivatives
how the actual cost function and study cost function differs?
in actual cost function the variables are theta0 and theta1
in study cost function the variables are x and y
normally machine learning problems we have to find the optimal values of thetas by gradient descent algorithm