Linear Regression Flashcards
What is gradient descent?
an algorithm that tweaks parameters iteratively (individually) in order to minimize a cost function
What is batch gradient descent?
instead of computing gradients individually, it computes them all in one go by using the whole training set at each iteration
what is the downsides to batch gradient descent?
because it uses the whole set at each step, it is very slow on large sets
what does a higher learning rate mean with gradient descent?
fails to find a good solution
what does a lower learning rate mean with gradient descent?
takes longer to compute
how does gradient descent perform with features with different scales?
it takes longer to reach the minimum, making the algorithm slower
how does gradient descent perform with features with same scales?
it goes directly to the minimum without jumping around, making the algorithm faster
what algorithm is better to use with Linear Regression out of -> Gradient Descent or Normal Equation when you have a larger dataset? Why?
Gradient Descent because it is faster
out of gradient descent and normal equation, which is faster, why?
Gradient descent, because it handles instances one at a time.
what is a cost function?
?
what is stochastic gradient descent?
as opposed to batch-gd which uses the whole set at each step, sgd picks a random instance and handles them one at a time, making it much faster and better for bigger sets
which gd algorithm is better for large sets?
sgd because it handles instances one at a time, instead of using the whole set at each step like bgd
what happens when you reduce the sgd’s learning rate slowly?
jumps around for ages
what happens when you reduce the sgd’s learning rate quickly?
get stuck in local minimum or frozen
what is mini-batch gradient descent?
computes gradient on small random set (both sgd and bgd)