Week 1 - Welcome to Machine learning Flashcards
What is machine learning?
The science of getting computers to learn, without being explicity programmed.
We often don’t know how to program things from the start so instead we have programs learn through experience, much the same way we do.
Example: spam filiters, web searching, netflix reccomendations, amazon reccomendations
Tom Mitchell provides a more modern definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Example: playing checkers.
E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game.
What is supervised learning?
The “right answers” are given
What is a regression problem?
When you are trying to predict a continuous valued output
What is a classification problem?
When you are trying to predict a discrete valued output
What is unsupervised learning?
Ask an algorithm to find structure in data. There is no feedback based on prediction results.
What is a cost function?
What is a loss function?
The technique of evaluating the performance of our algorithm/model.
It takes both predicted outputs by the model and actual outputs and calculates how much wrong the model was in its prediction. It outputs a higher number if our predictions differ a lot from the actual values. As we tune our model to improve the predictions, the cost function acts as an indicator of how the model has improved. This is essentially an optimization problem. The optimization strategies always aim at “minimizing the cost function”.
Loss function: Used when we refer to the error for a single training example.
Cost function: Used to refer to an average of the loss functions over an entire training dataset.
What is the cost function for linear regression?
The formula for calculating mean squared error, essentially the way you assess model fit
MSE = (sum of squared errors)/n
What is gradient descent?
an iterative first-order optimisation algorithm used to find a local minimum/maximum of a given function, it repeats until convergence
You simultaneously update parameters in this algorithm
theta = theta - alpha * derivative of the cost function
What is the difference between
a := b
a = b
?
a := b is using the assignment operator; this means to replace a with the value of b
a = b is a truth assertion
What is a derivative in calculus?
How does a derivative play into the gradient decent algorithm?
The definition of the derivative is the slope of a line that lies tangent to the curve at the specific point
Gradient descent incorporates the derivative (slope of the line) of the cost function when trying to minimize theta (coefficients). When the derivative is zero, theta is equal to theta and thus gradient descent has found a local minimum which is what you want
What is the concern about setting the proper alpha value when optimizing gradient descent?
If alpha is too small, it will take a long time for the algorithm to converge
If alpha is too large, the algorithm may struggle to converge or even diverge as it keeps leaping past the local minimum
What does the term “batch” refer to when using gradient descent?
Each step of gradient descent uses the entire training set, the whole “batch”
Do you have to constantly make alpha smaller in order to converge with gradient descent?
No; the derivatives will become smaller and thus alpha will be multiplied by a smaller number as the algorithm coverges on a local minima. Thus the value will get smaller as you approach convergence, even if alpha is fixed.