week 1: intro to ML Flashcards
What is machine learning?
The field of study that gives algorithms the ability to learn without being explicitly programmed
What are the main two types of supervised learning?
Regression and classification
What is the difference between supervised and unsupervised learning, and reinforcement learnign?
Supervised:
Training: algorithm learns a mapping between data and labels
Testing: algorithm predicts labels based on data
unsupervised:
finds hidden structure in unlabelled data
reinforcement:
how an agent should act in an environment to maximise reward
what does minimising the cost function look like in a regression line?
minimising the distance of datapoints from the regression line (residuals)
Mean squared error is the most commonly used method for this
How might we find the parameters (slope and intercept) of the regression line?
Linear algebra - describe the equation using linear algebra and solve for beta (however we can’t do this for more complex regression models)
Numerical optimisation - Try some value of the parameters. Calculate the loss function (mse) then try a new value hoping to converge on the smallest possible MSE
what does the gradient descent equation show?
At each step the existing parameters are updated to a new value
-alpha * derivative part of the equation ensures that each step is proportional to the learning rate (alpha) and the steepness of the slope (derivative)
what is the difference between a normal regression and supervised machine learning?
in supervised machine learning, the data is split into training and testing
The model fitted in the training is used to predict in the testing
how does the error change as a function of the complexity of the model, in the training vs. testing sets?
In the training set, the error will decrease as a function of the complexity of the model, until it hits zero
In the testing set, the error will be larger than in the training set, because the fit was optimised relative to the training set. It will usually decrease as a function of complexity and then at some point it will drastically increase, as the model has been overfitted to the training data so does not generalise to the test set.
If lots of models have the same MSE error, which should you pick?
The simplest one, on the principle of parsimony
what is k-fold cross validation?
You split your training data into K-folds
You then would run training and evaluation K -1 times (with the remaining set being the validation set)
Each time, the portion of the data that is used as testing vs. validation changes.
This ensures that model training and validation isn’t dependent on splitting the train and val data in a particular way.
Then finally you’d evaluate on the held out test set