ML-01 - ML-01-Introduction and linear regression Flashcards
ML-01 - Introduction and linear regression
When did Arthur Samuel come up with his definition of machine learning?
The 1950s.
ML-01 - Introduction and linear regression
What as Arthur Samuel’s definition of machine learning?
“[…] the field of study that gives computers the ability to learn without being explicitly learned/programmed.”
ML-01 - Introduction and linear regression
How did Tom Mitchell define machine learning?
Machine learning is a field of study which enables a computer program
learn from experience 𝑬 with respect to some task 𝑻 in a well-posed problem
and some performance measure 𝑷, and improves the performance 𝑷 with experience 𝑬.
ML-01 - Introduction and linear regression
What are the 3 broad types of machine learning?
- Supervised learning
- Unsupervised learning
- Reinforcement learning
ML-01 - Introduction and linear regression
What are the two big types of supervised learning?
- Classification
- Regression
ML-01 - Introduction and linear regression
What are the two big types of unsupervised learning?
- Clustering
- Dimensionality reduction
ML-01 - Introduction and linear regression
Describe the difference between regression and classification.
Regression predicts continuous values, while classification predicts discrete categories.
ML-01 - Introduction and linear regression
What is Semi-supervised learning?
A type of ML approach where you have some labeled data, but lots of unlabeled data.
ML-01 - Introduction and linear regression
What is reinforcement learning?
Learning by interacting with the environment.
ML-01 - Introduction and linear regression
What are the 5 steps for a supervised learning workflow?
1) Get data
2) Clean, prepare, manipulate
3) Train the model
4) Test data
5) Improve
ML-01 - Introduction and linear regression
What are the two most common optimization methods?
- Iterative methods, like gradient descent.
- Non-iterative methods, like the least squares method.
ML-01 - Introduction and linear regression
Describe gradient descent.
Gradient descent works by following the gradient of a function to reach a minimum.
ML-01 - Introduction and linear regression
What is the formula for gradient descent?
(See image)
ML-01 - Introduction and linear regression
What are the 3 typical variants of gradient descent?
- (Batch) gradient descent
- Mini-batch gradient descent
- Stochastic gradient descent
ML-01 - Introduction and linear regression
Describe (batch) gradient descent.
use the entire training samples in each iteration (called epoch) of gradient descent.
ML-01 - Introduction and linear regression
Describe mini-batch gradient descent.
Instead of learning on all data per epoch, learning happens on subsets of the data. During each epoch, N / batch_size samples are selected without replacement. For each batch, the model is updated.
ML-01 - Introduction and linear regression
Describe stochastic gradient descent.
During each epochs, set the batch size to 1 and update on each training example.
ML-01 - Introduction and linear regression
How do you make sure you set the learning rate correctly?
Plot loss vs. the number of epochs and make sure the loss converges after some number of iterations.
ML-01 - Introduction and linear regression
What happens if the learning rate is too high?
Loss might not decrease on every iteration and the training won’t converge.
ML-01 - Introduction and linear regression
What happens if the learning rate is too low?
The learning takes a long time to converge.
ML-01 - Introduction and linear regression
Describe the training workflow steps.
(See image)
ML-01 - Introduction and linear regression
What is feature scaling?
A transformation of some data to minimize the effects of different scales.
Learning rates are sensitive to unnormalized data.
E.g. house prices are a lot higher than the number of square meters, and the corresponding coefficients might be disproportional.
ML-01 - Introduction and linear regression
Describe visually what happens in feature scaling.
(See image)
The unnormalized data makes the learning curve jump around and it doesn’t converge nicely.
ML-01 - Introduction and linear regression
What are the two most commonly used normalization methods?
- Min-max normalization
- Standardization (z-score)
ML-01 - Introduction and linear regression
What’s the formula for min-max normalization?
(See image)
ML-01 - Introduction and linear regression
What’s the range of data after applying min-max normalization?
Between 0 and 1.
ML-01 - Introduction and linear regression
What’s the formula for standardization normalization?
It’s not bounded and might have outliers.
ML-01 - Introduction and linear regression
What are some reasons to apply min-max normalization rather than standardization?
- Data doesn’t follow a normal distribution
- Data must be in a specific range
- Need to keep the original shape of the data, just scaled down
- Outliers are not a significant concern
ML-01 - Introduction and linear regression
What are some reasons to apply standardization rather than min-max normalization rather? (4)
- When the data follows a normal distribution
- When the data has (extreme) outliers
- When your algorithm assumes standardized data
- When the range of your data is unknown or changing over time
ML-01 - Introduction and linear regression
What is polynomial regression?
Using linear regression (i.e. weights for parameters) where you feature engineer the data through polynomial transformations.
(See image)
ML-01 - Introduction and linear regression
What’s the image an example of? (See image)
Polynomial regression.
ML-01 - Introduction and linear regression
What is the “normal equation”?
A non-iterative optimization method for solving the parameters for linear regression directly.
ML-01 - Introduction and linear regression
What is the formula for the “normal equation”?
(See image)
ML-01 - Introduction and linear regression
What’s the solution to the “normal equation”?
(See image)
ML-01 - Introduction and linear regression
What are the advantages of gradient descent over the normal equation?
- It works well when the number of features is large.
ML-01 - Introduction and linear regression
What are the advantages of the normal equation over gradient descent?
- No need to choose learning rate.
- Doesn’t iterate, solves the inverse (X^T*x)^-1.
ML-01 - Introduction and linear regression
What are the disadvantages of gradient descent over the normal equation?
- You need to choose the learning rate.
- GD might run for many iterations.
ML-01 - Introduction and linear regression
What are the disadvantages of the normal equation over gradient descent?
- Can be slow if the number of features is high; the time complexity of calculating matrix inverses is O(n^3).