Machine Learning Flashcards
List ML algorithm categories.
- Supervised learning
- Unsupervised learning
- Reinforcement learning
- Recommender systems
Examples of supervised learning
- Regression: Predicting continuous value output
- Classification (Logistic regression): Predicting discrete value output
Examples of unsupervised learning
- Clustering: Google news, computer cluster analysis, market segmentation, cocktail party problem, SN analysis
Hypothesis (model) and cost function for linear regression with a single variable
hθ(x) = θ0 + θ1*x
J(θ) = 1/(2m) * Σ{i=1~m} (hθ(x(i)) - y(i))2
- m: number of the training data.
How to find the parameter set for a linear regression problem?
Find a parameter set that minimizes the cost function, i.e.,
minθJ(θ)
One way of solving this optimization problem is the gradient descent algorithm.
Describe the gradient descent algorithm.
repeat until convergence {
for all j’s (simultaneously)
for i=1 to m {
θj := θj - a* (d/dθj J(θ))
}
}
}
a: learning parameter - Note that all ‘thetaj’ are updated simultaneously.
Discuss the learning rate of the gradient descent algorithm.
a: too small –> converges too slow a: too big –> might fail to converge or even diverge
Gradient descent algorithm for a linear regression with a single variable.
repeat until convergence {
for i=1 to m {
θ0 := θ0 - a* (h(x(i)-y(i))
θ1 := θ1 - a* (h(x(i)-y(i))*x(i)
}
}
* Note: This is a batch gradient descent.
What is “batch” gradient descent?
Each step of the gradient descent uses all the training samples.
Hypothesis and cost function of a linear regression with multi-variables.
hθ(X) = θT•X
- θT = [θ0, … , θn]
- XT = [1, x1, …, xn]
J(θ) = 1/(2m) * Σ{i=1~m} (hθ(X(i)) - y(i))2
- m: number of the training data.
Gradient descent of a linear regression with multi-variables.
repeat until convergence {
for all j in {0, 1, …, n}
for i=1 to m
θj := θ - a* (hθ(X(i))-y(i))
}
Feature scaling and GD
For GD to work well, features must have a similar scale. Mean normalization can be used. X := (X - mu)/S - mu: mean vector - S: std or (max-min)
How do you make sure GD is working?
Plot the J(θ) as the number of iteration and see if it decreases at each iteration.
How to extend a linear regression to Polynomial regression for non-linear function?
Create new features from the existing ones.
For example,
x1 = x1
x2 = x12
x3 = x13
Then, solve the new feature sets using the linear regression technique.
Normal equation for linear regression.
θ=(XT•X)-1•XT•y