Regression Flashcards
KNN Classification - training and predicting
Training: store all the data
Prediction:
1. calculate the distance from x to all points in your dataset
2. sort the points in your dataset by increasing distance from x
3. predict the majority label of the k closest point
Distance
- euclidean distance, manhattan distance, cosine distance = 1 - cosine similarity
KNN Regression
We take the average of the k nearest items
KNN Hyperparameters
K
- how many amount of neighbors we’re using. General rule: k=sqrt(n). Then do your grid search from here.
KNN - noise vs signal
lower k tends to overfit
higher k tends to underfit - capture less noise, and low signal
standardization
use standardization when your data has varying scales
(data point - mean) / standard deviation
put each scale to mean = 0
KNN (pros, cons)
pros:
- super simple
- training is trivial
- easy o add more data
- few hyperparameter
cons
- if you have a lot of features, you need a lot more data, but can be costly to gather more data
- high prediction cost
- bad with high dimensions. Anything more than 5 is bad.
- categorical features don’t work well
Mean Squared Error (MSE)
expected value of the square of the error
MSE = 1/n * E( predicted - actual)**2
- the average squared difference between the estimated values and the actual values
- mean of all the square errors in your model and data
Irreducible error
Error that we can’t do anything about Even if we had all possible data and could build a perfect model, we can’t predict values exactly.
Bias and variance
- errors that we can control
bias = failing to capture some of the signal (underfit) variance = error we get when from real world data. Where are the errors coming from and How consistently we're off.
When capturing more signal, you’re naturally capture more noise and variance.
k fold cross validation
train test split - reserve data for the ultimate testing set.
then do k-fold training: training and validation set
churn
decision rules
Linear Regression - scatterplot
good practice to plot a scatter plot, if it appears a linear relationship, between dependent and independent variables, it is good hint that we can use linear regression as a learning algorithm.
Feature Engineering
anytime you use your current features to create new features.
Linear Regression with single feature
y = mx + b
Linear Regression - how to pick the best line
Residual = the distance between our predicted value and the actual value
Find the line that minimizes the total sum of squared residuals.