! Supervised Learning: Regression Flashcards
1
Q
Regression
A
- finding & predicting relationship between indepent & continuous output/dependent variables
- e.g. predict price of car given set of features (age, brand)
- supervised ML
2
Q
Regression - Tasks
A
- Linear Regression
- Neural Networks
3
Q
Linear Regression - Goal
A
- learn a linear model that we can use to predict a new y given a previously unseen x with as little error as possible
- parametic
4
Q
Linear Regression - Method
A
- Linear equation: y’ = ß0 + ß1x1 + … + ßixi
- estimating coefficient(s) (weights wi, bias b) so that
- predicts continous variable (y’) based on other variable(s) (xi) in best way
- Finding straight line minimizing distance between predicted & actual output
5
Q
Linear Regression - Steps
A
- Define cost / loss function: measures inaccuracy of models prediction
- Find parameters minimizing loss function: make model as accurate as possible -> Gradient Descent
6
Q
Linear Regression - Steps - Gradient Descent
A
- method to find minimum of model’s (y’ = ß0 + ß1x1 + … + ßixi) loss function by iteratively process
- Function f(ß0, ß1) = z <- focus on ß1 & ß0 (other variables of cost f given)
- Guess ß0 & ß1
- [dz/dβ0, dz/dβ1] <- Get partial derivatives of loss function with respect to each beta (= how much total loss is increased or decreased if increase ß0 or ß1 by very small amount)
- Adjust ß0 & ß1 accordingly: if partial derivative < 0 -> increase ß1; > 0 -> decrease ß2
- Repeat 3 & 4 until partial derivative = 0
7
Q
Linear Regression - Steps - define cost function
A
- a) Take average of
- b) Square (-> no negative numbers penalizing large differences)
- c) the sum of all differences between each data point (yi) & model predictor (ß1xi + ßo)
8
Q
Linear Regression - Solution over- & underfitting
A
- Use more training data
- Use regularization = penalty to loss function when model assigns too much power to 1 feature or to too many features (c + Lambda (sum of ßi^2) = hyperparameter -> higher: more penalty)
9
Q
Regression - Types
A
simple = 1 independent variable <- > Multiple > 1 independent variable
10
Q
Linear Regression - Assumptions / Dissdvantages
A
- variables = quantitative (Categorical -> binary); measure at continous level
- No significant outliers
- Residuals (errors) of best-fit regression line = normal distribution
- relationship between dependent & independent variables = linear
- all observations = independent
11
Q
Supervised Learning Goal
A
- find relation btw input & output data based on already knows answers (labeled data)
- apply to predict outcomes of new variables
12
Q
Supervised application
A
image classification (dog or cat?), fraud detection, spam filtering
13
Q
Logistic Regression
A
- predict discrete class based on probability
- put linear regression f(x) in sigmoid function P(Y=1) = 1 / (1+e^-lf(x)) -> result = probability btw 0 & 1
- threshold: calss decision based on tolerance false + / -
14
Q
Logistic Regression - Logg-odds-ratio
A
- gonna die ln(p to (1-p) ) log-ods
15
Q
Logistic Regression - Con
A
- risk of overfitting if large number of independent variables
- need sample of even categories