Linear Regression Flashcards
Types of regression
Simple linear regression
Multiple linear regression
Ridge regression
Logistic regression
What type of model is linear regression
Linear regression is one of the regressive prediction model
type of output of linear regression model
This model gives the output in a continuous value format (-∞ to ∞)
Linear regression model is supervised/ unsupervised learning model
supervised learning
Linear regression model is used to ______
Linear regression model is used to calculate the unknown value based on the known value
Regression definition
general equation
Regression is statistical technique used to model the relationship b/w the dependent variable and independent variable (one or more)
y = f(x, θ)
here θ denotes the set of parameters of the models i.e. m₁, m₂…mₙ, c
Types of linear regression
Simple linear regression
Multiple linear regression
Simple linear regression (definition)
This regression shows the relationship between the dependent variable and one independent variable.
What is used to make prediction in simple linear regression
Straight line equation is used as best fit line to make prediction.
equation used in simple linear regression
Multiple linear regression (definition)
This regression model shows the relationship between the dependent variable and 2 or more independent variables.
What is used to make prediction in multiple linear regression
predicted best fit line
Goal of linear regression
Goal of a linear regression is to find out the best fit line which minimises the error between the predicted output and actual output based on the historical data (training data set).
Another name of error (2)
Cost function or residual
Assumptions in linear regression
1.) Linearity
2.) Homoscedasticity
3.) No multicolinearity
Linearity
Data points are represented in the scatter plot in a linear order.
Homoscedasticity (and also opposite of Homoscedasticity)
Homoscedasticity: Distance between the data points in the scatter plot is less
No multicolinearity
(Also explain multicollinearity)
All the i/p variable or attributes are independent in the data set.
Multicolinearity means interdependency between the i/p variables.
Ex: x1 ∝ x2 (directly proportional)
Here no need to train the model with x1 and x2 attributes because Both are dependent. So drop any one attribute to make the independent attributes in the data set.
How we know this as best fit line
We use the prefromance matrics to calculate the error.
If the error is low then fix the line as best fit line
Slop of best fit line
slope(m) = Δy / Δx
Graph of best fit line
What is performance metrics
A system or standard of measurement
Types of performance metrics
1.) R² measure
2.) Adjusted R² measure
3.) Mean square error (MSE)
4.) Root mean square error (RMSE)
5.) Mean absolute error (MAE)
6.) Mean absolute percentage error (MAPE)
R² measure formula
R² range
[0 to 1]
Two special points about R²
If R² value is 1 means no error (error is zero) in the model. 100% accuracy.
If R² value is close to zero then more errors are present in model. Error is high. Re-training required.
Adjusted R² measure
SSE
sum of square error
MSE
MSE : Mean square Error
RMSE
RMSE: Root mean square error
RMSE = sqrt(MSE)
MAE
MAE: Mean absolute error
MAPE
MAPE: Mean absolute percentage error
MAPE = MAE * 100
How to calculate m and c in straight line equation which is best prediction line of simple linear regression
2 methods are used to calculate m and c
1.) Straight line method
2.) Ordinary least square method
regression coefficient
m (slope)
Straight line method
Compare straight line method and ordinary least square method
In the straight line method, we considered only the first and last sample to calculate the slope. Therefore most is food for 1st and last samples only and bad for the rest of the samples.
So, alternative required i.e.ordinary least square method. This take the mean value of all the data points to give the best prediction.
m and c in terms of mean
ȳ = m.x̄ - c
Ordinary least square method
m = covariance(x,y)/var(x)
c = ȳ - m.x̄
Which equation uses multiple linear regression
B = (XᵀX)⁻¹XᵀY
How to find multiple linear regression
(Explain whole method)