Lecture 5 - Linear Regression Flashcards
Regression Analysis
Fit a relationship between a numerical outcome variable and a set of predictors
Variables
- Numerical outcome variable Y also called response, target, or dependent variable
- Set of predictors X1, X2, …, Xn also referred to as independent variables, input variables, regressors, or covariates
Linear Regression Model
Arranged in, or extending, along a straight or nearly straight line
Single vs Multiple Linear Regression Model
Single: One independent prediction, i.e., single variable X
Multiple: two or more predictors, i.e., X1, X2, …
Intuition for Single Linear Regression
Intuition for Multiple Linear Regression
Linear Regression Model
Graphical visualisation of Linear Regression
Ordinary Least Squares (OLS)
- Method for estimating the unknown parameters in a linear regression model
- It minimises the errors associated with predicting values for the dependent variable Y
- It uses a least squares criterion because without square we would allow positive and negative deviations from the model to cancel each other out
Ordinary Least Square (OLS) pt2
Objectives for single/multiple regression
-
Predictive - detect the outcome value for new records given their input values
- Explanatory (or descriptive) - Quantifying / explaining the avg effect of inputs on an outcome & Data are treated as a random sample from a larger population of interest
Explanatory objective - in single/multiple regression
Generate statements useful for decision making
E.g. a unit increase in X is associated with an average increase of 2 points in Y
Predictive objective - in single/multiple regression
Given an ew value for X, estimate the value for Y
Overfitting
- *Issue:**
- The outcome correspond exactly, or is extremely close, to the given data set
- I.e., the model learns the existing data too well
- *Consequences:**
- Model fails to include additional data or
- Generates unreliable predictions
- *Example:**
- Considering creating a model for predicting grades of students given the hours they study
- Using the data from UVT, applying it to TU/e data fails due to overfitting
Underfitting
- *Issue:**
- Model can’t accurately capture the data dependencies
- Fails to identify effects supported by the data
- Usually this happens due to the model’s simplicity
- *Consequences:**
- Model has bad generalisation capabilities when applied with new data