Regression Flashcards
What is the general idea of regression?
Data: 2 quantitative variables
Analyze whether there is a relationship between those variables
Check if:
- Positive→move in the same direction
- Negative→move into opposite direction
- No relationship→do not influence each other
Analysis of regression
1) Direction – Positive – Negative 2) Form – Linear – Curve 3) Strength – Extent of scatter
When to use regression?
- Regression line predicts Y value for an X value
* Extrapolation: predicting far outside the X range of our data (should be avoided)
For what is the regression line?
Dependent/Response/Explained variable
• on the y-axis
• explained by the variable on the x-axis
Independent/Predictor/Explanatory variable
• on the x-axis
• used to explain the variable on the y-axis
Formulas
Regression equation: y^ = 𝑏 0 + 𝑏 1 𝑥 * Slope: tilt of regression line 𝑏1 = 𝑟x( 𝑆𝑦 /𝑆𝑥) Intercept: Value of Y when X=0 𝑏0 = 𝑌−𝑏1𝑋
What is the least squares regression line?
Line with least errors/that fits best: y^ Minimized distances of the observed y-values from the regression line
Need to know about intercept and slope
• Always goes through Point (𝑥ҧ, 𝑦ത) • Change of 1 standard deviation in X corresponds to change of r standard deviations in Y • Correlation is 0, if slope of line is 0
Properties of the least squares regression line
𝑏0: Intercept: Starting point of the line
- When independent variable is equal to 0
𝑏1: Slope of the line
- If we increase our independent variable by 1
unit, the dependent variable will go up by b1
The model selects b0 and b1 in such a way that the sum of the squared residuals* (from the sample) are minimized
What is the SSE (error)?
Difference between the observation (y) and the fitted line (𝑦ො)
→ Part we cannot explain “Error = e”
𝑒 = 𝑦 − y^
Formula and graph success formula
What is the SSR (Regression)?
Difference between the line (𝑦^) and the mean (𝑦ത) → Difference we can explain