Lecture #9 (Regression) Flashcards
What is linear regression?
Linear regression is finding the line of best fit in a scatter plot
What formula underpins linear regression?
y = a + bx
What does each variable in the linear regression represent?
y = dependent variable x = independent variable, b = slope of the line a = intercept with the y axis and line
How do we determine the centroid?
The centroid is defined by the mean of the x value and the y value
What is the centroid?
The centroid is the middle of the linear regression line
How is the slope determined?
The slope is determined by the sum of the square of the
distances between each point and the line is minimized
What is the line of best fit called?
Least squares regression line (LSRL)
What are the steps to determine the Least Squares Regression Line (LSRL)?
1) for each (x,y) point calculate x² and xy
2) sum all x, y, x2 and xy, which gives us Σx, Σy, Σx² and Σxy
3) Calculate slope B
4) Calculate intercept A
5) assemble the equation of the line: y = a + bx
How do you calculate slope B?
b = (𝑛 (∑𝑥𝑦) −(∑𝑥∑𝑦)) / (𝑛∑ (𝑥²) −∑(𝑥)²)
How do you calculate intercept A?
a = (∑𝑦−𝑏∑𝑥) / 𝑛
What are residuals?
residuals are the difference between the value observed and the value expected by the model (error)
What does a larger sum of residuals mean?
A less fit model
How should residuals be distributed?
Normally
Homoscedasticity is
Having the same scatter - points are approx. the same distance from the line
Heteroscedasticity
Having a different scatter - points are widely varying
distances from the regression line