Simple Linear Regression Flashcards
what is correlation?
- looking at how two variables are related to each other
-> we arenāt making predictions from one to the other
-> relationship is symmetrical
what is regression?
- trying to predict one variable from another using the model
-> predict criterion variables from the predicting variable
-> relationship is asymmetrical
-> assuming one (the predictor) precedes the other (outcome)
what is the whole idea of a regression?
predict outcome (dependent / criterion variable) from a predictor (independent) variables
whatās an example of a regression question?
How can you predict university success from school results?
* Tariff score and Honours Classification
How can we predict regression?
Y = b0 + b1
b0
intercept
-> where our line crosses the y axis - itās constant
b1
āgradient/slopeā
how does the āslopeā work?
the gradient of the line has been fitted to the data
* for every unit X goes up
* Y goes up (or down) in line with the gradient
i.e. for every unit of X that does up, Y goes up 0.5 of a unit [thatās the perfect prediction]
X = 2. What is Y?
Y = 0 + 0.5 (2)
Y is 1
If b0 = 3.75 and b1 (slope) = .469. An individual scores 7 on their maths test. What is Y?
Y = 3.75 + .469(7)
Y = 7.03
what is the issue with Y though?
fit of our line is not perfect, yet weāre interested in being able to quantify the gap
b0 = 11.35 and b1 = -0.722. What is the equation?
Y = 11.35 + -0.722(x)
what is the regression outcome?
statistics we look at to predict how good our predictor is at predicting our outcome variable
What is the technique about making decisions about the data?
aim us to ensure the line of best fit produces a small residual
* not always a good fit but itās the best fit -> we can measure how good a fit is is and estimate how good our regression is (how good is our equation at predicting the outcome -> knowing the predictor)
* and ifās itās significant
There are two outcomes
What are these two outcomes?
R^2: how good the model / regression is (predicting) [trying to test the null hypothesis that r = 0]
F ratio: is it significant or not [trying to say there is no predictive relationship / variation]
what are the questions we are asking ourselves?
- The general question we are asking how good is our model at predicting the actual data (Y, the dependent measure, the criterion variable)?
- The technical question is how much of the variance in the Y data set can we predict/account for using our model?
- Outcome of the analysis is what proportion of the variation in the data set can we predict using our model
what can we use to calculate this proportion?
- model
- data the model produces (the predicted Y score)
- the actual data (observed/actual Y scores)
There are different types of variation, what is this called?
the residual -> differences between the observed and predicted Y scores
* actual Y score minus the predicted Y score using the equation and X value
* squared to stop them cancelling each other out
* the gap between the actual and the predicted
the gap before the actual score and the predicted score, what does this tell us?
The weaker the prediction, the greater the residual variance
* the bigger the gap between the actual scores and the scores that our model predicts
if the gap is small?
youāve got a good prediction
if the gap is large?
you donāt have a good prediction
what is variation not predicted by?
the model/equation/regression
what does the residual tell us?
the difference between the score predicted by the equation and the score we actually have
How do we calculate the SSResidual
Y = score for each participant
Ŷ = score for each participant calculated by the equation (predicted Y)
Ŷ- Y = score for each participant calculated by the equation minus score for each participant
(Ŷ - Y)^2 = score for each participant calculated by the equation minus score for each participant squared
The Equation: ā(Ŷ- Y)^2