Simple Linear Regression Flashcards

1
Q

what is correlation?

A
  • looking at how two variables are related to each other
    -> we aren’t making predictions from one to the other
    -> relationship is symmetrical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is regression?

A
  • trying to predict one variable from another using the model
    -> predict criterion variables from the predicting variable
    -> relationship is asymmetrical
    -> assuming one (the predictor) precedes the other (outcome)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the whole idea of a regression?

A

predict outcome (dependent / criterion variable) from a predictor (independent) variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what’s an example of a regression question?

A

How can you predict university success from school results?
* Tariff score and Honours Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can we predict regression?

A

Y = b0 + b1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

b0

A

intercept
-> where our line crosses the y axis - it’s constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

b1

A

‘gradient/slope’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how does the ‘slope’ work?

A

the gradient of the line has been fitted to the data
* for every unit X goes up
* Y goes up (or down) in line with the gradient

i.e. for every unit of X that does up, Y goes up 0.5 of a unit [that’s the perfect prediction]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

X = 2. What is Y?

A

Y = 0 + 0.5 (2)
Y is 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If b0 = 3.75 and b1 (slope) = .469. An individual scores 7 on their maths test. What is Y?

A

Y = 3.75 + .469(7)
Y = 7.03

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the issue with Y though?

A

fit of our line is not perfect, yet we’re interested in being able to quantify the gap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

b0 = 11.35 and b1 = -0.722. What is the equation?

A

Y = 11.35 + -0.722(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the regression outcome?

A

statistics we look at to predict how good our predictor is at predicting our outcome variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the technique about making decisions about the data?

A

aim us to ensure the line of best fit produces a small residual
* not always a good fit but it’s the best fit -> we can measure how good a fit is is and estimate how good our regression is (how good is our equation at predicting the outcome -> knowing the predictor)
* and if’s it’s significant

There are two outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are these two outcomes?

A

R^2: how good the model / regression is (predicting) [trying to test the null hypothesis that r = 0]
F ratio: is it significant or not [trying to say there is no predictive relationship / variation]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the questions we are asking ourselves?

A
  • The general question we are asking how good is our model at predicting the actual data (Y, the dependent measure, the criterion variable)?
  • The technical question is how much of the variance in the Y data set can we predict/account for using our model?
  • Outcome of the analysis is what proportion of the variation in the data set can we predict using our model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what can we use to calculate this proportion?

A
  • model
  • data the model produces (the predicted Y score)
  • the actual data (observed/actual Y scores)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

There are different types of variation, what is this called?

A

the residual -> differences between the observed and predicted Y scores
* actual Y score minus the predicted Y score using the equation and X value
* squared to stop them cancelling each other out
* the gap between the actual and the predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

the gap before the actual score and the predicted score, what does this tell us?

A

The weaker the prediction, the greater the residual variance
* the bigger the gap between the actual scores and the scores that our model predicts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

if the gap is small?

A

you’ve got a good prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

if the gap is large?

A

you don’t have a good prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is variation not predicted by?

A

the model/equation/regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what does the residual tell us?

A

the difference between the score predicted by the equation and the score we actually have

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do we calculate the SSResidual

A

Y = score for each participant
Ŷ = score for each participant calculated by the equation (predicted Y)
Ŷ- Y = score for each participant calculated by the equation minus score for each participant
(Ŷ - Y)^2 = score for each participant calculated by the equation minus score for each participant squared

The Equation: ∑(Ŷ- Y)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
what is the total variance?
* the total variance of Y scores in the data set All the variation that there is to explain
26
how to calculate SS total?
sum of (Y-M)^2 * each data point minus the mean for all Y data points
27
So.. How do we figure out the variance?
Sum of Squares of the Residual (SS residual): an estimate of the amount of variation that is not predicted by our regression in our sample (gap between the actual and the predicted) Total Sum of Squares (SS total): an estimate of all the variation in the sample
28
What do we need to find out?
an estimate of how of the variation is actually predicted by our model
29
How do we find an estimate of how of the variation is actually predicted by our model?
Take away the SS residual from the SS total -> sum of squares model
30
what is SSm (Sum of Square of the Model) / SS reg (Sum of square of the regression)
an estimate of the amount of variance explained by the repression or the model
31
How can we calculate SSreg directly
take the mean of the actual Y score away from the predicted Y -> gives you the variance explained by the regression equation or model
32
what SS total?
an estimate of all the variance in the data set
33
what is SSm or SSreg
estimate of the variance accounted for by the model/regression (gives us a idea of the variation explained)
34
What is SSreg/m affected by?
sample size and amount of total variation in the sample * you can't compare it from different studies and samples as different sample sizes etc produce different estimates -> yet very useful if we want to generalise and compare results instead we need a standardised measure of the total proportion of the variation explained by the regression
35
what is the standardised measure of the total proportion of the variation explained by the regression?
R^2
36
R^2
Proportion of the variance predicted by the regression equation * SSreg divided by SStotal * Better 1 and 0 -> larger the better * can be expressed as a percentage i.e. 80% of the variance is explained by the model/regression
37
Sstotal
an estimate of all the variance in the data set
38
Ssres
A measure of the amount of variance not explained but our regression
39
SSreg or SSm
-> an estimate of the variance accounted for by the model / regression Take SStotal from SSres and that leaves us with the amount of variance explained by our equation
40
R^2
Standardise this by dividing SSreg by SStotal -> what proportion of the total variation is explained by the regression/model
41
what is the F ratio?
ratio between variance that is predicted and the variance that is not predicted (error) * a way to see whether a significant amount of the variance is explained If F ratio is high -> this means the effect is strong; there is lots of variance explained in relation to the variance that is not explained (we should get a significant result)
42
How do we calculate the F ratio?
'mean square error' * SS divided by degrees of freedom
43
what is F ratio?
the ratio between mean squared error -> SS / Df The degrees of freedom for the regression model is simply the number of predictors SS reg / m divided by the number of predictors in the model/regression ('k')
44
how many predictors are there in the linear regression?
1
45
SS res divided by N minus the number of parameters in our model. What else is there?
* These are the intercept and the predictors * There is always one intercept and “k” number of predictors
46
how many degrees of freedom is the F ratio reported with?
2 -> for each of the mean squared errors (df Msreg/m , df of Msres)
47
What does it mean if the F value (found in the F table) is large and the p value is significant
it's predicting a significant amount of the variance -> a lot of variance too
48
what does the p-value mean?
tells us that the result is significant -> allows us to make decisions about the null hypothesis
49
If p < 0.05
can reject the null hypothesis
50
in regression, the null hypothesis means
the variance explained by the model is 0
51
in t-tests, the null hypothesis means
there is no difference between the two means (or that the data comes from the same population)
52
F
The ratio of the Mean square model (or ‘regression’) error to the mean square residual error.
53
Big F ->
little p values
54
where is the R-squared?
in the module summary, sometimes we report adjusted r squared next to it
55
what are the assumptions of a simple regression?
* variable type must be continuous (predictor can be continuous or discrete) * non-zero variance: predictors must not have zero variance * independence: all values of outcomes should come from a different person or item * linearity: the relationship we model is, in reality, linear (x and y is still important to see if there's a relationship) * homoscedasticity: for each value of predictors, the variance of the error term should be constant AND independence of errors: Plot ZRESID (y-axis) against ZPRED (x-axis) * Normally-distributed errors: the residual (score) must be normally distributed (should form a normal distribution - if they don’t then we have some problems with the data) ○ Do a normal probability plot or ‘save’ the residuals and then compute all the usual tests for normality
56
How to calculate F
ssreg/m divided by 'k' -> number of predictors) (1) SSres divided by N - K - 1 = 2 MSres = answer / 2 = 3 F = 1 / 3
57
Regression
way of predicting an outcome
58
SStotal
total sum of squares of the differences between data points and the mean of y (all the variance there is to explain/account for)
59
SSres
total sum of squares of the differences between the data points and the line of best fit (variation that is not explained by the model) (an estimate of the variance that is not accounted for by the model/regression)
60
SSmodel/regression
difference between SStotal and SSres -> variation explained by the model
61
R^2
SSmodel/regression / SStotal -> proportion of variance explained by the model)