correlation and regression (w7) Flashcards
when may data be a mistake
perfect straight line, one or more datapoints a long way away from others, no relationship at all between things you expect to be related
what does correlation find
line of best fit by minimising the differences between data and line
equation for correlation/line of best fit
r = Sxy (how much x and y change together) / Sx.Sy (how much x and y change separately)
what does the r value tell you
the direction of and how strong the correlation is
when is correlation positive
r is above 0
1>(and equal to) r > 0
when is correlation negative
r is below 0
-1< (and equal to) r < 0
when is correlation strong
if r is close to 1
when is correlation weak
if r is close to 0
what does the r2 (r-squared) value tell you
how much of the variance is explained by your correlation
what does r2 close to 1 mean
correlation explains a lot of variance
what does r2 close to 0 mean
correlation explains only a little variance
what is r2 also called
coefficient of determination
what is 1-r2
amount of variance not explained
regression: when x increase by 1, what happens to y
y increases by the slope
correlation and regression: describes how many relationships
C - describes single relationship
R - can describe multiple relationships
correlation and regression: what do they describe about the relationship
C - direction and strength of relationship
R - directionS and strengthS of relationshipS
correlation and regression: are x and y variables inter-changable
C - yes
R - no
correlation and regression: do they allow prediction
C - no
R - yes
correlation and regression: what are the co-efficients
C - r, r2
R - R, R2, F, t, SE, ß1-n
how do you report correlations
r ([df]) = [pearson’s r] , p = [p-value]
r(64) = .881 , p < .001
how do you report regressions (overall model fit)
F ([df1] , [df2]) = [F-value] , p = [p-value]
F (1 , 64) = 223 , p < .001
how to report regressions (individual variables)
estimate = 121 , SE=8.10 , t(64) = 14.9 , p < .001
what is t when reporting regression
t = estimate/SE (this is a t test)
in multiple regression what is the outcome and predictor variables, how many
single outcome variable - y
multiple predictor variables - x1 , x2 , …
in multiple regression, what do you find instead of line of best fit
find the best fitting surface (3d graph with a 2d shape (surface) acting as line of best fit)
in multiple regression, what are residuals
distance from surface
in multiple regression, what can predictors be
can be almost anything:
continuous, ordinal or discrete
normally - distributed or not
linear or non-linear
what is a problem with correlation and regression, what may you have to do
extrapolation
non linear relationships (may have to transform data - quadratic, cubic, logarithmic)