Correlation and Regression Flashcards
what is correlation a form of?
bivariate analysis
- relationship between 2 variables
focus on direction and degree
what is a linear relationship?
for every increase in x, there is also an increase in why
what are some examples of non linear relationship?
practice and performance… when learning a musical instrument, you are more likely to learn a lot more in the first year and your progress is likely to slow over time, eg
T or F? Even when there is a non linear relationship, it makes sense to use correlation measures? Why
False
As you might get a correlation of value when in fact there is a U shape relationship between the data
what are the rules of thumb on how big or small a correlation is?
small (.1 to .3)
medium (.3 to .5)
large (.5 to .7)
what is r squared? what is it used for?
the correlation coefficient, squared
when you square the correlation coefficient, this gives you an estimate of the percentage of variance that is actually accounted for by your model - how much variance does your predictor account for?
if your predictor accounts for 50% of the variance, what does this mean?
that 50% of the variation across subjects can be accounted for by the predictor you have
what is variability?
how much a given variable varies from observation to observation - eg how much height in the class varies
what is covariability?
how much two variables vary together eg, if we take the class height and weight, as height increases (or decreases) how does that impact weight? positively, negatively or no relationship? do two variables vary together or independently of each other?
what is the sum of squares used for? how is it calculated?
it calculates a rough estimate of variability…
SS = Σ(X - X(mean))^2
you take each individual’s height, and subtract the mean height from that and square it…. then sum it up for all observed numbers
to measure the variability you use ____
to measure co variability you use ____
sum of squares; sum of products
how is the sum of products calculated?
SP = Σ (X-X (mean)) x (Y - Y(mean))
when will the sum of squares be identical to the sum of products?
when both variables are identical
how do we calculate the pearson correlation coefficient?
SP
r = ———————
Square root of (SS of x by SS of y)
what is the worded formula for calculating the pearson correlation coefficient?
r = covariability of X and Y/Variability of X and Y separately
calculating a ratio
what happens if we have relatively low co-variability of X and Y compared to variability of X and Y separately?
we have a weak correlation
what can drastically influence your correlation value?
extreme scores or outliers
what is regression towards the mean?
where an extreme score on one measure tends to be followed by a less extreme score on the other measure… as extreme scores are often due to chance, it’s extremely unlikely that the other value will also be extreme, eg if there is a really really rainy day, it is likely that the following day will not be as rainy
what is an example of the regression towards the mean?
1 or 2 people might guess 10 coin flips correctly, and 1 or 2 people might correctly guess the number between 1 and 50, but it is highly unlikely to be the same people as the extreme scores of the people who got the coin flip correct are more likely to be followed by getting a value closer to the mean on the next variable
what is the null hypothesis for correlation?
that the correlation in the population is zero
what is asked when determining if the null hypothesis is to be rejected or accepted?
once r value has been calculated, we ask what is the probability of finding an r value this big if the real association in the population is zero? If this probability is small, we reject the null hypothesis
what is the degrees of freedom?
the amount of participants (N) minus 2
how is spearman’s correlation used? when is it used?
convert the data to ranks before calculating correlations…
used when asking the question are values that are high on one variable also high on the other variable?
why would you use spearman’s correlation?
when you have non linear data… and you control for or eliminate outliers as values such as 15, 20, 23232 becomes values of 1, 2, 3.
what is reliability?
the consistency of a measure…. does a measure or test return the same results each time?
what does cronbach’s alpha measure?
reliability
what does cronbach’s alpha require?
at least 3 items or scales?
how is cronbach’s alpha calculated?
by averaging covariance of item pairs divided by the total variance
T or F? the value of cronbach’s alpha directly represents the proportion of reliable variance? Eg, value of .7 means 70% reliable variance?
True
what is the rough rule of thumb for cronbach’s alpha?
excellent: equal to or greater than 0.9 Good: 0.8 to 0.9 acceptable: 0.7 to 0.8 questionable: 0.6 to 0.7 poor: 0.5 to 0.6 unacceptable: less than 0.5
what is regression about?
predicting one variable from another
what is the general formula for a perfect linear relationship? give example
Y = a + b X + e Y = a + b x (IQ)
what do all components of the general formula for a perfect linear relationship represent?
Y = outcome variable (what is being predicted) a = y intercept (value of y when x = 0) b = slope (how much y changes whenever x changes) X = predictor variable e = error or residual term
what are some assumptions about errors?
- errors are independent of one another
- normally distributed (if we were to plot all of our errors, it would roughly follow a normal distribution)
- homoscedastic (equal error variance for levels of predicted Y)
how can we estimate regression model
least squares parameters estimates
how do you calculate error or residual?
minus the predicted value from the observed value
Y - Y(predicted)
how do you get the total squared error?
what is the formula?
calculate the error for every value then square them and sum them up
Σ (Y-Y(predicted))^2
what is the sign for slope? how do you calculate it?
b.
SP
b = —————-
SSx
what is the sign for intercepts how do you calculate it?
a.
a = Y(mean) - b x X(mean)
what are other names for the intercept?
which one is the one they use in SPSS?
a constant (this in spss) y-intercept predicted valued of Y when x = 0
what are other names for the slope?
which one is the one they use in SPSS?
b
rise over run
effect on Y for a unit increase (1) on predictor
how do you measure the variance?
correlation score, squared
R^2
what is r squared showing us?
the proportion of variance accounted for
when do you use ANOVA test?
to determine whether the variance explained is significantly different from zero
what does b (slope) indicate?
- whether there is a relationship between X and y
- whether that relationship is positive or negative
- estimate of expected change in Y when x increases by 1
DOES NOT indicate correlation (it is not the correlation)
what do you have to do to the slope so that it matches the correlation? how do you do this?
what is this new coefficient called?
- translate it into a standardised form
- convert X and Y into Z scores and then do the regression
- standardised regression coefficient
what regression diagnostics can be run?
histogram of residuals
residual plot - homoscedasticity
what do we hope or expect to find when we run a histogram of residuals to be in the clear?
an approximately normal distribution
what is a residual plot? what are we looking for?
- a scatterplot of residuals against predicted values to check for heteroscedasticity
- the absence of any systematic pattern supports the assumption of homoscedasticity
T or F…. the standardised coefficient is not the same thing as the correlation (r)
False
T or false
correlation tells us how tightly clustered the values are around the regression line but the line can have any sort of slope
True