Lecture 3 - Correlations and Regression Flashcards
Define variability and covariability.
Variability is much a given varies from observation to observation.
Covariability refers to how much two variables vary together.
Correlation does not necessarily imply causation.
True or false?
True.
How to determine the degrees of freedom when doing a correlation?
df = n - 2
Why is sample size so important when doing correlation analysis?
Because a small sample size could easily yield a correlation that is not actually reflective of the population.
What is the difference between correlation and regression?
Correlation refers to whether there is an association between variables and regression is the predictive model of that association.
In regards to regression what is the least squares solutions?
The least squares solution is how the line of best fit (regression model) is determined.
What is homosedasticity?
Equal variance across observations.
What is the difference between correlation and regression?
Correlation refers to the association between two variables.
Regression refers to predicting one variable from another (if in fact we can predict one variable from another).
What is Regression?
With regression, we are not asking is there a relationship between two variables, we are asking what is the best linear relationship to describe that association. This is often expressed as y = a + bx
In regards to the typical equation relating to Regression in psychology, where y = a + bx + e, what do the different coefficients refer to?
y - the outcome variable
x - predictor variable
a - intercept parameter (sometimes called the constant)
b - slope parameter
e - error or residual term
With regards to regression in psychology, what are the conditions/assumptions made about the error included in a regression term? i.e. the “e” in y = a + bx?
The error or residual term is assumed to be;
1. independent
2. Normally distributed - with a mean of zero
3. Homescedastic - equal error variance for predicted values of Y
With regard to Regression, what is the Least Squares Solution?
The Least Squares solution is how the “line of best fit” is determined from observed data, where the the regression line is determined by summing together the error between a predicted value and the observed value, such the when the the differences are squared and summed the the regression line has the smallest value for the sum of squares - hence the “least squares solution” being what this is referred to as.
How is a regression line determined from a set of data?
The Least Squares Solution is how a regression is determined/arrived at.
Does the slope in a regression model reflect the correlation between the two variables?
No. The slope does not predict or reflect correlation.
You could have a steep slope and a poor correlation, or a shallow slope and a strong correlation. The slope is simply a results of the least squares solution used to determine the line of best fit.
The direction of the slope, however, does reflect the correlation, such that if there is a downwards slope then this is reflective of a negative correlation, whereas if there is an upward slope this is reflective of a positive correlation
In Regression, what does “error” refer to?
Error refers to the difference in what the predicted outcome and the observed outcome.
Should a regression always report the variance explained? And if so, what does this mean?
Yes.
Variance explained simply refers to the idea that the regression proposed for a set of data between two variables should tell us how much of the variance between two variables is explained by the correlation/relationship between these variables or by the predictor. Variance explained is understood at as the R-squared value, which can be obtained from JASP output.
What is the ANOVA (F test)?
The F test tells us whether the variance explained is significantly different from zero and therefore whether the variance explained (r-squared) given with regression indicates a significant relationship between two variables. Another way of saying this is that the F test tells us whether a variable is a significant predictor for the outcome variable.
What does the F test for regression tell us?
The F test answers the question: does the regression line help to explain the variance in Y? Where the Null hypothesis would say that r-squared is = 0
Does r-squared and F (ANOVA) testing tell us the strength of a relationship/correlation?
No. The r-squared value and whether it is significant or not does NOT tell us the strength or direction of a relationship.
What are the regression coefficients?
If we think about regression as y (outcome variable) = a + bx(predictor variable), then the regression coefficients are a and b (as they were taught to us in the lecture), where b is the size and the direction of the slope and a is the y intercept.