Associations Between Two Continuous Variables Flashcards
Sometimes we are interested in testing if two continuous variables are associated with one another. What is the most common form of association studied?
Linear association
Positive association
When people score high (or low) on the first variable also score high (or low) on the second variable
Negative association
When people score high on the first variable and low on the second (or vice versa)
What is the most common index of a linear association?
Pearson correlation coefficient
Sum of the products of deviation (SP)
Reflects the co-variability (shared variation) of X and Y
What produces big positive SP values?
Lots of above/above pairs (two numbers above mean)
AND
Lots of below/below pairs (two numbers below mean)
What produces big negative SP values?
Lots of above/below pairs and lots of below/above pairs
What produces near 0 SP values?
Equal mix of above/above, below/below, above/below, and below/above pairs
r squared is referred to as the…
coefficient of determination
r squared reflects…
the proportion of variance that our predictor variable accounts for in our outcome variable (variability explained by linear regression)
3 factors influencing the size of r:
Distribution of variables
- Perfect correlations only exist if shape of distributions are exactly the same (positive) or exactly opposite (negative)
Reliability of measures
- Perfect correlations only exist with perfect reliability in both measures
Restriction of range
- Restricting the range of scores on one or both variables can weaken correlations
Regression analysis using a single predictor variable is referred to as…
“simple regression”
Regression analysis involving two or more predictors is referred to as…
“multiple regression”
When two variables are linearly associated, this association can be described using a simple equation:
Y = bX + a
What do each of the variables in the regression equation (Y = bX + a) represent?
Y - represents scores on the outcome variable
b - represents slope of best fitting line
X - represents scores on the predictor variable
a - fixed constant representing the Y intercept
Standard error of estimate
A measure of the standard distance between a regression line and the actual data points
Basically how much error variance is in our model
How is SS error related to r?
As r approaches 1, SS error will become smaller
As r approaches 0, SS error will become larger
What is the null and alternative hypotheses of the b value for a simple regression?
H0: B = 0 (there is no linear association between X and Y -> slope is not significantly different from 0)
H1: B (does NOT equal) 0
To test the null of a simple regression we partition the variance in Y (DV) into two components:
- Variability in Y predicted from linear association
2. Variability in Y predicted by error variability
What are the 4 assumptions that simple regression (and its NHST) is based on?
- Independence of observations
- Linear relationship between X and Y
- Residuals are normally distributed with a mean of 0
- Homoscedasticity of residuals
What makes regression more so like t-tests and less so like ANOVA?
No real follow-up tests are relevant because there is nothing to interpret, we are just looking for raw data.
Regression is not an omnibus test.
Total squared error is also known as…
sum of squared error (SS error)
Sum of squares (SS)
Sum of the squared deviations
A higher SS value indicates a large degree of variability
A lower SS value indicates data does not vary considerably from mean value
Regression degrees of freedom equals…
1
Error degrees of freedom equals…
n - 2
Anytime an SS value is divided by its df value it is…
an index of variance
The Pearson correlation coefficient (r) is…
an index of association that assesses the magnitude and direction of linear relation between two variables
AND
an index of co-variability of X and Y relative to the variability of X and Y separately
z-score represents…
an individual score’s standing within the distribution for that score
Basically, a score of 1 is 1 standard deviation about the mean
3 special cases of Pearson correlation
Point biserial correlation
- Correlation between dichotomous variable and continuous variable
Phi coefficient
- Correlation between two dichotomous variables
Spearman rank-order correlation
- Correlation between ordinal variables
Why put r into z scores? (think… r formula using z scores)
Because it allows us to standardize and compare r across different studies. By dividing by sample size we are also standardizing because sample size differs across different studies
When we standardize both X and Y (in z form) they equal zero. Thus, the variability (SS) in each of them…
have to be equivalent (SSY = SSX)
What is the difference between the homogeneity of variance assumption and the homoscedasicity of residuals assumption?
“Homogeneity of variance” is used in the ANOVA context
“Homoscedasicity” is used in the regression context
Both assume that the variance in residuals is the same everywhere