Chapter 8: Correlation Flashcards

1
Q

covariance

A
  • simple measure of association
  • we want to see if two variables are related/associated (do they covary)
  • if two variables are related, we should expect deviations on one variable to be met with deviations on another
  • positive covariance means two wariables have a positive relationship. a negative covariance means the two variables have a negative relationship. a covariance of 0 indicates no relationship
  • the covariance between 2 variables is heavily influenced by the units of measurement and is not easily interpretable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

standardization of the covariance

Pearson correlation coefficient (r)

A
  • pearson’s r is the covariance standardized
  • it can be obtained by dividing the covariance by the product of the two SDs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pearson’s r

A
  • a measure of linear association between 2 variables
  • range from +1.00 to -1.00
  • .1 is small, . 3 is medium, and .5 is large (guidelines
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

r squared

A
  • shared variance between two variables
  • just square r
  • interpretation: 25% of the variance in the outcome can be accounted for by the variance in the predictor
  • can inform judgments about practical and scientific significance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

curvilinear relationship

A

an observed curvilinear relationship may be due to a ceiling or floor effect, so consider this possibility
- ceiling effect: independent variable no longer has an effect on the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

factors that influence the observed r

A
  • sampling error
  • measurement error
  • range restriction (direct, indirect, self-selection)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sampling error

A

statistic - parameter
- occurs because we have samples, not the whole population
- r could be lower or higher than rho
- correlation in the sample is actually a biased estimated of rho
- affected by sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

measurement error

A

true value - actual value
- decreases the observed correlation, r
- possible to correct for if certain assumptions are met
- shorter tests have more measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

range restriction

A
  • occurs when you have reduced variability in your sample, often as a result of using cutoff scores
  • full range of values or a variable not present in the sample
  • decreases the observed correlation, r. it underestimates the utility of using that selection instrument
  • three types: direct, indirect, self selection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

range restriction types

A
  • direct: occurs when applicants are selected on X (variable of interest)
  • indirect: occurs when applicants are selected on a third variable, Z, that is correlated with X (i.e. ACT/SAT)
  • self-selection: occurs when people selectively do not apply for positions they believe they are not qualified for (i.e., harvard only takes high SAT so people w/ low SAT score aren’t going to apply, only leaves the people in the upper range, reduces variability)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

units of analysis

individual vs. group

A
  • associations at group and individual levels are different because the processes that are driving improvement are different
  • if you assume an association at one unit of analysis is going to hold across another unit of analysis, this is a fallacy
  • atomistic fallacy: concluding that an association at individual level must also exist at the group level
  • ecological fallacy: concluding that an association at the group level must also exist at the individual level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

alternative measures of association

A
  • Spearman’s rho: non parametric statistic used w/ skewed data and many outliers. used to minimize the effects of extreme scores and violations of assumptions
  • Kendall’s tau: non parametric statistic used to minimize the effects of extreme scores and violations of assumptions. used when you have a small data set and a large number of tied ranks
  • biserial correlation: used when one continuous variable is artificially dichotomized (makes r smaller). corrects for artificial dichotomy and estimates the correlation had the variable been measuired continuously. needs at least 100 observations. a lot of info is lost
  • tetrachoric correlation: used when both variables are artificially dichotomized. needs at least 400 observations. estimates what r would be if variables had been properly measured on an interval or ratio scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

why does correlation not equal causation?

A

to determine that X causes Y, three conditions must be met:
- X precedes Y in time (temporal precedence)
- there is an association between X and Y
- alternative explanations for the association between X and Y are ruled out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

spurious correlations

A
  • if there is no causal relationship between X and Y, but X and Y correlate, the correlation is said to be spurious
  • often caused by a third variable ( a variable that causes both X and Y)
  • mismatch between correlations and causal relations is possible. correlation can be positive when the real relationship is negative (can happen when looking between units)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

inferences about rho and CIs

A
  • the higher rho is, the more negatively skewed it cbecomes
  • the higher rho is, the more the estimates (r) are underestimates of rho
  • for anything not rho = 0, the correlation coeff tends to be biased (underestimates)
  • the higher rho is, the greater the bias is
  • as N increases, the more precise the estimates become
  • r is a consistent estimator: with higher N, you get more and more precise estimates of the population value. this is okay because it can be overcome by collecting bigger sample size
  • skewness for all values of rho except 0 causes issues about making inferences about rho (difficult to make because the CIs are not normal)
  • fisher r to z transformation: used to transform observed r into a z and place limits around the z. this extends out the tail to make it normal. these limits are transformed back into correlation coefficients to give CIs around r

r is a biased and consistent estimator of rho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

inferences about rho: NHST

A
  • NHST can also be used to make inferences about rho
  • the null is almost always rho = 0
  • if the p is less than .05, we conclude there is a relationship between the two variables in the population
  • if the p is greater than .05, we conclude that we did not find a relationship