Chapter 8: Correlation Flashcards

Question 1

Q

covariance

Answer

A

simple measure of association
we want to see if two variables are related/associated (do they covary)
if two variables are related, we should expect deviations on one variable to be met with deviations on another
positive covariance means two wariables have a positive relationship. a negative covariance means the two variables have a negative relationship. a covariance of 0 indicates no relationship
the covariance between 2 variables is heavily influenced by the units of measurement and is not easily interpretable

Question 2

Q

standardization of the covariance

Pearson correlation coefficient (r)

Answer

A

pearson’s r is the covariance standardized
it can be obtained by dividing the covariance by the product of the two SDs

Question 3

Q

Pearson’s r

Answer

A

a measure of linear association between 2 variables
range from +1.00 to -1.00
.1 is small, . 3 is medium, and .5 is large (guidelines

Question 4

Q

r squared

Answer

A

shared variance between two variables
just square r
interpretation: 25% of the variance in the outcome can be accounted for by the variance in the predictor
can inform judgments about practical and scientific significance

Question 5

Q

curvilinear relationship

Answer

A

an observed curvilinear relationship may be due to a ceiling or floor effect, so consider this possibility
- ceiling effect: independent variable no longer has an effect on the dependent variable

Question 6

Q

factors that influence the observed r

Answer

A

sampling error
measurement error
range restriction (direct, indirect, self-selection)

Question 7

Q

sampling error

Answer

A

statistic - parameter
- occurs because we have samples, not the whole population
- r could be lower or higher than rho
- correlation in the sample is actually a biased estimated of rho
- affected by sample size

Question 8

Q

measurement error

Answer

A

true value - actual value
- decreases the observed correlation, r
- possible to correct for if certain assumptions are met
- shorter tests have more measurement error

Question 9

Q

range restriction

Answer

A

occurs when you have reduced variability in your sample, often as a result of using cutoff scores
full range of values or a variable not present in the sample
decreases the observed correlation, r. it underestimates the utility of using that selection instrument
three types: direct, indirect, self selection

Question 10

Q

range restriction types

Answer

A

direct: occurs when applicants are selected on X (variable of interest)
indirect: occurs when applicants are selected on a third variable, Z, that is correlated with X (i.e. ACT/SAT)
self-selection: occurs when people selectively do not apply for positions they believe they are not qualified for (i.e., harvard only takes high SAT so people w/ low SAT score aren’t going to apply, only leaves the people in the upper range, reduces variability)

Question 11

Q

units of analysis

individual vs. group

Answer

A

associations at group and individual levels are different because the processes that are driving improvement are different
if you assume an association at one unit of analysis is going to hold across another unit of analysis, this is a fallacy
atomistic fallacy: concluding that an association at individual level must also exist at the group level
ecological fallacy: concluding that an association at the group level must also exist at the individual level

Question 12

Q

alternative measures of association

Answer

A

Spearman’s rho: non parametric statistic used w/ skewed data and many outliers. used to minimize the effects of extreme scores and violations of assumptions
Kendall’s tau: non parametric statistic used to minimize the effects of extreme scores and violations of assumptions. used when you have a small data set and a large number of tied ranks
biserial correlation: used when one continuous variable is artificially dichotomized (makes r smaller). corrects for artificial dichotomy and estimates the correlation had the variable been measuired continuously. needs at least 100 observations. a lot of info is lost
tetrachoric correlation: used when both variables are artificially dichotomized. needs at least 400 observations. estimates what r would be if variables had been properly measured on an interval or ratio scale

Question 13

Q

why does correlation not equal causation?

Answer

A

to determine that X causes Y, three conditions must be met:
- X precedes Y in time (temporal precedence)
- there is an association between X and Y
- alternative explanations for the association between X and Y are ruled out

Question 14

Q

spurious correlations

Answer

A

if there is no causal relationship between X and Y, but X and Y correlate, the correlation is said to be spurious
often caused by a third variable ( a variable that causes both X and Y)
mismatch between correlations and causal relations is possible. correlation can be positive when the real relationship is negative (can happen when looking between units)

Question 15

Q

inferences about rho and CIs

Answer

A

the higher rho is, the more negatively skewed it cbecomes
the higher rho is, the more the estimates (r) are underestimates of rho
for anything not rho = 0, the correlation coeff tends to be biased (underestimates)
the higher rho is, the greater the bias is
as N increases, the more precise the estimates become
r is a consistent estimator: with higher N, you get more and more precise estimates of the population value. this is okay because it can be overcome by collecting bigger sample size
skewness for all values of rho except 0 causes issues about making inferences about rho (difficult to make because the CIs are not normal)
fisher r to z transformation: used to transform observed r into a z and place limits around the z. this extends out the tail to make it normal. these limits are transformed back into correlation coefficients to give CIs around r

r is a biased and consistent estimator of rho

Question 16

Q

inferences about rho: NHST

Answer

Study These Flashcards

A

NHST can also be used to make inferences about rho
the null is almost always rho = 0
if the p is less than .05, we conclude there is a relationship between the two variables in the population
if the p is greater than .05, we conclude that we did not find a relationship

Chapter 8: Correlation Flashcards

(16 cards)