chapter 7/week 6 Flashcards
correlational research
correlational research
used to describe the relation between two or more naturally occurring variables
ex: Is income related to age? Is self confidence related to GPA? Is depression related to anxiety?
covary
vary or change together
correlation coefficient
a statistic that indicates the degree to which two variables are related to one another in a linear fashion
pearson correlation coefficient (r)
the most commonly used measure of correlation
Ranges from -1.00 to +1.00
Most commonly used; always this just the name is not said
The magnitude or numerical value of a correlation expresses the strength of the relation between the two variables
When r = .00 the variables are not related
A correlation of 0.78 indicates that the variables are more strongly related than does a correlation of 0.30
Magnitude is unrelated to the sign of the correlation; two variables with a correlation of 0.78 are just strongly related as two variables with a correlation of -0.78.
sign of the correlation coefficient
indicates the direction of the relation between the two variables
Variables can either positively or negatively related
positive correlation
negative correlation
positive correlation
a direct, positive relation between two variables; as one increases the other increases
negative correlation
an inverse, negative relation between two variables; as one variable increases the other decreases
perfect correlation vs no correlation
When there is a perfect correlation all of the data will fall into a straight line
- Should never happen; something is wrong
no correlation – scattered points
when r = .00
A correlation of .00 indicates that there is no linear relation between the two variables. However, there could be a curvilinear relation between them.
Ex: anxiety and performance – not linear
What are the components of the equation to calculate r?
X and y: participants score
Σxy: multiply each participant’s x- and y-scores together, then sum these products across all participants
(Σx)(Σy): sum all participants’ x-scores, sum all participants’ y-scores, then multiply these two sums
Coefficient of determination
r^2 or the square of the correlation coefficient
proportion of variance in one variable that is accounted for by the other variable
- It ranges from 0 to 1
- closer to 1, better you can calculate the variance in your target variable
- indicator of systemic variance
If the correlation between children’s and parent’s scores is .40
Coefficient of determination is (0.16). That is 16% of the variance in children’s IQ scores can be accounted for by their parent’s scores
In other words, 16% of the total variation in children’s IQ scores is systematic variance that is variance related to the parent’s IQ scores
Statistical significance
a finding that is very unlikely to be due to error variance; exists when a correlation coefficient calculated on a sample has a very low probability of being zero in the population
A correlation coefficient is statistically significant when the correlation calculated on a sample has a very low probability of being zero in the population from which the sample came
What probability level (we call this alpha) is very low?
It depends on your field of study, prior research and how much risk you want to take of being wrong
In psychology we generally pick an Alpha level of 0.05. We think 5% is very low.
what must you do before you run any statistical test
you must first determine your alpha level, which is also called the “significance level.” this is the test probability of making a wrong decision (Type I error)
Directional hypothesis
predicts the direction of correlation (pos/neg)
Nondirectional hypothesis
predicts that two will be correlated, doesn’t specify the direction
Statistical Significance of r is affected by several factors!!!!!!!!!!
sample size
effect size
significance level
sample size (affects SS)
i.e., power
An increase in your sample will increase your power, that is your ability to find a significant result
effect size (affects SS)
i.e., coefficient of determination
If the effect size ss larger, you are more likely to find a significant result
significance level (affects SS)
i.e., alpha level
If the alpha level (i.e., .05 or .01 or .001) is set at a higher level, you are more likely to find a significant result
3 factors that distort correlation coefficient
Restricted range
Outliers
Reliability of measures
Restricted Range
data in which participants’ scores are confined to a narrow range of the possible scores on a measure
Having a restricted range artificially lowers correlations below what they would be if the full range of scores were present
Ex – sampling one school to see if ses correlates to grades
- Restricted range; will not see a correlation because everyone in that school lives in the same zip code and has the same ses status
- Need to sample more schools from other areas
Outliers
A score is considered an outlier if it is more that 3 standard deviations away from the mean
On line outliers
off line outliers
on line outliers
fall in the same pattern as the rest of the data and tend to artificially inflate r
off line outliers
fall outside of the pattern of the rest of the data and tend to artificially deflate r
Reliability of Measures
The more unreliable a measure is, the lower its correlations with other measures will be
If the true correlation between college aspirations and SAT score is 0.45, but you use an aspiration scale that is unreliable, the obtained correlation will not be 0.45 but rather near .00.
Correlation and Causality
Correlation does not mean causation
Even when two variables have a perfect correlation of 1.00, we cannot conclude that one variable causes the other variable
criteria for inferring causality
covariation
directionality
extraneous variables
Correlational research satisfies the first (and sometimes the second) criterion, but never the third
Covariation
changes in one variable are associated with changes in the other variable; same as correlation (i.e., high school GPA → SAT score)
Directionality
the presumed causal variable preceded the presumed effect in time (i.e., smoking → lung cancer)
Extraneous variables
all other variables that may affect the relation between the two target variables are controlled or eliminated (think discrimination and depression?)
Spurious correlation
a correlation between two variables that is not due to any direct relationship between them but rather to their relation to other variables
Some other variance (z) may cause both x and y
Ex: students who are highly depressed do not do well in class, and they may try to relieve depression by drinking
Partial correlation
the correlation between two variables with the influence of one or more other variables statistically removed
If a partial correlation between two variables (with the influence of a third variable removed) is significantly lower than the Pearson correlation between the two variables, then the correlation between them is at least partly due to the third variable
For example, the correlation between motivation and test performance may be lower because of a third variable, such as study time. In other words, partial correlation between motivation and test performance is a function of study time
Other indices of correlation
Remember Pearson correlation coefficient (r) si only appropriate when you have a continuous variable (i.e., interval or ratio scale)
spearman rank-order correlation
phi coefficient
point-biserial correlation
Spearman rank-order correlation
correlation used when variables are measured on an ordinal scale, that is when the numbers reflect the rank ordering of participants on some attribute
Ex: correlation between starred reviews and box office reviews
Phi coefficient
correlation used when both variables are dichotomous (i.e., nominal with categories)
Ex: correlation between gender and HS dropout
Point-biserial correlation
correlation used when only one of the variables is dichotomous
Ex: gender and IQ