Chapter 6 Flashcards
Correlational Study
Examines the relationship between 2 or more measured variables (not manipulated or controlled by experimenter)
Correlation
Statistical technique used to measure and describe the relationship between 2 variables
You can correlate any two variables as long as they are numerical (meaning they can be represented by numbers)
Why do we use a correlation coefficient?
It is used to make a prediction (if two variables are related, we can use one variable to predict the other; example: SAT scores and college success)
To measure reliability (Test-retest, alternate forms; example: is that dependable friend going to pick you up at 2am at the airport)
To measure validity (Are the two variables really related?; example: SAT and ACT scores related to college grades)
BUT. IT IS NOT A MEASURE OF CAUSALITY
All correlations range from? And what does this number mean?
-1.00 to +1.00
This absolute value shows strength of relationship
Higher the absolute number, the stronger the relationship
What is perfect correlation?
+/- 1.00 is the strongest possible relationship
The graph of a perfect correlation is just one straight line
What does the sign of the correlation tell you?
It tells us the directionality of the relationship of any two variables, X and Y
If the sign is positive: (the variables change in the same direction)
As X is increasing, Y is increasing
As X is decreasing, Y is decreasing
If the sign is negative: (the variables change in opposite directions)
As X is increasing, Y is decreasing
As X is decreasing, Y is increasing
What is the correlation coefficient?
r, it is reflected by a spread. The fatter the oval the lower the correlation
What kind of line will r have if it equals zero?
It will be horizontal, because there is no correlation
Pearson correlation coefficient
r= the Pearson coefficient
r measures the amount that the 2 variables (X&Y) vary together taking into account how much they vary apart
It is a ratio
r= (degree to which X and Y vary together) / (degree to which X and Y vary separately)
Sum of Products of Deviations (SP)
Definitional Formula SP= The sum of (X-X bar)(Y-Y bar) Computational Formula SP= The sum of XY - ((sum of x, times the sum of Y) / n) N is the number of (X,Y) pairs
r squared
percentage of variance in Y accounted for by X
This ranges from 0 to 1 (POSITIVE ONLY)
you can not have a negative percentage, because squaring anything is positive
This number is a meaningful proportion (unlike the Pearson’s r)
It has a similar idea to effect size
What are the limitations of Pearson’s r?
- Correlation does not mean causation
- Strength of the relationship
(Pearson’s doesn’t give directly interpretable strength of relationship, the r squared (coefficient of determination ))
3.Outliers (extreme scores)
(scores with extreme X and/or Y value can drastically effect Pearson’s r) - Restriction of range
(restricted range of measured values can lead to inaccurate conclusions about the data;
finding no correlation when there really isn’t one
finding a correlation when there really is one)
What is regression?
Fitting a line to the data using an equation in order to describe and predict data
Simple regression
Uses just two variables (x and y)
Multiple regression
one y and many x’s. You’re still predicting one outcome, but comparing it to multiple causations
Multiple regression has a lot more external validity. Meaning that It is most comparable to the real world.
Linear regression
fits data to a straight line
Curvilinear regression
involves using geometry AND calculus to come up with a solution
From geometry we know:
That we can describe any line by an equation
Slope = change in Y per unit change in x
y intercept = where line crosses the Y axis (when X = 0)
statistics notation : y hat = bX + a,
it is like y = mx + b
Y hat
Is the predicted value of Y, given a certain value of X
Strong, moderate, weak correlation?
Strong= 0.8 Moderate= 0.4 Weak= 0.2
What does r = 0.0 look like?
The best fit line is a horizontal line. And dots are basically everywhere
What is the Pearson Correlation Coefficient?
It is r.
It measures the amount that the two variables (X and Y) vary together taking into account how much they vary apart.
r= (degree to which X and Y vary together) / (degree to which X and Y vary separately)
Standard Error of the Estimate
It is an estimate of how consistent it would be if we resampled over and over again. The amount of sampling variation there would be for Beta (slope).
Example:
If SE(B) = 0.2 and (B) by itself is 1. Then B is 5 standard errors away from zero. That means B = 1 is pretty far away from zero.