II - Textbook Flashcards
Coefficient of determination�
represents the proportion of the variance in one variable (x) that is accounted for by the other variable (y).
r2 (square the correlation coefficient).
If the correlation between two variables (x and y) is 0.3. Then 0.3 squared = 0.09, or 9% is the variance in x is accounted for y
Proportion of variance in x that is systemic variance shared with y.
�
Statistical Significance can be influenced by
Sample Size - Larger means more likely a correlation is significant
.Magnitude of correlation
.P value
Partial Correlation
The correlation between two variables after the influence of the third variable is statistically removed.
Spearman Rank-order correlation
correlation between two variables when one or both of the variables is on an ordinal scale (the numbers reflect rank ordering).
E.g. Correlation between teachers ranking of the best to worst students (ordinal scale) and the students IQ scores (interval scale). �.e.g. ask teacher to rank students in class from 1-30 based on what they think their intelligence is, then correlate with actual measured iq
Point biserial correlation:�
used when one variable is dichotomous
Gender is dichotomous (male or female). To correlate gender with spatial memory you would assign all males a 1 and all females a 2.
If you get a significant positive correlation that would mean that females tend to score higher on spatial memory than males. A significant negative correlation would mean that males score higher.
1 dichotomous var, 1 interval/ratio
Phi coefficient�
used when both variables being correlated are dichotomous (e.g., gender, handedness, yes/no answer)
.BOTH variables are dichotomous
On-line outliers
extreme on both variables (very top right of scatter plot graph) INFLATES r
off-line outliers
off to the side extremes… so points on bottom right and top left of scatter plot. DEFLATES r
spurious correlation
correlation between two variables that is not due to any direct relationship between them but rather to their relation to other variables. if researchers think something is spurious, they’ll look for third variables
Factors that distort correlation coefficients
.restricted range
.outliers
.reliability of measures (less reliable, lower the coefficients)
restricted range how distorts coefficients
Restricted range: the size of the correlation may be reduced by a restriction of the range in the variables being correlated.
A restricted range occurs when most participants have similar scores (less variability).
This can occur when you are correlating scores that are either either high or low on one variable.
E.g. If you correlate SAT scores of people who get into college with their college GPA, you may be dealing with a restricted range because usually those with higher SAT scores get in to college.
Must ensure you have a broad range of scores.
Regression�
Predict scores on one variable from scores on another variable
Use GRE scores to predict success in grad school
Regression line
A regression line is a straight line that summarizes the linear relationship between two variables.
The regression line minimizes the sum of the squared deviations around the line.
It describes how an outcome variable y changes as a predictor variable x changes.
A regression line is often used as a model to predict the value of the response y for a given value of the explanatory variable x.
multiple regression
Multiple Regression is used when there is more than one predictor variable.
If you are predicting success in grad school you may use three predictor variables: GRE scores, University GPA, and IQ scores.
Then you can predict success in grad school based on all three predictors, which usually is more accurate than one predictor.
Allows the researcher to simultaneously consider the influence of all the predictor variables on the outcome variable.
standard multiple regression
Standard multiple regression (simultaneous multiple regression): enter all the predictor variables at the same time.
You can predict grad school success by entering GPA, GRE, and IQ score simultaneously.
�
stepwise multiple regression
enter the predictor variables one at a time.
First enter the predictor variable that correlates the highest with the outcome variable.
Next, you enter the variable that relates the strongest to the outcome variable after the first variable is entered.
It will account for the highest amount of variance in the outcome variable after the the first predictor variable is entered
This may or may not be the second highest correlation. If the second highest correlation was highly correlated with the first variable than it may not predict a unique amount of the variance in the outcome variable.
enter strongest predictor variable first. then, add the predictor variable that contributes most strongly to the criterion variable GIVEN THAT THE FIRST PREDICTOR VARIABLE IS ALREADY IN THE EQUATION. SEE PAGE 166 second last paragraph ok dummy.
hierarchical multiple regression
enter the predictor variables in a predetermined order, based on hypotheses the researcher wants to test.
Can partial out the effects of predictor variables entered in early steps to see if other predictor variables still have a contribution uniquely to the variance in the outcome variable.
Predetermined order to select for predictor variables to see if have any UNIQUE effects. Entered based on a hypothesis that the research wants to test.