Relationship between Variables: Correlation and Regression Flashcards
We are interested in finding a way to represent association between scores.
association
The Regression Line
first and most obvious way to summarize data where we are examining the relationship between two variables.
Scatterplot
The Regression Line
We put one variable on the x-axis and another on the y-axis, and we draw a point for each person showing their scores on the _____
two variables.
The Regression Line
When we want to tell people about our results, we don’t have to draw a lot of _____
scatterplots
Children were asked to listen to a word and repeat it. They were then asked which of these 3 words started with the same sound.
X
Initial phoneme detection
reading score, a standard measure of reading ability.
Y
British Ability Scale (BAS)
We usually summarize and represent the relationship between two variables with a number
correlation coefficient
We also calculate the ______ for this number, and we want to be able to find out if the relationship is statistically significant.
Thus, we want to know what is the _______ of finding a relationship at least this strong if the null hypothesis that there is no relationship in the population is true.
Confidence Intervals
probability
a best fitting line used for prediction.
Line of best fit or Regression Line
Predicting the_____ in Y as a function of the ______ in X.
variation
how steep the line
slope
the position or height of the line.
intercept
By ____ we give the height at the point where the line hits the y-axis.
convention
The height is called the ____or often just the_____. (or sometimes the constant)
y-intercept or intercept
The intercept represents the expected score of a person who scored zero on the ______
x-axis variable.
It is often the case that the intercept doesn’t make any sense. After all, no one usually scores____
scores 0 or close to 0.
We can use the two values of______ to calculate the expected value of any person’s score on Y, given their score on X
slope and intercept
formula for Expected Y score
Expected Y score = intercept + slope x (score on X)
Where x is the x-axis variable. This equation is called the ______
regression equation.
Making Sense of Regression Lines
thinking about the relationship between______ can be very useful.
two variables
Making Sense of Regression Lines
We can make a____ about one score from the another score.
prediction
Problem: if we don’t understand the scale(s), regression lines and equations are _____
meaningless
When there is a relationship between two variables, we can _____ one from the other.
We can not say that one _____the other,
predict
explains
The correlation coefficient
We need some way of making the scales have some sort of meaning, and the way to do this is to convert the data into _____
standard deviation units.
Thus we could ask: “If the score on ___ is one SD higher, how many SDs higher would we expect the ____score to be?”
x
y
Talking in terms of SDs means that we are talking about _____
standardized scores
Because we are talking about standardized regression slopes, we call it______
standardized slope.
Correlation coefficient – a more important name for the ______
standardized slope.
Where σx is the SD of the variable of the variable on the x -axis (the horizontal one) of the scatterplot, and σy is the SD of the variable on the y-axis (the vertical one), and r is the correlation.
The letter r actually stands for ______, but most people ignore that because it is confusing.
regression
if we know the slope we can calculate the correlation using the formula:
r = β x σx / σy
Residual
In correlation, we want to know how well the ______line fits the data
That is, how far away the points are from the line.
regression
The closer the points are to the ____ the stronger the relationship between the two variables. (how do we measure this?)
line
When we had one variable and we wanted to know the spread of the points around the mean, we calculated the____
SD (σ)
The square of the SD is the ____
variance
We can do the same thing with our regression data, but instead of making d the difference between the mean and the score, we can make it the difference between the value that we would expect the person to have, given their score on the x-variable, and the score they actually got. We can calculate their predicted scores, using:
y = b0 + b1x
for each person, we can therefore calculate their predicted BAS reading score, and the difference between their predicted score and their actual score. The difference is called_____
Residual.
the difference between the score they got and the score we thought they would get based on their initial score
residual score
if we want to calculate the equivalent of the variance, we need to ____ each person’s score.
square
The value of the standardized slope and the value of the square root of the proportion of variance explained will___ be the same value.
always