Relationship between Variables: Correlation and Regression Flashcards

1
Q

We are interested in finding a way to represent association between scores.

A

association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The Regression Line

first and most obvious way to summarize data where we are examining the relationship between two variables.

A

Scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The Regression Line

We put one variable on the x-axis and another on the y-axis, and we draw a point for each person showing their scores on the _____

A

two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The Regression Line

When we want to tell people about our results, we don’t have to draw a lot of _____

A

scatterplots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Children were asked to listen to a word and repeat it. They were then asked which of these 3 words started with the same sound.

A

X

Initial phoneme detection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

reading score, a standard measure of reading ability.

A

Y

British Ability Scale (BAS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We usually summarize and represent the relationship between two variables with a number

A

correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

We also calculate the ______ for this number, and we want to be able to find out if the relationship is statistically significant.

Thus, we want to know what is the _______ of finding a relationship at least this strong if the null hypothesis that there is no relationship in the population is true.

A

Confidence Intervals

probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a best fitting line used for prediction.

A

Line of best fit or Regression Line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Predicting the_____ in Y as a function of the ______ in X.

A

variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how steep the line

A

slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

the position or height of the line.

A

intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

By ____ we give the height at the point where the line hits the y-axis.

A

convention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The height is called the ____or often just the_____. (or sometimes the constant)

A

y-intercept or intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The intercept represents the expected score of a person who scored zero on the ______

A

x-axis variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

It is often the case that the intercept doesn’t make any sense. After all, no one usually scores____

A

scores 0 or close to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

We can use the two values of______ to calculate the expected value of any person’s score on Y, given their score on X

A

slope and intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

formula for Expected Y score

A

Expected Y score = intercept + slope x (score on X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Where x is the x-axis variable. This equation is called the ______

A

regression equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Making Sense of Regression Lines

thinking about the relationship between______ can be very useful.

A

two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Making Sense of Regression Lines

We can make a____ about one score from the another score.

A

prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Problem: if we don’t understand the scale(s), regression lines and equations are _____

A

meaningless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When there is a relationship between two variables, we can _____ one from the other.

We can not say that one _____the other,

A

predict

explains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The correlation coefficient

We need some way of making the scales have some sort of meaning, and the way to do this is to convert the data into _____

A

standard deviation units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Thus we could ask: “If the score on ___ is one SD higher, how many SDs higher would we expect the ____score to be?”
x y
26
Talking in terms of SDs means that we are talking about _____
standardized scores
27
Because we are talking about standardized regression slopes, we call it______
standardized slope.
28
Correlation coefficient – a more important name for the ______
standardized slope.
29
Where σx is the SD of the variable of the variable on the x -axis (the horizontal one) of the scatterplot, and σy is the SD of the variable on the y-axis (the vertical one), and r is the correlation.
30
The letter r actually stands for ______, but most people ignore that because it is confusing.
regression
31
if we know the slope we can calculate the correlation using the formula:
r = β x σx / σy
32
Residual In correlation, we want to know how well the ______line fits the data That is, how far away the points are from the line.
regression
33
The closer the points are to the ____ the stronger the relationship between the two variables. (how do we measure this?)
line
34
When we had one variable and we wanted to know the spread of the points around the mean, we calculated the____
SD (σ)
35
The square of the SD is the ____
variance
36
We can do the same thing with our regression data, but instead of making d the difference between the mean and the score, we can make it the difference between the value that we would expect the person to have, given their score on the x-variable, and the score they actually got. We can calculate their predicted scores, using:
y = b0 + b1x
37
for each person, we can therefore calculate their predicted BAS reading score, and the difference between their predicted score and their actual score. The difference is called_____
Residual.
38
the difference between the score they got and the score we thought they would get based on their initial score
residual score
39
if we want to calculate the equivalent of the variance, we need to ____ each person’s score.
square
40
The value of the standardized slope and the value of the square root of the proportion of variance explained will___ be the same value.
always
41
We therefore have two equivalent ways of thinking about correlation. The first way is the _____. It is the expected increase in one variable, when the other variable increases by 1 SD.
standardized slope
42
_We therefore have two equivalent ways of thinking about correlation. The second way is the ______. If you square a correlation, you get the ______ in one variable that is explained by the other variable
proportion of variance
43
Interpreting Correlations A correlation is both ____ and ____
descriptive and inferential statistics
44
We can find the probability estimate and we can also use it to describe the ____
strength of the relationship.
45
strength of relationship
magnitude
46
positive, negative, curvilinear etc.
direction
47
r = 0.1 = small correlation * r = 0.3 = medium correlation * r = 0.5 = large correlation Note that these only really apply in what __, called Social and Behavioral sciences.
cohen's effect size
48
Common mistake in interpreting correlations A correlation around 0.5 is a _____ * A correlation does not have to ____ 0.5 to be large. * If you have a correlation of r = 0.45, you have a correlation which is approximately ___ to a large correlation. * It’s not a ______ correlation just because it hasn’t quite reached 0.5
1. large correlation 2. exceed 3. equal 4. medium
49
calculating the correlation coefficient Also known as Pearson Product moment correlation
Pearson Correlation Coefficient
50
Pearson correlation coefficient developed by ____
karl pearson
51
_____ correlation and makes the same assumptions made by other _____ tests.
Parametric
52
pearson correlation coefficient is _______ data
Continuous and normally distributed data
53
the moment is the length from the fulcrum multiplied by the weight on the lever.
physics
54
the total moment is equal to the length from the center, multiplied by the weight.
seesaw analogy
55
The same principle applies with _____
correlation
56
We find the length from the center for each of the variables. In this case the center is the _____
mean
57
So, we calculate the difference between ______ and _____ for each variable (these are the moments) and then we multiply them together (this is the product).
the score and the mean
58
Because this value is____ on the number of people, we need to divide it by N.
dependent
59
And because it is related to the _____, we actually divide by N-1 This is called _____, and if we call the two variables x and y
standard deviation covariance
60
Finally, finding _____ is laborious, and we do not want to do it more than we have to.
square roots
61
So instead of finding the square roots and then multiplying them together, it is easier to ______ together, and then find the square root.
multiply the two values
62
importance scattergraph or plot: It will show us approximately what the correlation should be. So if it looks strong, ______, and our analysis shows it is -0.60. we have made a mistake.
positive correlation
63
importance of scattergraph or plot It will help us detect any____ in our data, for example data entry errors.
errors
64
importance of scattergraph or plot * It will help us get a feel of our ____
data
65
The_____ for a statistic tell us the likely range of a value in the population
confidence intervals
66
calculating confidence intervals Sampling distributions of correlation is _____
tricky.
67
calculating confidence interval It is not symmetrical, which means we can’t _____ or _____ CIs in the usual way.
add and subtract
67
calculating the pearson correlation transformation used which makes the distribution symmetrical.
* Fisher’s z transformation
68
calculating the pearson correlation Used to calculate the CIs and then transform back to _____
correlations
68
It is called a_____ because it makes the distribution of the correlation into a z distribution which is a normal distribution with a mean of 0 and SD of 1.
z transformation
69
step _ Carry out Fisher’s transformation.
step 1
70
step _ calculate the Standard Error
step 2
71
step __ And now the CIs. We use the formula * CI = z’ + or – zα/2 x se
step 3
72
Where zα/2 is the value for the ______ which includes the percentage of values that we want to cover.
normal distribution
73
the value for the 95% confidence is (as always) ____
1.96
74
Step _ Convert back to correlation.
step 4
75
If we really want to know the_____ then we can convert the value for r into a value for t.
p-value,
76
When we know the correlation we can also calculate the position of the _______
regression line.
77
We can use the two values _______ to create a regression equation which will allow us to predict y _____ from x ______.
1. slope and intercept 2. (display behavior) 3. (desirability)
78
If variables are both dichotomous (for example, yes/no, top, bottom) we can use the ____
Pearson correlation formula.
78
f one of your variables is continuous and the other is dichotomous we can use the ___
Point Biserial Formula
79
This is when one variable is categorical and has just two all-inclusive values. * Examples: Male/Female, Car owner/Non-Car owner, and so on.
Point Biserial Correlation
80
Non-Parametric Correlations * Used when the data do not satisfy the assumptions of the Pearson Correlation because they are not normally distributed or are only ordinal in nature.
Spearman Correlation Kendall Correlation
81
Three ways to deal with this problem:
Ignore it. It does not make a lot of difference. * Use the Pearson Formula on the ranks (although the calculation is harder than the Spearman formula). * Use a correction
82
If we use a non-parametric test, such as a _____ we tend to lose power.
Spearman correlation
83
Although we could be strict and say that rating data are strictly measured at an ordinal level, in reality when there isn’t a problem with the distributions, we would always prefer to use a ____
Pearson Correlation
84
A_____ gives a better chance of a significant result.
Pearson correlation
85
A curious thing about the _____ is how to interpret it.
Spearman
86
We can’t say that it is the_____, that is the relative difference in the SDs, because the SDs don’t really exist as there is not necessarily any relationship between the score and the SD.
standardized slope
87
We also can’t say that it is the ____ explained, because the variance is a parametric term, and we are using ranks.
proportion of variance
88
All we can really say about the Spearman is that it is the Pearson correlation between the ___
ranks
89
alternative nonparametric correlation, which does have a more sensible interpretation. (advantage: meaningful interpretation) Very rarely used however.
Kendall’s Tau-a (τ – Greek Letter)
90
Kendall’s Tau-a is rarely used for two reasons:
Difficult to calculate if you do not have a computer. * It is always lower than a spearman correlation, for the same data (but the p-values are always exactly the same). * Because people like their correlations to be high, they tend to use it less.
91
The fact that two variables correlate does not mean there is a causal relationship between them. * Though it is often very tempting to believe that there is.
Correlation and Causality
92
* Correlation does not mean____, but ____ does mean correlation
causality
93
In general, if one variable is a purely category-type measure, then correlation cannot be carried out, unless the variable is _____
DICHOTOMOUS
94
Correlation is also a measure of association between_____
two variables
95
What we can do with a nominal/categorical data is reduce the measured variable to nominal level and conduct a _____on the resulting frequency table.
chi-square test
96
A lack of relationship is signified by a value close to ___
zero
97
A value of zero however could occur for a ____
curvilinear relationship.
98
Strength is a measure of the____
correlation.