[L8] Relationship between Variables Correlation and Regression Flashcards

1
Q

We are interested in finding a way to represent ___
between scores.

A

association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of Correlation

A

Bivariate; Multivariate Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Correlation does not prove __

A

Causality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Multivariate Correlation have more ____ Validity

A

Ecological

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

IGT & RMT = test of difference
Correlation = test of _
_

_

A

correlation/association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

___ – first and most obvious way to summarize
data where we are examining the relationship between
two variables

A

Scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We put one variable on the x-axis and another on the yaxis,
and we ___for each person showing their
scores on the two variables.

A

draw a point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

test of correlation involved administering ___ tests in the same group of participants

A

2 or more different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When we want to tell people about our results, we ____

A

don’t
have to draw a lot of scatterplots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

__
_
_Children were asked to listen
to a word and repeat it. They were then asked which of
these 3 words started with the same sound.

A

Initial phoneme detection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

____reading score, a standard
measure of reading ability.

A

British Ability Scale (BAS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

We usually summarize and represent the relationship
between two variables with a ___
__
_
_

A

number (correlation
coefficient).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

We also calculate the ____ for this
number, and we want to be able to find out if the
relationship is ___

A

Confidence Intervals; statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Thus, we want to know what is the probability of finding
a relationship at least this strong if the ____ that
there is no relationship in the population is true.

A

null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

– a best fitting line
used for prediction

A

Line of best fit or Regression Line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Predicting the variation in Y as a __
_

A

function of the variation
in X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

– how steep the line
*

A

Slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

___ – the position or height of the line.

A

Intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

By convention we give the height at the point where the
line ___

A

hits the y-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The
height is called the____or often just the
intercept

A

y-intercept ; (or sometimes the constant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The intercept represents the ___of a person
who scored _
_ on the x-axis variable.

A

expected score ; zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

y=b0+b1X

A

regression expression, predicting behavior of y as function of x

useful for raw scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

It is often the case that the intercept __. After all, __no one_usually scores ___

A

doesn’t make any
sense; 0 or close to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

We can use the ___of slope and __ to
calculate the expected value of any person’s score on Y,
given their score on X.

A

two values, intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
y = β0 + β1x (sometimes it is y = a + bx or y = mx + c) Where x is the x-axis variable. This equation is called the ___
regression equation.
26
We can make a _ __ about one score from the another score
prediction
27
Problem: if we don’t understand the ___, regression lines and equations are ___.
scale(s), meaningless
28
thinking about the relationship between two variables can be very useful
Making Sense of Regression Lines
29
When there is a relationship between two variables, we can ___ one from the other.
predict
30
We can not say that one __ the other,
explains
31
We need some way of making the scales have some sort of meaning, and the way to do this is to __ the data into __
convert; standard deviation units.
32
Talking in terms of SDs means that we are talking about _ __
standardized scores.
33
Because we are talking about standardized regression slopes, we call it "___
standardized slope.
34
___ – a more important name for the standardized slope.
Correlation coefficient
35
In order to convert the units, we need to know the ___
SD of each of the measures.
36
If we know the ___, we can calculate the correlation using the formula: r = β x σx / σy
slope
37
The letter r actually stands for ___, but most people ignore that because it is confusing
regression
38
Thus, if we know the _ __ we can calculate the correlation
slope
39
3 ways to calculate for the correlation coefficient"r"
1. regression line 2. standardized slope 3. proportion of variance
40
In correlation, we want to know how well the regression line ___
fits the data.
41
That is, how ___the points are from the line.
far away
42
The __ the points are to the line, the stronger the relationship between the two variables.
closer
43
When we had one variable and we wanted to know the spread of the points around the mean, we calculated the _ _
SD (σ).
44
The square of the SD is the _ __.
variance
45
We can do the same thing with our regression data, but instead of making d the difference between the mean and the score, we can make it the difference between the value that we would expect the person to have, given their score on the x-variable, and the score they actually got. We can calculate their ___
predicted scores,
46
the difference between their predicted score and their actual score. The difference is called _--.
Residual
47
Their ____ (the difference between the score they got and the score we thought they would get based on their initial phoneme score)
residual score
48
if we want to calculate the equivalent of the variance, we need to ___ each person’s score
square
49
___ = d squared
Residual squared
50
The value of the standardized slope and the value of the square root of the proportion of variance explained will ___ be the same value.
always
51
We therefore have ___of thinking about correlation.
two equivalent ways
52
The first way is the ___ It is the expected increase in one variable, when the other variable increases by 1 SD.
standardized slope.
53
The second way is the __ __ If you square a correlation, you get the proportion of variance in one variable that is explained by the other variable.
proportion of variance.
54
A correlation is both ___statistics.
descriptive and inferential
55
We can find the ____and we can also use it to describe the ___
probability estimate ; strength of the relationship
56
* __ – strength of relationship _
Magnitude
57
___ – positive, negative, curvilinear etc.
Direction
58
Cohen’s effect size:
* r = 0.1 = small correlation * r = 0.3 = medium correlation * r = 0.5 = large correlation
59
Note that these only really apply in what Cohen, called ___
Social and Behavioral sciences.
60
Common mistake
* A correlation around 0.5 is a large correlation. * A correlation does not have to exceed 0.5 to be large. * If you have a correlation of r = 0.45, you have a correlation which is approximately equal to a large correlation. * It’s not a medium correlation just because it hasn’t quite reached 0.5
61
Pearson Correlation Coefficient * Also known as ___
Pearson Product moment correlation.
62
Pearson Product moment correlation developed by
Karl Pearson
63
Pearson Correlation Coefficient is a _____ and makes the ___ made by other parametric tests.
Parametric correlation; same assumptions
64
level of measurement for Pearson Correlation Coefficient
Continuous and normally distributed data
65
to determine r
1. standardized slope 2. proportion of variance 3. pearson product moment correlation
66
Optional Extra: Product Moments * ___: the moment is the __ from the fulcrum multiplied by the weight on the lever.
Physics; length
67
___ the total moment is equal to the length from the center, multiplied by the weight. The same principle applies with ___.
Seesaw analogy: correlation
68
The same principle applies with correlation: needs to be balanced (raw to standard score) to be _- _
comparable
69
* We find the _ __ for each of the variables. In this case the center is the __.
length from the center; mean
70
So, we calculate the difference between the score and the mean for each variable (these are the ___) and then we multiply them together (this is the ___).
moments; product
71
Because this value is dependent on the ___ we need to divide it by N.
number of people,
72
And because it is related to the ___, we actually divide by N-1.
standard deviation
73
This is called ___, and if we call the two variables x and y,
covariance
74
Just as before, we need to __ this value by dividing by the ___
standardize; standard deviations.
75
Calculating the Correlation Coefficient: we need to divide by ___, so we multiply them together
both SDs
76
So instead of finding the square roots and then multiplying them together, it is easier to multiply the two values together, and then find the ____
square root.
77
Importance scattergraph or plot: *
It will show us approximately what the correlation should be. It will help us detect any errors in our data, for example data entry errors. It will help us get a feel of our data.
78
The confidence intervals for a statistic tell us the likely ___
range of a value in the population.
79
Sampling distributions of correlation is ___. It is not __ _, which means we can’t add and subtract CIs in the usual way.
tricky; symmetrical
80
___transformation used which makes the distribution symmetrical.
Fisher’s z transformation –
81
Used to calculate the CIs and then transform back to correlations.
Fisher’s z transformation –
82
It is called a ____, because it makes the distribution of the correlation into a z distribution which is a normal distribution with a mean of 0 and SD of 1.
z transformation
83
There are ___ to find the p-value associated with a correlation.
2 ways
84
Calculating the p-value
1. Use table in Appendix 3.
85
If we really want to know the p-value, then we can convert the value for r into a ___
value for t.
86
We can use this t-value to obtain the __using a __
exact p-value, computer program
87
When we know the __ we can also calculate the ___ of the regression line
correlation; position
88
We can use the two values ___ to create a regression equation which will allow us to predict y (display behavior); (). from x desirability
(slope and intercept);
89
We can use the ___ to draw a graph with the line of best fit on it
predictions
90
we have extended the line to __ – we would not normally do this.
zero
91
If variables are both dichotomous (for example, yes/no, top, bottom) we can use the ___
Pearson correlation formula.
92
Dichotomous Variables - A much easier way is to calculate the value of__ and then use the ___, which will give the same answer as using the r correlation.
chi-square; phi ( ф ) correlation formula
93
The p-value of the correlation will be the same as the pvalue for the ___because the two tests are just different ways of thinking about the same thing.
chi-square test
94
If one of your variables is continuous and the other is dichotomous we can use the ___
Point Biserial Formula:
95
*These formulae give exactly the same answer as the __ _, but they just easier to use.
Pearson Formula
96
On special occasions we can correlate using a __ This is when one variable is categorical and has just two all-inclusive values.
dichotomous variable.
97
Here, we may give an ___according to membership of the categories, e.g. 0 for female and 1 for male.
arbitrary value
98
Dichotomous Variables- We then proceed with the ___ as usual.
Pearson Correlation
99
The Point Biserial is written as
rpb.
100
This value can be turned into an ordinary___
t-value.
101
This may sound like a cheat because the Pearson’s was a ____ type of statistic and that the level of measurement should be at least interval.
parametric
102
This is true only if we want to make ___ from our results about underlying populations
certain assumptions
103
* Used when the data do not satisfy the assumptions of the Pearson Correlation because they are not normally distributed or are only ordinal in nature
Non-Parametric Correlations
104
Non-Parametric Correlations (2)
Spearman Correlation Kendall Correlation
105
Two ways to find the Spearman Correlation.
calculate the Spearman correlation
106
steps of spearman correlation
1. Draw a scatterplot 2. Rank the data in each group separately 3. Find the difference between the ranks for each person. Which we call d. 4. use the Formula for Spearman
107
The first way to calculate the Spearman correlation is just to calculate a (Pearson) correlation, using the ___
ranked data.
108
The problem is that the Pearson Formula is a bit __, especially if a computer is not used.
fiddly
109
A simplification of the Pearson Formula is available, developed by Spearman, which works in the case where there are ___.
ranks
110
Find the __ for each person. Which we call d.
difference between the ranks
111
The ____ is on a scale, however it is not on a scale we understand
d-score total
112
We need to convert the scale into one that we do understand such as the ____, which goes from -1.00 to +1.00.
correlation scale
113
However, there is a slight complication because the formula as we have given it is only valid when there are ___
no ties in the data.
114
Three ways to deal with this problem:
1. Ignore it. It does not make a lot of difference. 2. Use the Pearson Formula on the ranks (although the calculation is harder than the Spearman formula). 3. Use a correction. (book suggests a site)
115
We calculate the significance of the Spearman in the same way as the significance of the __ Correlation.
Pearson
116
___– not at all straightforward or easy to calculate.
Confidence Intervals
117
If we use a non-parametric test, such as a __ correlation, we tend to lose power.
Spearman
118
By converting data to ___ information about the actual scores are thrown away
ranks
119
Although we could be strict and say that rating data are strictly measured at an ordinal level, in reality when there isn’t a problem with the distributions, we would always prefer to use a ___
Pearson Correlation.
120
A ___ correlation gives a better chance of a significant result
Pearson
121
A curious thing about the Spearman is ___
how to interpret it.
122
We can’t say that it is the +____, that is the relative difference in the SDs, because the SDs don’t really exist as there is not necessarily any relationship between the score and the SD.
standardized slope
123
We also can’t say that it is the ___ explained, because the variance is a ___ term, and we are using ranks
proportion of variance; parametric
124
All we can really say about the Spearman is that it is the ___
Pearson correlation between the ranks.
125
Non-Parametric Correlations
Spearman Rank Correlation Coefficient
126
Spearman Rank Correlation Coefficient - shows how closely the ___ are related.
ranked data
127
___ – alternative nonparametric correlation, which does have a more sensible interpretation. (advantage: meaningful interpretation)
Kendall’s Tau-a (τ – Greek Letter)
128
Very rarely used however.
Kendall’s Tau-a
129
Kendall’s Tau-a is rarely used for two reasons:
1. Difficult to calculate if you do not have a computer. 2. It is always lower than a spearman correlation, for the same data (but the p-values are always exactly the same).
130
Kendall’s Tau-a - Because people like their correlations to be ___, they tend to use it less.
high
131
The fact that two variables correlate does not mean there is a ___ relationship between them.
causal
132
Correlation does not mean causality, but ___ does mean correlation.
causality
133
Correlation when one variable is categorical
Chi-square test
134
In general, if one variable is a purely category-type measure, then correlation cannot be carried out, unless the variable is ___.
DICHOTOMOUS
135
* We can however use the Chi-square test since it is called a ____
test of association.
136
___ is also a measure of ___ between two variables
Correlation; association
137
What we can do with a nominal/categorical data is reduce the measured variable to ___ level and conduct a ___ test on the resulting frequency table.
nominal; chi-square
138
This is only possible, however, where you have gathered several ___ in each category.
cases
139
We can find the ___ for the measured data and record how many responses/frequencies were ___
overall mean; above and below this mean
140
*Typical variables that cannot be correlated (unless a rational attempt to order categories is made) are: marital status, ethnicity, place of residence, handedness, sexuality, degree subject and so on.__
*Typical variables
141
A lack of relationship is signified by a value __ _
close to zero.
142
A value of zero however could occur for a___
curvilinear relationship.
143
___ is a measure of the correlation.
Strength