[L8] Relationship between Variables Correlation and Regression Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

We are interested in finding a way to represent ___
between scores.

A

association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of Correlation

A

Bivariate; Multivariate Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Correlation does not prove __

A

Causality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Multivariate Correlation have more ____ Validity

A

Ecological

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

IGT & RMT = test of difference
Correlation = test of _
_

_

A

correlation/association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

___ – first and most obvious way to summarize
data where we are examining the relationship between
two variables

A

Scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We put one variable on the x-axis and another on the yaxis,
and we ___for each person showing their
scores on the two variables.

A

draw a point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

test of correlation involved administering ___ tests in the same group of participants

A

2 or more different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When we want to tell people about our results, we ____

A

don’t
have to draw a lot of scatterplots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

__
_
_Children were asked to listen
to a word and repeat it. They were then asked which of
these 3 words started with the same sound.

A

Initial phoneme detection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

____reading score, a standard
measure of reading ability.

A

British Ability Scale (BAS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

We usually summarize and represent the relationship
between two variables with a ___
__
_
_

A

number (correlation
coefficient).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

We also calculate the ____ for this
number, and we want to be able to find out if the
relationship is ___

A

Confidence Intervals; statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Thus, we want to know what is the probability of finding
a relationship at least this strong if the ____ that
there is no relationship in the population is true.

A

null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

– a best fitting line
used for prediction

A

Line of best fit or Regression Line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Predicting the variation in Y as a __
_

A

function of the variation
in X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

– how steep the line
*

A

Slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

___ – the position or height of the line.

A

Intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

By convention we give the height at the point where the
line ___

A

hits the y-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The
height is called the____or often just the
intercept

A

y-intercept ; (or sometimes the constant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The intercept represents the ___of a person
who scored _
_ on the x-axis variable.

A

expected score ; zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

y=b0+b1X

A

regression expression, predicting behavior of y as function of x

useful for raw scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

It is often the case that the intercept __. After all, __no one_usually scores ___

A

doesn’t make any
sense; 0 or close to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

We can use the ___of slope and __ to
calculate the expected value of any person’s score on Y,
given their score on X.

A

two values, intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

y = β0 + β1x (sometimes it is y = a + bx or y = mx + c)
Where x is the x-axis variable. This equation is called the
___

A

regression equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

We can make a _
__ about one score from the
another score

A

prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Problem: if we don’t understand the ___, regression
lines and equations are ___.

A

scale(s), meaningless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

thinking about the relationship between two variables can
be very useful

A

Making Sense of Regression Lines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

When there is a relationship between two variables, we
can ___ one from the other.

A

predict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

We can not say that one __ the other,

A

explains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

We need some way of making the scales have some sort
of meaning, and the way to do this is to
__ the data
into __

A

convert; standard deviation units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Talking in terms of SDs means that we are talking about
_
__

A

standardized scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Because we are talking about standardized regression
slopes, we call it “___

A

standardized slope.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

___ – a more important name for the
standardized slope.

A

Correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

In order to convert the units, we need to know the ___

A

SD of
each of the measures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

If we know the ___, we can calculate the correlation
using the formula: r = β x σx / σy

A

slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

The letter r actually stands for ___, but most people
ignore that because it is confusing

A

regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Thus, if we know the _
__ we can calculate the correlation

A

slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

3 ways to calculate for the correlation coefficient”r”

A
  1. regression line
  2. standardized slope
  3. proportion of variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

In correlation, we want to know how well the regression
line ___

A

fits the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

That is, how
___the points are from the line.

A

far away

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

The __ the points are to the line, the stronger the
relationship between the two variables.

A

closer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

When we had one variable and we wanted to know the
spread of the points around the mean, we calculated the
_
_

A

SD (σ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

The square of the SD is the _
__.

A

variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

We can do the same thing with our regression data, but
instead of making d the difference between the mean and
the score, we can make it the difference between the value
that we would expect the person to have, given their score
on the x-variable, and the score they actually got. We can
calculate their ___

A

predicted scores,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

the difference between their
predicted score and their actual score. The difference is
called
_–.

A

Residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Their ____ (the difference between the score they
got and the score we thought they would get based on
their initial phoneme score)

A

residual score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

if we want to calculate the equivalent of the
variance, we need to ___ each person’s score

A

square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

___ = d squared

A

Residual squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

The value of the standardized slope and the value of the
square root of the proportion of variance explained will
___ be the same value.

A

always

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

We therefore have ___of thinking about
correlation.

A

two equivalent ways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

The first way is the ___
It is the expected
increase in one variable, when the other variable increases
by 1 SD.

A

standardized slope.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

The second way is the __
__ If you
square a correlation, you get the proportion of variance in
one variable that is explained by the other variable.

A

proportion of variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

A correlation is both ___statistics.

A

descriptive and inferential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

We can find the
____and we can also use
it to describe the ___

A

probability estimate ; strength of the relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q
  • __ – strength of relationship
    _
A

Magnitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

___ – positive, negative, curvilinear etc.

A

Direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Cohen’s effect size:

A
  • r = 0.1 = small correlation
  • r = 0.3 = medium correlation
  • r = 0.5 = large correlation
59
Q

Note that these only really apply in what Cohen, called
___

A

Social and Behavioral sciences.

60
Q

Common mistake

A
  • A correlation around 0.5 is a large correlation.
  • A correlation does not have to exceed 0.5 to be large.
  • If you have a correlation of r = 0.45, you have a
    correlation which is approximately equal to a large
    correlation.
  • It’s not a medium correlation just because it hasn’t quite
    reached 0.5
61
Q

Pearson Correlation Coefficient
* Also known as ___

A

Pearson Product moment correlation.

62
Q

Pearson Product moment correlation developed by

A

Karl Pearson

63
Q

Pearson Correlation Coefficient
is a _____ and makes the ___
made by other parametric tests.

A

Parametric correlation; same assumptions

64
Q

level of measurement for Pearson Correlation Coefficient

A

Continuous and normally distributed data

65
Q

to determine r

A
  1. standardized slope
  2. proportion of variance
  3. pearson product moment correlation
66
Q

Optional Extra: Product Moments
* ___: the moment is the __ from the fulcrum
multiplied by the weight on the lever.

A

Physics; length

67
Q

___ the total moment is equal to the length
from the center, multiplied by the weight. The same principle applies with ___.

A

Seesaw analogy: correlation

68
Q

The same principle applies with correlation: needs to be balanced (raw to standard score) to be
_-
_

A

comparable

69
Q
  • We find the _
    __ for each of the
    variables. In this case the center is the
    __.
A

length from the center; mean

70
Q

So, we calculate the difference between the score and the
mean for each variable (these are the ___) and then
we multiply them together (this is the ___).

A

moments; product

71
Q

Because this value is dependent on the ___
we need to divide it by N.

A

number of people,

72
Q

And because it is related to the ___, we
actually divide by N-1.

A

standard deviation

73
Q

This is called ___, and if we call the two variables
x and y,

A

covariance

74
Q

Just as before, we need to __ this value by
dividing by the ___

A

standardize; standard deviations.

75
Q

Calculating the Correlation
Coefficient:

we need to divide by ___, so we
multiply them together

A

both SDs

76
Q

So instead of finding the square roots and then
multiplying them together, it is easier to multiply the two
values together, and then find the ____

A

square root.

77
Q

Importance scattergraph or plot:
*

A

It will show us approximately what the correlation should
be.

It will help us detect any errors in our data, for example
data entry errors.

It will help us get a feel of our data.

78
Q

The confidence intervals for a statistic tell us the likely
___

A

range of a value in the population.

79
Q

Sampling distributions of correlation is ___. It is not __
_, which means we can’t add and
subtract CIs in the usual way.

A

tricky; symmetrical

80
Q

___transformation used which
makes the distribution symmetrical.

A

Fisher’s z transformation –

81
Q

Used to calculate the CIs and then transform back to
correlations.

A

Fisher’s z transformation –

82
Q

It is called a ____, because it makes the
distribution of the correlation into a z distribution which
is a normal distribution with a mean of 0 and SD of 1.

A

z transformation

83
Q

There are ___ to find the p-value associated with a
correlation.

A

2 ways

84
Q

Calculating the p-value

A
  1. Use table in Appendix 3.
85
Q

If we really want to know the p-value, then we can
convert the value for r into a ___

A

value for t.

86
Q

We can use this t-value to obtain the __using
a __

A

exact p-value, computer program

87
Q

When we know the __ we can also calculate the
___ of the regression line

A

correlation; position

88
Q

We can use the two values ___ to create
a regression equation which will allow us to predict y
(display behavior); (). from x desirability

A

(slope and intercept);

89
Q

We can use the
___ to draw a graph with the line
of best fit on it

A

predictions

90
Q

we have extended the line to __ – we would not
normally do this.

A

zero

91
Q

If variables are both dichotomous (for example, yes/no,
top, bottom) we can use the ___

A

Pearson correlation formula.

92
Q

Dichotomous Variables - A much easier way is to calculate the value of__
and then use the ___, which will
give the same answer as using the r correlation.

A

chi-square; phi ( ф ) correlation formula

93
Q

The p-value of the correlation will be the same as the pvalue
for the ___because the two tests are just
different ways of thinking about the same thing.

A

chi-square test

94
Q

If one of your variables is continuous and the other is
dichotomous we can use the ___

A

Point Biserial Formula:

95
Q

*These formulae give exactly the same answer as the
__
_, but they just easier to use.

A

Pearson Formula

96
Q

On special occasions we can correlate using a
__ This is when one variable is categorical and has just two
all-inclusive values.

A

dichotomous variable.

97
Q

Here, we may give an ___according to
membership of the categories, e.g. 0 for female and 1 for
male.

A

arbitrary value

98
Q

Dichotomous Variables- We then proceed with the ___ as usual.

A

Pearson Correlation

99
Q

The Point Biserial is written as

A

rpb.

100
Q

This value can be turned into an ordinary___

A

t-value.

101
Q

This may sound like a cheat because the Pearson’s was a
____ type of statistic and that the level of
measurement should be at least interval.

A

parametric

102
Q

This is true only if we want to make ___
from our results about underlying populations

A

certain assumptions

103
Q
  • Used when the data do not satisfy the assumptions of the
    Pearson Correlation because they are not normally
    distributed or are only ordinal in nature
A

Non-Parametric Correlations

104
Q

Non-Parametric Correlations (2)

A

Spearman Correlation
Kendall Correlation

105
Q

Two ways to find the Spearman Correlation.

A

calculate the Spearman correlation

106
Q

steps of spearman correlation

A
  1. Draw a scatterplot
  2. Rank the data in each group separately
  3. Find the difference between the ranks for each
    person. Which we call d.
  4. use the Formula for Spearman
107
Q

The first way to calculate the Spearman correlation is just
to calculate a (Pearson) correlation, using the ___

A

ranked data.

108
Q

The problem is that the Pearson Formula is a bit __,
especially if a computer is not used.

A

fiddly

109
Q

A simplification of the Pearson Formula is available,
developed by Spearman, which works in the case where
there are ___.

A

ranks

110
Q

Find the __ for each
person. Which we call d.

A

difference between the ranks

111
Q

The ____ is on a scale, however it is not on a scale
we understand

A

d-score total

112
Q

We need to convert the scale into one that we do
understand such as the
____, which goes from
-1.00 to +1.00.

A

correlation scale

113
Q

However, there is a slight complication because the
formula as we have given it is only valid when there are
___

A

no ties in the data.

114
Q

Three ways to deal with this problem:

A
  1. Ignore it. It does not make a lot of difference.
  2. Use the Pearson Formula on the ranks (although the
    calculation is harder than the Spearman formula).
  3. Use a correction. (book suggests a site)
115
Q

We calculate the significance of the Spearman in the same
way as the significance of the __ Correlation.

A

Pearson

116
Q

___– not at all straightforward or easy to
calculate.

A

Confidence Intervals

117
Q

If we use a non-parametric test, such as a __
correlation, we tend to lose power.

A

Spearman

118
Q

By converting data to ___ information about the actual
scores are thrown away

A

ranks

119
Q

Although we could be strict and say that rating data are
strictly measured at an ordinal level, in reality when there
isn’t a problem with the distributions, we would always
prefer to use a ___

A

Pearson Correlation.

120
Q

A ___ correlation gives a better chance of a
significant result

A

Pearson

121
Q

A curious thing about the Spearman is ___

A

how to interpret it.

122
Q

We can’t say that it is the +____, that is the
relative difference in the SDs, because the SDs don’t
really exist as there is not necessarily any relationship
between the score and the SD.

A

standardized slope

123
Q

We also can’t say that it is the ___
explained, because the variance is a ___ term, and
we are using ranks

A

proportion of variance; parametric

124
Q

All we can really say about the Spearman is that it is the

___

A

Pearson correlation between the ranks.

125
Q

Non-Parametric Correlations

A

Spearman Rank Correlation Coefficient

126
Q

Spearman Rank Correlation Coefficient - shows how closely the ___ are related.

A

ranked data

127
Q

___ – alternative nonparametric
correlation, which does have a more sensible
interpretation. (advantage: meaningful interpretation)

A

Kendall’s Tau-a (τ – Greek Letter)

128
Q

Very rarely used however.

A

Kendall’s Tau-a

129
Q

Kendall’s Tau-a is rarely used for two reasons:

A
  1. Difficult to calculate if you do not have a computer.
  2. It is always lower than a spearman correlation, for the
    same data (but the p-values are always exactly the same).
130
Q

Kendall’s Tau-a - Because people like their correlations to be ___, they
tend to use it less.

A

high

131
Q

The fact that two variables correlate does not mean there
is a ___ relationship between them.

A

causal

132
Q

Correlation does not mean causality, but ___ does
mean correlation.

A

causality

133
Q

Correlation when one variable is categorical

A

Chi-square test

134
Q

In general, if one variable is a purely category-type
measure, then correlation cannot be carried out, unless
the variable is ___.

A

DICHOTOMOUS

135
Q
  • We can however use the Chi-square test since it is called
    a ____
A

test of association.

136
Q

___ is also a measure of ___ between two
variables

A

Correlation; association

137
Q

What we can do with a nominal/categorical data is
reduce the measured variable to ___ level and
conduct a ___ test on the resulting frequency
table.

A

nominal; chi-square

138
Q

This is only possible, however, where you have gathered
several ___ in each category.

A

cases

139
Q

We can find the ___ for the measured data and
record how many responses/frequencies were ___

A

overall mean; above and
below this mean

140
Q

*Typical variables that cannot be correlated (unless a
rational attempt to order categories is made) are: marital
status, ethnicity, place of residence, handedness,
sexuality, degree subject and so on.__

A

*Typical variables

141
Q

A lack of relationship is signified by a value __
_

A

close to
zero.

142
Q

A value of zero however could occur for a___

A

curvilinear
relationship.

143
Q

___ is a measure of the correlation.

A

Strength