Relationship between Variables: Correlation and Regression Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

We are interested in finding a way to represent association between scores.

A

association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The Regression Line

first and most obvious way to summarize data where we are examining the relationship between two variables.

A

Scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The Regression Line

We put one variable on the x-axis and another on the y-axis, and we draw a point for each person showing their scores on the _____

A

two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The Regression Line

When we want to tell people about our results, we don’t have to draw a lot of _____

A

scatterplots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Children were asked to listen to a word and repeat it. They were then asked which of these 3 words started with the same sound.

A

X

Initial phoneme detection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

reading score, a standard measure of reading ability.

A

Y

British Ability Scale (BAS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We usually summarize and represent the relationship between two variables with a number

A

correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

We also calculate the ______ for this number, and we want to be able to find out if the relationship is statistically significant.

Thus, we want to know what is the _______ of finding a relationship at least this strong if the null hypothesis that there is no relationship in the population is true.

A

Confidence Intervals

probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a best fitting line used for prediction.

A

Line of best fit or Regression Line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Predicting the_____ in Y as a function of the ______ in X.

A

variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how steep the line

A

slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

the position or height of the line.

A

intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

By ____ we give the height at the point where the line hits the y-axis.

A

convention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The height is called the ____or often just the_____. (or sometimes the constant)

A

y-intercept or intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The intercept represents the expected score of a person who scored zero on the ______

A

x-axis variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

It is often the case that the intercept doesn’t make any sense. After all, no one usually scores____

A

scores 0 or close to 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

We can use the two values of______ to calculate the expected value of any person’s score on Y, given their score on X

A

slope and intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

formula for Expected Y score

A

Expected Y score = intercept + slope x (score on X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Where x is the x-axis variable. This equation is called the ______

A

regression equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Making Sense of Regression Lines

thinking about the relationship between______ can be very useful.

A

two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Making Sense of Regression Lines

We can make a____ about one score from the another score.

A

prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Problem: if we don’t understand the scale(s), regression lines and equations are _____

A

meaningless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When there is a relationship between two variables, we can _____ one from the other.

We can not say that one _____the other,

A

predict

explains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The correlation coefficient

We need some way of making the scales have some sort of meaning, and the way to do this is to convert the data into _____

A

standard deviation units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Thus we could ask: “If the score on ___ is one SD higher, how many SDs higher would we expect the ____score to be?”

A

x
y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Talking in terms of SDs means that we are talking about _____

A

standardized scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Because we are talking about standardized regression slopes, we call it______

A

standardized slope.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Correlation coefficient – a more important name for the ______

A

standardized slope.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Where σx is the SD of the variable of the variable on the x -axis (the horizontal one) of the scatterplot, and σy is the SD of the variable on the y-axis (the vertical one), and r is the correlation.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

The letter r actually stands for ______, but most people ignore that because it is confusing.

A

regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

if we know the slope we can calculate the correlation using the formula:

A

r = β x σx / σy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Residual

In correlation, we want to know how well the ______line fits the data

That is, how far away the points are from the line.

A

regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

The closer the points are to the ____ the stronger the relationship between the two variables. (how do we measure this?)

A

line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

When we had one variable and we wanted to know the spread of the points around the mean, we calculated the____

A

SD (σ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

The square of the SD is the ____

A

variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

We can do the same thing with our regression data, but instead of making d the difference between the mean and the score, we can make it the difference between the value that we would expect the person to have, given their score on the x-variable, and the score they actually got. We can calculate their predicted scores, using:

A

y = b0 + b1x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

for each person, we can therefore calculate their predicted BAS reading score, and the difference between their predicted score and their actual score. The difference is called_____

A

Residual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

the difference between the score they got and the score we thought they would get based on their initial score

A

residual score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

if we want to calculate the equivalent of the variance, we need to ____ each person’s score.

A

square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

The value of the standardized slope and the value of the square root of the proportion of variance explained will___ be the same value.

A

always

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

We therefore have two equivalent ways of thinking about correlation.

The first way is the _____. It is the expected increase in one variable, when the other variable increases by 1 SD.

A

standardized slope

42
Q

_We therefore have two equivalent ways of thinking about correlation.

The second way is the ______. If you square a correlation, you get the ______ in one variable that is explained by the other variable

A

proportion of variance

43
Q

Interpreting Correlations

A correlation is both ____ and ____

A

descriptive and inferential statistics

44
Q

We can find the probability estimate and we can also use it to describe the ____

A

strength of the relationship.

45
Q

strength of relationship

A

magnitude

46
Q

positive, negative, curvilinear etc.

A

direction

47
Q

r = 0.1 = small correlation
* r = 0.3 = medium correlation
* r = 0.5 = large correlation

Note that these only really apply in what __, called Social and Behavioral sciences.

A

cohen’s effect size

48
Q

Common mistake in interpreting correlations

A correlation around 0.5 is a _____

  • A correlation does not have to ____ 0.5 to be large.
  • If you have a correlation of r = 0.45, you have a correlation which is approximately ___ to a large correlation.
  • It’s not a ______ correlation just because it hasn’t quite reached 0.5
A
  1. large correlation
  2. exceed
  3. equal
  4. medium
49
Q

calculating the correlation coefficient

Also known as Pearson Product moment correlation

A

Pearson Correlation Coefficient

50
Q

Pearson correlation coefficient developed by ____

A

karl pearson

51
Q

_____ correlation and makes the same assumptions made by other _____ tests.

A

Parametric

52
Q

pearson correlation coefficient is _______ data

A

Continuous and normally distributed data

53
Q

the moment is the length from the fulcrum multiplied by the weight on the lever.

A

physics

54
Q

the total moment is equal to the length from the center, multiplied by the weight.

A

seesaw analogy

55
Q

The same principle applies with _____

A

correlation

56
Q

We find the length from the center for each of the variables. In this case the center is the _____

A

mean

57
Q

So, we calculate the difference between ______ and _____ for each variable (these are the moments) and then we multiply them together (this is the product).

A

the score and the mean

58
Q

Because this value is____ on the number of people, we need to divide it by N.

A

dependent

59
Q

And because it is related to the _____, we actually divide by N-1

This is called _____, and if we call the two variables x and y

A

standard deviation

covariance

60
Q

Finally, finding _____ is laborious, and we do not want to do it more than we have to.

A

square roots

61
Q

So instead of finding the square roots and then multiplying them together, it is easier to ______ together, and then find the square root.

A

multiply the two values

62
Q

importance scattergraph or plot:

It will show us approximately what the correlation should be. So if it looks strong, ______, and our analysis shows it is -0.60. we have made a mistake.

A

positive correlation

63
Q

importance of scattergraph or plot

It will help us detect any____ in our data, for example data entry errors.

A

errors

64
Q

importance of scattergraph or plot

  • It will help us get a feel of our ____
A

data

65
Q

The_____ for a statistic tell us the likely range of a value in the population

A

confidence intervals

66
Q

calculating confidence intervals

Sampling distributions of correlation is _____

A

tricky.

67
Q

calculating confidence interval

It is not symmetrical, which means we can’t _____ or _____ CIs in the usual way.

A

add and subtract

67
Q

calculating the pearson correlation

transformation used which makes the distribution symmetrical.

A
  • Fisher’s z transformation
68
Q

calculating the pearson correlation

Used to calculate the CIs and then transform back to _____

A

correlations

68
Q

It is called a_____ because it makes the distribution of the correlation into a z distribution which is a normal distribution with a mean of 0 and SD of 1.

A

z transformation

69
Q

step _

Carry out Fisher’s transformation.

A

step 1

70
Q

step _
calculate the Standard Error

A

step 2

71
Q

step __

And now the CIs. We use the formula
* CI = z’ + or – zα/2 x se

A

step 3

72
Q

Where zα/2 is the value for the ______ which includes the percentage of values that we want to cover.

A

normal distribution

73
Q

the value for the 95% confidence is (as always) ____

A

1.96

74
Q

Step _

Convert back to correlation.

A

step 4

75
Q

If we really want to know the_____ then we can convert the value for r into a value for t.

A

p-value,

76
Q

When we know the correlation we can also calculate the position of the _______

A

regression line.

77
Q

We can use the two values _______ to create a regression equation which will allow us to predict y _____ from x ______.

A
  1. slope and intercept
  2. (display behavior)
  3. (desirability)
78
Q

If variables are both dichotomous (for example, yes/no, top, bottom) we can use the ____

A

Pearson correlation formula.

78
Q

f one of your variables is continuous and the other is dichotomous we can use the ___

A

Point Biserial Formula

79
Q

This is when one variable is categorical and has just two all-inclusive values.
* Examples: Male/Female, Car owner/Non-Car owner, and so on.

A

Point Biserial Correlation

80
Q

Non-Parametric Correlations
* Used when the data do not satisfy the assumptions of the Pearson Correlation because they are not normally distributed or are only ordinal in nature.

A

Spearman Correlation Kendall Correlation

81
Q

Three ways to deal with this problem:

A

Ignore it.
It does not make a lot of difference.

  • Use the Pearson Formula on the ranks (although the calculation is harder than the Spearman formula).
  • Use a correction
82
Q

If we use a non-parametric test, such as a _____ we tend to lose power.

A

Spearman correlation

83
Q

Although we could be strict and say that rating data are strictly measured at an ordinal level, in reality when there isn’t a problem with the distributions, we would always prefer to use a ____

A

Pearson Correlation

84
Q

A_____ gives a better chance of a significant result.

A

Pearson correlation

85
Q

A curious thing about the _____ is how to interpret it.

A

Spearman

86
Q

We can’t say that it is the_____, that is the relative difference in the SDs, because the SDs don’t really exist as there is not necessarily any relationship between the score and the SD.

A

standardized slope

87
Q

We also can’t say that it is the ____ explained, because the variance is a parametric term, and we are using ranks.

A

proportion of variance

88
Q

All we can really say about the Spearman is that it is the Pearson correlation between the ___

A

ranks

89
Q

alternative nonparametric correlation, which does have a more sensible interpretation. (advantage: meaningful interpretation)

Very rarely used however.

A

Kendall’s Tau-a (τ – Greek Letter)

90
Q

Kendall’s Tau-a is rarely used for two reasons:

A

Difficult to calculate if you do not have a computer.
* It is always lower than a spearman correlation, for the same data (but the p-values are always exactly the same).
* Because people like their correlations to be high, they tend to use it less.

91
Q

The fact that two variables correlate does not mean there is a causal relationship between them.
* Though it is often very tempting to believe that there is.

A

Correlation and Causality

92
Q
  • Correlation does not mean____, but ____ does mean correlation
A

causality

93
Q

In general, if one variable is a purely category-type measure, then correlation cannot be carried out, unless the variable is _____

A

DICHOTOMOUS

94
Q

Correlation is also a measure of association between_____

A

two variables

95
Q

What we can do with a nominal/categorical data is reduce the measured variable to nominal level and conduct a _____on the resulting frequency table.

A

chi-square test

96
Q

A lack of relationship is signified by a value close to ___

A

zero

97
Q

A value of zero however could occur for a ____

A

curvilinear relationship.

98
Q

Strength is a measure of the____

A

correlation.