Relationship between Variables: Correlation and Regression Flashcards by Arlyn Buligan

We are interested in finding a way to represent association between scores.

association

How well did you know this?

Not at all

Perfectly

The Regression Line

first and most obvious way to summarize data where we are examining the relationship between two variables.

Scatterplot

How well did you know this?

Not at all

Perfectly

The Regression Line

We put one variable on the x-axis and another on the y-axis, and we draw a point for each person showing their scores on the _____

two variables.

How well did you know this?

Not at all

Perfectly

The Regression Line

When we want to tell people about our results, we don’t have to draw a lot of _____

scatterplots

How well did you know this?

Not at all

Perfectly

Children were asked to listen to a word and repeat it. They were then asked which of these 3 words started with the same sound.

Initial phoneme detection

How well did you know this?

Not at all

Perfectly

reading score, a standard measure of reading ability.

British Ability Scale (BAS)

How well did you know this?

Not at all

Perfectly

We usually summarize and represent the relationship between two variables with a number

correlation coefficient

How well did you know this?

Not at all

Perfectly

We also calculate the ______ for this number, and we want to be able to find out if the relationship is statistically significant.

Thus, we want to know what is the _______ of finding a relationship at least this strong if the null hypothesis that there is no relationship in the population is true.

Confidence Intervals

probability

How well did you know this?

Not at all

Perfectly

a best fitting line used for prediction.

Line of best fit or Regression Line

How well did you know this?

Not at all

Perfectly

Predicting the_____ in Y as a function of the ______ in X.

variation

How well did you know this?

Not at all

Perfectly

how steep the line

slope

How well did you know this?

Not at all

Perfectly

the position or height of the line.

intercept

How well did you know this?

Not at all

Perfectly

By ____ we give the height at the point where the line hits the y-axis.

convention

How well did you know this?

Not at all

Perfectly

The height is called the ____or often just the_____. (or sometimes the constant)

y-intercept or intercept

How well did you know this?

Not at all

Perfectly

The intercept represents the expected score of a person who scored zero on the ______

x-axis variable.

How well did you know this?

Not at all

Perfectly

It is often the case that the intercept doesn’t make any sense. After all, no one usually scores____

scores 0 or close to 0.

How well did you know this?

Not at all

Perfectly

We can use the two values of______ to calculate the expected value of any person’s score on Y, given their score on X

slope and intercept

How well did you know this?

Not at all

Perfectly

formula for Expected Y score

Expected Y score = intercept + slope x (score on X)

How well did you know this?

Not at all

Perfectly

Where x is the x-axis variable. This equation is called the ______

regression equation.

How well did you know this?

Not at all

Perfectly

Making Sense of Regression Lines

thinking about the relationship between______ can be very useful.

two variables

How well did you know this?

Not at all

Perfectly

Making Sense of Regression Lines

We can make a____ about one score from the another score.

prediction

How well did you know this?

Not at all

Perfectly

Problem: if we don’t understand the scale(s), regression lines and equations are _____

meaningless

How well did you know this?

Not at all

Perfectly

When there is a relationship between two variables, we can _____ one from the other.

We can not say that one _____the other,

predict

explains

How well did you know this?

Not at all

Perfectly

The correlation coefficient

We need some way of making the scales have some sort of meaning, and the way to do this is to convert the data into _____

standard deviation units.

How well did you know this?

Not at all

Perfectly

Thus we could ask: “If the score on ___ is one SD higher, how many SDs higher would we expect the ____score to be?”

x y

Talking in terms of SDs means that we are talking about _____

standardized scores

Because we are talking about standardized regression slopes, we call it______

standardized slope.

Correlation coefficient – a more important name for the ______

standardized slope.

Where σx is the SD of the variable of the variable on the x -axis (the horizontal one) of the scatterplot, and σy is the SD of the variable on the y-axis (the vertical one), and r is the correlation.

The letter r actually stands for ______, but most people ignore that because it is confusing.

regression

if we know the slope we can calculate the correlation using the formula:

r = β x σx / σy

Residual In correlation, we want to know how well the ______line fits the data That is, how far away the points are from the line.

regression

The closer the points are to the ____ the stronger the relationship between the two variables. (how do we measure this?)

line

When we had one variable and we wanted to know the spread of the points around the mean, we calculated the____

SD (σ)

The square of the SD is the ____

variance

We can do the same thing with our regression data, but instead of making d the difference between the mean and the score, we can make it the difference between the value that we would expect the person to have, given their score on the x-variable, and the score they actually got. We can calculate their predicted scores, using:

y = b0 + b1x

for each person, we can therefore calculate their predicted BAS reading score, and the difference between their predicted score and their actual score. The difference is called_____

Residual.

the difference between the score they got and the score we thought they would get based on their initial score

residual score

if we want to calculate the equivalent of the variance, we need to ____ each person’s score.

square

The value of the standardized slope and the value of the square root of the proportion of variance explained will___ be the same value.

always

We therefore have two equivalent ways of thinking about correlation. The first way is the _____. It is the expected increase in one variable, when the other variable increases by 1 SD.

standardized slope

_We therefore have two equivalent ways of thinking about correlation. The second way is the ______. If you square a correlation, you get the ______ in one variable that is explained by the other variable

proportion of variance

Interpreting Correlations A correlation is both ____ and ____

descriptive and inferential statistics

We can find the probability estimate and we can also use it to describe the ____

strength of the relationship.

strength of relationship

magnitude

positive, negative, curvilinear etc.

direction

r = 0.1 = small correlation * r = 0.3 = medium correlation * r = 0.5 = large correlation Note that these only really apply in what __, called Social and Behavioral sciences.

cohen's effect size

Common mistake in interpreting correlations A correlation around 0.5 is a _____ * A correlation does not have to ____ 0.5 to be large. * If you have a correlation of r = 0.45, you have a correlation which is approximately ___ to a large correlation. * It’s not a ______ correlation just because it hasn’t quite reached 0.5

1. large correlation 2. exceed 3. equal 4. medium

calculating the correlation coefficient Also known as Pearson Product moment correlation

Pearson Correlation Coefficient

Pearson correlation coefficient developed by ____

karl pearson

_____ correlation and makes the same assumptions made by other _____ tests.

Parametric

pearson correlation coefficient is _______ data

Continuous and normally distributed data

the moment is the length from the fulcrum multiplied by the weight on the lever.

physics

the total moment is equal to the length from the center, multiplied by the weight.

seesaw analogy

The same principle applies with _____

correlation

We find the length from the center for each of the variables. In this case the center is the _____

mean

So, we calculate the difference between ______ and _____ for each variable (these are the moments) and then we multiply them together (this is the product).

the score and the mean

Because this value is____ on the number of people, we need to divide it by N.

dependent

And because it is related to the _____, we actually divide by N-1 This is called _____, and if we call the two variables x and y

standard deviation covariance

Finally, finding _____ is laborious, and we do not want to do it more than we have to.

square roots

So instead of finding the square roots and then multiplying them together, it is easier to ______ together, and then find the square root.

multiply the two values

importance scattergraph or plot: It will show us approximately what the correlation should be. So if it looks strong, ______, and our analysis shows it is -0.60. we have made a mistake.

positive correlation

importance of scattergraph or plot It will help us detect any____ in our data, for example data entry errors.

errors

importance of scattergraph or plot * It will help us get a feel of our ____

data

The_____ for a statistic tell us the likely range of a value in the population

confidence intervals

calculating confidence intervals Sampling distributions of correlation is _____

tricky.

calculating confidence interval It is not symmetrical, which means we can’t _____ or _____ CIs in the usual way.

add and subtract

calculating the pearson correlation transformation used which makes the distribution symmetrical.

* Fisher’s z transformation

calculating the pearson correlation Used to calculate the CIs and then transform back to _____

correlations

It is called a_____ because it makes the distribution of the correlation into a z distribution which is a normal distribution with a mean of 0 and SD of 1.

z transformation

step _ Carry out Fisher’s transformation.

step 1

step _ calculate the Standard Error

step 2

step __ And now the CIs. We use the formula * CI = z’ + or – zα/2 x se

step 3

Where zα/2 is the value for the ______ which includes the percentage of values that we want to cover.

normal distribution

the value for the 95% confidence is (as always) ____

1.96

Step _ Convert back to correlation.

step 4

If we really want to know the_____ then we can convert the value for r into a value for t.

p-value,

When we know the correlation we can also calculate the position of the _______

regression line.

We can use the two values _______ to create a regression equation which will allow us to predict y _____ from x ______.

1. slope and intercept 2. (display behavior) 3. (desirability)

If variables are both dichotomous (for example, yes/no, top, bottom) we can use the ____

Pearson correlation formula.

f one of your variables is continuous and the other is dichotomous we can use the ___

Point Biserial Formula

This is when one variable is categorical and has just two all-inclusive values. * Examples: Male/Female, Car owner/Non-Car owner, and so on.

Point Biserial Correlation

Non-Parametric Correlations * Used when the data do not satisfy the assumptions of the Pearson Correlation because they are not normally distributed or are only ordinal in nature.

Spearman Correlation Kendall Correlation

Three ways to deal with this problem:

Ignore it. It does not make a lot of difference. * Use the Pearson Formula on the ranks (although the calculation is harder than the Spearman formula). * Use a correction

If we use a non-parametric test, such as a _____ we tend to lose power.

Spearman correlation

Although we could be strict and say that rating data are strictly measured at an ordinal level, in reality when there isn’t a problem with the distributions, we would always prefer to use a ____

Pearson Correlation

A_____ gives a better chance of a significant result.

Pearson correlation

A curious thing about the _____ is how to interpret it.

Spearman

We can’t say that it is the_____, that is the relative difference in the SDs, because the SDs don’t really exist as there is not necessarily any relationship between the score and the SD.

standardized slope

We also can’t say that it is the ____ explained, because the variance is a parametric term, and we are using ranks.

proportion of variance

All we can really say about the Spearman is that it is the Pearson correlation between the ___

ranks

alternative nonparametric correlation, which does have a more sensible interpretation. (advantage: meaningful interpretation) Very rarely used however.

Kendall’s Tau-a (τ – Greek Letter)

Kendall’s Tau-a is rarely used for two reasons:

Difficult to calculate if you do not have a computer. * It is always lower than a spearman correlation, for the same data (but the p-values are always exactly the same). * Because people like their correlations to be high, they tend to use it less.

The fact that two variables correlate does not mean there is a causal relationship between them. * Though it is often very tempting to believe that there is.

Correlation and Causality

* Correlation does not mean____, but ____ does mean correlation

causality

In general, if one variable is a purely category-type measure, then correlation cannot be carried out, unless the variable is _____

DICHOTOMOUS

Correlation is also a measure of association between_____

two variables

What we can do with a nominal/categorical data is reduce the measured variable to nominal level and conduct a _____on the resulting frequency table.

chi-square test

A lack of relationship is signified by a value close to ___

zero

A value of zero however could occur for a ____

curvilinear relationship.

Strength is a measure of the____

correlation.

Relationship between Variables: Correlation and Regression Flashcards

(101 cards)