[L8] Relationship between Variables Correlation and Regression Flashcards by Arellano, Miella Janica

We are interested in finding a way to represent ___
between scores.

association

How well did you know this?

Not at all

Perfectly

Types of Correlation

Bivariate; Multivariate Correlation

How well did you know this?

Not at all

Perfectly

Correlation does not prove __

Causality

How well did you know this?

Not at all

Perfectly

Multivariate Correlation have more ____ Validity

Ecological

How well did you know this?

Not at all

Perfectly

IGT & RMT = test of difference
Correlation = test of _
_

correlation/association

How well did you know this?

Not at all

Perfectly

___ – first and most obvious way to summarize
data where we are examining the relationship between
two variables

Scatterplot

How well did you know this?

Not at all

Perfectly

We put one variable on the x-axis and another on the yaxis,
and we ___for each person showing their
scores on the two variables.

draw a point

How well did you know this?

Not at all

Perfectly

test of correlation involved administering ___ tests in the same group of participants

2 or more different

How well did you know this?

Not at all

Perfectly

When we want to tell people about our results, we ____

don’t
have to draw a lot of scatterplots.

How well did you know this?

Not at all

Perfectly

__
_
_Children were asked to listen
to a word and repeat it. They were then asked which of
these 3 words started with the same sound.

Initial phoneme detection.

How well did you know this?

Not at all

Perfectly

____reading score, a standard
measure of reading ability.

British Ability Scale (BAS)

How well did you know this?

Not at all

Perfectly

We usually summarize and represent the relationship
between two variables with a ___
__
_
_

number (correlation
coefficient).

How well did you know this?

Not at all

Perfectly

We also calculate the ____ for this
number, and we want to be able to find out if the
relationship is ___

Confidence Intervals; statistically significant

How well did you know this?

Not at all

Perfectly

Thus, we want to know what is the probability of finding
a relationship at least this strong if the ____ that
there is no relationship in the population is true.

null hypothesis

How well did you know this?

Not at all

Perfectly

– a best fitting line
used for prediction

Line of best fit or Regression Line

How well did you know this?

Not at all

Perfectly

Predicting the variation in Y as a __
_

function of the variation
in X.

How well did you know this?

Not at all

Perfectly

– how steep the line
*

Slope

How well did you know this?

Not at all

Perfectly

___ – the position or height of the line.

Intercept

How well did you know this?

Not at all

Perfectly

By convention we give the height at the point where the
line ___

hits the y-axis.

How well did you know this?

Not at all

Perfectly

The
height is called the____or often just the
intercept

y-intercept ; (or sometimes the constant)

How well did you know this?

Not at all

Perfectly

The intercept represents the ___of a person
who scored _
_ on the x-axis variable.

expected score ; zero

How well did you know this?

Not at all

Perfectly

y=b0+b1X

regression expression, predicting behavior of y as function of x

useful for raw scores

How well did you know this?

Not at all

Perfectly

It is often the case that the intercept __. After all, __no one_usually scores ___

doesn’t make any
sense; 0 or close to 0.

How well did you know this?

Not at all

Perfectly

We can use the ___of slope and __ to
calculate the expected value of any person’s score on Y,
given their score on X.

two values, intercept

How well did you know this?

Not at all

Perfectly

y = β0 + β1x (sometimes it is y = a + bx or y = mx + c) Where x is the x-axis variable. This equation is called the ___

regression equation.

We can make a _ __ about one score from the another score

prediction

Problem: if we don’t understand the ___, regression lines and equations are ___.

scale(s), meaningless

thinking about the relationship between two variables can be very useful

Making Sense of Regression Lines

When there is a relationship between two variables, we can ___ one from the other.

predict

We can not say that one __ the other,

explains

We need some way of making the scales have some sort of meaning, and the way to do this is to __ the data into __

convert; standard deviation units.

Talking in terms of SDs means that we are talking about _ __

standardized scores.

Because we are talking about standardized regression slopes, we call it "___

standardized slope.

___ – a more important name for the standardized slope.

Correlation coefficient

In order to convert the units, we need to know the ___

SD of each of the measures.

If we know the ___, we can calculate the correlation using the formula: r = β x σx / σy

slope

The letter r actually stands for ___, but most people ignore that because it is confusing

regression

Thus, if we know the _ __ we can calculate the correlation

slope

3 ways to calculate for the correlation coefficient"r"

1. regression line 2. standardized slope 3. proportion of variance

In correlation, we want to know how well the regression line ___

fits the data.

That is, how ___the points are from the line.

far away

The __ the points are to the line, the stronger the relationship between the two variables.

closer

When we had one variable and we wanted to know the spread of the points around the mean, we calculated the _ _

SD (σ).

The square of the SD is the _ __.

variance

We can do the same thing with our regression data, but instead of making d the difference between the mean and the score, we can make it the difference between the value that we would expect the person to have, given their score on the x-variable, and the score they actually got. We can calculate their ___

predicted scores,

the difference between their predicted score and their actual score. The difference is called _--.

Residual

Their ____ (the difference between the score they got and the score we thought they would get based on their initial phoneme score)

residual score

if we want to calculate the equivalent of the variance, we need to ___ each person’s score

square

___ = d squared

Residual squared

The value of the standardized slope and the value of the square root of the proportion of variance explained will ___ be the same value.

always

We therefore have ___of thinking about correlation.

two equivalent ways

The first way is the ___ It is the expected increase in one variable, when the other variable increases by 1 SD.

standardized slope.

The second way is the __ __ If you square a correlation, you get the proportion of variance in one variable that is explained by the other variable.

proportion of variance.

A correlation is both ___statistics.

descriptive and inferential

We can find the ____and we can also use it to describe the ___

probability estimate ; strength of the relationship

* __ – strength of relationship _

Magnitude

___ – positive, negative, curvilinear etc.

Direction

Cohen’s effect size:

* r = 0.1 = small correlation * r = 0.3 = medium correlation * r = 0.5 = large correlation

Note that these only really apply in what Cohen, called ___

Social and Behavioral sciences.

Common mistake

* A correlation around 0.5 is a large correlation. * A correlation does not have to exceed 0.5 to be large. * If you have a correlation of r = 0.45, you have a correlation which is approximately equal to a large correlation. * It’s not a medium correlation just because it hasn’t quite reached 0.5

Pearson Correlation Coefficient * Also known as ___

Pearson Product moment correlation.

Pearson Product moment correlation developed by

Karl Pearson

Pearson Correlation Coefficient is a _____ and makes the ___ made by other parametric tests.

Parametric correlation; same assumptions

level of measurement for Pearson Correlation Coefficient

Continuous and normally distributed data

to determine r

1. standardized slope 2. proportion of variance 3. pearson product moment correlation

Optional Extra: Product Moments * ___: the moment is the __ from the fulcrum multiplied by the weight on the lever.

Physics; length

___ the total moment is equal to the length from the center, multiplied by the weight. The same principle applies with ___.

Seesaw analogy: correlation

The same principle applies with correlation: needs to be balanced (raw to standard score) to be _- _

comparable

* We find the _ __ for each of the variables. In this case the center is the __.

length from the center; mean

So, we calculate the difference between the score and the mean for each variable (these are the ___) and then we multiply them together (this is the ___).

moments; product

Because this value is dependent on the ___ we need to divide it by N.

number of people,

And because it is related to the ___, we actually divide by N-1.

standard deviation

This is called ___, and if we call the two variables x and y,

covariance

Just as before, we need to __ this value by dividing by the ___

standardize; standard deviations.

Calculating the Correlation Coefficient: we need to divide by ___, so we multiply them together

both SDs

So instead of finding the square roots and then multiplying them together, it is easier to multiply the two values together, and then find the ____

square root.

Importance scattergraph or plot: *

It will show us approximately what the correlation should be. It will help us detect any errors in our data, for example data entry errors. It will help us get a feel of our data.

The confidence intervals for a statistic tell us the likely ___

range of a value in the population.

Sampling distributions of correlation is ___. It is not __ _, which means we can’t add and subtract CIs in the usual way.

tricky; symmetrical

___transformation used which makes the distribution symmetrical.

Fisher’s z transformation –

Used to calculate the CIs and then transform back to correlations.

Fisher’s z transformation –

It is called a ____, because it makes the distribution of the correlation into a z distribution which is a normal distribution with a mean of 0 and SD of 1.

z transformation

There are ___ to find the p-value associated with a correlation.

2 ways

Calculating the p-value

1. Use table in Appendix 3.

If we really want to know the p-value, then we can convert the value for r into a ___

value for t.

We can use this t-value to obtain the __using a __

exact p-value, computer program

When we know the __ we can also calculate the ___ of the regression line

correlation; position

We can use the two values ___ to create a regression equation which will allow us to predict y (display behavior); (). from x desirability

(slope and intercept);

We can use the ___ to draw a graph with the line of best fit on it

predictions

we have extended the line to __ – we would not normally do this.

zero

If variables are both dichotomous (for example, yes/no, top, bottom) we can use the ___

Pearson correlation formula.

Dichotomous Variables - A much easier way is to calculate the value of__ and then use the ___, which will give the same answer as using the r correlation.

chi-square; phi ( ф ) correlation formula

The p-value of the correlation will be the same as the pvalue for the ___because the two tests are just different ways of thinking about the same thing.

chi-square test

If one of your variables is continuous and the other is dichotomous we can use the ___

Point Biserial Formula:

*These formulae give exactly the same answer as the __ _, but they just easier to use.

Pearson Formula

On special occasions we can correlate using a __ This is when one variable is categorical and has just two all-inclusive values.

dichotomous variable.

Here, we may give an ___according to membership of the categories, e.g. 0 for female and 1 for male.

arbitrary value

Dichotomous Variables- We then proceed with the ___ as usual.

Pearson Correlation

The Point Biserial is written as

rpb.

This value can be turned into an ordinary___

t-value.

This may sound like a cheat because the Pearson’s was a ____ type of statistic and that the level of measurement should be at least interval.

parametric

This is true only if we want to make ___ from our results about underlying populations

certain assumptions

* Used when the data do not satisfy the assumptions of the Pearson Correlation because they are not normally distributed or are only ordinal in nature

Non-Parametric Correlations

Non-Parametric Correlations (2)

Spearman Correlation Kendall Correlation

Two ways to find the Spearman Correlation.

calculate the Spearman correlation

steps of spearman correlation

1. Draw a scatterplot 2. Rank the data in each group separately 3. Find the difference between the ranks for each person. Which we call d. 4. use the Formula for Spearman

The first way to calculate the Spearman correlation is just to calculate a (Pearson) correlation, using the ___

ranked data.

The problem is that the Pearson Formula is a bit __, especially if a computer is not used.

fiddly

A simplification of the Pearson Formula is available, developed by Spearman, which works in the case where there are ___.

ranks

Find the __ for each person. Which we call d.

difference between the ranks

The ____ is on a scale, however it is not on a scale we understand

d-score total

We need to convert the scale into one that we do understand such as the ____, which goes from -1.00 to +1.00.

correlation scale

However, there is a slight complication because the formula as we have given it is only valid when there are ___

no ties in the data.

Three ways to deal with this problem:

1. Ignore it. It does not make a lot of difference. 2. Use the Pearson Formula on the ranks (although the calculation is harder than the Spearman formula). 3. Use a correction. (book suggests a site)

We calculate the significance of the Spearman in the same way as the significance of the __ Correlation.

Pearson

___– not at all straightforward or easy to calculate.

Confidence Intervals

If we use a non-parametric test, such as a __ correlation, we tend to lose power.

Spearman

By converting data to ___ information about the actual scores are thrown away

ranks

Although we could be strict and say that rating data are strictly measured at an ordinal level, in reality when there isn’t a problem with the distributions, we would always prefer to use a ___

Pearson Correlation.

A ___ correlation gives a better chance of a significant result

Pearson

A curious thing about the Spearman is ___

how to interpret it.

We can’t say that it is the +____, that is the relative difference in the SDs, because the SDs don’t really exist as there is not necessarily any relationship between the score and the SD.

standardized slope

We also can’t say that it is the ___ explained, because the variance is a ___ term, and we are using ranks

proportion of variance; parametric

All we can really say about the Spearman is that it is the ___

Pearson correlation between the ranks.

Non-Parametric Correlations

Spearman Rank Correlation Coefficient

Spearman Rank Correlation Coefficient - shows how closely the ___ are related.

ranked data

___ – alternative nonparametric correlation, which does have a more sensible interpretation. (advantage: meaningful interpretation)

Kendall’s Tau-a (τ – Greek Letter)

Very rarely used however.

Kendall’s Tau-a

Kendall’s Tau-a is rarely used for two reasons:

1. Difficult to calculate if you do not have a computer. 2. It is always lower than a spearman correlation, for the same data (but the p-values are always exactly the same).

Kendall’s Tau-a - Because people like their correlations to be ___, they tend to use it less.

high

The fact that two variables correlate does not mean there is a ___ relationship between them.

causal

Correlation does not mean causality, but ___ does mean correlation.

causality

Correlation when one variable is categorical

Chi-square test

In general, if one variable is a purely category-type measure, then correlation cannot be carried out, unless the variable is ___.

DICHOTOMOUS

* We can however use the Chi-square test since it is called a ____

test of association.

___ is also a measure of ___ between two variables

Correlation; association

What we can do with a nominal/categorical data is reduce the measured variable to ___ level and conduct a ___ test on the resulting frequency table.

nominal; chi-square

This is only possible, however, where you have gathered several ___ in each category.

cases

We can find the ___ for the measured data and record how many responses/frequencies were ___

overall mean; above and below this mean

*Typical variables that cannot be correlated (unless a rational attempt to order categories is made) are: marital status, ethnicity, place of residence, handedness, sexuality, degree subject and so on.__

*Typical variables

A lack of relationship is signified by a value __ _

close to zero.

A value of zero however could occur for a___

curvilinear relationship.

___ is a measure of the correlation.

Strength

[L8] Relationship between Variables Correlation and Regression Flashcards

(143 cards)