Term 2: Lecture 8 Correlations and Linear Regression Flashcards by Harriet Housby

Relationships between variables: Whats the difference between

Association and correlation
what are they?
what levels of data are they appropriate for?

Association:
• When two variables are related to one another, in the sense that they vary together
• appropriate for nominal, ordinal, interval or
ratio-level variables.

Correlation:
• A correlation is a linear association between variables: a relationship that can be represented by a straight line.

It can be measured by Pearson’s correlation coefficient (Pearson’s r). Pearson’s r is appropriate for interval and ratio-level

How well did you know this?

Not at all

Perfectly

A correlation is…

a linear association between variables: a relationship that can be represented by a straight line.

How well did you know this?

Not at all

Perfectly

What can a correlation be measure by? and what level of data is it appropriate for?

It can be measured by Pearson’s correlation coefficient (Pearson’s r). Pearson’s r is appropriate for interval and ratio-level

How well did you know this?

Not at all

Perfectly

An association is…

and what level of data is it appropriate for?

When two variables are related to one another, in the sense that they vary together

nominal ordinal interval or ratio levels

How well did you know this?

Not at all

Perfectly

waht is pearson’s R?

what does it range between?

what numbers denote perfect postive, perfect negative, and no correlation?

Is a measure of a linear relationship
(correlation) between two variables

• Ranges between -1 and 1
• Tells us how well the data fit a straight
line
r = 1 → a perfect positive correlation
r = –1 → a perfect negative correlation
r = 0 → no correlation

How well did you know this?

Not at all

Perfectly

The correlation coefficient (r) is a ds (like the mean or the standard deviation).

so what do we need to be careful of?
what is it subject to?
what do we need?

descriptive statistic

Therefore, we need to be careful when drawing conclusions from a correlation coefficient computed from sample data:

the correlation coefficient is subject to random sampling error.

We need a significance test.

How well did you know this?

Not at all

Perfectly

what would the null hypothesis of a correlation analysis be?

two variables are linearly independent in the population.

we will be testing is that the correlation
is zero in the population

How well did you know this?

Not at all

Perfectly

Correlations: Statistical significance and strength

It is important to distinguish the SS of a correlation from the S of a correlation.

statistical significance

strength

How well did you know this?

Not at all

Perfectly

Statistical significance means…..

This says X about the strength of the correlation.

“we have evidence against the
null hypothesis that the correlation is zero in the population”.

nothing
(A correlation may be non-zero, but small)

How well did you know this?

Not at all

Perfectly

The following values are often used to evaluate the strength (the effect size) of the correlation coefficient:

Small

Medium

Large

Small .10

Medium .30

Large .50

How well did you know this?

Not at all

Perfectly

Confidence Interval for a Correlation Coefficient
It is possible to calculate confidence intervals for a correlation coefficient.

For a 95% confidence interval for the correlation between Psych Distress at 16 and Psych Distress at 34 is (0.080 to 0.355). What are we?

For a given pe, the confidence interval of a Pearson correlation will be X, the X the sample size.

Note: SPSS does not have an automatic function to calculate confidence intervals for Pearson correlations. There are online calculators that can work out confidence intervals given the PE (here: r = 0.222) and the SS (here: n = 184)

We are 95% confident that the interval between 0.080 and 0.355 contains the true correlation coefficient.

point estimate
narrower
larger

Point estimate
Sample size

How well did you know this?

Not at all

Perfectly

What are the assumptions of Pearson’s r?

when do we not need to make assumptions? why?

when do we need to make assumptions?
CI ST

what do we assume? BND

we do not make any assumptions about the distribution of the two variables, X and Y, whose correlation we measure.
Don’t need to be normally distributed for Pearson’s r to be a meaningful measure of correlation as it only measures linear relationship between two variables

confidence interval for, or perform a significance test

We then assume that the two variables follow a
bivariate normal distribution. If X and Y are each normally distributed, then their joint distribution will be bivariate normal.

How well did you know this?

Not at all

Perfectly

Alternative to Pearson’s r

when would you use?
want to carry out ST
LoD
NND

If X and Y are ordinal measures (rather than interval or ratio), or if either X or Y is not normally distributed, but we want to carry out a significance test

How well did you know this?

Not at all

Perfectly

what are spurious correlations?

When a correlation is found between things that have no causal relationship with each other

How well did you know this?

Not at all

Perfectly

what are the npe of Pearson’s r

what type of data do they work on?

what do the coeffients vary between?

Spearman’s correlation coefficient (“Spearman’s rho”)
Kendall’s tau

Both Spearman’s coefficient and Kendall’s Tau work on ranked data (Spearman’s rho is, in fact, the Pearson correlation coefficient carried out on ranked data).

Spearman’s coefficient and Kendall’s tau vary between -1 and +1, and their interpretation is analogous to Pearson’s r.

How well did you know this?

Not at all

Perfectly

what is a regression used for?

What does it assume?

what can a regression not do?

Study These Flashcards

Simple Linear Regression
In regression, we use one variable to predict another.

Linear regression assumes that the relationship may be represented as a straight line.

Regression analysis can help to establish whether a given set of data is consistent with a predictive assumption. However, regression by itself cannot prove that X causes Y

what is a residual?

Study These Flashcards

A residual is the vertical distance of a point from the regression line.

what is the regression line? how does it link to residuals?

Study These Flashcards

The regression line is the line that minimises the squared residuals.

The regression line is therefore called the line of best fit, or least squares regression line.

In a regression what does Y denote?

Study These Flashcards

• The predicted variable, usually denoted by Y, is called the dependent variable, or the outcome.

In a regression what does X denote?

Study These Flashcards

• The predictor variable, usually denoted by X, is called the independent variable. In regression, we model the dependent variable as a function of the independent variable.

Y=bo+(b1X)+ e

What do the letters stand for

Study These Flashcards

Y the predicted variable (DV)

X the predictor variable (IV)

bo = is the intercept of the regression line – also called the constant
b1 = is the slope of the regression line

e= is the residual (the prediction error specific to each individual)

From this model we can derive a prediction of __ given __

what is Y hat?

remember that e is not included when

Study These Flashcards

Y X

where (“y-hat”) is the predicted value of Y for individual i.

what does the slope of a regression line

tells us?

Study These Flashcards

how much difference in Y we can predict for a 1 unit
change in X.

Here: for a 1 inch increase in the parents’ height, a child is predicted to be 0.65 inches taller, on average.

What does the Intercept of a regression line tell us?

Study These Flashcards

The intercept is the predicted value of Y when X = 0

The regression slope is: b1 = 0.65. This means that if parents' height increases by 1 inch, a child's height is predicted to, what? The intercept is b0= 23.94. This means that the model predicts that?? what is the General rule? We can use the equation to predict the height of a person whose parents' average height is 64 inches:

increase by 0.65 inches. a person whose parents are on average 0 inches tall, is predicted to be 23.94 inches tall. In this case, the intercept is not a meaningful number. do not use a regression line to make predictions outside of the range of the data you used to estimate the line. (Yhat) = 23.94 + 0.65x64 = 65.54

In SPSS for a linear regression what do you look for?

Intercept = Constant = b0 = 23.942 Slope = Unstandardized Coefficient (B) of Parents' Mean Height = b1 = 0.646

The t test in a linear regression provide a test of the null hypothesis that....

The intercept and the coffins are zero in the population

for the slope coefficient, t = 15.711, p < 0.001, s

o we have evidence that the coefficient for Parents' Height is not zero. So Parents' Height makes a (non-zero) contribution to the prediction of Child's Height in the population.

``` Standardized Coefficient (Beta, β) of Parents' Mean Height = 0.459. The standardized coefficient represents the slope in terms of the ``` what question does it answer? The standardized coefficient measures the ________ of the relationship of X and Y. In simple linear regression, it is equal to ________ _

standard deviations of X and Y; if I move up one SD of X, how many SD of Y do I move up? strength Pearsons R

R: For simple linear regression, this is equal to.... R-Square can be interpreted as

Pearson's r for X and Y. the proportion of the variance in Y (participant’s height) that can be “accounted for” by X (parents’ height). Another way to put this is: Using our knowledge of a person’s parents’ height reduces the error of the prediction of their height by 21% (relative to a situation where we did not know the parents’ height).

When the regressio slope is zero (b1 = 0)..... However, it is not generally true that the steeper the slope, the stronger the prediction. If we changed the scale of the X-variable (parents’ height) in feet, instead of in inches (but kept child’s height in inches), the regression slope would change. The predictive power of X is indicated by the Standardized Coefficient (called “Beta” in SPSS). The strength of the overall model is indicated by R2. For simple linear regression (with only a single independent variable), beta is the same as R.

then the line is parallel to the Xaxis, and X has no predictive power with respect to Y.

R: For simple linear regression, this is equal to.... R-Square can be interpreted as

When the regression slope is zero (b1 = 0)..... However, it is not generally true that,..... The predictive power of X is indicated by the The strength of the overall model is indicated by . For simple linear regression (with only a single independent variable), beta is the same as .

then the line is parallel to the Xaxis, and X has no predictive power with respect to Y. the steeper the slope, the stronger the prediction. If we changed the scale of the X-variable (parents’ height) in feet, instead of in inches (but kept child’s height in inches), the regression slope would change. Standardized Coefficient (called “Beta” in SPSS). R2 R

Term 2: Lecture 8 Correlations and Linear Regression Flashcards

(33 cards)