Correlation and Simple linear regression Flashcards

1
Q

What does a correlation coefficient measure?

A

The linear association between 2 continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does a correlation coefficient have values between?

A

-1 and +1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a correlation coefficient of -1 indicate?

A

Perfect negative association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a correlation coefficient of +1 indicate?

A

Perfective positive association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of statistic is a correlation?

A

Parametric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are parametric tests?

A

Those that make assumptions about the parameters of the population distribution from which the sample is drawn. This is often the assumption that the population data are normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why do the two continuous variables need to be approximately
normally distributed?

A

Correlation is a parametric statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a non-parametric measure of association between ranks and thereby does NOT require this assumption?

A

Spearman rank correlation coefficient is a non-parametric measure of association between ranks. So does NOT require this assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Correlation assumes causation.

True or false

A

FALSE

Correlation does. not assume causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does exact linearity between variable y and variable x mean?

A

That one variable is a linear function of the second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the following equation mean?

Y= B0 + B1X

A

The intercept B0 is the value that y taken when x is zero.
- If the intercept is zero then y increases in proportion to x (i.e. double x then y doubles)

The slope B1 determines the change y when x changes by one unit

  • It measures the gradient of the line.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why might the linear relationship only apply to the expected value of y?

A

Other things influence y other than x
- (The expected value of y is the average over several instances with the same value of x).

-Any single y measurement might differ from the line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the mathematical model of a linear relationship?

A

E[Y]=B0+B1X

Or
y= B0 + B1x+E

  • where E = Y-E[Y] is the error (discrepancy between the expected and observed y)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a linear model stipulate?

A

A linear relationship between variable x and the expectation of y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are scatterplots useful for?

A

Understanding the bivariate distribution of two continuous variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What do linear models allow us to make an assessment of?

A

Whether a linear relationship is a realistic assumption.

17
Q

So assuming that an approximate linear relationship is an appropriate model, how can we find the best line?

A

Ordinary Least Squares (OLS) – minimises the squared deviations between the data points and the fitted line.

Maximum Likelihood (ML) – chooses the line under which the observed data was most likely to have occurred
Needs to know the distribution under which the errors arise

OLS = ML if the errors have a normal distribution

18
Q

What does the ordinary least squares approach pick?

A

The line that minimizes the
deviations

19
Q

What does the maximum likelihood approach involve?

A

If the deviations (also know as residuals) {ei} are assumed to be normally distributed then this line is also the maximum likelihood estimate – the line that makes the observed data seem as likely an
occurrence as possible.

The hypothesized normal distribution for the residuals is assumed to have constant variance

20
Q

What are the advantages of a linear model?

A

Linear model can fit well even where we “know” it’s the wrong model. It is often a good approximation over a restricted section of a non-linear relationship.

It is also a parsimonious description(A parsimonious model is a model that accomplishes the desired level of explanation or prediction with as few predictor variables as possible)

It might represent a causal/explanatory relationship between height and weight but this should NOT be assumed (stretching children does not make them heavier).

Accurate prediction does NOT require a relationship to be causal/explanatory.
(We call y the response and x the explanatory variable – but this does mean that the relationship is necessarily causal).

21
Q

What is the Regression intercept?

A

Predicted value of y when x=0

22
Q

What is the confidence interval formula?

A

95% CI = [B - 1.96xSE(B), B + 1.96x SE(B)]

where SE(B) = standard error of B

23
Q

In addition to the “point” estimates of the regression coefficients what does a linear model also show us?

A

Estimates of how precisely we have estimated them – the standard error (Std. Error)

24
Q

In a regression model the multiple correlation measures what?

A

The (Pearson) correlation between the observed and predicted values of y.

25
Q

In a simple linear regression model (one x) the correlation between the response and explanatory variable is what?

A

The multiple correlation.

It is also the standardised regression coefficient (beta).

26
Q

The square of the multiple correlation, R2, measures what?

A

The amount of variance in y that can be explained by differences in x.

Thus R2 provides a measure of fit of the regression model.

27
Q

How can we test the exposure-outcome association?

A

We can test the null hypothesis that he slope B1=0 .

The alternative hypothesis is B1 ≠0 .

This null hypothesis amounts to testing whether the dependent variable (outcome) is associated with the explanatory variable (exposure).

The test statistic is t=B/s.e.(B) .

The value of the test statistic can be compared with a t-distribution with n-2 degrees of freedom to obtain a p-value. (I.e. for large samples values |t|>2 are significant at the 5% level).

28
Q

What assumptions are required for inference?

A

The following assumptions are made for inference:

The observations are independent.

The observations may be selected for values of x but are otherwise a random sample.

The residuals have
a normal distribution (stronger than required by OLS)
an expected value of zero for all values of x (linearity)
constant variance (homoscedasticity).

This is the equivalent of assuming that the within group variances are equal when doing a two-samples t-test.