Correlation and Linear Regression Flashcards

1
Q

In what situation is linear regression correlation a useful method of comparison

A

With continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is correlation?

A

Correlation tells us whether there is any association between two continuous variables and what is the strength of their association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is linear regression?

A

Quantifies the relationship between two variables when one of them depends on another. This allows the mean of one variable to be estimated for a given value of the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What sort of hypothesis testing can be done with linear regression?

A

Can carry out a t test and be easily extended to incorporate adjustments for baseline imbalances with a continuous outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the standard method of calculating correlation?

A

Pearson’s correlation coefficient / product moment correlation coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the notation of Pearson’s correlation coefficient

A

p (rho) for the population value and the estimated value is r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Pearson’s Correlation Coefficient?

A

Measures the scatter of the points around an underlying linear trend and can take a value from -1 to +1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we interpret r

A

Positive if higher values of one variable are associated with higher values of the other variable

The correlation is negative if values of one variable decrease as the values of the other increases.

The closer the points are to an underlying trend the higher the correlation. Conversely the greater the spread of points around an underlying linear trend the lower the correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does r change with restriction of the range of a variable

A

When the range of one of the variables is limited r is weaker. This means that comparison of r in different studies may be misleading is ranges of variables are not comparable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you calculate a valid confidence interval for r?

A

Must have bivariate normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does bivariate data look on a scatter graph?

A

Elliptical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What condition must be satisfied to produce a valid hypothesis test for correlated data?

A

At least one variable should be normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we calculate the correlation coefficient for non-parametric data?

A

Spearman’s rank-order correlation coefficient (denoted as ps for the population value and rs for the estimate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does Spearman’s rank-order correlation coefficient calculate correlation?

A

Rank correlation does not specifically assess linear association but assesses more generally whether there is a monotonic relationship between the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the two variables called in linear regression?

A

Variable x is the explanatory variable (dependent or outcome)
Variable y is the response variable (independent or predictor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we calculate a line of regression?

A

Least squares method to calculate the smallest possible residuals

17
Q

What assumptions are made that underlie the method of linear regression?

A
  1. The response variable has a normal distribution at each value of x (i.e. the residuals should be normally distributed)
  2. The variability of y must be the same across x (equal variance)
  3. The relationship between x and y is linear
  4. Observations are independent
18
Q

How do we guarantee observations are independent?

A

There is no test. Usually as long as and y are taken by individuals once then it is independent.

19
Q

What design of clinical trial would linear regression be inappropriate?

A

Cluster randomised trials

20
Q

What is the most important variable in the calculation of lines of best fit?

A

b (gradient)

21
Q

How do we calculate the confidence interval for the slope b?

A

Use the t-distribution

22
Q

What will t0.05 tend close to with increasing n?

A

1.96 - because the t-distribution approximates to the normal distribution as n increases

23
Q

How do we calculate the test statistic for a regression line hypothesis test?

A

Divide the estimate of b by its standard error and compare the result to the t-distribution with (n-2) degrees of freedom.

24
Q

P-values are not the most important measure of the appropriateness of lines of best fit. What other variable is important to comment on?

A

r^2 (the ratio of the model sum of squares and the total sum of squares) - this describes how well the model fits the data. Equivalent to the square of Pearson’s correlation coefficient

25
Q

What method may be used to analyse the variance of a regression model?

A

ANOVA testing (gives the model sum of squares)

26
Q

How do we check the assumption that the response variable (y) is normally distributed?

A

Plot the distribution of the residuals (ei)

27
Q

How do we check the assumption that the variability of y (variance) is the same as x changes?

A

Plot the residuals against x (the plot should not have a conical shape)

28
Q

What assumption, other than variance, can be checked when plotting the index of the residuals against x?

A

Linearity

29
Q

If one or more of the assumptions made when performing linear regression are not met, what common transformations may be possible to make it fit?

A

Natrual log, square, root and reciprocals

30
Q

How can we construct a hypothesis test using linear regression where one variable (x) is discrete (i.e. number of arms in a trial e.g. active and placebo)?

A

x = 0 = placebo and x = 1 = active
therefore b = difference in treatment between two groups
This is equivalent to two-sample t-test

31
Q

What are the benefits of using multiple linear regression to adjust for confounders in trial data?

A
  1. Can adjust for multiple explanatory variables

2. Can be used to adjust for continuous data