Regression + Correlation Flashcards

1
Q

Regression

A

Describes a relationship which isn’t deterministic between two variables one which is continuous. Allows easy visual analysis of linear /non-linear relationships - dependent and independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Regression line

A

y = a + bx ( y changes with x)

Straight line of best fit where a = y-intercept and b = slope of the line. This dependence of the mean of the y variable on the x variable is known as the
regression of y on x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Easiest way to assess trend between variables

A

Scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sum of all squares

A

Estimate a line - a line is then drawn up/down from the line to each induvidual point. This difference is squared to remove the -ve and added. The smallest number = line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Correlation

A

test of the relationship between to variables

r = 0 is a linear straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Correlation coefficient

A

r. Can vary from 1 - -1 these begins the two extremes of correlation
1 = increase in one variable leads to a linear increase in the other variable
-1 = increased in one variable leads to an linear decrease in the other variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Assumptions for regression analysis

A

The sample is representative of the population for the inference prediction.

The error is a random variable with a mean of zero conditional on the explanatory variables.

The independent variables are measured with no error. (Note: If this is not so, modeling may be done instead using errors-in-variables model techniques).

The independent variables (predictors) are linearly independent, i.e. it is not possible to express any predictor as a linear combination of the others.

The errors are uncorrelated

We also assume that the spread of FEV1 about this mean is measured by a standard deviation, σ, about the line and that this does not change with height.

The variance of the error is constant across observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regression of y on x

A

This dependence of the mean of the y variable on the x variable μ )( =α + βxx .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does B measure

A

Measures the rate at which the mean of the y variable changes as the x variable changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

B = 0

A

Mean of the y variable does not change with the x variable. Hence no association between the y and x variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Outcomes to extract from regression analysis

A
  • the estimated slope and intercept, given under Coef;
  • the standard error of the slope, given under SE Coef;
  • the P-value for the test of the hypothesis β=0;
  • the standard deviation about the line, given as S.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

a = y intercept

A

Mean value when x = 0, hence may be negative, needed for correct orientation and degree of slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Making predictions from regression

A

Natural variability needs to be taken into account hence wide wide limits on the prediction made for an individual are often in place

estimate for height h is α + β h,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Intervals for prediction

A

Confidence interval - calculate the uncertainty within the sample means to consturct an interval where we are 95% sure u (population mean) will lie

NB as sample increased interval will reduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Prediction interval

A

Can expect to see the next data point sampled. Collect a sample of data and calculate a prediction interval. Then sample one more value from the population. If you do this many times, you’d expect that next value to lie within that prediction interval in 95% of the samples.

Crucially the prediction interval tells you about the distribution of values, not the uncertainty in determining the population mean. Account for uncertainty within the mean and data scatter hence are wider than confidence intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Confidence interval

A

How well you have determined the mean. If you do this many times, and calculate a confidence interval of the mean from each sample, you’d expect about 95 % of those intervals to include the true value of the population mean.

Crucially tells you about the likely location of the true population parameter.

17
Q

Pitfalls

A

Beware of outliers - noticeable influence
Regression of x on y and y on x are very different
Not applicable to different populations

18
Q

Assumptions in regression

A

That the mean of the y variable at a given value of the x variable changes linearly with x

Spread of data = constant

Deviation from the line follows a normal distribution

19
Q

Assessing linearity

A

Draw a scatterplot exclude outliers check assumption plausible

20
Q

Assessing spread

A

Use residuals best seen in a scatterplot. Vertical distance from the line to the points.

If the fitted line truly reflects the structure of the data then the residuals are a sample from a distribution with population mean equal to zero and they all have the same SD

21
Q

Assessing the Normality of the deviations from the line

A

Residuals are deviations from the line then this assessment amounts to checking that the
residuals come from a common Normal distribution. Use a common probability plot

22
Q

Properties of r

A
  1. it always takes values between -1 and 1;
  2. if the points were to lie exactly on a straight line then r would be either -1 or 1;
  3. a value of 0 corresponds to no linear relation between the variables;
  4. it can be computed for data which comprise pairs of continuous variables.
23
Q

Method comparison

A

Comparing two methods which both measure the same thing ie manual and automated BP

Null hypothesis = no difference, scatter plot to assess visually

24
Q

1st method to compare methods

A

Analysis via paired t-test to compute the differences between each measurement for A and B

25
Q

2nd method (Bland and Altman plot)

A

Plot differences vs mean (of each pair ie A+B/2)

Trend in graph indicated one method = higher SD