Relating Two Variables: Linear Regression and Correlation Flashcards

1
Q

What can you do first to analyse how one variable relates to another

A

A scatter plot
(However you still need a QUANTITATIVE DESCRIPTION of the plot)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does fitting a line to the data in a scatter plot allow for

A

Allows for a QUANTITATIVE DESCRIPTION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the straight line formula

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the most widely used algorithm for finding the slope and intercept

A

METHOD OF LEAST SQUARES
Counts the distances of the points above the line as positive but as a negative for points below the line then square the distances before adding them up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the fitted line in the scatter plot still require

A

Statistical Context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the dependance on the mean of the Y variable on the X variable known as

A

Regression of Y on X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do you have to consider to analyse two variables

A
  1. The MEAN of the POPULATION of ONE VARIABLE depends LINEARLY on the VALUE of the OTHER VARIABLE (so the mean will vary linearly with other variable). REMEMBER DON’T assume that a variable of the individual depends on the other viable via a straight line relationship_
  2. Assume the SPREAD of the y variable about this mean is measured by STANDARD DEVIATION (alpha sign) about the line and DOESN’T CHANGE with the X variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three parameters to estimate in a Regression Analysis

A
  1. The intercept defining the mean (α)
  2. The slope of the line defining the mean (β)
  3. Standard Deviation about the line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can regression analysis sometimes introduce into the analysis

A

Asymmetry analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the asymmetry in the regression analysis sometimes mean

A

It sometimes means regression is NOT THE RIGHT TOOL to analysis the two variables. However, this problem can be best posed in this assemetruc way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you estimate the population mean by regression analysis?

A

By α+βx (population intercept +slope)
(They do this by the sample intercept+ slope)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you estimate the standard deviation by regression analysis?

A

By a Quantity related to the MINIMISED SUM OF SQUARES about the fitted line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a standard deviation estimate in regression analysis

A

The spread of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the units of the intercept

A

The same units as the Y variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the units of standard deviation about the line?

A

Same units as Y variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the units of the slope

A

Y per X e.g. litres per cm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How good is the intercept (α) in estimating and testing a hypothesis

A

LITTLE DIRECT INTEREST as regression analysis is aimed at ELUCIDATING the RELATIONSHIP BETWEEN Y and X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How good is the intercept (β) in estimating and testing a hypothesis

A

It is of GREATER INTEREST because it MEASURES the RATE AT WHICH THE MEAN OF THE Y VARIABLE CHANGES AS THE X VARIABLE CHANGES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you see how good the sample slope (b) is as an estimate of β?

A

You measure STANDARD ERROR which is calculated by MINITAB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does β=0 mean?

A
  1. The mean of Y variable DOESN’T change with the X variable - FORMS OF LINEAR DEPENDANCE
  2. NO ASSOCIATION between y and x variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Does standard error play an important role in the testing of the hypothesis

A

YES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What computer program fits a linear regression?

A

Minitab- it can fit much more elaborate modes
(Standard Error of β is given under the heading SE Coef and S is standard deviation)
(The test of hypothesis β=0 is based on the t-statistic given under T-value with a corresponding P-value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the Key items in minitab?

A
  1. Estimate slope and intercept =given under Coef
  2. Standard Error of the slope = given under SE Coef
  3. P-value for the test of the hypothesis β=0
  4. Standard Deviation about the line= given as S
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the intercept

A

It is the MEAN of Y variable when the individual has a X variable EQUAL TO 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Can regression be used to predict one variable from another

A

Yes regression can be used to predict the value of another variable from the measurement of one value of another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why is predicting the value of another variable from the regression of another variable important and what is it important for?

A
  1. Predicting variables that would be difficult or invasive to measure
  2. For EQUALLY VARIABLE which only become apparent in the FUTURE such as survival time can be predicted from variables known as presentation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

When making predictions from regression analysis what is important to take into account?

A

NATURAL VARIABILITY IN THE SYSTEM
This often places wide limits on the predictions made for an individual
More detailed consideration of preduction methods is instructive

28
Q

To predict value what estimates do you have to use

A

You have to use the estimates of β and α which would have been obtained from regression analysis

29
Q

What pairs of limits can you place on the predictions made by linear regression

A
  1. 95% Confidence Intervals
  2. 95% prediction intervals
30
Q

What happens to the width of the 95% confidence intervals when the sample size increases?

A

The Width of the interval decreases

31
Q

Does the bigger the sample mean a more precise estimate of the population mean (u)

A

Yes

32
Q

Can the uncertainty of the intercept and slope be quantified by the confidence intervals?

A

Yes

33
Q

When is a prediction interval contemplated and applied?

A

To an individual, prediction consisting of a single value is useful and much more helpful if an INTERVAL can be given which the VALUE OF THE INDIVIDUAL IS LIKELY TO LIE

34
Q

Is prediction intervals the same as confidence intervals

A

No

35
Q

Can prediction interval for the prediction of the value for an individual shrink to zero width under any circumstances

A

No any sensible interval; for the prediction of the value for an individual CAN’T SHRINK TO ZERO WIDTH UNDER ANY CIRCUMSTANCES

36
Q

In prediction intervals, when increasing the sample size is there change in the width of the intervals

A

There is LITTLE CHANGE IN THE PREDICTION INTERVAL but the confidence intervals become narrower

37
Q

What does little change in the prediction interval when the sample size increases and the confidence interval narrows highlight?

A

NATURAL VARIABILITY of the variable REMAINS UNCHANGED

38
Q

What are the pitfalls to regression analysis?

A
  1. The technique is rather weak, empirical technique which deduces a relationship between two variables from that which can be apprehended in the sample itself. Therefore regression analysis might be guided by some sort of COMPARTMENTAL MODEL
  2. SHOULDN’T BE USED IN SOME SORT OF DATA such as older males
  3. Beware of OUTLYING OR UNUSUAL VALUES. This can have a noticeable influence on the estimates you obtain
  4. It is up to the analyst to ensure that correct approach is chosen
39
Q

What are the assumptions of linear regression?

A
  1. The MEAN of the Y VARIABLE at a given value of the X VARIABLE changes LINEARLY with X
  2. The SPREAD OF DATA about this line is CONSTANT, that DOESN’T CHANGE as X CHANGES
  3. The deviations from the line follow a NORMAL DISTRIBUTION but this is only needed if you intent to compute confidence or predictions or to perform hypothesis tests
40
Q

How do you assess linearity assummptions

A

Draw a scatter plot of the Y variable against the X variable and you can assess by eye

41
Q

How do you assess the spread about the line

A

From a scatter plot but defining quantities known as RESIDUALS

42
Q

What are residuals

A

The vertical distance of a point from the fitted line

43
Q

Are residuals positive above the line or negative below the line or can they be both

A

They can be both

44
Q

How can you tell if the fitted line truly reflects the data structure

A

The RESIDUALS are a sample from a distribution with POPULATION MEAN EQUAL TO ZERO and they have the SAME STANDARD DEVIATION

45
Q

How do you assess the normality of the deviation from the line?

A

To do a NORMAL PROBABILITY PLOT

46
Q

What does correlation coefficient attempt to quantify?

A

The DIFFERENCE BETWEEN TWO DATA SETS

47
Q

Does correlation go together with regression?

A

Yes they go together

48
Q

What are the properties of correlation (r)?

A
  1. Always take VALUES BETWEEN -1 and 1
  2. IF THE POINTS WERE TO LIE EXACTLY ON A STRAIGHT LINE THEN r WOULD EITHER BE -1 OR 1
  3. A VALUE of 0 CORRESPONDS TO NO LINEAR RELATIONS BETWEEN VARIABLES
  4. IT CAN BE COMPUTED FOR DATA WHICH COMPRISES OF CONTINUOUS VARIABLE PAIRS
49
Q

What does a Negative Correlation (r) look like?

A

The data which the Y variable tends to DECREASE as the X variable INCREASES

50
Q

What does a Positive Correlation (r) look like?

A

The Y and X Variable tend to increase or decrease together

51
Q

What is the problem with correlation?

A

Hard to interpret values that are not at the extreme. This is bad as in large samples, a WEAK CORRELATION can STILL HAVE SIGNIFICANCE.

52
Q

What method compares two methods that measure the same thing

A

Altman and Bland Method

53
Q

What does the Altman and Bland Method work with?

A

Works with the differences between individual observations

54
Q

What does the Altman and Bland method consider

A

The distribution of the differences between individual observations

55
Q

In the Altman and Bland method how is the dispersion of differences summarised?

A

Through the limits of agreement. 95% of differences will lie between these limits so if one observation on one method is , then the observation using method Y would generally not differ from X by more than what is quantified in the limits of agreements

56
Q

What does the Altman and Bland Method show

A
  1. Shows QUANTITATIVELY the DEGREE OF AGREEMENT BETWEEN THE TWO METHODS and whether they agree but not a statistical one
  2. They also can provide further information for example the standard deviations of the methods may be different maybe due to one method have a higher error
57
Q

What are residuals

A

Spread about the line measured by SD of mean Y

58
Q

What happens to the confidence limits when the sample size increases

A

Gets narrower

59
Q

What happens to the prediction limits when the sample size increases

A

Nothing, it does not vary with sample size

60
Q

Are the confidence intervals narrower compared to teh prediction limits at 95%

A

They are narrower and more “curved”

61
Q

What does correlation measure

A

The STRENGTH of a relationship

62
Q

What is the value of Pearson coefficient

A

Between -1 and +1

63
Q

What does the Pearson coefficient value of 1 represent

A

All observation lie on a STRAIGHT line with a positive gradient

64
Q

What kind of gradient is there if the correlation (r) is below 0

A

Negative gradient

65
Q

What does 0 represent in correlation

A

There is no linear relationship as the Pearson correlation assumes linearity

66
Q

Does correlation establish agreement

A