Relating Two Variables: Linear Regression and Correlation Flashcards
What can you do first to analyse how one variable relates to another
A scatter plot
(However you still need a QUANTITATIVE DESCRIPTION of the plot)
What does fitting a line to the data in a scatter plot allow for
Allows for a QUANTITATIVE DESCRIPTION
What is the straight line formula
What is the most widely used algorithm for finding the slope and intercept
METHOD OF LEAST SQUARES
Counts the distances of the points above the line as positive but as a negative for points below the line then square the distances before adding them up
What does the fitted line in the scatter plot still require
Statistical Context
What is the dependance on the mean of the Y variable on the X variable known as
Regression of Y on X
What do you have to consider to analyse two variables
- The MEAN of the POPULATION of ONE VARIABLE depends LINEARLY on the VALUE of the OTHER VARIABLE (so the mean will vary linearly with other variable). REMEMBER DON’T assume that a variable of the individual depends on the other viable via a straight line relationship_
- Assume the SPREAD of the y variable about this mean is measured by STANDARD DEVIATION (alpha sign) about the line and DOESN’T CHANGE with the X variable
What are the three parameters to estimate in a Regression Analysis
- The intercept defining the mean (α)
- The slope of the line defining the mean (β)
- Standard Deviation about the line
What can regression analysis sometimes introduce into the analysis
Asymmetry analysis
What does the asymmetry in the regression analysis sometimes mean
It sometimes means regression is NOT THE RIGHT TOOL to analysis the two variables. However, this problem can be best posed in this assemetruc way
How do you estimate the population mean by regression analysis?
By α+βx (population intercept +slope)
(They do this by the sample intercept+ slope)
How do you estimate the standard deviation by regression analysis?
By a Quantity related to the MINIMISED SUM OF SQUARES about the fitted line
What does a standard deviation estimate in regression analysis
The spread of data
What are the units of the intercept
The same units as the Y variable
What are the units of standard deviation about the line?
Same units as Y variables
What are the units of the slope
Y per X e.g. litres per cm
How good is the intercept (α) in estimating and testing a hypothesis
LITTLE DIRECT INTEREST as regression analysis is aimed at ELUCIDATING the RELATIONSHIP BETWEEN Y and X
How good is the intercept (β) in estimating and testing a hypothesis
It is of GREATER INTEREST because it MEASURES the RATE AT WHICH THE MEAN OF THE Y VARIABLE CHANGES AS THE X VARIABLE CHANGES
How do you see how good the sample slope (b) is as an estimate of β?
You measure STANDARD ERROR which is calculated by MINITAB
What does β=0 mean?
- The mean of Y variable DOESN’T change with the X variable - FORMS OF LINEAR DEPENDANCE
- NO ASSOCIATION between y and x variables
Does standard error play an important role in the testing of the hypothesis
YES
What computer program fits a linear regression?
Minitab- it can fit much more elaborate modes
(Standard Error of β is given under the heading SE Coef and S is standard deviation)
(The test of hypothesis β=0 is based on the t-statistic given under T-value with a corresponding P-value)
What are the Key items in minitab?
- Estimate slope and intercept =given under Coef
- Standard Error of the slope = given under SE Coef
- P-value for the test of the hypothesis β=0
- Standard Deviation about the line= given as S
What is the intercept
It is the MEAN of Y variable when the individual has a X variable EQUAL TO 0
Can regression be used to predict one variable from another
Yes regression can be used to predict the value of another variable from the measurement of one value of another
Why is predicting the value of another variable from the regression of another variable important and what is it important for?
- Predicting variables that would be difficult or invasive to measure
- For EQUALLY VARIABLE which only become apparent in the FUTURE such as survival time can be predicted from variables known as presentation
When making predictions from regression analysis what is important to take into account?
NATURAL VARIABILITY IN THE SYSTEM
This often places wide limits on the predictions made for an individual
More detailed consideration of preduction methods is instructive
To predict value what estimates do you have to use
You have to use the estimates of β and α which would have been obtained from regression analysis
What pairs of limits can you place on the predictions made by linear regression
- 95% Confidence Intervals
- 95% prediction intervals
What happens to the width of the 95% confidence intervals when the sample size increases?
The Width of the interval decreases
Does the bigger the sample mean a more precise estimate of the population mean (u)
Yes
Can the uncertainty of the intercept and slope be quantified by the confidence intervals?
Yes
When is a prediction interval contemplated and applied?
To an individual, prediction consisting of a single value is useful and much more helpful if an INTERVAL can be given which the VALUE OF THE INDIVIDUAL IS LIKELY TO LIE
Is prediction intervals the same as confidence intervals
No
Can prediction interval for the prediction of the value for an individual shrink to zero width under any circumstances
No any sensible interval; for the prediction of the value for an individual CAN’T SHRINK TO ZERO WIDTH UNDER ANY CIRCUMSTANCES
In prediction intervals, when increasing the sample size is there change in the width of the intervals
There is LITTLE CHANGE IN THE PREDICTION INTERVAL but the confidence intervals become narrower
What does little change in the prediction interval when the sample size increases and the confidence interval narrows highlight?
NATURAL VARIABILITY of the variable REMAINS UNCHANGED
What are the pitfalls to regression analysis?
- The technique is rather weak, empirical technique which deduces a relationship between two variables from that which can be apprehended in the sample itself. Therefore regression analysis might be guided by some sort of COMPARTMENTAL MODEL
- SHOULDN’T BE USED IN SOME SORT OF DATA such as older males
- Beware of OUTLYING OR UNUSUAL VALUES. This can have a noticeable influence on the estimates you obtain
- It is up to the analyst to ensure that correct approach is chosen
What are the assumptions of linear regression?
- The MEAN of the Y VARIABLE at a given value of the X VARIABLE changes LINEARLY with X
- The SPREAD OF DATA about this line is CONSTANT, that DOESN’T CHANGE as X CHANGES
- The deviations from the line follow a NORMAL DISTRIBUTION but this is only needed if you intent to compute confidence or predictions or to perform hypothesis tests
How do you assess linearity assummptions
Draw a scatter plot of the Y variable against the X variable and you can assess by eye
How do you assess the spread about the line
From a scatter plot but defining quantities known as RESIDUALS
What are residuals
The vertical distance of a point from the fitted line
Are residuals positive above the line or negative below the line or can they be both
They can be both
How can you tell if the fitted line truly reflects the data structure
The RESIDUALS are a sample from a distribution with POPULATION MEAN EQUAL TO ZERO and they have the SAME STANDARD DEVIATION
How do you assess the normality of the deviation from the line?
To do a NORMAL PROBABILITY PLOT
What does correlation coefficient attempt to quantify?
The DIFFERENCE BETWEEN TWO DATA SETS
Does correlation go together with regression?
Yes they go together
What are the properties of correlation (r)?
- Always take VALUES BETWEEN -1 and 1
- IF THE POINTS WERE TO LIE EXACTLY ON A STRAIGHT LINE THEN r WOULD EITHER BE -1 OR 1
- A VALUE of 0 CORRESPONDS TO NO LINEAR RELATIONS BETWEEN VARIABLES
- IT CAN BE COMPUTED FOR DATA WHICH COMPRISES OF CONTINUOUS VARIABLE PAIRS
What does a Negative Correlation (r) look like?
The data which the Y variable tends to DECREASE as the X variable INCREASES
What does a Positive Correlation (r) look like?
The Y and X Variable tend to increase or decrease together
What is the problem with correlation?
Hard to interpret values that are not at the extreme. This is bad as in large samples, a WEAK CORRELATION can STILL HAVE SIGNIFICANCE.
What method compares two methods that measure the same thing
Altman and Bland Method
What does the Altman and Bland Method work with?
Works with the differences between individual observations
What does the Altman and Bland method consider
The distribution of the differences between individual observations
In the Altman and Bland method how is the dispersion of differences summarised?
Through the limits of agreement. 95% of differences will lie between these limits so if one observation on one method is , then the observation using method Y would generally not differ from X by more than what is quantified in the limits of agreements
What does the Altman and Bland Method show
- Shows QUANTITATIVELY the DEGREE OF AGREEMENT BETWEEN THE TWO METHODS and whether they agree but not a statistical one
- They also can provide further information for example the standard deviations of the methods may be different maybe due to one method have a higher error
What are residuals
Spread about the line measured by SD of mean Y
What happens to the confidence limits when the sample size increases
Gets narrower
What happens to the prediction limits when the sample size increases
Nothing, it does not vary with sample size
Are the confidence intervals narrower compared to teh prediction limits at 95%
They are narrower and more “curved”
What does correlation measure
The STRENGTH of a relationship
What is the value of Pearson coefficient
Between -1 and +1
What does the Pearson coefficient value of 1 represent
All observation lie on a STRAIGHT line with a positive gradient
What kind of gradient is there if the correlation (r) is below 0
Negative gradient
What does 0 represent in correlation
There is no linear relationship as the Pearson correlation assumes linearity
Does correlation establish agreement