Relating Two Variables: Linear Regression and Correlation Flashcards
What can you do first to analyse how one variable relates to another
A scatter plot
(However you still need a QUANTITATIVE DESCRIPTION of the plot)
What does fitting a line to the data in a scatter plot allow for
Allows for a QUANTITATIVE DESCRIPTION
What is the straight line formula
What is the most widely used algorithm for finding the slope and intercept
METHOD OF LEAST SQUARES
Counts the distances of the points above the line as positive but as a negative for points below the line then square the distances before adding them up
What does the fitted line in the scatter plot still require
Statistical Context
What is the dependance on the mean of the Y variable on the X variable known as
Regression of Y on X
What do you have to consider to analyse two variables
- The MEAN of the POPULATION of ONE VARIABLE depends LINEARLY on the VALUE of the OTHER VARIABLE (so the mean will vary linearly with other variable). REMEMBER DON’T assume that a variable of the individual depends on the other viable via a straight line relationship_
- Assume the SPREAD of the y variable about this mean is measured by STANDARD DEVIATION (alpha sign) about the line and DOESN’T CHANGE with the X variable
What are the three parameters to estimate in a Regression Analysis
- The intercept defining the mean (α)
- The slope of the line defining the mean (β)
- Standard Deviation about the line
What can regression analysis sometimes introduce into the analysis
Asymmetry analysis
What does the asymmetry in the regression analysis sometimes mean
It sometimes means regression is NOT THE RIGHT TOOL to analysis the two variables. However, this problem can be best posed in this assemetruc way
How do you estimate the population mean by regression analysis?
By α+βx (population intercept +slope)
(They do this by the sample intercept+ slope)
How do you estimate the standard deviation by regression analysis?
By a Quantity related to the MINIMISED SUM OF SQUARES about the fitted line
What does a standard deviation estimate in regression analysis
The spread of data
What are the units of the intercept
The same units as the Y variable
What are the units of standard deviation about the line?
Same units as Y variables
What are the units of the slope
Y per X e.g. litres per cm
How good is the intercept (α) in estimating and testing a hypothesis
LITTLE DIRECT INTEREST as regression analysis is aimed at ELUCIDATING the RELATIONSHIP BETWEEN Y and X
How good is the intercept (β) in estimating and testing a hypothesis
It is of GREATER INTEREST because it MEASURES the RATE AT WHICH THE MEAN OF THE Y VARIABLE CHANGES AS THE X VARIABLE CHANGES
How do you see how good the sample slope (b) is as an estimate of β?
You measure STANDARD ERROR which is calculated by MINITAB
What does β=0 mean?
- The mean of Y variable DOESN’T change with the X variable - FORMS OF LINEAR DEPENDANCE
- NO ASSOCIATION between y and x variables
Does standard error play an important role in the testing of the hypothesis
YES
What computer program fits a linear regression?
Minitab- it can fit much more elaborate modes
(Standard Error of β is given under the heading SE Coef and S is standard deviation)
(The test of hypothesis β=0 is based on the t-statistic given under T-value with a corresponding P-value)
What are the Key items in minitab?
- Estimate slope and intercept =given under Coef
- Standard Error of the slope = given under SE Coef
- P-value for the test of the hypothesis β=0
- Standard Deviation about the line= given as S
What is the intercept
It is the MEAN of Y variable when the individual has a X variable EQUAL TO 0
Can regression be used to predict one variable from another
Yes regression can be used to predict the value of another variable from the measurement of one value of another
Why is predicting the value of another variable from the regression of another variable important and what is it important for?
- Predicting variables that would be difficult or invasive to measure
- For EQUALLY VARIABLE which only become apparent in the FUTURE such as survival time can be predicted from variables known as presentation