M7: Cautions in Regression and Categorical Data Flashcards
Extrapolation
Predicting outside the range of trusted values of X within the observed range
Cautions in Analyzing Associations
a) Extrapolation
b) Influential outliers
c) Correlation does not imply causation
Outliers
Points that are away from the trend of the observations
Influential outliers
Tend to pull the regression line towards them
What to do if your data has outliers?
Check the data and correct any typos
If there are unusual observations, try to find more about them.
If they do not belong in the data set, delete the point before proceeding with the regression analysis
Lurking variables
A variable usually unobserved that influences the association between x and y
Types of graphs to find association
a) Box plot (Categorical data / Blood pressure / Caf)
b) Scatter-plot (Size of the head / IQ)
c) Contingency table (Smoking / Lung cancer)