Data Analysis - Interpretation Flashcards
What is the take home message regarding any form of models?
ALL models are bad but some are useful for specific things
What 4 things does the statistical doctor look for?
- observe
- guess
- test
- assess
What can be observed from this scatterplot?
- hypertension and CHD values appear to increase together
- they have a correlation of 0.70
(the red line is linear regression analysis - there is no line for correlation)
How do these diagrams demonstrate the difference between correlation and regression?
correlation indicates whether two variables do or don’t change together
regression quanitifies how the variables change together
What is significant about the slope of a linear regression model?
How is it calculated?
slope = change in height/change in horizontal distance
the slope of a linear regression model quantifies how the variables relate
What does the number highlighted in red demonstrate?
the coefficient
the slope of the line that is used as a model to represent the relationship between CHD and hypertension
the interpretation is that for every step we take in hypertension, our CHD changes by 0.32
What is a better interpretation of the “slope”?
for every 1% difference in the prevalence of hypertension, we see a 0.32 difference in the prevalence of CHD
Why is it important to be careful with presentation when showing a relationship between 2 variables?
2 graphs may have the same slopes but look different due to different scales being used
What is R squared?
What is the interpretation and meaning?
it is a goodness of fit statistic with values between 0 and 1
interpretation:
the larger the value, the better
meaning:
proportion of the outcome’s variability that the model explains
In which ways is the data limited, when the conclusion about the relationship between the prevalence of CHD and hypertension is made?
the data used is only from one year, only from England and only from patients
statements cannot be made outside of the year or country that the data is taken from
Why is the population not usually studied?
the group that we are most interested in is the population
the population is impossible to study, so samples are studied and the results are generalised
What 4 questions must be asked when thinking about generalising?
- how reliable is my model?
- would i get the same coefficient if i built my model using different data?
- would i get the same goodness-of-fit if i used the same model on different data
- how likely am i to make the correct conclusion?