Data Analysis Flashcards
what is effect size
the size of the outcomes from a determinant
what are the measures of average
mean - average measurement
median - midpoint of measurements
mode - the most common value
what are the measures of spread
range - extremes of all data
standard deviation - average spread of values around the mean
IQR - spread of values around the median
what are the directions of skew
positive skew is to the right
negative skew is to the left
what does normally distributed data measure
report mean and standard deviation
what does skewed data measure (discrete)
report median and inter quartile range
what is the relationship between mean and median in normally distributed data
roughly the same
what values are dependant on distribution
mode and range
what are the two ways of displaying data
skewed data - box plot - median, IQR and range
two continuous variables - scatterplot
how do you measure associations between categorical variables
use risk, risk ratio, odds ratio
what test would you use for a continuous outcome, or categorical exposure
t test or non-parametric equivalent
what measure of association would you use between continuous variables
correlation or lier regression
what is the difference between correlation and linear regression
c - association between two variables
lr - effect of one on the other
what is the definition of correlation and the two types
measure of linear association between two continuous variables (r)
persons - both variables normally distributed
spearman’s - either or both variables skewed
-1 = perfect negative linear relationship 0 = no linear relationship \+1 = perfect positive linear relationship
what are the pros and cons of correlation
simple method of association
order doesn’t matter
calculated between two variables only
assessment of straight line association only
cannot describe an exposure/outcome relationship or make predications
what is anscombes quartet
4 completely different relationships that give all the same r value - con against correlation tests as can’t describe the relationship
describe the uses of linear regression
models the relationship between two or more variables
can describe an exposure/outcome relationship
what does the r2 value of linear regression mean
the variability in the health outcome that is explained by the given causative variable - described between 0 and 1 = higher the better
what are the axis of linear regression
outcome on y
exposure on x
in data interpretation what is the difference between correlation and regression
c - tells us whether both variables do or don’t change together between -1 and 1
r - quantifies how they change together - a gradient
what would 0.32 mean in regression
means that for every 1% increase in exposure there is a 0.32% increase in outcome
how do you interpret the r2 value
the proportion of the outcome variance that the model explains
the larger the better between 0 and 1
what are point estimates in data interpretation
estimated values for particular points eg one patient comes in and diagnosed with A - other similar patient is diagnosed with B
what is the difference between confidence level and interval width
c l - how often is this true
i w - the boundaries in which the truth lies at this level
confidence level is proportional to confidence width ie with an increased interval you can be more certain that the trust value lies within it
what is the p value definition
the probability of a coefficient at least as big as yours, assuming the coefficient is actually zero
describe a small vs large p value
small = zero-assumption is probably wrong - an effect is likely
large - zero assumption is probably right - an effect is unlikely
what would a p value below 0.05 mean
statistically significant