Data analysis Flashcards
Nominal
data categorical, independent and coded as numbers
example of nominal
diagnosis eg 1 = stable angina 2 = unstable angina 3 = acute coronary syndrome STEMI 4 = acute coronary artery syndrome NSTEMI
ordinal
data are categorical and have a relative direction
ordinal eg
BMI: 1 = underweight 2 = normal weight 3 = overweight 4 = obese
interval
data relative to each other
no true zero (absence) exists
interval eg
temp:
degrees
kelvin
F
ratio
data relative to each other
true 0 (absence) exists
eg currency
what type of variable is colour
nominal - different primary colours
ordinal and interval - shades
ratio - wave lengths
what is a statistic
describes a characteristic of a sample
what is a parameter
describes a characteristic of a population
samples
study small gp (sample) to infer what would happen in the pop
want to be representative
intention to treat
all data from randomised trials
per protocol analysis
based on treatment received/completed
risk of adding bias
quantitive data
interval and ratio
qualitative
nominal and ordinal
axis labels for frequency distribution
y - frequency
x - variable
mean
add all values and divide by number of values
median
order all values in ascending order and choose the middle value
mode
identify the most frequent value measured
effect of mean=mode=median
normally distributed
symmetrical
effect of meanΒ»median>mode
shift to lower end
averages
different indications of true averages
depend on data interpretation
non-parametric need different statistical tests
normal distribution
bell shaped curve
or Gaussian distribution
mean =
Ex/n
variance =
E(x - mean)squared/(n-1)
sd =
π = βπππ
SEM =
ππΈπ= π /βπ
confidence interval =
mean +- z xSEM
z score for 95% ci
1.96
excel mean
=average(XX:XX)
excel variance
=var.s(XX:XX)
excel sd
=stdev(XX:XX)
excel SEM
=stdev(XX:XX)/sqrt(count(XX:XX))
comparision between SEM and sd
SEM
legend
remember n numbers
excel formulae for T test
ttest(array1,array2,tails,type)
t test tails
1 - see difference in 1 dirn
2 - not assuming anything, T test in both directions
type of T test
paired - same sample unpaired - completely independent 2 - unpaired, equal variance 3 - unpaired, unequal variance better to do 3 if unsure
what does the appropriate index of average and variability depend on
distribution
normality
key assumption
informs how data is managed, analysed and reported
when to use mean, median or mode
not mean when outliers
median - when data skewed - mean loses ability to show central value as data is dragging it away, median less stringly affected
mode - not with continuous, unlikely to get more than 1 people with exactly same value
mean - when normally distributed
when to use mean, median and mode with data typesn
nominal - mode
ordinal - median
interval/ratio not skewed - mean
interval/ratio data skewed - median
when to use different types of variation
SD - shows general variability, descriptive, assess overall variation, estimate percentiles of normally distributed data
SE - variability between samples, technical and inferential, assess precision of estimate
CI - indicates likely range of values - assess the certainty of an estimate and compare to bench marks