Interpreting data Flashcards
What are the main issues with sampling?
- Target population can be difficult to determine and is subjective
- Sample must be selected from the population at random (each member of the population must have the same chance of being included in the sample
- A reasonably high response rate is needed (study sample must be high proportion of selected sample = representitive of population of interest)
What is sample data used to calculate?
- Estimate of population value e.g. mean
- Precision of estimate
- Range of values can be confident true population value lies within
- P value
= statistica inference - applied to study sample to provide information about the population
Distribution of sample means:
- Approximately normally distributed if sample size <100
- Mean of the distribution of sample means = population mean
What is accurate?
How close sample is to the true population value we are trying to estimate (cannot be assess without gold standard/repeated measurement to compare)
= centred on target
n.b. cannot be accurate if measuring the wrong people
What is precision?
Amount of variation in the estimate of the population (increased variation = decreased precision)
= all close together
What is standard error of the mean?
And how is it calculated?
= measurement of precision of sample mean as estimate of population mean
SE = SD / √sample size
Larger sample size = smaller SE = more precise
What is standard deviation?
Measurement of the amount of variability in the population (how far an individual observation is likely to be from the population mean
n.b. population SD is usually unknown so we have to use sample SD
What is not affected by the size of the sample?
Standard deviation
Standard error DOES as sample size increases, standard error decreases = more precise estimate of the sample mean
What are confidence intervals for means?
For large samples 95% of individual sample means are within 1.96 X SE of the population mean = 95% of the population mean will be within 1.96 X SE if an observed sample mean
Or… if several independent random samples are drawn from a population and sample measn are calculated for each, the confidence intervals will include the true population of the mean on 95% of occasions
How is a 95% confidence of 150.5 to 150.9 interpreted?
With 95% confidence, the mean height of 11 year olds in the general population is between 150.5 and 150.9 cm
What is the difference between reference ranges and confidence intervals?
RR = clinical practice -> using SD to inform if a patient is clinically normal or not
CI - statistical inference in research -> using SE and used in research when statistical inference is required (indication of how precise an estimate the sample mean is of the populations mean)
Which of the following statements is true?
- Data which are positively skewed can be used to calculate confidence intervals for mean values
- Standard deviations are used to calculate confidence intervals
- Confidence intervals provide a measure of precision e.g. of a smaple mean
- confidence intervals provide a measurement of precision e.g. of a sample mean
n. b.
not 1. as data should be normally distrivuted otherwise mean could be misleading
not 2. standard errors are used
How are different confidence intervals calculated?
Using different multipliers
95% = 1.96 X SE
99% = 2.58 X SE
90% = 1.64 X SE
these are based on standard normal distribution and obtained from statistical tables
What is a P value?
Probability of a difference at least as big as that observed if null hypothesis is true (no real difference between exposure groups of the population)
NOT the probability that the null hyporthesis is true
What does a smaller p-value show?
Stronger evidence against null hypothesis (lower chance of getting a difference as big as the on observed if the null hypothesis was true)
How are P values calculated?
Test statisitc = summary statistic/ SE e.g. difference in means between two groups
Compared with appripriate statistical distribution to derive p value (computer package tables -> compare with standard normal distribution/ t distribution)
What type of test is used to get p value for standard normal distributions?
z test
for large samples
What type of test is used to get p values from a t distribution?
t- test for large/small samples
= comparisons between means
What can we infer from a p value of <0.001 when the difference in means is 110g?
Probability of getting a difference of at least 110g (in either direction) if the null hypothesis is true is less than 0.1%
i.e. very strong evidence against the null hypothesis
When do we use two sided p-values?
Because we are interested to see if the absolute difference is >0
= assessment of probability that result due to chance is based on how extreme the size of the departure from the null hypothesis is and not its direction!
Whether or not to reject a null hypothesis should be based on confidence intervals and…?
The strength of evidence against the null hypothesis as assessed by the p value = most relaible way of interpreting findings
What are the 4 common errors made in interpreting data?
- Accepting a hypothesis if p value is large (should interpret it in terms of strength of evidence against the hypothesis)
- Interpreting p<0.05 as statistically significant
- Basing conclusions on p value only
- Ignoring the contect (if sample split into many subgroups and seperately tested we should take that in to consideration when comparing t values)
Why can we not ‘accept’ the null hypothesis?
To prove hypothesis -> have to find every person >90 and check non-smoker
To disprove hypothesis -> find just 1 person >90 who smokes
= much easier to find evidence against hypothesis than prove it correct (absence of evidence is not evidence of absence)
= p value of 1 can be obtained even when the null hypothesis of false -> cannot reject the null hypothesis
Why is p<0.05 not statistically significant?
preferable to consider exact p value /strength of evidence
Cutpoint of 0.05 is arbitury -> commly used but 1 in 20 get p < 0.05 purely by chance
p value depends on sample size (harder to get p value <0.05 in small samples)
How do we interpret p values?
- 1 = weak evidence against the null hypothesis
- 01 = increasing evidence against the null hypothesis
- 001 strong evidence against the null hypothesis
Why is it important to use both confidence intervals & p values?
Confidence interval = range of values within which we are reasonably confident the population difference lies
P values = strength of evidence against null hypothesis
STATISTICAL SIGNIFICANCE DOES NOT EQUATE TO CLINICAL IMPORTANCE -> even when p value shows strong evidence against the null hypothesis
Why must you look at the context when applying many statistical tests using different subgroups within a sampls?
P values should be interpreted with caution (small p value for one test simply due to chance)