Data Interpretation Flashcards
What should be included in a good figure caption
Should be able to interpret the results just by looking at the figure, caption, legends without having to read the text associated with it
Brief description for the treatment conditions
Brief description of results
Statistical tests
Number of data points used to create graphs
What error bars and points represent
What is the difference between descriptive and inferential statistics
Descriptive
- describes and summarises a data set
- calculations are made without uncertainty
Inferential
- inference about a parameter of interest in the population, based on what is observed in a sample
- calculations are estimated with a degree of uncertainty
What are the 2 forms of inferential statistics
-what is the difference between them
Estimation
-estimation of a population parameter of interest from the value observed in the sample
Hypothesis testing
-way to test differences in the parameter of interest between groups and produces a p value
What are the 2 forms of estimation
Point estimate
-single value (best guess) of the parameter in the population
Interval
- defined by 2 numbers between which the population parameter is said to lie
- examples include confidence intervals
What are the types of variable
-what are examples of each type
Categorical Qualitative -classified in categories without intrinsic ordering Binary (2 categories) - sex Nominal (2+) - ethnicity
Ordinal (order values from low to high) - age groups
- spacing between values does not have to be consistent
- age groups
Numeric
Quantitative
Discrete - cell counts
Continuous - height
What operators would be used for these variables
- qualitative
- ordinal
- quantitative
Qualitative - = or ≠
Ordinal - < or >
Quantitative - +, =, x
How would you work out the average and spread for these variables
- qualitative
- ordinal
- quantitative
Qualitative
- mode
- frequency distribution - frequency tables, graphs (histograms, pie chart)
Ordinal
- median - 50% percentile
- absolute ranges - range between max and min value
- percentiles - the value below which a certain percent of observations fall
- IQR - the range between the 25th and 75th percentile
Quantitative
- mean
- variance - average squared deviation of each observation from the mean
- SDs - square root of the variance
How does are the mean and median influenced by extreme values
Means is affected by extremes
Medians are not
In the situation where you have extremes, medians would be a better measure of averages
Describe what the mode, median and mean would be in a normal distribution
Bell curve
All 3 would be the same
Describe what % of observations would fall within -1SD -2SD -3SD in a normal distribution
1SD - 68% of observations would be within 1SD from the mean
2SD - 95% of observations would be within 2SD from the mean
3SD - basically all observations would be within 3SD from the mean
What are the 2 possible skewed distribution curves
-how will this affect the mean and median
Negatively skewed - the longer tail of distribution points in a negative direction
-mean is less than the median
Positively skewed - the longer tail of distribution points in a positive direction
-mean is more than the median
What are the dangers associated with categorising continuous variables
But why is this done
Done to improve clinical interpretation of results
However this may lead to
- a loss of information, leading to a loss of statistical power (loss of ability to detect a difference)
- the impact of the choice of cut-offs on results is problematic when the choice is not based on a strong a priori rationale
What is the difference between a population vs sample
-when is the use of a sample ok
Population
-represents whole group we are interested in
Sample
- too time consuming and expensive to contact the whole population
- representative group taken from population
- use this sample to infer information about the whole population
Sample results are appropriate when they are
- valid - sample is representative of population
- accurate - sample size is large enough
How do standard error, standard deviation and confidence interval differ from each other
Standard error
- describes the accuracy of the point estimate
- used to calculate CI
- 95% confidence interval indicates the range of values likely to include the true value in the population
Standard deviation
- measure of spread (variability)
- used for descriptive statistics to calculate intervals showing variability in the data
- for data sets with a normal distribution, 95% of data points will fall within 2SDs of the mean
Confidence interval
-estimated range of values likely to include the true unknown value in the population
Why do we use 95% confidence intervals
-what does this mean
Good compromise between 90% and 99%
95% CI
-if we repeated the same sampling from the same population 100 times, 95 of the 100 CIs would contain the true population parameter
How would we calculate the standard error of the mean
SD of sample/square root of sample size
How would you calculate the 95% CI of the mean
-what are we assuming with this calculation
sample mean +- (1.96 x SE)
We expect 95% of sample means to lie within (1.96 x SE)
We assume that
- the mean of all samples that could be drawn from our population follow a normal distribution
- SE corresponds to the SD of all sample means (this is different from the SD of the original data)
- distribution follows the 3sigma rule
How does sample size impact on the accuracy of your estimate
SE is inversely proportional to the square root of the sample size
So larger the sample size, the greater the accuracy, lower SE and narrower CI
How to calculate SE for binary variables
-when would you use this
square root [p(1-p)/n]
p = sample proportion n = sample size
SE of proportion is an approximation and only of use if the sample size is large
-a large sample size allows us to assume that the estimate is from a normal distribution and the SE is well estimated
np and n(1-p) should exceed 5 for the SE to be a good approximation
How would you calculate the 95% CI of the proportion
p +- (1.96 x SE)
What is a
- hypothesis
- hypothesis testing
Hypothesis - a statement about the true value of parameters and the relationship in a defined population
Hypothesis testing - procedure, based on the observed values of the parameters in a sample of the population, to determine whether the hypothesis is a reasonable statement
What are the steps involved in hypothesis
Define hypothesis
Perform test
Calculate test statistics
-the measure that summarises the difference or relationship that you want to test
Estimate p value
-tells you if you should accept or reject your H0
Interpret test results
What is the difference between the
- null hypothesis H0
- alternative hypothesis H1
H0
- assumed to be true
- there is no true difference or relationship between the observed values in the sampled population
H1
-there is a true difference between the observed values in the sampled population
What is the difference between a 2 sided and 1 sided alternative hypothesis
2 sided
- the difference can be in either direction
- default
1 sided
- the difference can be in 1 direction only
- recommended if there is strong supporting evidence that the effect can be in one direction only
For qualitative tests, which statistical test would you use
- unpaired
- paired
Unpaired - Pearson X2
Paired - McNemar X2
For ordinal tests, which statistical test would you use
- unpaired
- paired
Unpaired - Mann Whitney U
Paired - Sign
For quantitative unpaired tests, which statistical test would you use
- parametric
- non-parametric
Parametric - Student’s T test
Non-parametric - Mann Whitney U
For quantitative paired tests, which statistical test would you use
- parametric
- non parametric
Parametric - Student’s T test
Non-parametric - Wilcoxon signed-rank test