Statistics Flashcards
Define variable
Aspect that can take different values for different participants
What are the two main types of variable?
Categorical
Quantitative
What are two types of categorical variable?
Nominal - unordered labelled characteristics
Ordinal - small set of ordered/ranked categories
What descriptive statistics would you do for categorical data?
Frequency
Relative Frequency
What type of graph would you create for categorical data?
Bar chat
What descriptive statistics would you do for quantitative data?
Averages
Variation
Symmetry
What types of graph would you create for quantitative data?
Histogram
Box Plot
Box and whisker
What is normal distribution?
Mathematically defined theoretical distribution
Define descriptive statistics
Describe and summarise data in the sample
i.e. how common are certain characteristics, how are different characteristics related to each other
Define inferential statistics
Using sample data to make inferences about characteristics and relationships in the populations
i.e. standard errors, confidence intervals, p-values
What is standard error?
Indicates how far, on average, the sample estimate is expected to be from the true population parameter value
What is the difference between standard error and standard deviation?
SE summarises precision of an estimate
SD summarises variability of an estimate
What is a confidence interval?
Range of values in which we can be confident the true value lies
What is a p-value?
Quantifies the extent to which the sample estimate contradicts the null hypothesis
What is a null hypothesis?
The most boring truth imaginable
What is an alternative hypothesis?
Opposite of the null hypothesis
Usually two tailed - contradictions to the null hypothesis in either direction
What is a Type I and Type II error in hypothesis testing?
Type 1 error - null being rejected when it is true, ‘significant’ result due to chance
Type 2 error - null not rejected when it is false, study not powerful enough
What is a two sample (unpaired) t-test used for? What are the assumptions?
Tests for mean difference between two independent groups Assumptions: Variable is normally distributed SD is similar Observations are not paired
What is a paired t-test used for? What are the assumptions?
Used when observations are linked in some way (e.g. before and after)
Analysis based on within-pair differences between groups
Assumption:
Within-pair differences are normally distributed
What is an ANOVA used for? What are the assumptions?
For comparing 3 or more independent groups Assumptions: Each group is normally distributed SD is similar Observations are independent
What is a repeated measures ANOVA used for? What are the assumptions?
For comparing 3 or more paired groups
Assumptions:
Difference scores between any two groups are normally distributed
SD of different scores should be the same for all combined groups
How do non-parametric tests work?
Analyse rank ordering rather than actual scores
Compare distributions rather than means
When do we use non-parametric tests?
When assumptions for parametric tests do not hold
e.g. variable is skewed, SD differs markedly, variable is more ordinal than quantitative
What is a wilcoxon (rank sum) test used for?
Compares two independent groups
What is a kruskal-wallis test used for?
Comparing three or more independent groups
What is a wilcoxon signed rank test used for?
Compares two paired groups
What is a Friedman test used for?
Compares three or more paired groups
What are the advantages and disadvantages of non-parametric tests?
Advantages: Always valid for quantitative data Often provide similar p-values to quantitative tests Disadvantages: Do not make direct inferences Do not provide CIs Based on analysis of ranks, not scores When assumptions hold, not as powerful
How do you calculate proportion?
No. in category of interest/total no. of participants
What does the term risk mean?
The proportion/percentage of people with a specified disease in a population
How do you calculate odds?
No. in category of interest/No. in other category
OR
% in category of interest/ 100-% in category of interest
How are odds and % related?
The higher the %, the higher the odds
In what type of study are odds particularly useful?
Case-control studies
How would you summarise binary variables in 2 independent groups?
Cross tabulate - exposure variable and outcome variable
When calculating absolute measures, what indicates no difference?
0
How do you calculate risk difference?
% of people affected in one group - % in the other
When calculating relative measures, what indicates no difference?
1
How do you calculate risk ratio/relative risk?
% affected in intervention group / % in the control
How do you calculate odds ratio?
Odds in one group / odds in the other
What does 0, 0< and 0> mean in term of risk difference?
0 = groups equally likely to have disease 0< = first group more likely to have disease 0> = second group more likely to have disease
How do you calculate absolute risk reduction?
Difference in % points between groups
What is number needed to treat?
Number of people that need to receive intervention before one person benefits from it
How do you calculate number needed to treat?
100/risk difference
What does a risk ratio of <1 mean?
Disease occurrence is lower in the intervention group
How do you calculate relative risk reduction?
(1-risk ratio)x100
How do you calculate odds ratio?
odds of disease in exposed group / odds of disease in non-exposed group
What are risk difference and NNT good at quantifying?
Impact of an intervention
What are risk ratio and odds ratio good at quantifying?
Strength of association between intervention and disease status
Which two tests give p-values when comparing binary variables between two groups?
Chi-squared test
Fishers exact test
How do you calculate an ‘expected value’?
(row total x column total) / total sample size
What are the assumptions of the Chi-sqaured test?
Total sample size is at least 40
OR
If sample is between 20 and 39, the expected value in each cell is at least 5
What are the assumptions of the Fisher’s Exact Test?
Fewer than 20 participants
Between 20 and 39 participants and the expected value in at least one cell is less than 5
What is the definition of correlation?
The association between two variables
How can correlation be summarised graphically and numerically?
Graphically - scatterplots
Numerically - correlation coefficients
What does a correlation coefficient do?
Quantifies the strength of association between two variables
What is the difference between Pearson and Spearman correlation coefficient?
Pearson - linear relationship
Spearman - non-linear associations (monotonic - only positive or only negative)
What does R Squared tell us?
The proportion of the variation in one variable that is explained by another variable
How do we calculate R squared?
Multiply Pearson’s coefficient by itself
What is linear regression used for?
Estimating the mathematical equation that describes the linear relationship between a quantitative outcome and a quantitative predictor
What is the least squares estimation?
Method of estimating regression coefficients
Derives line of best fit/regression line
Estimate of the true regression line in the population
What are the assumptions of regression?
Outcome is quantitative
Relationship is linear
Residuals are normally distributed
Constant variance
What are the similarities and differences between regression and persons CC?
Pearsons quantifies strength of association
Regression describes the relationship and can be used to predict outcome variable score
How do we calculate sensitivity?
TP / TP + FN
How do we calculate specificity?
TN / TN + FP
What is PPV?
Positive predictive value - proportion of those with a +ve result that have the condition
What is NPV?
Negative predictive value - proportion of those with a -ve result that do not have the condition
How do we calculate PPV?
TP / TP + FP
How do we calculate NPV?
TN / TN + FN