Biostats Flashcards
What is the name given to a descriptive measure computed from the data of a sample?
A statistic
What is the name given to a descriptive measure computed from the data of a population?
A parameter
If a distribution is skewed to the right, what is the relationship between mean, median and mode? Where is the tail of this graph?
Mode
If a distribution is skewed to the left, what is the relationship between the mean, median and mode? Where is the tail of this graph?
Mode > Median > Mean
Tail is on the left
What does a 95% confidence interval imply in theoretical probabilistic terms?
That in repeating sampling from a normally distributed population, 95% of all intervals will (in the long run) include the true population mean
What does a 95% confidence interval imply in practical terms?
We are 95% confident that the interval will include the true population mean
Null hypothesis
Means are equal
Alternative hypothesis
Means are different
Usually synonymous with research hypothesis
What is a Type I error?
When we reject the null hypothesis, although the null is true (“Null = no wolf. Therefore Type I = False alarm”)
What symbol is the probability of a Type I error denoted by?
a
What do we call the quantity 1-a?
The level of confidence
What is a Type II error?
When we fail to reject the null hypothesis when the null is false (“Null = no wolf. Therefore Type II = Failing to raise the alarm”)
What symbol is the probability of a Type II error denoted by?
B
What do we call the quantity 1-B?
Statistical power of the test (probability of rejecting the null hypothesis when it is false)
What is a P value?
A measure of how much evidence we have against the null hypothesis. The smaller it is, the more evidence we have against the null.
What does P>0.05 mean?
That we cannot reject the null hypothesis as we have insufficient evidence to do so. This does NOT necessarily mean that the null hypothesis is true, only that there is insufficient evidence to reject it
What is a paired t-test used to do?
It is used to determine whether there is a significant difference between paired observations, for example, before and after an intervention.
What is the significance of a confidence interval that crosses 0?
If the 95% confidence interval for the difference between two means includes zero, then the hypothesis test WILL give a statistically non-significant result (p>0.05)
What is a variable?
Any aspect of an individual that is measured or recorded
What is a categorical variable?
Qualitative
What is a numerical variable?
Quantitative
What is the relationship between mean, median and mode in a symmetrical distribution?
They are approximately equal/very similar
What percentage of observations are included in the following ranges of a normal distribution:
mean ±1, 2 and 3 standard deviations of the mean?
What is another name for these values?
1: ±68%
2: ±95%
3: ±99%
These can also be referred to as the probability limits
What is a synonym for a sample statistic?
Point estimate
What is the name given to the measurement of the degree of variation between means from repeated samples? How is this calculated?
Standard error of the mean
SEM = SD/sqrt(n)
How is a 95% confidence interval calculated?
mean±1.96*SEM
What is standard error?
A measurement of the precision of estimates. It shows how good the estimation of the mean is (SEM)
What is standard deviation?
A measurement of the variability of distributions. It shows how widely scattered the measurements are (SD)
When is a result inconclusive?
When a sample size is too small to declare an observed effect to be statistically significant
When is a result imprecise?
If a small sample size results in wide confidence limits
What does a t-test measure?
The difference between two means from independent samples of numerical variables
What are the underlying assumptions of a valid t-test?
- Sampled populations are normally distributed
- Standard deviations are similar
What is the non-parametric equivalent of a t-test and when is this used?
- Wilcoxon sum of rank test
- It is used when there is an obvious non-normal data set
What does a Chi-squared test measure?
The relationship between two categorical variables, and the differences in proportions
What is the purpose of Pearson’s correlation?
To measure the degree of association between two numerical variables
Binary variable
Categorical
Observations fall under one of two categories
e.g. exposed vs non-exposed
Nominal
Categorical
Observations fall under more than two categories
e.g. classification of disease; marital status
Ordinal
Categorical
Observations fal under more than two categories which can be ordered
e.g. classification according to mild, moderate, severe
Continuous data
Numerical
Data are measurements that can assume any value within a a specified range
e.g. height, weight, blood pressure
Discrete data
Numerical
Data are integers/counted numbers of events
e.g. number of births in a week, number of patients in a clinic
Which summary statistics are reported if a data set is symmetrically distributed?
Number of observations
Mean
Standard deviation
Which summary statistics are reported if a data set is skewed?
Number of observations
Median
Range
Interquartile range
What is a population?
A group of individuals having certain common characteristics about which statistical inferences can be made
What is a sample?
A subset of individuals selected from a defined population
Benefits of sampling
Less time to collect and analyse data
Greater flexibility in data management and type of information that can be obtained
Reduce cost
Random sampling
Each individual in the population has an equal
chance of being included in the sample
Stratified sampling
Stratified sampling is used when the population
consist of distinct sub-groups or strata, which
differ considerably with respect to the main feature under study
Systematic sampling
A process where individuals are selected systematically throughout the series on a basis of a predetermined sampling fraction
Cluster sampling
Study population is divided into clearly defined groups or clusters (example: street-blocks or areas around informal housing units)
A random sample of clusters are drawn