1.7 Biostatistics 1 Flashcards
What is a probability sample?
A probability sample is one that is chosen in a way that allows you to determine how likely it is that an individual from the population is included in the sample.
What are the methods for getting probability samples?
Simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling
Describe ‘Simple Random Sampling’
Every individual in the population has an equal chance of being part of the sample
Describe ‘Systematic Sampling’
Sample chosen from a list of al individuals in the population by selection every k^th individual, where k is any number. e.g. every 3rd person from the list. May inadvertently develop a pattern in sample.
Describe ‘Stratified Sampling’
The population is divided into strata (subgroups) and then a random sample is selected from each of the strata. Usually stratify for age and sex.
Describe ‘Cluster Sampling’
Clusters (groups) are selected from the population, and then a random sample of the individuals is taken form each of the selected clusters. May introduce some sort of bias (e.g. socio-economic status when clustering by streets or suburbs)
What is a ‘Non-Probability Sample’
A non-probability sample is one that is chosen in a way that makes it impossible to determine how likely it is that an individual from the population is included in the sample.
What are the four methods for getting non-probability samples?
Convenience Sampling
Quota Sampling
Purposive Sampling
Snowball Sampling
Describe ‘Convenience Sampling’
Asking med student to participate in research
Describe ‘Quota Sampling’
Med students recruited until quota for each gender is filled
Describe ‘Purposive Sampling’
Selecting individuals that exemplify the ‘typical med student’
Describe ‘Snowball Sampling’
Select a few ‘typical med students’ who ask their friends to join and so on.
Name the two types of data
Categorical
Numerical
Name the two types of categorical data
Nominal
Ordinal
Name the two types of numerical data
Discrete
Continuous
Describe Nominal Data
Numbers mean nothing, purely a way to name data. E.g. gender (Male = 1, Female = 2)
Describe Ordinal data
Some sort of rationale/ranking to numbering. e.g. In general, how do you rate your health? 1=poor, 2=fair and so on.
What are the three measures of central tendency?
Mean
Median
Mode
Describe the Mean
Sum of values / number of values.
Can be manipulated algebraically, most stable estimate of central tendency. However, is influenced by extreme scores and can’t be used for nominal data
Describe the Median
Middle point of ordered data (50% of scores fall either side). Relatively unaffected by extreme scores (useful for skewed distributions), can be used with ordinal data. However cannot be manipulated algebraically.
Descrive the Mode
The value that occurs most frequently in the data set. Can be used with nominal data, is a real score from the data set. However, may not be particularly representative of the data set and cannot be manipulated algebraically.
What are the 5 measures of variability?
Range Interquartile range Variance Standard deviation Standard error
How do you calculate the variance?
s^2.
Calculate the mean, subtract mean from each observation, square the difference. Sum the squared differences and divide by n-1. Not directly interpretable as variance can fall outside the values of the range.
How do you calculate standard deviation?
Square root of the variance.
Conveys how widely or tightly the observations are distributed from the central tendency (mean). A measure of variance that is on the scale of the mean.
How do you calculate Standard Error?
SE=SD/Square root of n. Often confused with standard deviation. Describes the variability we might expect in the means of repeated sample taken form the population. Quantifies variation in sample means.
It assumes the data is taken from one sample of an infinite number of possible samples from larger population.
In a normal distribution, what percentage of values lie within one standard deviation either side of the mean?
68%
In a normal distribution, what percentage of values lie within two standard deviations either side of the mean?
95%