Biostatistics Flashcards
Sampling
The process to determine who we are going to study/examine.
Purpose: To find out information without talking to everyone.
Two types of sampling
Nonprobability
Probability
Used most frequently in quantitative research
Systematic technique is used to select respondents – goal is to create a sample as representative of the population as possible
Nonprobability Sampling
Less generalizability; problem with representativeness.
Lower confidence in findings.
Useful when probability sampling can’t be used.
Four common methods…
Purposive, convenience, snowball, quota
Probability Sampling
Use to generalize to population at large
Works toward representativeness
Used in all large-scale surveys/observational studies
Avoids sampling bias – selecting atypical folks.
Numerous ways to introduce bias into your sample.
Representative
Your sample is like the population Random selection! All members have an equal chance of being selected… EPSEM Equal Probability of Selection Method Probability samples are never perfect More representative than non-probability Probability theory allows us to estimate accuracy
Element
Individual members of the population
Population
The entire set of elements
Sampling frame
List of all the elements in a population
Parameter
Summary of a given variable in a population
Statistic
Summary of a given variable in a sample
Sampling distribution
All the possible random samples that could be selected
Simple Random Sample
Base of sampling Need a list (sampling frame) Assign a number Select by a random number Random number list
Systematic Sampling
Determine number needed Divide population by sample number desired (we call this our sampling interval, denoted here by ‘k’) List and number our elements Randomly select start point Select every k-th elements within groups Caution: avoid periodicity!
Stratified Sampling
Possible modification of previous techniques
Random sample from sub populations
Betters representativeness
Decreases some sampling error
Homogenous subsetscertain number of elements within subsets
Allows oversampling
Cluster Sampling
More complex methodologically (not conceptually, I hope)
Cluster = Groups of elements
Multi-stage
Basic stages/steps: listing and sampling
Helps with cost and dispersed populations
Increases sampling error potential
two samples – double the error opportunity
Comparability (of control & exp groups)
Randomization Recruited folks (who may have been selected using nonprobability sampling techniques) are randomly placed into control and exp. groups. Matching Assign people to group based on characteristics so groups match.
Sampling Error
Variation in values of your sample mean compared to the population mean
Because of sampling error, we probably won’t always have completely accurate estimates
Deviation between sample results and population
Reduce by:
Increase sample size
Increase homogeneity
THE NORMAL CURVE
Characteristics (from central limit theorem):
Theoretical distribution of scores
Perfectly symmetrical
Bell-shaped
Unimodal
Tails extend infinitely in both directions
Mean, median, and mode are equal
NOTE: CENTRAL TENDENCY AND DISPERSION OR VARIABILITY.
Assumption of normality of a given empirical distribution makes it possible to describe this “real-world” distribution based on what we know about the (theoretical) normal curve
We use this assumption to generalize sample findings to a population
.68 of area under the curve (.34 on each side of mean) falls within 1 standard deviation (s) of the mean
In other words, 68% of cases fall within +/- 1 s
About 95% of cases/values fall within 2 s’s
About 99% of cases fall within 3 s’s
The z-distribution
Just a special case of the normal dist.
Idealized mean of 0 and s.d. of 1
Allows us to use a corresponding z-table to look up critical values
Common critical z-scores (set by conf. level – see next slide):
- 65 = 90% CL
- 96 SE = 95% CL
- 58 SE = 99% CL
Confidence level
(also called significance level)
Probability our sample statistics fall within a given confidence interval.
We set this ahead of time and denote as alpha (α). Most frequently, it’s α = .05 (95%).
Confidence interval
Range within ‘true’ parameters should lie, a range of values around the estimate (point estimate)
Upper and lower limit for the confidence level
Many of the biomedical books use CI = mean +/- 1.96(standard errors), but this assumes a 95% confidence level (that’s where they are getting the z-score of +/-1.96).
Setting up a CI
- You need to set a level of confidence ( alpha). Often, as we said, = .05, or 95% confidence.
- Calculate the mean (or proportion). On exams, this will likely be provided.
- Calculate the standard error, also usually provided. (if not, they will give a standard deviation. Calculation for SE = s.d./√N)
- Based on #1, we will know how many standard errors to use – (precision on this from a z-scores table).
1.65 = 90% CL
1.96 SE = 95% CL
2.58 SE = 99% CL
Calculate CI = mean score +/- z-score (which is usually 1.96, or rounded to 2) x SE
More clearly: CI = mean +/- 1.96 (SE)
Calculation for SE
s.d./√N)
Calculate CI
mean +/- 1.96 * (SE)
What influences confidence intervals
The width of a confidence interval depends on three things
/ confidence level: The confidence level can be raised (e.g., to 99%) or lowered (e.g., to 90%).
N: We have more confidence in larger sample sizes so as N increases, the interval decreases
Variation: more variation = more error
For proportions, % agree closer to 50%
For means, higher standard deviations
Hypothesis
A prediction about the relationship between 2 variables that asserts that changes in the measure of an independent variable will correspond to changes in the measure of a dependent variable
Research vs. Null hypotheses
Research hypothesis
H1
Typically predicts relationships or “differences”
Null hypothesis
Ho
Predicts “no relationship” or “no difference”
Can usually create by inserting “not” into a correctly worded research hypothesis
In Science, we test the null hypothesis!
Assuming there really is “no difference” in the population, what are the odds of obtaining our particular sample finding?
DIRECTIONAL VS. NONDIRECTIONAL HYPOTHESES
Non-directional research hypothesis
“There was an effect”
“There is a difference”
Directional research hypothesis
Specifies the direction of the difference (greater or smaller) from the Ho