Sampling, Data Description & Probability Flashcards
Sampling error
when measuring something in a sample of the population, the difference between the sample result and the true underlying population value (if you measured everyone). This error may be the result of biased selection (i.e. sampling bias) or random variation.
biased selection (i.e. sampling bias)
sampled subjects are not representative of the population. Bias can be minimized through good study design.
random variation
variation due strictly to chance; sample to sample variability that we expect in any study, even with unbiased selection of subjects.
Types of samples
Sample of Convenience, Random sample, Stratified sampling
Sample of Convenience
take who you can get:
- easy to obtain
- bias may be a problem
Random sample
every individual in the population has an equal chance of being in the sample:
- used to ensure that uncontrolled factors do not bias results
- may be difficult to obtain
Stratified sampling
sample is drawn within each of two or more strata (groups with common characteristic):
- used to improve accuracy of results in certain circumstances
Categorical variables
values fit into natural categories
Examples: gender, disease status, vital status, type of bone break (hairline, simple, etc).
discrete variables
ordered numerical data restricted to integer values (count data)
Examples: # of siblings, # of days hospitalized, # of pregnancies
continuous variables
numerical data that can take on any value. Often limited by precision of measuring instrument (e.g. height to 1/4 inch)
Examples: age, height, weight, cholesterol, blood pressure
Frequency distribution
a table of categories along with their observed frequencies.
a. categories may be natural (e.g. gender, race, type of fracture) or they may be created from continuous variables by grouping values together (e.g. age < 21 yrs, 21-49, 50+)
b. percentages are often included (relative frequencies)
c. may also include cumulative percentages
Distribution shapes
Unimodal and symmetric (Mean = Median = Mode), Bimodal, Skewed left (Mean < Median)/right (Mean > Median)
Mean (x)
the sum of all observations divided by n, the number of subjects
Median
Half the values are below the median and half are above. The middle-most observation of ordered data. If the data are ordered from smallest to largest, the median is
- the observation in the middle of the list if n is odd
- the mean of the two middle observations if n is even
Mode
the most frequently occurring observation(s) in the sample