Sampling, data description & probability Flashcards
Population (target population)
The totality of subjects about which we want to make inference
Biostatistics
Application of statistical methods to medical and biological problems
Sample
The subset of the population on which data is actually collected
Sampling error
The difference between the sample result and the true underlying population value. Error may be caused due to bias (sampled subjects are not representative of the population) or random variation (variation due strictly to chance, even with unbiased selection of subjects)
Sample of convenience
Take who you can get. Easy to obtain and bias may be a problem
Random sample
Every individual in the population has an equal chance of being in the sample. Used to ensure that uncontrolled factors do not bias results. May be difficult to obtain
Stratified sampling
Sample is drawn within each of two or more strata (groups with common characteristics). Used to improve accuracy in certain circumstances
Data variables
The measurement or observation made of the sampled subjects
Categorical
Values fit into natural categories. Ex. Gender, disease status, vital status
Discrete variable
Ordered numerical data restricted to integer values (count data). Ex. Number of siblings, # of days hospitalized
Continuous variable
Numerical data that can take on any value. Often limited by precision of measuring instrument. Ex. Age, height, weight, cholesterol
Descriptive statistics
Part of statistical methods that deals with organizing and summarizing data
Frequency distribution
A table of categories along with their observed frequencies. Categories may be natural (gender, race) or they may be crated from continuous variables by grouping values together (21-49 years old)
Histogram
Graphical representation of a frequency or relative frequency distribution. Used to determine shape of a distribution of data
Mean
Sum of all observations divided by n, the number of subjects
Median
Half of the values are below the median and half are above. The middle most observation of ordered data
Mode
The most frequently occurring observation in the sample
Range
The difference between the highest and lowest observation
Standard deviation
A measure of the average distance of each observation from the mean
Normal distribution
Widely used distribution with a bell shape. Mean, median and mode are equal. 68% of its area lies within 1 std dev of mean. 95% of its area lies within 1.96 std dev of mean. 99% of its area lies within 2.58 std dev of mean
Confidence interval
Also called confidence limits. An interval that describes where the population mean is likely to be with a certain level of confidence (usually 95%). Formula: mean +/- (1.96)(SEx)
Standard error of the mean
Computed as SEx= s/n^1/2
Set
Collection of distinct objects (ex. Sample of patients)
Event
A characteristic defining a subset of our set (ex. Affliction with a disease)
Pr(event)
Probability of the event, estimated as (# experiencing event)/(#in the set)
Conditional probability
Let e1 and e2 be two events. The conditional probability of e1 given e2 is expressed as Pr(E1|E2) = probability of e1 and e2 occurring jointly/probability of e2 occurring
General multiplication rule
Pr(E1 and E2) = Pr (E1|E2) x Pr(E2)