Sampling, data description & probability Flashcards
Population (target population)
The totality of subjects about which we want to make inference
Biostatistics
Application of statistical methods to medical and biological problems
Sample
The subset of the population on which data is actually collected
Sampling error
The difference between the sample result and the true underlying population value. Error may be caused due to bias (sampled subjects are not representative of the population) or random variation (variation due strictly to chance, even with unbiased selection of subjects)
Sample of convenience
Take who you can get. Easy to obtain and bias may be a problem
Random sample
Every individual in the population has an equal chance of being in the sample. Used to ensure that uncontrolled factors do not bias results. May be difficult to obtain
Stratified sampling
Sample is drawn within each of two or more strata (groups with common characteristics). Used to improve accuracy in certain circumstances
Data variables
The measurement or observation made of the sampled subjects
Categorical
Values fit into natural categories. Ex. Gender, disease status, vital status
Discrete variable
Ordered numerical data restricted to integer values (count data). Ex. Number of siblings, # of days hospitalized
Continuous variable
Numerical data that can take on any value. Often limited by precision of measuring instrument. Ex. Age, height, weight, cholesterol
Descriptive statistics
Part of statistical methods that deals with organizing and summarizing data
Frequency distribution
A table of categories along with their observed frequencies. Categories may be natural (gender, race) or they may be crated from continuous variables by grouping values together (21-49 years old)
Histogram
Graphical representation of a frequency or relative frequency distribution. Used to determine shape of a distribution of data
Mean
Sum of all observations divided by n, the number of subjects