Quiz 1 Flashcards
Structured vs. Unstructured data
structured = quantitative/properties that vary in type of attribute/variable
unstructured = qualitative (cannot be categorized), properties that differ in amount
CCHS
Canadian community health survey
cross-sectional survey for health surveillance, health care utilization and health determinants
goal: support health surveillance programs at all levels, to provide single source data for health researchers, timely release of information easily available, flexible survey instrument with rapid response option
every 2 years, self reported 2x24hr food recalls, largest (n = 65000)
CHMS
Canadian Health Measure Survey
questionnaire data (household interview) & physical measurements for baseline NCD and exposure to infectious diseases and environmental contaminants, and biobank every 2 years (for future research)
Exclusions include military, kids <12, on reserve or institutionalized
Self reported FFQ and biomarkers
n = 5000, every 2 years
NHANES
National Health and Nutrition Examination Survey
Assess health and nutrition status of adults and children
questionnaire to determine prevalence of major disease and risk factors
physical measurements 2x24hr recall + biomarkers, every year
Syntax
Syntax = coding language used to perform data analysis operations
Variable
Variable = factor or attribute which can be assigned 2 or more values
Discrete vs Continuous variables
Discrete variables = have no intermediate value
Continuous variables = intermediate values between adjacent scale values can exist
Numeric vs string variables
Numeric = number based
String = character based (can include numbers)
Variable types:
Continuous, categorical, ordinal
Continuous variables
AKA scale
Cannot be string variables, must be numeric
Ratio (true zero with even separation scale) or interval (arbitrary zero)
Categorical
AKA nominal
Can be string or numeric
Ordinal
Categorial with implicit order
Can be string or numeric
Central limit theorem
central limit theorem = random average variables become normally distributed when observations are sufficiently large
Gaussian curve
Standard normal curve
Mean = median = mode = 0
Std dev =1
Parametric vs non parametric tests
Parametric tests - assume normal distribution of data and easier to interpret
Nonparametric tests - no normal distribution assumption