Module 1-8 Flashcards
Biostatistics
-Statistics is not merely a compilation of computational techniques
-Statistics
~Is a way of learning from data
~Is concerned with all elements of study design, data collection, and analysis of numerical data
~Does require judgment
-Biostatistics is statistics applied to biological and health problems
Biostatisticians are
-Data Detectives
~Who uncovers patterns and clues
~This involves exploratory data analysis (EDA) and descriptive statistics
-Data Judges
~Who judge and confirm clues
~This involves statistical inference
Measurement
-Measurement (defined)
~The assigning of numbers and codes according to prior-set rules
-There are three broad types of measurements
~Nominal
~Ordinal
~Quantitive
Nominal Measurements
-Classify observations into named categories
~No order
~Typically two categories (binary (yes/no)), but can have more categories but can not be ordered
-Ex:
~HIV status (positive or negative)
~Sex (Male or Female)
~Hair color (red, brown, black, blonde, gray, etc.)
Ordinal Measurement
-Categories that can be put in rank order
~Opinion
*More than two categories that have to be in order
-Ex:
~ Stage of cancer classified as Stage I, Stage II, Stage III, Stage IV
~Opinion classified as strongly agree (5), agree (4), neutral (3), disagree (2), strongly disagree (1)
~Age groups 0-4, 5-9, 10-14, etc.
Quantitative Measurements
-True numerical values that can be put on a number line
-Numerical values with equal spacing between numerical values
-Ex:
~Age (years)
~Serum cholesterol (mg/dL)
~T4 cell count (per dL)
Illustrative Example:
-Weight Change and Heart Disease
-This study sought to determine the effect of weight change on coronary heart disease risk
-It studied 115,818 women, 30-55 years of age, free of CHD over 14 years
-Measurements included the following variables
~Nominal (including Binary)
*CHD onset (yes or no)
*Family history of CHD (yes or no)
~Ordinal
*Non-smoker, light-smoker, moderate smoker, heavy smoker
~Quantitative
*BMI (kgs/m^3)
*Age (years)
*Weight presently
*Weight at age 18
Observation, Variable, and Value
-Observation
~The unit upon which measurements are made and can be an individual or aggregate
-Variable
~The generic thing we measure
*Age of person
*HIV status of a person
-Value
~A realized measurement
*“27”
*“positive”
Data Table
-Each row corresponds to an observation
-Each column contains information on a variable
-Each cell in the table contains a value
-Units of observation in these data are individual regions, not individual people
~Table 1.2 in the textbook
Measurement Inaccuracies
-Imprecision
~The inability to get the same result upon repetition
-Bias
~A tendency to overestimate or underestimate the true value of an object
Biostatisticians are
-Data Detectives
~Who uncovers patterns and clues
~This involves exploratory data analysis (EDA) and descriptive statistics
-Data Judges
~Who judges and confirms clues
~This involves statistical inference
Types of Studies
-Surveys
~Describe population characteristics
*A study of the prevalence of hypertension in a population
-Comparative Studies
~Determine relationships between variables
*A study to address whether weight gain causes hypertension
2.1 Surveys
-Goal
~To describe population characteristics
-Studies a subset (sample) of the population
~Census vs. Sample
-Uses sample to make inferences about population
-Sampling
~Saves time
~Saves money
~Allows resources to be devoted to greater scope and accuracy
Illustrative Example
-Youth Risk Behavior Surveillance (YRBS)
-YRBS monitors health behaviors in youth and young adults in the US. Six categories of health-risk behaviors are monitored. These include:
~Behavior that contributes to unintentional injuries and violence
~Tobacco use
~Alcohol and drug use
~Sexual behaviors
~Unhealthy dietary behavior
~Physical activity levels and body weight
-Ex:
~Several million public and private school students in the US in 2003
~Sampling
*15,240 questionnaires completed at 158 schools
Types of Samples
-Probability Sample
-Simple random sample (SRS)
-Stratified random sample
-Cluster sample
Types of Samples
-Non-probability sample
-Convenience sample
Sampling
-Probability samples
~Use chance mechanisms to select individuals
-Most basic type of probability sample is the simple random sample (SRS)
-SRS
~Each population member has the same probability of being selected into the sample
~Selection of any individual into the sample does not influence the likelihood of selecting any other individuals
Simple Random Sampling Method
- Identify each population member with a number 1,2,…N
-Use a random number generator to generate n random numbers between 1 and N
~Ex:
*http://www.random.org/integer-sets/
-Keep in mind
~The objective of an SRS is that every possible subset is equally likely!
Let’s Select 5 random IDs from our class ID (1-79)
-Step one
-Generate 1 set with 5 unique random integers in each
-Each integer should have a value between 1 and 79 (both inclusive; limits +/- 1,000,000,000)
-The total number of integers must be no greater that 10,000
Class ID Selection
-15, 31, 40, 48, and 76
-Random Integer Set Generator
~One requested 1 set with 5 unique random integers, taken from the [1,79] range. The integers were sorted in ascending order
*Here is the set
**Set 1: 15, 31, 40, 48, 76
Sampling
-Sampling Fraction = (n)/N
~n is the sample size
~N is the size of the population
-Sampling with replacement
~Tossing selected members back into the mix after they’ve been selected
~Any given unit can appear more than once in the sample
-Sampling without replacement
~Selected units are removed from possible future reselection
Other Types of Probability Samples
-(More Advanced Methods)
-Stratified random sample
~Random sample strata (subset) with the population
~Ex:
*The population can be divided into 5-year age groups (0-4, 5-9,…) with simple random samples of varying sizes drawn from each age-strata
-Cluster sample
~Randomly sample clusters comprising varying numbers of observations
~Ex:
*Households (cluster) are selected at random, and ALL individuals are studied within the clusters
Cautions when Sampling
-Undercoverage
~Groups in the source population are left out or underrepresented in the population list used to select the sample
-Volunteer bias
~Occurs when self-selected participants are atypical of the source population
-Nonresponse bias
~Occurs when a large percentage of selected individuals refuse to participate or cannot be contacted
2.2 Comparative Studies
-Comparative designs study the relationship between an explanatory variable and a response variable
-Explanatory variable
~Synonyms
*Independent variable, factor, predictor, exposure
~Treatment or exposure that explains or predicts changes in the response variable
-Response variable
~Synonyms
*dependent variable, outcome
~Outcome or response being investigated