KNPE 251 1/2 Flashcards
5 hierarchal scales
-Sampling Unit
-Sample
-Observation Unit
-Statistical Population
-Population of Interest
Sampling Unit
unit being selected at random
Sample
collection of sampling units randomly selected
Observation Unit
scale for data collection
Statistical Population
collection of all sampling units that could have been in sample
Population of interest
collection of sampling units that you hope to draw a conclusion about (Scope of research question)
Measurement variable
what we want to measure about the observation unit
Measurement unit
scale of measurement variable (cm, years etc)
What is the measurement unit if the data is categorical
no measurement unit
what is descriptive statistics used for?
describe the data in the sample
What is inferential statistics used for?
describe statistical population population based on sample
steps to carry out study
- Sampling
- Measure
- Calculate Descriptive Statistics
- calculate inferential statistics
Goals of ideal sampling designs
-all sampling units must have some probability of being included in sample (p>0)
-Selection of sampling units are unbiased
-selection of sampling units are independent
-Each possible sample has equal chance of being selected
what is an observational study
based on observations of a statistical population
*researchers do not have any control over the variables
Primary goal of observational study
characterize something about an existing population
Limitations of observational study
cannot make statements about whether a factor CAUSES the response you are interested
Response Variable
response you are interested in
Explanatory variable
factor you are investigating
Confounding Variable
unobserved variables that affect the response variable
Spurious
when the relstionship between and explanatory variable and response variable is thought to be driven by a confounding variable
Simple Random Survey
starts by identifying every sampling unit in the statistical population and then selecting a random subset for those samples
Stratified Survey
-used when there are subgroups within the statistical population that can influence the results
-break statistical population into strata then sample within each strata
**strata must be defined ahead of time by researcher
**each strata has equal weighing in sample
Cluster Survey
-used to remove diversity in the statistical population that is not relevant to research question
-create groups where the non-relevant diversity is contained within each group
-can be done in one or two stage designs
One stage cluster design
data is collected from ALL observation units in a cluster
Two stage cluster design
a subset of observation units are randomly selected within each cluster
Case-Control Survey
-used to compare data between 2 groups
-1st group is the “case” and contains sampling units with a particular response variable
-2nd group is the “control” and contains sampling units without the response variable of the case grou
-purposely biased as it aims to select sampling units for the case group based on a measured resposne variable and compare that to the control group
***high spurious chance
**retrospective
Cohort survey
-follow sampling units over time, looking for development of a particular response variable
-goal is to select a random set of sampling units and observe over time
**outcomes unknown when sampling units selected
**prospective
Retrospective
outcome is already known, looking back in time
*increase risk of spurious relationships
Prospective
outcome is not yet known, looking forward in time
Cross-sectional
ones that study a response variable at only a single snapshot in time
Longitudinal
studying a response variable at multiple points in time
experimental studies
-treatment only starts once put in the category
-based on creating treatments where the researcher controls one or more variables
-establish cause-effect among variables
-each manipulated variable is called a factor (each factor has levels)
the 2 steps when sampling units are selected at random in experimental studies
- Selection
- Assignment
Replication
the idea that a treatment will be repeated a number of times to see how reputable a measured outcome is
Pseudoreplication
where the observation units are analyzed rather than the sampling units
*this is an error in the design of an experimental study
Types of experimental study designs
-control treatment
-blocking
-blinding
-placebo
-sham treatment
Control treatment
contains everything except the actual treatment; reference to compare treatment levels against
Blocking
predefined groups where treatments are applied within each group; you can randomly allocate your sampling units to the treatments, but cant do it across groups
Blinding
sampling unit does not know what treatments are applied within each group
(double blind: researcher does not know either)
Placebo
given substance/treatment that has no affect on the response variable
Sham Treatment
controls for treatments that require handling the sampling unit
(aims to account for effect of delivery of a treatment that is not of interest)
3 pieces of information a variable contains
-what the variable represents
-the measurement unit
-description of the observation unit
4 subtypes of variables
continuous, discrete, ordinal and nominal
continuous variable
can take on continuous numbers (any value including fractions)
discrete variable
can only take on whole numbers (integers)
ordinal categorical variable
can take on qualitative values but where values are from a ranked scale
nominal categorical variables
can take on qualitative values but where values have no particular order
what is central tendency
describes typical values in a sample
*2nd quartile, the median
what is dispersion
describes the spread of values
*range of inner-most 50% of the data, 3rd to 1st quartile (IQR)
what do central tendency and dispersion depend on
whether the variable is numerical or categorical