Term Test Flashcards
hierarchical scales
simplify process of developing a statistical study design
1. sampling unit
2. sample
3. observation unit
4. statistical population
5. population of interest
sampling unit
unit being selected at random (can be same as observation unit)
sample
collection of sampling units that you randomly selected
observation unit
scale for data collection
- subject of the study
statistical population
collection of all sampling units that could’ve been in your sample
- is defined by your study design
population of interest
collection of sampling units that you hope to draw a conclusion about
- defined by your research question
- same as statistical population, but often population of interest is larger
Ex. of hierarchy for design (street address)
pop. of interest: all people of voting age in kingston
statistical population: all addresses in kingston
sampling unit: street address
sample: 100 random street adresses
observation unit: a person
measurement variable: voting intent
measurement unit: none cause measurement is categorical
measurement variable
what we want to measure about the obervation unit (height, age)
measurement unit
scale of measurement variable (cm for height, years for age)
descriptive statistics
characterize data in your sample (quantitative)
- averages, tables & graphs
inferential statistics
uses information from sample to make a probabilistic statement about statistical population (qualitative)
- confidence intervals
***takes uncertainty into account
4 steps to statistical framework
- sampling
- measuring
- calculating descriptive statistics
- calculating inferential statistics
inferential vs descriptive statistics
inferential:use info from data to make statement about STATISTICAL POPULATION
descriptive: use info from data to make statement about OUR SAMPLE
subgroups
divide the population in groups
sampling design
describe how to sample a statistical population in a fair way
4 goals of an ideal sampling design
- all sampling units are selectable
- selection is unbiased
- selection is independent
- all samples are possible
- all sampling units are selectable
every sampling unit has probability of being included
- selection is unbiased
probability of selecting certain sampling units cannot depend on any attribute of that sampling unit
- selection is independent
selection of sampling unit must not decrease or increase the probability that any other sampling unit is selected
- all samples are possible
all samples that could be created from statistical population are possible
bias
over-or-under estimate of a value from an average sample compared to a statistical population
observational studies
based on observations of a statistical population where researchers do not have any control over the variables which impact our conclusions
- ex. cant control confounding variable so relationships aren’t causal
goal of observational studies
characterize something about an existing statistical population that allows us to investigate relationships among variables
limitations of observational studies
cannot make statements about whether a factor causes the response you’re interested in
response variable
response you are interested in
- ex. tobacco
explanatory variable
factor you investigate
- ex. lung cancer
confounding variables
unobserved variables that affect a response variable
spurious relationship
when relationship between explanatory and response variables is thought to be driven by confounding variable
simple random survey
sampling units are selected at random from the statistical population where each sampling unit has the same probability of being in your sample
stratified survey
researcher creates strata then takes samples within each strata
strata
name given to a subgroup within the statistical population in a stratified survey
cluster survey
used to remove diversity in the statistical population thats not relevant to research question
- cluster= sampling unit
- nesting inside the cluster=observational unit
one-stage clusters
data are collected from all observation units in a cluster
two-stage clusters
a subset of observation units are randomly selected within each cluster
case-control survey
used to compare data between two groups
2 groups:
- case
-control
***strong risk of spurious relationship
case group (first group)
contains sampling unit WITH a particular response variable
control group (second group)
contains sampling unit WITHOUT response variable of the case group
cohort survey
sampling unit are selected and followed over time
- use simple random survey and then observe their fate over time
retrospective studies
where outcome is already known (increases risk of spurious relationships)
ex. case-control studies
prospective studies
where the outcome is not yet known (require more effort, but decrease risk of spurious relationships)
ex. cohort studies
cross-sectional studies
study a response variable at only a single snapshot in time
longitudinal studies
study a response variable at multiple points in time
experimental studies
based on creating treatments where the researcher controls one or more variable
goals of experimental studies
study effect of one or more manipulated variables on one or more random variables
- establishes cause and effect
factor
each manipulated variable has two levels/groups
replicates
number of times treatment is repeated on randomly selected units
- number of replicates is the number of sampling units in an experimental study
pseudoreplication
an error in the design of an experimental studies where the observation units are analyzed rather than sampling units
levels
different values of the factor
control treatments
contains everything except the treatment
blocking
used to control for variation among sampling unit thats not of interest that alter experimental variable
***PREDEFINED
blinded
a design where the sampling unit (usually a person) does not know what treatment they are being exposed to
single blind design
sampling unit does not know the treatment they are assigned
double blind design
both the researcher and sampling unit do not know what treatment they are assigned to
***removes accidental bias
placebo
method used for control treatment that helps accomplish a blinded design
- substance or treatment that has no effect on response variable
sham treatment
aims to account for the effect of delivery of a treatment thats not of interest of researcher
multiple factors
one factor could be drug type and another is diet type
interaction
when two explanatory variables have effects that are different than the simple sum of each variable in isolation
variable
any measurable characteristic of an observation unit (varies among sampling units)
3 pieces of information a variable contains
- what the variable represents
- measurement unit
- description of the observation units
data
value of a variable you measure
continuous numerical variable
can take on continuous numbers (fractional numbers)
ex. weight =107.23kg
discrete numerical variable
can take on only whole numbers (integers)
categorical
data is a qualitative description
- no measurement units
ordinal categorical variable
categorical (qualitative) variables that have ORDERED levels
ex. use emojis to describe how you feel