STATS (BIOL 243) FALL 2024 Flashcards
five Hierarchical scales
sample unit
sample
observation unit
statistical population
population of interest
sampling unit
the unit being selected at random, it may be the same as the observation unit or contain multiple observation unit
sample
collection of the sampling units
observation unit
scale of data collection, subject of study
statistical population
collection of all sampling units that could have been in your sample, and represents the true scale in which your statistical conclusions are valid
population of interest
collection of sampling units that you hope to draw conclusion about
scope of the research question
ideally the same as your statistical population
measurement variable
what we want to know/measure about the observation unit
measurement unit
scale
descriptive stats
set of tools used to describe data
inferential statistics
uses information from the sample to make a probolistic statement about the statistical population
what is the rule for descriptive and inferential stats when there are multiple groups i a statistical population
descriptive stats are repeated for each group but inferential stats are only done once and can be used to make statements about the differences between groups
ideal sampling design
- all sample units have a probability of being included
- selection of sampling units must be unbiased
- selection of sampling units are independent
- each possible sample has an equal chance of being selected
observational studies
- researchers have no control over the variables
- it characterizes something about an existing statistical pop
- a tool for discovering associations, but can not make statements about the involvement of the sampling unit (cannot establish causation cause there is no way to know if the the factor is governed by something else
response variables
variable the investigators are interested in
explanatory variable
variable that the investigator believes may explain the response variable
confounding variables
unobserved variables that affect the response variable
simple random
starts by identifying every sampling unit in the statistical population and then selecting a random subset of those to be in your sample. Each sampling unit has the same probability of being included in your sample.
stratified
used when the statistical population has some grouping (strata)
clustered
observation units are contained within a larger group that we can randomly sample (geographicl or organizational)
case control
when there is a known outcome we are trying to explain
cohort
select a sampling unit, follow them through time to see if they developed the result we want
retrospective
studies where the results are already known
ie. case control studies
prospective
outcome is not yet known
ie. cohort studies
cross-sectional
study a response variable at only a single snap shot of time
ie. simple random
longitudinal surveys
study a response variable at multiple points of time
which of the following distinguishes case control from cohort surveys
a. Whether the survey is cross-sectional or longitudinal
b. Whether strata are defined ahead of time or not
c. Whether the survey design is retrospective or prospective
d. Whether clusters of observation units were selected at random or not
c. Whether the survey design is retrospective or prospective (correct)
Which of the following distinguish stratified from clustered surveys?
Whether the survey is cross-sectional or longitudinal
Whether strata are defined ahead of time or not
Whether the survey design is retrospective or prospective
Whether clusters of observation units were selected at random or not
Whether strata are defined ahead of time or not
You design a study where you randomly select 10 car models from within each category of electric, hybrid electric-gas, gasoline, or diesel. For each model, you find the purchase cost and estimate how much it will cost you to drive the vehicle for the next 10 years. What type of survey design is this?
Stratified survey
Your children are young teenagers and you hear them listening to an entirely new genre of music called Korean Pop. You are curious whether it is just your kids that are listening to Korean Pop or if other kids their age are as well. You decide to find out by approaching 15 parents at the next Parent Teacher Night. Being a bit of a statistical geek, you mentally number each of the parents while they are talking to teachers. You pull out your cell phone with a list of random numbers and use these numbers to randomly select the parents that you approach to ask. What type of survey design is this?
Simple random survey
You are a researcher interested in the rates of mental illness in Canadian cities. You randomly select 120 cities across Canada, and conduct a survey of each to get a single estimate of per capita incidence of mental illness. The design of this surveying method is best characterized as:
cluster survey
corner stone of experimental studies
replication
number of sample units =?
number of replicates
pseudoreplicates
an error in the design of an experimental study where the observation units are analyzed instead
the common design elements/types
- control
- blocking
- blinded (single and double)
- placebo
- sham treatment
control treatment
reference treatment to compare against the treatment levels
blocking
used to control variation among the sampling units (similar to stratified sampling it forms subgroups or “blocks”)
single blinded
when the sampling unit does not know what treatment they are being exposed to
double blinded
both researcher and sample unit are unaware
placebo
often used in medical trials as the control treatment that helps accomplish a blinded design (has no effect)
sham treatment
method used in control treatments, accounts for the affect of delivery of a treatment that is not of interest
compare and contrast between sham and treatment
Imagine a study that evaluates the effectiveness of different over-the-counter pain relievers in alleviating the symptoms of arthritis: acetaminophen, ibuprofen and acetylsalicylic acid. Two hundred patients are randomly assigned to receive one of these three pain relievers, or to receive a placebo (control). How many factors and levels are evident in this study?
1 factor with 4 levels
Patients who are blinded to the experimental treatment is a crucial part of a randomized clinical trial. Why?
Reduces the possibility of placebo effects
Reduces biases in measurements stemming from the anticipation of a treatment effect
What is the reason for blinding the researcher to what experimental treatment a patient is going to receive?
Reduces biases in measurements stemming from the anticipation of a treatment effect
Reduces the possibility of placebo effects
What design characteristic distinguish experimental studies from observational studies?
Whether sampling units are randomly assigned to treatments or not.
A researcher studied the effect of the prescription drug raloxifene on fracture risk in postmenopausal women. They found that women who took raloxifene over a five year period reduced their risk of clinical vertebrate fracture compared to women who did not take the drug. What are the factors and levels in this experiment?
There is one Factor (drug) with two Levels (raloxifene, no raloxifene).
variable
any measurable characteristic of an observation
datum
value of the variable
continuous numerical variable
can take on any value (1.2 or 1/4 etc.)
discrete numerical
can only be whole numbers
ordinal categorical variable
can take on qualitative values but the values are on a ranked scale
nominal categorical variable
takes on qualitative values but they do not have any particular order
eg. types of fruit
What is the data type for describing your age
Continuous numerical
What is the data type for the description: child, teenager, adult?
Ordinal categorical
What is the data type for the number of students in a class?
Discrete numerical
What is the data type for the letter grade on your exam?
Ordinal categorical
What is the data type for the percent grade on your exam?
Continuous numerical
central tendency
describes the typical values in our sample (eg. mean)
the second quartile
dispersion
describes the spread of the values
counts
categorical variable
of observations in your sample that fall within a particular category
proportions
percentages
variance
variance measures the amount of variation
the average squared distance of each data point from the sample mean
σ^2
calculating variance
calculate the mean
find the diff between each data point and the mean
square the value
sum the squares and divideby the # of observation points
Quartiles
ranked bins of data
1. sort from lowest to highest
finding the second quartile
split the data in half, according to
a. if you have a odd data set then quartile 2 is the middle value
b. if a even data set the the second quartile is the average of the two middle values
finding the first quartile
subset the lower-valued half of observations, then use the rules in the second quartile to find the middle value
note the 2nd quartile is included if the # of observations is odd
3rd quartile
repeat steps for quartile 1 in the upper valued half
dispersion aka interquartile range
range of inner-most 50% of the data
between Q1 and Q3 (Q3-Q1)
Calculate the mean & median of the following data:
7.5 9.9 8.6 10.3 8.5 9.4 15.1
Mean is 9.9, median is 9.4
Would the mean or median be a better descriptor of the ‘middle’ value for this set of data?
7.5 9.9 8.6 10.3 8.5 9.4 15.1
Median
Calculate the population variance & interquartile range (IQR) of the following data:
7.5 8.6 8.9 8.5 9.4 10.7 15.1
Variance is 5.5, IQR is 1.5
Calculate the interquartile range (IQR) for the following set of numbers and indicate what range the answer lies within.
10.1, 18.6, 19.8, 15.7, 21.9, 12.9, 11.8, 26.0, 13.0, 12.9
5 ≤ ANSWER < 7
Calculate the interquartile range (IQR) for the following set of data and indicate what range the answer lies within.
46.7, 18.7, 39.4, 7.2, 19.8, 42.1, 2.6, 17.1, 30.7, 21.9
19 ≤ ANSWER <23
meaningfulness
the difference among groups important to your study
effect size
whether the change in the response variables is meaningful for a practical study
The rate of home ownership in Canada decreased from 46% in 2004 to 44% in 2011. What is the effect size as a difference between the years?
-2%
do relative effect sizes have units
no
In the United Kingdom, 56% of older adults (55+ years) get their news from the television whereas only 12% of youth (18-24 years) do. What is the relative effect size of youth compared to older adults?
4.7 (0.56/0.12)
absolute effect size
the actual difference in outcomes
ie. 80%-60%=20%
relative effect size
Relative effect size compares the outcomes between two groups as a ratio or percentage.
(80% / 60%) = 1.33, or a 33% increase
marginal distributions
sum the values in each row
sum the values in each column
in the last box add up every row and column, this helps make proportions
shows how many sampling units are in each level of one categorical variable
good way to describe patterns