Lesson1 Flashcards
Describe summary statistic
a single number summarizing a large amount of data
Describe numerical and categorical variables.
Numerical= quantitative: numbers that can be ordered and calculated
Categorical= qualitative: non-numeric values
What is a discrete variable?
A numerical variable with only whole numbers or only certain values, ie shoe sizes or “numbers with jumps”
What is a continuous variable?
A numerical variable that can take on an infinite number of values ie temperature
What is an ordinal variable?
A type of categorical variable, where the possible values are ordered, ie grade level
Describe a data matrix.
A spreadsheet where the rows are “cases,” and the columns are “variables.”
What is it called when two variables show a connection?
Association or dependence
Observational study
Collect data in a way that does not interfere with how the data arise, no treatment. Eg surveys, medical records, or follow a cohort of many similar individuals
Define experiment
Studies where researchers assign treatments to cases, the experiment is randomized when the treatments are assigned randomly
What can an experiment show that an observational study cannot?
Causation. An observational study only shows correlation.
What do you call an observational study where the data points take place in the future? The past?
Prospective: future (as events unfold)
Retrospective: past
Is a survey an observational study or an experiment?
Observational study
Why does an experiment show causal relationship?
The random assignment of treatment to the subjects. In an observational study, you can’t show the treatment is randomly assigned.
What is the difference between a positive or negative correlation?
A positive correlation (pos slope) is when both variables go up. If one variable goes down but the other variable goes up (neg slope), its considered a negative correlation.
What do you call two variables that are NOT associated
Independent.
No pair of variables can be both independent and associated
Define confounding variable
extraneous variables that affect both the explanatory and the response variable, making it seem like there is a relationship between them
eg. sun exposure affect on whether sunscreen prevents skin cancer
Define response variable
When we suspect one variable might causally affect another, we say the explanatory variable might affect the response variable
Define explanatory variable
When we suspect one variable might affect another, we say the explanatory var might affect the response var
How can sampling show a better measure than taking a census?
Some individuals could be harder to measure, populations rarely stand still
Define inference
When you sample to show something is happening in a population
Name potential sources of sampling bias
Convenience sample: easily accessible individuals
Non-response: surveys
Voluntary response: people with strongest opinions might respond (no initial random sample)
Name some sampling methods
Simple random sample SRS
Stratified sample
Cluster sample
Multistage sample
What is a simple random sample SRS
When each case in a population is equally likely to be sampled and knowing that a case included in a sample does not provide useful info about which other cases are included
What is a stratified sample
Divide the sample into homogeneous strata(groups into similar cases), then randomly sample
Allows for controlling for potential confounders
Useful when cases in each stratum are very similar with respect to the outcome of interest, downside is that analysis is more complex than SRS
What is a cluster sample?
Break up population into many groups called clusters, then sample a fixed number of clusters and include all observations from each cluster in the sample
Downside is that more advanced technique is needed for analysis.
What is a multistage sample?
Divide the population clusters, randomly select a few clusters , then randomly sample within these clusters eg financial basis for neighborhood sampling avoids need to travel to all neighborhoods
Downside is that more advanced technique needed for analysis
What are the steps to experimental design?
Control
Randomize
Replicate
Block
What does it mean to control in exp design?
Try to control for any other differences in the randomly assigned treatments, eg ask patients to drink 12oz of water with pill, rather than whatever amount they want
What is randomization in study design?
Randomize cases into treatment groups to account for variables that cannot be controlled, eg some patients might be more susceptible to a disease due to their diet, randomization also prevents accidental bias
What does it mean to replicate an experiment design?
Replicate is creating a sufficiently large sample or replicate the entire study to verify an earlier finding.
What does it mean to block for an experimental design?
Block for variables known or suspected to affect the outcome ie, when testing energy gel, the pro status of an athlete might affect results, so you would assign equal number of pro and amateur, and randomly assign each group to gel or no gel
Define blocking or explanatory variables (factors)
conditions we can impose on experimental unit
eg: response variable- exam performance, block variable- gender (known or suspected to affect outcome), explanatory variables- light and noise (might be affecting the outcome)
Define treatment groups vs control group
Treatment group receives the drug, or treatment, control does not
What is a double blind exp?
When both experimental units and experimenters don’t know treatment
What is random sampling?
Each subject is equally likely to be selected- likely generalizable to population at large
What is random assignment?
When differently characterized subjects are represented equally, it allows us to make causal conclusions, this is only in experimental situations
Is the experiment causal or generalizable? if:
random assignment
random sampling
Causal and generalizable
Is the experiment causal and generalizable? if:
random sampling
no random assignment
not causal but generalizable
Is the experiment causal and generalizable? if:
No random sampling
Random assignment
causal but not generalizable
Is the experiment causal and generalizable? if:
no random assignment
no random sampling
neither causal nor generalizable
Does random assignment affect causation or association?
If random assignment is used, then causation is measured
If no random assignment, then association is measured
How does random sampling affect generalizability?
With random sampling, the data is generalizable, without random sampling it is not
Is observational data causal and generalizable?
not causal but generalizable
What sampling and assignment does an ideal experiment include?
Random sampling and random assignment- not often possible.
What is a variable’s levels?
If a variable is categorical, the possible values are called the variables levels
Define lurking variable
Similar to confounding variable in that its a variable that affects the results of a study but lurking variables are unknown and unmeasured.
Define multistage sampling
Divide population into cluster samples and then collect a random sample from selected clusters.