Lesson1 Flashcards
Describe summary statistic
a single number summarizing a large amount of data
Describe numerical and categorical variables.
Numerical= quantitative: numbers that can be ordered and calculated
Categorical= qualitative: non-numeric values
What is a discrete variable?
A numerical variable with only whole numbers or only certain values, ie shoe sizes or “numbers with jumps”
What is a continuous variable?
A numerical variable that can take on an infinite number of values ie temperature
What is an ordinal variable?
A type of categorical variable, where the possible values are ordered, ie grade level
Describe a data matrix.
A spreadsheet where the rows are “cases,” and the columns are “variables.”
What is it called when two variables show a connection?
Association or dependence
Observational study
Collect data in a way that does not interfere with how the data arise, no treatment. Eg surveys, medical records, or follow a cohort of many similar individuals
Define experiment
Studies where researchers assign treatments to cases, the experiment is randomized when the treatments are assigned randomly
What can an experiment show that an observational study cannot?
Causation. An observational study only shows correlation.
What do you call an observational study where the data points take place in the future? The past?
Prospective: future (as events unfold)
Retrospective: past
Is a survey an observational study or an experiment?
Observational study
Why does an experiment show causal relationship?
The random assignment of treatment to the subjects. In an observational study, you can’t show the treatment is randomly assigned.
What is the difference between a positive or negative correlation?
A positive correlation (pos slope) is when both variables go up. If one variable goes down but the other variable goes up (neg slope), its considered a negative correlation.
What do you call two variables that are NOT associated
Independent.
No pair of variables can be both independent and associated
Define confounding variable
extraneous variables that affect both the explanatory and the response variable, making it seem like there is a relationship between them
eg. sun exposure affect on whether sunscreen prevents skin cancer
Define response variable
When we suspect one variable might causally affect another, we say the explanatory variable might affect the response variable
Define explanatory variable
When we suspect one variable might affect another, we say the explanatory var might affect the response var
How can sampling show a better measure than taking a census?
Some individuals could be harder to measure, populations rarely stand still
Define inference
When you sample to show something is happening in a population
Name potential sources of sampling bias
Convenience sample: easily accessible individuals
Non-response: surveys
Voluntary response: people with strongest opinions might respond (no initial random sample)
Name some sampling methods
Simple random sample SRS
Stratified sample
Cluster sample
Multistage sample
What is a simple random sample SRS
When each case in a population is equally likely to be sampled and knowing that a case included in a sample does not provide useful info about which other cases are included
What is a stratified sample
Divide the sample into homogeneous strata(groups into similar cases), then randomly sample
Allows for controlling for potential confounders
Useful when cases in each stratum are very similar with respect to the outcome of interest, downside is that analysis is more complex than SRS