Quizzes Flashcards
Probability is the process of drawing conclusions about the sample based on population data.
T/F
False
Big Picture Statistics order
1-Exploratory data analysis
2-Produce data
3-Probability
4-Inference
Multistage Sampling
In statistics, multistage sampling is the taking of samples in stages using smaller and smaller sampling units at each stage. Multistage sampling can be a complex form of cluster sampling because it is a type of sampling which involves dividing the population into groups.
Stratified Sampling Vs Cluster Sampling
Cluster sampling divides a population into groups, then includes all members of some randomly chosen groups. Stratified sampling divides a population into groups, then includes some members of all of the groups.
Observational Study vs Experiment
Observational Study: Individuals choose which treatment to receive or naturally belong to one of the treatment groups
Lurking variables that influence choice confounded with treatments
Passive data collection: observing, measuring, counting, subjects are undisturbed
Media often improperly attribute cause-effect conclusions to these
Experiment: A study design where treatments are imposed on subjects before observing response (manipulations, interventions)
To determine if treatments cause change in response
Quantitive vs. Categorical
Quantitive variable : variable whose possible values are meaningful numbers : cost, height, yield
Categorical variable: variable whose possible values are non-quantitative categories ex; gender, option
Lurking Variables
Control
Confounding
Lurking variables: variable that is related to the explanatory variable and the response variable and could affect your interpretation of the relationship among those variables
Control: an effort to reduce effects of lurking variables
Confounding: situation in which effects of lurking variable cannot be distinguished from effects of factors
Under-coverage
vs
Non-response
vs
Misleading response
Under-coverage
Some individuals have no possibility of being selected
Ex: homeless, phoneless
Non-response
Some individuals refuse to answer or can’t be contacted
Ex: hang-ups, on vacation, refusal to mail census form
Misleading response
Selected individuals lie or give inaccurate answer (sensitive issues)
Ex: do you wash your hands, have you cheated, do you struggle with mental health?
Response variable
vs
Explanatory variable
A response variable is the expected effect, and it responds to other variables.
The response variable is measured on the individual.
An explanatory variable is the expected cause, and it explains the results.
Control / Comparison
Benefit of Experiment
An experiment can be used to establish causation.
Randomized Block Experiment
vs
Randomized Controlled Experiment
vs
Matched Pairs
What is the advantage of a histogram over a stemplot or dotplot?
Histograms work well for very large data sets
Mean vs Median
3 steps
1.first construct histogram or stem plot, evaluate skewness and outliers
2.use median if markedly skewed or outliers are present
3.use mean if roughly symmetric
*Approximately equal if histogram is roughly symmetric
*Median “resistant” to outliers and long tails
*Mean has desirable properties for inference (much more on this later)
Mean + Median and Histogram
x(bar)= mean
M= median
symmetric distribution: x=M
left skew: x<M
right skew: x>M