AP Stats Flashcards
categorical variable
labels that place each individual into a particular group
ex: race, sex, age group
quantitative
takes number values that are quantities, counts, or measurements
ex: height, weight, cost
quantitative discrete
a fixed set of possible values
ex: how many green marbles you draw out of a bag
quantitative continuous
any value in an interval on the number line
ex: time
relative frequency table
shows the proportion or percent of individuals having each value.
two-way table
a table of counts that summarizes data on the relationship between two categorical variables for some group of individuals.
categorical graphs
- Pie Charts
- Pictographs
- Dot-plots
- Bar Graphs
- Side-by-side Bar Graphs
- Segmented Bar Graphs
- Mosaic Plots
Association
if knowing the value of one variable
helps us predict the value of the other
Simpson’s Paradox
a contradiction between what we see when looking at individual
categories and the subtotals for our distributions when dealing with categorical
variables
Quantitative Graphs
- Dot-plots
- Stem-plots
- Histogram
- Boxplots
- Ogives
how we describe distributions
CUSS + BS
-C: center
-U: unusual outliers
-S: spread
-S: shape
-BS: be specific
mean
average of all individual data values
Statistic
a number that describes some characteristic of a sample
-ex: asking 20 random people their height and averaging results
Parameter
a number that describes some characteristic of a population
-ex: asking everyone in the population
their height and averaging results
Range
difference between the maximum value
and the minimum value
Standard Deviation
the typical or average distance of the values in a distribution from the _mean
Interquartile Range (IQR)
IQR = Q3 – Q1
lower Outlier Test
Lower Outliers < Q1 – 1.5(IQR)
higher outlier test
Higher Outliers > Q3 + 1.5(IQR)
response variable
measures an outcome of a study
explanatory variable
may help predictor explain changes in a
response variable
population
the entire group we want to know about
census
collects data from every individual in a population
sample
a subset of individuals in the population
population parameter
ex: google says that 79% of people everywhere have a dog
*it’s from everyone
sample statistic
student asks 100 people if they have a dog and 70% say yes
*it’s from a small sample
convenience sampling
selects individuals who are easy to reach
voluntary response sampling
allows people to respond if they want to
SUDS
*Used for describing data
-S: strength (strong, moderate, weak)
-U: unusual values (outliers)
-D: direction (positive, negative)
-S: Shape (bell, Bimodal, skewed, uniform)
correlation(r)
measures the direction and strength:
-strong: close to 1 or -1
-does not imply causation
-does not measure form
-only for linear relationships
Regression Line(LSRL)
line that models how a response variable y changes as an an explanatory variable x changes
ŷ=a+bx
ŷ=a+bx
ŷ: predicted y
a: y-intercept(a=ȳ -bx̄)
b: slope(b= r sy/sx)
x: x variable
extrapolation
uses the regression to predict a value outside of the interval
residual
actual y - predicted y OR (y-ŷ)
least squared regression line(LSRL)
the sum of the squared residuals as small as possible
coefficient of determination(r²)
n% of variability in y can be explained by the linear model
stratified random sampling
selects a sample by choosing an SRS from each group and combining the SRSs into one big sample
cluster sampling
selects a sample by randomly choosing clusters and including each member of selected clusters in the sample
systemic random sampling
selects a sample from an ordered arrangement of the population by randomly selecting one of the first k individuals and choosing every kth after that
under-coverage
when some members of the population are less likely to be chosen or can’t be chosen for a sample
non-response
when an individual chosen for tyhe sample cannot be contacted or refuses to answer
response bias
when there is a systematic pattern of inaccurate answers
observational
observes results without trying to influence them
experiment
deliberately and randomly imposes treatments to measure responses
confounding
when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other
principles of experimental design
1.) how many in each
2.) randomization
3.) repeats?
4.) rest/stop
5.) assign to treatment groups
match pair design
a common experimental design for comparing two treatments that use blocks of 2
block
a group of experimental units that are known before the experiment to be similar in some way in terms of response to treatment
random selection
allows inference about the population from which the individuals were chosen
random assignment
allows for inference about cause and effect
law of large numbers
if we observe more and more trials of any random process, the proportion of times that a specific outcome occurs approaches its probability
mutually exclusive
when you have one, you CANT have the other
categorical conditions
☑ randomization
☑ independence OR 10% rule
☑ np rules
quantitative conditions
☑ randomization
☑ independence OR 10 % rule
☑ central limits theorem: n>30