AP stats summer Flashcards
learn all vocab
Statistics
the study of variability
Variability
how things differ
2 branches of ap stats
inferential and descriptive
descriptive stats
tell the data you collected using mean, median, mode, range
inferential stats
look at the data and use it for the big picture
data
any collected info
population
the group you are interested in
sample
A subset of a population, often taken to make inferences
about a population.
Compare population to sample.
Populations are generally large and samples are small
subsets of a population. We take samples to make
inferences about populations. We use statistics to estimate
parameters
Compare data to statistics
Data is each little bit of information collected from the
subjects…They are the INDIVIDUAL little things we collect…
we summarize them by, for example, finding the mean of a
group of data. If it is a sample, then we call that mean a
statistic. If we have data from every member of a
population, then that mean is called a “parameter”
Compare descriptive to inferential
STATS.
Descriptive explains to you about the data that you have,
inference uses the data you have to try to say something
about an entire population.
Compare data to parameters
Data is each little bit of information collected from the
subjects…They are the INDIVIDUAL little things we collect…
we summarize them by, for example, finding the mean of a
group of data. If it is a sample, then we call that mean a
statistic. If we have data from every member of a
population, then that mean is called a “parameter”
parameter
A numerical summary of a population. Like a mean,
median, range,…of a population
statistic
A numerical summary of a sample. Like a mean, median,
range,…of a sample
We are curious about the average wait time at a Dunkin Donuts drive through in your neighborhood. You randomly sample cars one afternoon and find the average wait time is 3.2 minutes. What is the population parameter? What is the statistic? What is the parameter of interest? What is the data?
The parameter is the true average wait time at that Dunkin
Donuts. This is a number you don’t have and will never
know. The statistic is “3.2 minutes.” It is the average of
the data you collected. The parameter of interest is the
same thing as the population parameter. In this case, it is
the true average wait time of each individual car, so that
would be like “3.8min, 2.2min, 0.8min, 3min”. You take that
data and find the average. That average is called a
“statistic” and you use that to make an inference about the
true parameter.
Compare DATA-STATISTIC-PARAMETER using Categorical Data
Data are individual measures…like meal preference “taco,
taco, pasta, burger, burger, taco”…Statistics and parameters
are summaries. A statistic would be “42% of the sample
preferred tacos.” A parameter would be “42% of the
population preferred tacos.”
Compare DATA-STATISTIC-PARAMETER using Quantitative Data
Data are individual measures, like how long a person can
hold their breath: “45 sec, 64 sec, 32 sec, 68 sec.” That is
the raw data. Statistics and parameters are summaries like
“the average breath holding time in the sample was 52.4
seconds” and a parameter would be “the average breath
holding time in the population was 52.4 seconds”
census
Like a sample of the entire population, you get info from
every member of the population.
does a census make sense
A census is ok for small populations (like Mr. Nystrom’s
students) but impossible if you want to survey “all US
teens”
What is the difference between a
parameter and a statistic?
BOTH ARE A SINGLE NUMBER SUMMARIZING A
LARGER GROUP OF NUMBERS… But pppp parameters
come from pppp populations… sss statistics come from ssss
samples.
If I take a random sample of 20 hamburgers from FIVE GUYS and count the number of pickles on a bunch of them and one of them had 9 pickles, then the number 9 from that burger would be called \_\_\_\_\_\_\_\_\_\_\_\_\_\_
a datum, or a data value.
If I take a random sample 20 hamburgers from FIVE GUYS and count the number of pickles on a bunch of them and the average number of pickles was 9.5, then 9.5 is considered a \_\_\_\_\_\_\_\_\_\_
statistic. (it is a summary of a sample.)
If I take a random sample of 20 hamburgers from FIVE GUYS and count the number of pickles on a bunch of them and I do this because I want to know the true average number of pickles on a burger at FIVE GUYS, the true average number of pickles is considered a \_\_\_\_\_\_\_\_\_\_\_\_
parameter, a one number summary of the population. The
truth. AKA the parameter of interest.
What is the difference between a
sample and a census?
With a sample, you get information from a small part of
the population. In a census, you get info from the entire
population. You can get a parameter from a census, but
only a statistic from a sample.
Use the following words in one
sentence: population, parameter, census,
sample, data, statistics, inference,
population of interest.
I was curious about a population parameter, but a census
was too costly so I decided to choose a sample, collect
some data, calculate a statistic and use that statistic to
make an inference about the population parameter (aka the
parameter of interest).
If you are tasting soup.. Then the flavor
of each individual thing in the spoon is the
________, the entire spoon is a ______..
The flavor of all of that stuff together is like
the _____ and you use that to __________
about the flavor of the entire pot of soup,
which would be the__________.
If you are tasting soup. Then the flavor of each individual
thing in the spoon is DATA, the entire spoon is a
SAMPLE. The flavor of all of that stuff together is like the
STATISTIC, and you use that to MAKE AN INFERENCE
about the flavor of the entire pot of soup, which would be
the PARAMETER. Notice you are interested in the
parameter to begin with… that is why you took a sample.
random variables
If you randomly choose people from a list, then their hair
color, height, weight and any other data collected from
them can be considered random variables.
What is the difference between
quantitative and categorical variables?
Quantitative variables are numerical measures, like height
and IQ. Categorical are categories, like eye color and
music preference
What is the difference between
quantitative and categorical data?
The data is the actual gathered measurements. So, if it is
eye color, then the data would look like this “blue, brown,
brown, brown, blue, green, blue, brown etc.” The data from
categorical variables are usually words, often it is simply
“YES, YES, YES, NO, YES, NO” If it was weight, then the
data would be quantitative like “125, 155, 223, 178, 222,
etc..” The data from quantitative variables are numbers.
- What is the difference between discrete
and continuous variables?
Discrete can be counted, like “number of cars sold” they
are generally integers (you wouldn’t sell 9.3 cars), while
continuous would be something like weight of a mouse?
4.344 oz
quantitative variable
Quantitative variables are numeric like: Height, age, number
of cars sold, SAT score
categorical variable
categorical variables are like categories: Blonde, Listens to Hip Hop, Female, yes, no,… etc.
what can a categorical variable be called
qualitative
quantitative data
The actual numbers gathered from each subject. 211
pounds. 67 beats per minute.
categorical data
The actual individual category from a subject, like “blue” or
“female” or “sophomore”
random sample
When you choose a sample by rolling dice, choosing
names from a hat, or other REAL RANDOMLY generated
sample. Humans can’t really do this well without the help
of a calculator, cards, dice, or slips of paper.
frequency
how often something comes up
data or datum
datum is singular.. Like “hey dude, come see this datum I
got from this rat!” data is the plural.. “hey look at all that
data Edgar got from those chipmunks over there!!”
frequency distribution
A table or chart that shows how often certain values or
categories occur in a data set.
relative frequency
The PERCENT of time something comes up
frequency/total
How do you find relative frequency?
divide the frequency by the total
cumulative frequency
ADD up the frequencies as you go. Suppose you are
selling 25 pieces of candy. You sell 10 the first hour, 5
the second, 3 the third and 7 in the last hour, the
cumulative frequency would be 10, 15, 18, 25
relative cumulative frequency
It is the ADDED up PERCENTAGES.. An example is
selling candy, 25 pieces sold overall…, with 10 the first
hour, 5 the second, 3 the third, and 7 the fourth hour,
we’d take the cumulative frequencies, 10, 15, 18 and 25
and divide by the total giving cumulative percentages… .40,
.60, .64, and 1.00. Relative cumulative frequencies always
end at 100 percent.
What is the difference between a bar
chart and a histogram
bar charts are for categorical data (bars don’t touch) and
histograms are for quantitative data (bars touch)
mean
average
It is the balancing
point of the histogram
What is the difference between a
population mean and a sample mean?
population mean is the mean of a population, it is a
parameter, sample mean is a mean of a sample, so it is a
statistic. We use sample statistics to make inferences
about population parameters.
What symbols do we use for population
mean and sample mean?
Population mean = μ
Sample mean = 𝑥̅
Mu for population mean, xbar for sample mean.
How can you think about the mean and
median to remember the difference when
looking at a histogram?
mean is balancing point of histogram, median splits the
area of the histogram in half.
median
the middlest number, it splits area in half (always in the
POSITION (n+1)/2 )
mode
the most common, or the peaks of a histogram. We often
use mode with categorical data
when do you use mode
With categorical variables. For instance, to describe the
average teenagers preference, we often speak of what
most students chose, which is the mode. It is also tells the
number of bumps in a histogram for quantitative data
(unimodal, bimodal, etc…)
Why don’t we always use the mean,
we’ve been calculating it all of our life ?
It is not RESILIENT, it is impacted by skewness and
outliers
When we say “the average teenager”
are we talking about mean, median or
mode?
It depends, if we are talking height, it might be the mean,
if we are talking about parental income, we’d probably use
the median, if we were talking about music preference,
we’d probably use the mode to talk about the average
teenager.