Chapters 1 and 3 Flashcards
What is Statistics?
The study of variability
What is Variability?
- how things differ
- –> it exists everywhere
- –> statisticians pay close attention to differences
ex. we all look and act differently
What are the two branches of AP Statistics?
Inferential and Descriptive
What are Descriptive Stats?
Used to describe the basic features of data in a study
ex. pictures, summaries such as mean, median, and mode, etc.
What are Inferential Stats?
uses a random sample of data taken from a population to describe/make inferences about the population (the big picture)
ex. tasting soup; take one sip to determine what the whole soup tastes like
Compare Descriptive and Inferential Stats
- descriptive: explains the data you have
- inferential: uses that data to say something about an entire population
What is Data?
any collected info
ex. survey about liking pizza; yes, no, yes, yes, no…
ex. number of cookies eaten/minute: 5, 4, 6, 7, 3…
What is a Population?
the group we are interested in (sizes may vary)
ex. “all teens in the US” or “all AP Stat students in my school”
What is a Sample?
a subset of a population
- taken to make inferences about a population
- all statistics are calculated from samples
Compare Population to Sample
Populations: generally large
Samples: small subsets of the population; taken to make inferences about population
Compare Data to Statistics
Data: each bit of info is collected from subjects; summarized by mean, median, mode, etc.
Statistics: the descriptions/summaries used for SAMPLES (mean, median, range, etc.)
Compare Data to Parameters
Data: each bit of info is collected from subjects; summarized by mean, median, mode, etc.
Parameters: the descriptions/summaries used for POPULATIONS (mean, median, range, etc.)
What is a Parameter?
a numerical summary of a population
ex. mean, median, range
What is a Statistic?
a numerical summary of a sample
ex. mean, median, range
Average wait time at a Dunkin Donuts drive thru. Cars are randomly sampled. Average wait time = 3.2 minutes. Population Parameter? Statistic? Data? Parameter of Interest?
Population Parameter: the true wait time (we will never know/have)
Statistic: average wait time (3.2 minutes)
Parameter of Interest: Population Parameter
Data: Wait time of each car
Compare Data-Statistic-Parameter using Categorical example
Data: individual measures ex. meal preference: taco, pasta, burger, pizza, taco... Statistics and Parameters are summaries ex. STAT: 42% of sample prefer tacos ex. PARA: 42% of population prefer tacos
Compare Data-Statistic-Parameter using Quantitative example
Data: individual measures
ex. how long a person can hold their breath (sec): 45, 64, 32, 68 (raw data)
Statistics and Parameters are summaries
ex. STAT: avg. breath-holding time of sample= 52.4 sec
ex. PARA: avg. breath-holding of pop. = 52.4 sec
What is a Census?
Information taken from each member of a population
Does a Census make sense?
a census works for small populations (Mr. Nystrom’s students); impossible for large populations (all US kids)
What is the difference between a Parameter and a Statistic?
Parameters come from Populations
Statistics come from Samples
If I take a random sample of 20 hamburgers from Five Guys and count the number of pickles on a bunch of them, and one of them had 9 pickles, then the 9 from that burger would be called ______?
A Datum or Data Value
If I take a random sample of 20 hamburgers from Five Guys and count the number of pickles on a bunch of them, and the average number of pickles was 9.5, then the 9.5 is considered a ______?
Statistic (it is a summary of a sample)
If I take a random sample of 20 hamburgers from Five Guys and count the number of pickles on a bunch of them, and I do this b/c I want to know the true average number of pickles on a burger at Five Guys, the true average number is called a ______?
Parameter; a one number summary of a population (aka the parameter of interest)
What is the difference between a sample and a census?
Samples contain info from a small part of a population. A Census contains info from the entire population.
Use the following words in one sentence: Population, Parameter, Census, Sample, Data, Statistics, Inference, Population of Interest
I was curious about a population parameter, but a census was to costly so I decided to choose a sample, collect some data, calculate a statistic and use it to make an inference about the population of interest.
If you are tasting soup, then the flavor of each individual thing in the spoon is the ___, the entire spoon is a ____. The flavor of all that stuff together is like the ______ and you use that to ______ about the flavor of the entire pot of soup, which would be the ______.
- The flavor of each individual thing in the spoon is DATA
- The entire spoon is the SAMPLE
- The flavor of all the stuff together is the STATISTIC
- MAKE AN INFERENCE about the flavor of the whole pot of soup, which would be the PARAMETER
What are Random Variables?
A variable whose possible values are of random phenomena
ex. hair color, height, weight, etc.
What is the difference between Quantitative and Categorical Variables?
Quantitative variables are numerical (height, IQ, etc.)
Categorical variables are categories (eye color, favorite music genre, etc.)
What is the difference between Quantitative and Categorical Data?
Quantitative: numerical
ex. measuring weight, data would be: 125, 155, 223, 178, 222, etc.
Categorical: categorical
ex. eye color: blue, brown, brown, brown, blue, green, etc. ; often uses words like yes and no
What is the difference between Discrete and Continuous Variables?
Discrete can be counted (ex. number of cars sold) and are integers.
Continuous cannot be counted; usually measurements (ex. weight of a mouse: 4.344 oz.)
What is a Quantitative Variable?
Numeric values
ex. height, age, number or cars sold, SAT score
What is a Categorical Variable?
Categories
ex. blonde, listens to hip hop, female, yes, no
What do we sometimes call a Categorical Variable?
Qualitative
What is Quantitative Data?
The actual numbers gathered from each subject: 211 lbs., 67 bpm, etc.
What is Categorical Data?
The actual individual category from a subject, like “blue”, “female”, or “sophomore”.
What is a Random Sample?
Randomly choosing subjects from a population.
Ex. rolling dice, choosing names from a hat, etc.
A real random sample requires external help (humans cannot do this on their own)
What is Frequency?
How often something comes up.
Data or Datum?
Datum is singular (collecting datum from a rat)
Data is plural (collecting data from a group of rats)
What is a Frequency Distribution?
A table or a chart that shows how often certain values or categories occur in a data set.
What is meant by Relative Frequency?
The PERCENT of times something comes up (frequency/total)
How do you find Relative Frequency?
Divide frequency by the Total
What is meant by Cumulative Frequency?
Add up the frequencies as you go down the table
Ex. selling candy; sell 10 in hour 1, then 5, 3, and 7. Cumulative Frequency is 10, 15, 18, 25
Make a guess as to what Relative Cumulative Frequency is
The added up percentages
Ex. selling candy; sell 10 in hour 1, then 5, 3, and 7. Cumulative Frequency is 10, 15, 18, 25. Divide by the total giving percentages: 0.40. .60, .64, and 1.00.
–always end at 100%
What is the difference between a Bar Chart and a Histogram?
Bar Charts are for categorical data (bars don’t touch)
Histograms are for quantitative data (bars touch)
What is Mean?
Average; it is the balancing point of a histogram
What is the difference between Population Mean and a Sample Mean?
Population Mean: mean of a population; a parameter
Sample Mean: mean of a sample; a statistic
What symbols are used for Population and Sample Mean?
Mu ( ) for population mean
X-Bar ( ) for sample mean
How can you think about the Mean and Median to remember the difference when looking at a histogram?
Mean is the balancing point of a histogram
Median splits the area of the histogram in half
What is Median?
The number in the middle; splits area in half (always in the position n+1/2)
What is Mode?
The most common values; peaks in a histogram.
When do we often use Mode?
Categorical Variables
Ex. Describing average preference, we need to find what “most” students chose
Why don’t we always use Mean?
It is not resilient; it is impacted by skewness and outliers
When we say “The average teen”, are we talking about mean, median, or mode?
It depends; if it is height, it could be mean, if it is parental income, we would use median, if it is musical preference, we would use mode
What is a clear example of where the mean would change but median wouldn’t? (this would show the mean’s resilience)
8 ppl. are asked how much they have in their wallet: {1,2,2,5,5,8,8,9}. Mean and median is 5. If one person just got back from casino and new set is {1,2,2,5,5,8,8,9000}. Median is still 5, but mean is over 1000.
–Median would be used as an average, not the mean (9000 is an outlier.)
How are Mean, Median, and Mode positioned in a skewed left histogram?
Goes in order from left to right: Mean-Median-Mode
How are Mean, Median, and Mode positioned in a skewed right histogram?
Goes in the opposite order: Mode-Median-Mean
Who chases the tail?
The mean chases the tail, the mean chases the tail, high-ho the derry-oh the mean chases the tail… and outliers