test 1 Flashcards
Statistics
Study of variability
Variability
All things have differences, and statisticians look a these differences
2 branchs of AP stats
inferential and descriptive
descriptive stats
describe collected data using pictures or summaries like mean, median, range, etc…
inferential stats
look at data of sample and use it to tell about the population
compare descriptive and inferential stats
descriptive explains about data; inferential uses data of sample to tell about an entire population
data
any collected info., generally each little measurement
population
group of interest, can be big or small
sample
A subset of population, taken to make inferences about the population, calculate statistics from samples
compare population to sample
populations generally are large, samples are small subsets of population; take samples to make inferences about populations, use statistics to estimate parameters
compare data to statistics
data is the individual bits of info collected, summarize data by, ex. finding mean of a group of data,
mean of sample is statistic, if data is from each member of population, mean is parameter
compare data to parameters
data is the individual bits of collected info, summarize data by, ex. finding mean of a group of data,
mean of sample is statistic, summary of sample; if data is from each member of population, mean is parameter, summary of population
parameter
numerical summary of a population like mean, median, mode
stastistic
numerical summary of a sample like mean, median, mode
Curious about average wait time at Dunkin Donuts drive through: randomly sample cars and find the average wait time is 3.2 minutes. What is the population parameter, statistic, parameter of interest, data?
parameter is the true average wait time at that Dunkin Donuts, a number you don’t have and will never know. Statistic is 3.2 minutes, average of data collected. Parameter of interest= population parameter. Data is the wait time of each individual car, like “3.8 min, 2.2 min, 0.8 min.” Average of that data is statistic, and use that to make inference about the true parameter
Compare DATA-STATISTIC-PARAMETER using categorical example
words, put into groups, data are individual measures, statistics and parameters are summaries;
“taco, taco, pasta, taco, burger, burger, taco…” statistic: 42% of sample preferred tacos, and parameter: 42% of population preferred tacos
Compare DATA-STATISTIC-PARAMETER using quantitative example
numbers,
“45 sec, 64 sec, 32 sec, 68 sec,” raw data, statistic: average breath holding time in the sample was 52.4 sec, and parameter: average breath holding time in population was 52.4 sec
census
sample of entire population, info. from every member of population
does a census make sense?
census is ok for small populations, impossible for big populations
difference between a parameter and a statistic
both are number summarizing a larger gorup of numbers, parameter from population, statistic from sample
If I take a random sample of 20 hamburgers from FIVE GUYS and count the number of pickles on each of them… and one of them had 9 pickles, then the number 9 from that burger would be called?
a datum, or a data value
If I take a random sample of 20 hamburgers from FIVE GUYS and count the number of pickles on each of them… and the average number of pickles was 9.5, then 9.5 is considered a?
statistic (a summary of a sample)
If I take a random sample of 20 hamburgers from FIVE GUYS and count the number of pickles on each of them… and i want to know the true average number of pickles on a burger, which is considered?
parameter, a one number summary of the population (parameter of interest)
difference between a sample and a census
sample, info. from a small part of population; census, info. from entire population; parameter from a census, statistic from a sample
population, parameter, census, sample data, statistics, inference, parameter of interest
I was curious about a population parameter, but a census was too costly so I decided to chooses a sample, collect some data, calculate a statistic and use that statistic to make an inference about the population parameter (parameter of interest)
If you are tasting soup…
If you are tasting soup… then the flavor of each individual thing in the spoon is DATA, the entire spoon is a SAMPLE. The flavor of all stuff together is like the STATISTIC, and you use that to MAKE AN INFERENCE about the flavor of the entire pot of soup, which would be the PARAMETER. Notice you are interested in the parameter to begin with… that is why you took a sample
random variables
randomly choosing people from a list; hair color, height, weight and any other data collected from them can be considered random varaibles
difference between quantitative and categorical variables
quantitative variables, numerical measures; categorical, categories in words
difference between quantitative and categorical data
data from quantitative variables are numbers, data from categorical variables are words
difference between discrete and continuous variables
discrete can be counted and are integers, continuous would have to be measured specifically for, have decimals
quantitative variable
quantitative variables are numeric
categorical variable
categories
another name for categorical variable
qualitative
quantitative data
actual numbers gathered from each subject
categorical data
actual individual category from a subject
random sample
choosing a real randomly generated sample
frequency
how often something comes up
data or datum
datum singular, data plural
frequency distribution
table or chart, show how often certain values or categories occur in a data set
relative frequency
the percent of time something comes up (frequency/total)
how to find relative frequency
divide frequency by TOTAL
cumulative frequency
add up the frequencies as you go;
ex. selling 25 pieces of candy, sell 10 the first hour, 5 the second, 3 the third and 7 in the last, the cumulative frequency would be 10, 15, 18, 25
relative cumulative frequency
add up percentages, take cumulative frequencies and divide by the total giving cumulative percentages
ex. 10/25=0.4, 15/25=0.6, 18/25=0.64, 25/25=1.0; relative cumulative frequencies always end at 100%
difference between a bar chart and a histogram
bar charts are for categorical data (bars don’t touch) and histograms are for quantitative data (bars touch)
mean
average; balancing point of the histogram
difference between a population mean and a sample mean
population mean=parameter, sample mean=statistic
symbols for population mean and sample mean
μ for population mean, x̄ for sample mean
how can you think about the mean and median to remember the difference when looking at a histogram?
mean is balancing point of histogram, median splits the area of the histogram in half
median
middlest number, splits area in half (always in the position (n+1)/2)
mode
the most common, peaks of a histogram, often use mode with categorical data
when do we often use mode
with categorical variables, describe the average preference, speak of what “most” chose; mode also tells the number of bumps in a histogram for quantitative data (unimodal, bimodal, etc…)
why don’t we always use mean
it is not RESILIENT, can be impacted by skewness and outliers
when we say “the average teenager” are we talking about mean, median or mode?
depends:
height, mean
parental income, median
music preference, mode
clear example of where the mean would change but median wouldn’t? (show its resilience)
eight people with money (1, 2, 2, 5, 5, 8, 8, 9), mean and median is 5
if one has big bucks (1, 2, 2, 5, 5, 8, 8, 9000) median still 5, mean goes over 1000; here, 5 is a better description of the average person in this group and 9000 is an outlier
how are mean, median and mode positioned in a skewed left histogram?
goes in order from left to right, mean-median-mode
how are mean, median and mode positioned in a skewed right histogram?
goes in opposite order, mode-median-mean
who chases the tail
the mean chases the tail and outliers