DECK 4: UNIT 1 REVIEW MIX Flashcards
P(Z>0)
The probability that a randomly chosen z score is to the right (above) Zero (the middle) is the same as the percent of area above a z score of zero in a normal model.. which is about 50% or 0.5
What does SHIFT and SCALE mean?
Shift is when you add or subtract, scale is when you multiply
What is variability?
Differences. how things differ. There is variability everywhere.. We all look different, act different, have different preferences. Statisticians look at these differences.
What is the mean?
the old average we used to calculate. It is the balancing point of the histogram
What is Q2 also known as?
the median
not associated is the same as being ____________
independent
Compare DATA-STATISTIC-PARAMETER using categorical example
Data are individual measures like meal preference: taco, taco, pasta, taco, burger, burger, taco. Statistics and Parameters are summaries. A statistic would be 42% of sample preferred tacos and a parameter would be 42% of population preferred tacos.
What percent of the data is above Q3?
25%
If you want to calculate the probability (%) something falls between two values in a normal model, what do you do?
find z scores for both value, and then normcdf (Z LOW, Z HIGH )
Year in school (F,S,J,S) and Pizza Preference (pepperoni or cheese) are __________ because _______________
independent because all grades have similar preference distributions.. 40% cheese, 30%pepperoni, 20% veggie 10% other
What does normcdf do?
It gives you the area under the normal curve between any two z scores
What is the total area under the normal curve?
1 or 1.000
What is the IQR?
Interquartile range… a measure of spread. Q3-Q1. The distance from Q1 to Q3. The regular range is Hi-Lo, this is the inner range, the interquartile range.
When there is a relationship between two variables, we say that they are
associated (or not independent)
what happens if you multiply all of a data set by a constant? Think of an example
it is scaled Both center and spread are impacted. Mean/ median/ stand dev/ iqr/ quartiles all multiplied by that constant. Center, spread and all individual values are changed. Consider 1,2,3,4,5 mean of 3 and range of 4. Now multiply by 10: 10,20,30,40,50 and you get a mean of 30 and a range of 40… both multiplied by ten.
What are DESCRIPTIVE STATS?
Make a picture. Describe to me the data that you collected, use pictures or summaries like mean, median, range.
are there any normal samples?
no, nothing is normal, just normalish. The only normal thing is the model we use.
How can you turn OGIVES into histograms?
RECTANGLE DROP! (bin drop)
Compare population to sample
populations are generally large, and samples are small subsets of these population. We take samples to make inferences about populations. We use statistics to estimate parameters.
How do students often mix up IQR and St. Dev
They INCORRECTLY think that Q1 is 1sd below the mean and Q3 is 1sd above the mean. THIS IS NOT TRUE!!! Q1 is only .67 sd above the mean and Q2 is .67 below
What is the difference between quantitative and categorical variables?
Quantitative variables are numerical measures, like height and IQ. Categorical are categories, like eye color and music preference
For information purposes, which gives LEAST… stem-leaf, histogram or box-whisker?
Box/Whisker, BE CAREFUL. you really don’t know how things are distributed. The box and whisker and fish tank give a very GENERAL look.
Make a guess as to what relative cumulative frequency is?
It is the ADDED up PERCENTAGES.. An example is selling candy, 25 pieces sold overall…, with 10 the first hour, 5 the second, 3 the third, and 7 the fourth hour, we’d take the cumulative frequencies, 10, 15, 18 and 25 and divide by the total giving cumulative percentages… .40, .60, .64, and 1.00. Relative cumulative frequencies always end at 100 percent. OGIVE
How can you describe spread?
range, IQR, stand dev, variance, or simply say: From here, to about here
How do you describe SPREAD for unimodal and symmetric distributions?
use the standard deviation
If I take a random sample 20 hamburgers from FIVE GUYS and count the number of pickles on a bunch of them? and the average number of pickles was 9.5, then 9.5 is considered a _______?
statistic. (it is a summary of a sample.)
When drawing a normal model, what are the PERCENTILES from left to right?
2.5, 16, 50, 84, 97.5
What is the difference between discrete and continuous variables?
Discrete can be counted, like “number of cars sold” they are generally integers (you wouldn’t sell 9.3 cars), while continuous would be something like weight of a mouse: 4.344 oz.
which calculator function gives you a percent?
normcdf(Z left, Z right)
When there is no relationship between two variables, we say they are
independent (or not associated)
What is a categorical variable?
Qualitative variables are like categories: Blonde, Listens to Hip Hop, Female, yes, no, etc.
Think of the minimum value, the median and the IQR, which is impacted by shifting (adding a constant?)
adding a value shifts the entire histogram to the right, so the min and the median will increase by that amount, BUT THE IQR WILL NOT CHANGE.
How do you match OGIVES to histograms?
RECTANGLE DROP!!
If the distribution is bimodal or multimodal, what would you use for center and spread statistics?
Talk about each mode (center) and maybe use the range or IQR. You could also say “one group seems to go from __ to __ and the other from about __ to __”
How do you find 5 number summary from OGIVE?
Split the y axis into quarters. Shoot out to the right from 0, .25, .50, .75 and 1.00 till you hit the line in the ogive, then go straight down. Those numbers on the x axis below correspond to the 5 numbers.
Does the IQR capture 68% of the data?
NO. it catches the middle 50%.
How do you describe SPREAD for bimodal or multimodal?
talk about the outer edges of the clusters “from here to here” or use the IQR.
marginal distribution
distribution in the margins (outside of the table). The overall distributions of a single variable in contingency table.
What is the five number summary?
min, Q1 , Q2(median), Q3 and max
What percentile is Q3?
75th
What are INFERENTIAL STATS?
Look at your data, and use that to say stuff about the BIG PICTURE. like tasting soup. a little sample can tell you a lot about the big pot of soup (the population)
How can you match boxplots to histograms?
USE THE FISH TANK METHOD!
What is data?
Any collected information. Generally each little measurement. Like, if it is a survey about liking porridge, the data might be: yes, yes, no, yes, yes. if it is the number of saltines someone can eat in 30 seconds, the data might be 3, 1, 2, 1, 4,3 , 3, 4
Can numbers be CATEGORICAL?
sure. Zip codes, sports jersey numbers, telephone numbers, social security nunmbers, area codes… these are categorical.
What is a Z score?
The number of standard deviaiton away from the mean
What is meant by cumulative frequency?
ADD up the frequencies as you go. Suppose you are selling 25 pieces of candy. You sell 10 the first hour, 5 the second, 3 the third and 7 in the last hour, the cumulative frequency would be 10, 15, 18, 25
Give a simple example showing that adding a constant doesn’t change the spread, but changes the center. (this always happens)
Data set: 1,2,3,4,5 Spread (range):4, Center: 3 add three and get new data set: 3,4,5,6,7 spread:4 Center: 5 (center went up, spread stayed the same). The IQR and SD will stay the same, but median and mean go up 3. Called shifting, or sliding the data.
When drawing a graph or chart, what do you have to remember to do?
LABEL AXES, make a KEY(if needed ) AND GIVE IT A NAME!!! “Figure 1: Age and Food Preference”
Does a census make sense?
A census is ok for small populations (like Mr. Nystrom’s students) but impossible if you want to survey “all wild turkeys on Nantucket”
What percent of the data is between Q1 and Q3?
50%
How do you find percentiles and make a boxplot from OGIVE?
Go across till you hit the curve and then STRAIGHT DOWN!
What is Statistics?
The study of variability
How do you describe SPREAD for skewed distributions (or distributions with outliers?)
Use the IQR
What are 2 branches of AP STATS?
Inferential and Descriptive
Compare data to parameters
Data is each little bit of information collected from the subjects?. They are the INDIVIDUAL little things we collect? we summarize them by, for example, finding the mean of a group of data. If it is a sample, then we call that mean a “statistic” if we have data from each member of population, then that mean is called a “parameter”
Why do we plug 999 into normcdf?
It needs a z score, but we can’t plug in infinity. So we go down or up 999 standard deviations and that pretty much gets everything
what happens if you ADD a constant to each value in a data set?
it is SHIFTED only. Does not impact spread. This effects all of the data values and measures of center (mean, med) and quartiles, deciles, etc, IT DOES NOT CHANGE THE SPREAD! (IQR, St Dev, Range all stay the SAME).
where are the “outlier fences?”
1.5 IQR above Q3 and 1.5 IQR below Q1. Just a rule of thumb.
Think of the minimum value, the mean and the standard deviation, what is impacted by shifting (adding a constant)
adding a value shifts the entire histogram to the right, so the min and the mean will increase by that amount, BUT THE SD WILL NOT CHANGE.
How do you find Q1 and Q3?
Q1 is the median of the bottom half and Q3 is the median of the upper half (they are the 25th and 75th percentiles)
What is a population?
the group you’re interested in. Sometimes it’s big, like “all teenagers in the US” other times it is small, like “all AP Stats students in my school”
what is the emperical rule?
mean 68-95-99.7 yeah!
If the distribution is unimodal and symmetric, what would you use for center and spread statistics?
Mean (center) and Standard Deviation (spread)
How do you describe CENTER for unimodal and symmetric distributions?
use the MEAN
what does P(z< -1) =?
The probability that a randomly chosen z score is to the left (below) -1 is the same as The percent of area below a z score of -1 in a normal model.. which is about 16%