UNIT 1 REVIEW MIX Flashcards
P(Z>0)
0.50 or 50%. The probability that a randomly chosen z score is to the right (above) Zero (the middle) is the same as the percent of area above a z score of zero in a normal model.. which is about 50% or 0.5
What is variability?
Differences. how things differ. (this is not the same thing as variance, which is a specific measurement). There is variability everywhere.. We all look different, act different, have different preferences. Statisticians look at these differences.
What is Q2 also known as?
the median
not associated is the same as being ____________
independent
What are the values of the data set from this stem plot:
1 | 2 2 3 5
2 | 2 4 5
12, 12, 13, 15, 22, 24, 25
What does a UNIFORM distribution look like?
All of the bars have the same height. The same amount of values in each category.
Compare DATA-STATISTIC-PARAMETER using categorical example
Data are individual measures like meal preference: taco, taco, pasta, taco, burger, burger, taco. Statistics and Parameters are summaries. A statistic would be 42% of sample preferred tacos and a parameter would be 42% of population preferred tacos.
What percent of the data is above Q3?
25%
If you want to calculate the probability (%) something falls between two values z scores in a normal model, what do you do?
normcdf (Z LOW, Z HIGH )
Year in school (F,S,J,S) and Pizza Preference (pepperoni or cheese) are __________ because _______________
independent because all grades have similar preference distributions.. 40% cheese, 30%pepperoni, 20% veggie 10% other
What does normcdf do?
It gives you the area under the normal curve between any two z scores if you put in two entries. If you put in 4 entries, be sure to do normcdf (LO, HI, MEAN, SD)
What is the total area under the normal curve?
1 or 1.000
What is the IQR?
Interquartile range… a measure of spread. Q3-Q1. The distance from Q1 to Q3. The regular range is Hi-Lo, this is the inner range, the interquartile range.
When there is a relationship between two variables, we say that they are
associated (or not independent)
what happens if you multiply all of a data set by a constant? Think of an example
Both center and spread are impacted. Mean/ median/ stand dev/ iqr/ quartiles all multiplied by that constant. Center, spread and all individual values are changed. Consider 1,2,3,4,5 mean of 3 and range of 4. Now multiply by 10: 10,20,30,40,50 and you get a mean of 30 and a range of 40… both multiplied by ten.
are there any normal samples?
no, nothing is normal, just normalish. The only normal thing is the model we use.
How can you turn OGIVES into histograms?
RECTANGLE DROP! (bin drop)
What is the difference between quantitative and categorical variables?
Quantitative variables are numerical measures, like height and IQ. Categorical are categories, like eye color and music preference
For information purposes, which gives LEAST amount of information about the distribution
Stem-leaf, histogram or box-whisker?
Box/Whisker, BE CAREFUL. you really don’t know how things are distributed. The box and whisker and fish tank give a very GENERAL look. Stemplot gives all the data. Histogram gives clear shape.
Make a guess as to what relative cumulative frequency is?
It is the ADDED up PERCENTAGES.. An example is selling candy, 25 pieces sold overall…, with 10 the first hour, 5 the second, 3 the third, and 7 the fourth hour, we’d take the cumulative frequencies, 10, 15, 18 and 25 and divide by the total giving cumulative percentages… .40, .60, .64, and 1.00. Relative cumulative frequencies always end at 100 percent. OGIVE
How can you describe spread?
range, IQR, stand dev, variance, or simply say: From here, to about here
How do you describe SPREAD for unimodal and symmetric distributions? What measure should you use?
use the standard deviation
If I take a random sample 20 hamburgers from FIVE GUYS and count the number of pickles on a bunch of them? and the average number of pickles was 9.5, then 9.5 is considered a _______?
statistic. (it is a summary of a sample.)
When drawing a normal model, what are the PERCENTILES from left to right?
2.5, 16, 50, 84, 97.5
What is the difference between discrete and continuous variables?
Discrete can be counted, like “number of cars sold” they are generally integers (you wouldn’t sell 9.3 cars), while continuous would be something like weight of a mouse: 4.344 oz.
which calculator function gives you a percent under normal curve?
normcdf(Z left, Z right)
When there is no relationship between two variables, we say they are
independent (or not associated)
What is a categorical variable?
Blonde, Listens to Hip Hop, Female, yes, no, etc.
Think of the minimum value, the median and the IQR, which is impacted by adding a constant to all the data in a data set.
adding a value shifts the entire histogram to the right, so the min and the median will increase by that amount, BUT THE IQR WILL NOT CHANGE. It just slides.
How do you match OGIVES to histograms?
RECTANGLE DROP!!
If a distribution is bimodal or multimodal, what would you use for center and spread statistics?
Talk about each mode (center) and maybe use the range or IQR. You could also say “one group seems to go from __ to __ and the other from about __ to __”
How do you find 5 number summary from OGIVE?
Split the y axis into quarters. Shoot out to the right from 0, .25, .50, .75 and 1.00 till you hit the line in the ogive, then go straight down. Those numbers on the x axis below correspond to the 5 numbers.
Does the IQR capture 68% of the data?
NO. it catches the middle 50%.
How do you describe SPREAD for bimodal or multimodal distributions?
talk about the outer edges of the clusters “from here to here” or use the IQR.
what is marginal distribution
distribution in the margins (outside of the table). The overall distributions of a single variable in contingency table.
What is the five number summary?
min, Q1 , Q2(median), Q3 and max
What percentile is Z=1 ?
84th
What percentile is Q3?
75th
How can you match boxplots to histograms?
USE THE FISH TANK METHOD!
What is data?
Any collected information. Generally each little measurement. Like, if it is a survey about liking porridge, the data might be: yes, yes, no, yes, yes. if it is the number of saltines someone can eat in 30 seconds, the data might be 3, 1, 2, 1, 4,3 , 3, 4
Can numbers be CATEGORICAL?
sure. Zip codes, sports jersey numbers, telephone numbers, social security nunmbers, area codes… these are categorical.
What is a Z score?
The number of standard deviaiton away from the mean
What is meant by cumulative frequency?
ADD up the frequencies as you go. Suppose you are selling 25 pieces of candy. You sell 10 the first hour, 5 the second, 3 the third and 7 in the last hour, the cumulative frequency would be 10, 15, 18, 25
Give a simple example showing that adding a constant doesn’t change the spread, but changes the center. (this always happens)
Data set: 1,2,3,4,5 Spread (range):4, Center: 3 add three and get new data set: 3,4,5,6,7 spread:4 Center: 5 (center went up, spread stayed the same). The IQR and SD will stay the same, but median and mean go up 3. Called shifting, or sliding the data.
When drawing a graph or chart, what do you have to remember to do?
LABEL AXES, make a KEY(if needed ) AND GIVE IT A NAME!!! “Figure 1: Age and Food Preference”
Does a census make sense?
A census is ok for small populations (like Mr. Nystrom’s students) but impossible if you want to survey “all wild turkeys on Nantucket”
What percent of the data is between Q1 and Q3?
50%
How do you find percentiles from OGIVE?
Go across till you hit the curve and then STRAIGHT DOWN!
How do you describe SPREAD for skewed distributions (or distributions with outliers?)
Use the IQR
Compare data to parameters
Data is each little bit of information collected from the subjects?. They are the INDIVIDUAL little things we collect? we summarize them by, for example, finding the mean of a group of data. If it is a sample, then we call that mean a “statistic” if we have data from each member of population, then that mean is called a “parameter”
Why do we plug 999 into normcdf?
It is like infinity. We go down or up 999 standard deviations and that pretty much gets everything
what happens if you ADD a constant to each value in a data set?
Does not impact spread!!! This effects all of the data values and measures of center (mean, med) and quartiles, deciles, etc, IT DOES NOT CHANGE THE SPREAD! (IQR, St Dev, Range all stay the SAME).
where are the “outlier fences?”
1.5 IQR above Q3 and 1.5 IQR below Q1. Just a rule of thumb.
Think of the minimum value, the mean and the standard deviation, what is impacted by adding a constant to each value in a data set?
adding a value shifts the entire histogram to the right, so the min and the mean will increase by that amount, BUT THE SD WILL NOT CHANGE.
How do you find Q1 and Q3?
Q1 is the median of the bottom half and Q3 is the median of the upper half (they are the 25th and 75th percentiles)
What is a population?
the group you’re interested in. Sometimes it’s big, like “ALL STRIPED BASS IN CAPE COD BAY” other times it is small, like “all AP Stats students in my school”
what is the emperical rule?
mean 68-95-99.7 yeah!
If the distribution is unimodal and symmetric, what would you use for center and spread statistics?
Mean (center) and Standard Deviation (spread)
How do you describe CENTER for unimodal and symmetric distributions?
use the MEAN
what does P(z< -1) =?
16%
The probability that a randomly chosen z score is to the left (below) -1 is the same as The percent of area below a z score of -1 in a normal model.. which is about 16%
What is a statistic?
A numerical summary of a sample. Like a mean, median, range, %, or SD of a sample.
Gender and Video Game playing are___________ because_______
associated (or not independent) because a higher percentage of males play video games. (think.. It depends on gender)
what is the shortcut normcdf with 4 entries?
Gives area under curve
normcdf (low VALUE, high VALUE, mean, sd)