UNIT 1 2017 Flashcards
what is the shortcut normcdf?
gives % from raw data, skips Z score. normcdf (low VALUE, high VALUE, mean, sd)
what is the shortcut invnorm?
gives data value from percentile, skips Z score. Invnorm (percentile, mean, sd)
Why do we plug 999 into normcdf?
It needs a z score, but we can’t plug in infinity. So we go down or up 999 standard deviations and that pretty much gets everything
If I take a random sample of 20 hamburgers from FIVE GUYS and count the number of pickles on a bunch of them, and one of them had 9 pickles, then the number 9 from that burger would be called ____?
a datum, or a data value.
What is data?
Any collected information. Generally each little measurement, Like, if it is a survey about liking porridg, the data might be ?yes, yes, no, yes, yes? if it is the number of saltines someone can eat in 30 seconds, the data might be ?3, 1, 2, 1, 4,3 , 3, 4?
If you want to find percentile for a value, what do you put into normcdf (? ?)
find z score for value, and then normcdf (-999, Zright) like going from negative infinity up to the z score
are there any normal samples?
no, nothing is normal, just normalish. The only normal thing is the model we use.
How can you describe spread?
range, IQR, stand dev, variance, or simply say: From here, to about here
How can you think about the mean and median to remember the difference when looking at a histogram?
mean is balancing point of histogram, median splits the area of the histogram in half.
Compare DATA-STATISTIC-PARAMETER using CATEGORICAL example
Data are individual measures? like meal preference: ?taco, taco, pasta, taco, burger, burger, taco? Statistics and Parameters are summaries. A statistic would be ?42% of sample preferred tacos? and a parameter would be ?42% of population preferred tacos.? Notice that for categorical variables, the categories are words and the statistics and parameters are percents.
Give a simple example showing that adding a constant doesn’t change the spread, but changes the center. (this always happens)
Data set: 1,2,3,4,5 Spread (range):4, Center: 3
add three and get new data set: 3,4,5,6,7 spread:4 Center: 5 (center went up, spread stayed the same). The IQR and SD will stay the same, but median and mean go up 3. Called shifting, or sliding the data.
what is a nice mean/median/mode helper diagram?
Sketch a skewed left distribution, then mean/median/mode will be labeled in order from L to R
data or datum?
datum is singular, Like “hey dude, come see this datum I got from this rat!” data is the plural, “hey look at all that data Edgar got from his brussel sprouts”
If a distribution is skewed left, what will be greater, the mean or median? WHY?
Median. The mean moves left to keep balance.
When drawing a graph or chart, what do you have to remember to do?
LABEL AXES, make a KEY(if needed ) AND GIVE IT A NAME!!! “Figure 1: Age and Food Preference”
What is the difference between a bar chart and a histogram
bar charts are for categorical data (bars don’t touch and can often be in any order) and histograms are for quantitative data (bars usually touch and x axis is in order)
What is a quantitative variable? Compared to quantitative data?
Quantitative variable are the things your are interested in like: Height, age, price, number of cars sold, SAT score. Quantitative data are the actual heights or ages from individuals: 54” , 2 years, $ 34.99
Why don’t we just use the average (mean) all the time? (instead of mode and median)
The word average is a general term that can be actually talking about the mean, median or mode. We don’t always use the mean because it is not RESILIENT, it is impacted by skewness and outliers
How can you tell if variables in a contingency table are independent?
If the distributions are the same across the variables.. Then it doesn’t DEPEND? so INDEPENDENT. Ex: 30% of freshman and 30% of seniors like cabbage.
What is a categorical variable? Compare to categorical data.
Categorical (or qualitative) variables are the categories you are interested in like “hair color” and “music preference”. The data are the measureds from individuals like: SUV, sedan, Listens to Hip Hop, Female, yes, no, etc.
When are box plots used most often?
When comparing a bunch of different sets of data.
How do you describe distributions (histograms)?
Shape-Cener-Spread- and STRANGE (Outliers and gaps) some say GSOCS. where’s yo GSOCS?
what happens if you multiply all of a data set by a constant? Think of an example
it is scaled Both center and spread are impacted. Mean/ median/ stand dev/ iqr/ quartiles all multiplied by that constant. Center, spread and all individual values are changed. Consider 1,2,3,4,5 mean of 3 and range of 4. Now multiply by 3: 3,6,9,12,15 and you get a mean of 9 and a range of 12… both multiplied by three.
what is a z score?
the number of standard deviations away from the mean
what happens if you ADD a constant to each value in a data set?
it is SHIFTED only. Does not impact spread. This effects all of the data values and measures of center (mean, med) and quartiles, deciles, etc, IT DOES NOT CHANGE THE SPREAD! (IQR, St Dev, Range all stay the SAME).
Who chases the tail?
The mean chases the tail, the mean chases the tail, high-ho the derry-oh the mean chases the tail? and outliers??.
What is variability?
Differences, how things differ. There is variability everywhere. We all look different, act different, have different preference. Statisticians look at these differences.
If you want to calculate the probability (%) something falls between two values in a normal model, what do you do?
find z scores for both value, and then normcdf (Z LOW, Z HIGH )
What is meant by cumulative frequency?
ADD up the frequencies as you go. Suppose you are selling 25 pieces of candy. You sell 10 the first hour, 5 the second, 3 the third and 7 in the last hour, the cumulative frequency would be 10, 15, 18, 25
How do you find relative frequency?
just divide frequency by TOTAL. Make it a percent so it is relative to the whole.
What do we sometimes call a categorical variable?
qualitative
What is the IQR?
Interquartile range? a measure of spread. Q3-Q1. The distance from Q1 to Q3. The regular range is Hi-Lo, this is the inner range, the interquartile range.
What is a parameter?
A numerical summary of a population. Like a mean, median, range of a population
How can you match boxplots to histograms?
USE THE FISH TANK METHOD!
When can you round?
AT THE VERY END!!! (keep 3 digits until end!)
the output for normcdf(Zleft, Zright) is_______
the area under the normal curve between the given z scores
What is a standard deviation?
average distance to the mean (about)
For information purposes, which gives LEAST? stem-leaf, histogram or box-whisker?
Box/Whisker, BE CAREFUL. you really don’t know how things are distributed. The box and whisker and fish tank give a very GENERAL look.
What do you call things that are not independent?
associated. Or not independent. We generally don’t say DEPENDENT (unless talking about y variable on a scatterplot).
If you want to find % below a value, what do put into normcdf (? ?)
find z score for value, and then normcdf (-999, Zright)
What is the mode?
the most common, or the peaks of a histogram. We often use mode with categorical data.
What is Statistics?
The study of variability
How can you describe shape?
unimodal, bimodal, multimodal, uniform, symmetric, skewed
What is the difference between discrete and continuous variables?
Discrete can be counted, like “number of cars sold” they are generally integers (you wouldn’t sell 9.3 cars), while continuous would be something like weight of a mouse. 4.344 oz. Summaries of discreet variables will often be decimals.
What are the percentiles for Q1, med, and Q3?
25, 50 and 75
If asked to compare distributions, what should you write about?
Compare Shapes, Centers, Spreads, and Stranges. The GSOCS
What does normcdf do?
It gives you the area under the normal curve between any two z scores
What is meant by relative frequency?
The PERCENT of time something comes up (frequency/total)
How do students often mix up IQR and St. Dev
They INCORRECTLY think that Q1 is 1sd below the mean and Q3 is 1sd above the mean. THIS IS NOT TRUE!!! Q1 is only .67 sd above the mean and Q2 is .67 below
what are the percentiles from left to R on normal model?
2.5-16-50-84-97.5
What percentile is the median (aka Q2)?
50th