UNIT 1 2017 Flashcards
what is the shortcut normcdf?
gives % from raw data, skips Z score. normcdf (low VALUE, high VALUE, mean, sd)
what is the shortcut invnorm?
gives data value from percentile, skips Z score. Invnorm (percentile, mean, sd)
Why do we plug 999 into normcdf?
It needs a z score, but we can’t plug in infinity. So we go down or up 999 standard deviations and that pretty much gets everything
If I take a random sample of 20 hamburgers from FIVE GUYS and count the number of pickles on a bunch of them, and one of them had 9 pickles, then the number 9 from that burger would be called ____?
a datum, or a data value.
What is data?
Any collected information. Generally each little measurement, Like, if it is a survey about liking porridg, the data might be ?yes, yes, no, yes, yes? if it is the number of saltines someone can eat in 30 seconds, the data might be ?3, 1, 2, 1, 4,3 , 3, 4?
If you want to find percentile for a value, what do you put into normcdf (? ?)
find z score for value, and then normcdf (-999, Zright) like going from negative infinity up to the z score
are there any normal samples?
no, nothing is normal, just normalish. The only normal thing is the model we use.
How can you describe spread?
range, IQR, stand dev, variance, or simply say: From here, to about here
How can you think about the mean and median to remember the difference when looking at a histogram?
mean is balancing point of histogram, median splits the area of the histogram in half.
Compare DATA-STATISTIC-PARAMETER using CATEGORICAL example
Data are individual measures? like meal preference: ?taco, taco, pasta, taco, burger, burger, taco? Statistics and Parameters are summaries. A statistic would be ?42% of sample preferred tacos? and a parameter would be ?42% of population preferred tacos.? Notice that for categorical variables, the categories are words and the statistics and parameters are percents.
Give a simple example showing that adding a constant doesn’t change the spread, but changes the center. (this always happens)
Data set: 1,2,3,4,5 Spread (range):4, Center: 3
add three and get new data set: 3,4,5,6,7 spread:4 Center: 5 (center went up, spread stayed the same). The IQR and SD will stay the same, but median and mean go up 3. Called shifting, or sliding the data.
what is a nice mean/median/mode helper diagram?
Sketch a skewed left distribution, then mean/median/mode will be labeled in order from L to R
data or datum?
datum is singular, Like “hey dude, come see this datum I got from this rat!” data is the plural, “hey look at all that data Edgar got from his brussel sprouts”
If a distribution is skewed left, what will be greater, the mean or median? WHY?
Median. The mean moves left to keep balance.
When drawing a graph or chart, what do you have to remember to do?
LABEL AXES, make a KEY(if needed ) AND GIVE IT A NAME!!! “Figure 1: Age and Food Preference”
What is the difference between a bar chart and a histogram
bar charts are for categorical data (bars don’t touch and can often be in any order) and histograms are for quantitative data (bars usually touch and x axis is in order)
What is a quantitative variable? Compared to quantitative data?
Quantitative variable are the things your are interested in like: Height, age, price, number of cars sold, SAT score. Quantitative data are the actual heights or ages from individuals: 54” , 2 years, $ 34.99
Why don’t we just use the average (mean) all the time? (instead of mode and median)
The word average is a general term that can be actually talking about the mean, median or mode. We don’t always use the mean because it is not RESILIENT, it is impacted by skewness and outliers
How can you tell if variables in a contingency table are independent?
If the distributions are the same across the variables.. Then it doesn’t DEPEND? so INDEPENDENT. Ex: 30% of freshman and 30% of seniors like cabbage.
What is a categorical variable? Compare to categorical data.
Categorical (or qualitative) variables are the categories you are interested in like “hair color” and “music preference”. The data are the measureds from individuals like: SUV, sedan, Listens to Hip Hop, Female, yes, no, etc.
When are box plots used most often?
When comparing a bunch of different sets of data.
How do you describe distributions (histograms)?
Shape-Cener-Spread- and STRANGE (Outliers and gaps) some say GSOCS. where’s yo GSOCS?
what happens if you multiply all of a data set by a constant? Think of an example
it is scaled Both center and spread are impacted. Mean/ median/ stand dev/ iqr/ quartiles all multiplied by that constant. Center, spread and all individual values are changed. Consider 1,2,3,4,5 mean of 3 and range of 4. Now multiply by 3: 3,6,9,12,15 and you get a mean of 9 and a range of 12… both multiplied by three.
what is a z score?
the number of standard deviations away from the mean