DECK 2 Flashcards
What is a contingency table
shows distributions across 2 variables like gender acros music pref (male/female across hip hop/country/ classical). AKA 2-way table
How can you tell if variables in a contingency table are independent?
If the distributions are the same across the variables.. Then it doesn’t DEPEND on which category it came from, it still has same likelihood as others….. it’s INDEPENDENT.
marginal distribution
overall distributions of a single variable in contingency table (out in margins)
conditional distribution?
A distribution within the table, along only one row or column on the inside of the table… NOT IN THE MARGINS
Association and Independence?. How are they related?
Variables are either ASSOCIATED or INDEPENDENT. If they are associated, then they are not independent, if they are independent then they are not associated.
mean/SD/median/IQR? what to use?
when unimodal and symmetric, us MEAN and SD. If skewed or outliers use Median and IQR. If BIMODAL Talk about the MODES and use range or IQR,
How do you describe distributions (histograms)?
Shape-Cener-Spread-Outlierg-Gaps —- GSOCS? where’s yo GSOCS?
If asked to compare distributions, what should you write about?
GSOCS.. Write a sentence comparing shapes, and then one comparing centers, then comparing spreads and finally gaps and outliers…
Center description?
mean (balance), median (splits area in half), mode (peaks? if bimodal, talk about both modes) or ?. “centered around ____”
Shape description?
unimodal, bimodal, multimodal, uniform, symmetric, skewed,
Spread description?
We have many measures of spread: range, IQR, stand dev, variance, or simply say. From here to about here.
what happens if you ADD a constant to each value in a data set?
it is SHIFTED only. This effects all of the data values and measures of center (mean, med) and quartiles, deciles, etc… IT DOES NOT CHANGE THE SPREAD!
(IQR, St Dev, Range all stay the SAME).
what happens if you multiply all of a data set by a constant?
it is scaled.. Everything is effected. Mean/ median/ stand dev/ iqr/ quartiles all multiplied by that constant. Center, spread and all individual values are changed.
What is the five number summary?
min, Q1, Q2(median), Q3 and max
How do you find Q1 and Q3?
Q1 is the median of the bottom half (25th %ile) and Q3 is the median of the upper half (75th %ile)
How can you match boxplots to histograms?
USE THE FISH TANK METHOD!
For information purposes, which gives most? stem-leaf, histogram or box-whisker?
Stem leaf gives the actual values and the shape, histogram just the shape and box-whisker the least amt, but are great for comparing multiple distributions.
What percent of the data is above Q3?
25%
What percent of the data is below the median?
50%
What percent of the data is between Q1 and Q3?
50%
What is the IQR?
Interquartile range. a measure of spread.. Q3-Q1.. The distance from Q1 to Q3.
What are the percentiles for Q1, med, and Q3?
25, 50 and 75
where are the “outlier fences?”
1.5 IQR above Q3 and below Q1. Just a rule of thumb..
How can you think of mean, median and mode to help understand?
Mean is balancing point of histogram, median splits area in half, mode is the peaks of histogram…
What if mean median are way different?
There is evidence that the data is skewed or there is an outlier. (the mean chases the tail)
Who is the best?
YOU ARE!! Because you are studying these here cards!!! Way to go. I am proud of you.