1.7-1.9 Quantitative Variables: Analysis Flashcards
How do you find the variance?
Variance is just the standard deviation squared. You will usually be given the s.d. or be able to calculate it on a calculator using “1-var stats”
How do you interpret the standard deviation?
The typical (variable) is an average of (s.d. w/ units) different from the mean (variable).”
What does “Correlation does not imply causation” mean?
Just because two variables are related, you can’t say that one is causing the other unless you do a controlled experiment. There could be multiple other factors that affect both variables.
How do I calculate mean, median, s.d., etc. on my calculator?
Put the data into L1 (Stat -> Edit) and then run 1-var stats (Stat -> arrow right to calc -> #1 1-var stats). Scroll down to see more if needed.
What is IQR? How do I find it?
Interquartile range. It is just a number. Subtract Q3 and Q1. On a boxplot it is the width of the box.
How do you determine if a set of data has any outliers?
First, define your “fences”. Take 1.5 * IQR (this is just a number). Add this number to Q3 for your upper fence and subtract it from Q1 for your lower fence. Then check if the data has any points that are outside of these fences. If all you have is summary data, check to make sure the min and max are in the fences.
What measures of center and spread should I use to describe data?
The median and IQR should always be paired together. So should the mean and s.d. Generally, if the distribution is not symmetric, you should use the median and IQR instead of the mean and s.d. because they are resistant to skew/outliers.
What is a 5 number summary?
List the min, Q1, median, Q3, and max.
How do you find the median and quartiles?
Make sure the data is in order. Find the middle number. If there is an even number of points, average the two middle numbers. Q1 is just the “median” of the first half of the data and Q3 is the second half. If you have all the points, you could also put them in L1 and run 1-var stats instead.
When comparing distributions, how can you tell which will have a greater standard deviation?
Standard deviation is the average distance that points are from the center. So a graph that has most of its points near the center will have a smaller s.d. The biggest s.d. will be when there are a lot of points far from the center and/or a few points very far from center (outliers).
What percent of data is between the median and the max? Q1 and the max? Q3 and the max? Q3 and the min?
There is 25% data between each of the parts of the 5 number summary. So if you have to combine any of the sections, just add the 25’s. So median to max is 50%, Q1 to max is 75%, Q3 to max is just 25% and Q3 to min is 75%.