Data representation & Regression and Correlation Flashcards
What do we need on a box plot?
- Minimum and maximum value
- Median
- Upper and lower quartile (Q1 and Q3)
- Outliers represented as crosses
How do we find outlier boundaries?
Q1-1.5xIQR
Q3+1.5xIQR
What kind of data do we use for histograms?
Continuous data
What equation do we need to remember with histograms?
Kf=cw x fd
How do we find frequency density with a histogram?
Frequency/ class width
What is the area of a histogram equal to?
Frequency x k
(you must find out what k is in the question)
What do we do when we are comparing sets of data?
- Compare a measure of location (median, mean)
- Compare a measure of spread (variance,standard deviation)
- Pu them into context
What is the correlation coefficient?
It is a value which measures the strength and positivity/negativity of correlation on scatter graphs.
What values is the correlation coefficient between?
-1≤ r ≤ 1
What value is r when there is no correlation and what would that look like on a scatter graph?
0
What would a scatter graph with a positive correlation look like?
What would a scatter graph with a negativecorrelation look like?
What is a regression line and what is its formula?
Line of best fit
comes in the form y=mx+c
What is interpolation?
Estimating inside the data range
What is extrapolation?
Estimating outside the data range
Why is interpolation better than extrapolation?
Because it is more reliable.
What are the advantages and disadvantages of using box plots?
+Highlights outliers
+Makes it easy to compare data sets
-Data is grouped into only 4 categories so some detailed analysis isn’t possible
What are the advantages and disadvantages of using histograms?
+Clearly shows shape of distribution
-Doesn’t always highlight outliers
-Not easy to estimate Q1,Q2,Q3
What are the advantages and disadvantages of using cumulative frequency curves?
+Easy to find the five number summary (Q1,median, etc)
-Doesn’t always hgihlight outliers
-If interval boundaries aren’t shown. the degree of detail is not clear.
What does it mean when a box plot is positively skewed and what does this look like?
The median is closer to the LQ than the UQ
What does it mean when a box plot is negatively skewed and what does this look like?
The median is closer to the UQ than the LQ