Numerical Measures and Data Representation Flashcards
How to work out the Mean [x̄]
Sum of data values / number of data values
x̄ = Σx / n
How do you work out the mean[x̄] from a frequency table?
Add new column that multiplies number (if needed do midpoint) by frequencies = (n*f)
x̄ = Σ(n*f) / Σf
Q1 =
lower quartile
for grouped data
= 1/4 nth data value
for non-grouped data
= (n+1)/4
Q2 =
Median
for grouped data
= 1/2 nth data value
for non-grouped data
= (n+1)/2
Q3
Upper Quartile
for grouped data
= 3/4 nth data value
for non-grouped data
= ((n+1)/4) *3
In Listed Data - for quartiles - if a decimal…
round up !
In Listed data - for quartiles - if whole…
find midpoint with the next one
For finding Quartiles in Grouped data:
INTERPOLATION
What is Variance?
Shows how spread out data is.
(σ2)
Standard Deviation (σ) fb
root of variance.
σ = √(Σx2/n) - (Σx/n)2
{mean of squares minus square of means}
Standard Deviation/ Variance form a frequency table
USE CALCULATOR
- menu
- 6:Statistics
-1: 1-variable
- Fill Table (using Midpoints)
- Option
- 3: 1-variable Calc
For coded data what terms affect the Standard Deviation?
only terms multiplied or divided
Histogram how do you calculate Frequency Density ?
frequency / class width
How do you draw a boxplot from a cumulative frequency diagram?
Find median in CumFreq and trace down to x-axis to get the data correlation
Half median to get LQ and add LQ to Median to get UQ then again map to x-axis to get actual data for each
Top value is most extreme value MAXIMUM
and Bottom value is least extreme value on the boxplot MINIMUM
Why use a histogram?
data is continuous
no gaps
Comparing two diagrams what should you talk about?
- Location
- Spread
What is the range for ‘r’ in Product Moment Correlation Coefficient?
-1<= r <= 1
PMCC measures strength and positivity of correlation - this is why can only go between 1 and -1
Regression line
Line of Best Fit
y = a + bx
What is meant by the term Interpolation?
Estimating inside the data range
What is meant by Extrapolation?
estimating outside the data range
When do you change boundaries in grouped data?
when there is a gap and it is continuous data
use these new boundaries to find midpoints
Frequency Diagram
(x,y) <- plot points
x=upper class boundary
y=cumulative frequency
join with a curve
frequency polygon
(x,y)
x= midpoint
y=frequency density
join up with straight lines
Area of bar [histogram]
PROPORTIONAL to frequency
(if not to scale find scale factor)
always check scale on histogram
What are histograms used for?
grouped, continuous data
Pros and Cons of Box Plot
+ highlights outliers
+easy to compare data sets
-data categorised in 4 sections means detailed analysis is not possible
Pros and Cons Histogram
+clearly shows shape of distribution
- doesnt always highlight outliers
- not easy to estimate quartiles and median
Cumulative Frequency Curve Pros and Cons
+ makes it easy to find the min, LQ, median, UQ and max
- doesnt always highlight outliers