Topic 2 (statistics) Flashcards
Central tendency
Averages
mode
median
mean
Mode
Value in class that occurs most often
Median
Q2
The middle value when data values are in order
N divided by 2 = whole number = find midpoint of corresponding term and the term above
N divided by 2 = not a whole number = round number up and chose that corresponding term
Mean
Sum of all values divided by the number values
Swiggly Z = sum of
-x x bar is the mean if the population
X bar = sum of x divided by n
Pros and cons of mode
Pros Used with non numerical data and numerical (qualitative/quantitative) Not skewed by outliers and anomalies Always an actual data value Used for single mode or bi-modal data
Cons
Sometimes no mode or multiple modes
Doesn’t consider every data point
Pros and cons of median
Pros
Not skewed by extreme values
Unaffected by an anomaly or outlier
Cons
Not always being a data value
Doesn’t use every value in data set
Pros and cons of mean
Pros
Considers all the data values
Cons
Can be skewed by extreme values
Affected by anomalies
Lower quartile
Q1 25%
N divided by 4
Upper quartile
Q3 75%
3n divided by 4
Decile and percentile
Decile - splits data in tenths - D
Percentile - splits data into hundredths - P
3 averages
Mean
Mode
Media
4 measures of spread
Range
IQR
VARIANCE
STANDARD DEVIATION
median raw data & data has been grouped
list of data:
- no observations in ordered list find n divided by 2
- if whole no = find midpoint of that term and term above
- if not whole no = round number up and chose term
unknown data points:
-no observations in ordered list find n divided by 2 = approximate position of mean
lower quartile and upper quartile raw data & data has been grouped
list of data(raw)
- no observations in ordered list find n divided by 4 (LQ) or n x3 divided by 4 (UQ)
- if whole no =find midpoint of that term and term above
- if not whole no = round number up and chose that term
unknown data points:
-no observations in ordered list find n divided by 4 =
approximate position of lower/upper quartile
linear interpolation
assumption all the points in a class are spread in a linear way ( find where the median/UQ/LQ lies in a class) -gives you an actual value using all data points
measures of dispersion
how spread out the data is (consistency)
range and IQR +ves and -ves
Range:
positives doesnt exclude extreme values
negatives includes extreme values and can be skewed
IQR:
positives: excludes extreme values
negatives doesnt include all data
standard deviation
mean of the squares mines the squares of the means
average distance from the mean
(formula in formula book )
coded data implications
when add or subtract = standard deviation stays same and also applies to range or IQR
add or subtract = affects median/mean and mode
multiply or divided affects all of them including variance