Topic 3 (statistics) Flashcards
measures of central tendency adv and dis
mode:
pros- qualitative data, not affected by outliers or errors or omission always an observed data value
cons: doesnt use all data, not representative if low frequency or if other values have similar frequency
median:
pros-not affected by outliers or significantly affected by error or omissions
cons: doesnt make use of all data
mean:
pros: uses all values in set, large set = outlier not big impact on data
cons: data is small = outliers have a big impact
measure of dispersion/spread adv and dis
range:
pros reflects full data set
cons distorted by outliers
IQR
pros not distorted by outliers
cons doesnt reflect full data set (half is disregarded)
standard deviations
pros data set is large = few outliers = negligible impact
cons data set small outliers - big impact on data
outliers
a value that lies significantly outside the set of values of a variable
due to:
- errors in measuring/recording data
- natural variation
- clean data if value incorrect
- included if genuine result from natural variation
ways outliers are defined
anything bigger than Q3 + k(Q3-Q1)
anything smaller than Q1- k(Q3-Q1)
k = typically constant 1.5
OR
anything more than a given number of standard deviations from the mean
advantages of a stem and leaf diagram
visibility of data easy to spot clusters and outliers
convenient to calculate median mode and range
can compare 2 data sets easily back to back
cumulative frequency graphs
cf on y axis
variable on x axis
plot the upper bounds
Frequency density
Frequency divided by class width
Histograms
Continuous data No gaps between bars Height = freq density Area of bars proportional to frequency Plot at bounds
no skew/ symmetrical skew
Q2 - Q1 = Q3-Q2
mean = median = mode
use median and IQR when data skewed
positive skew
more data to the left
Q2 - Q1 < Q3-Q2
mean >median > mode
negative skew
more data to right
Q2 - Q1 > Q3-Q2
mean < median < mode
3(mean - median) divided by standard deviation
postive for positive skew
negative for negative skew
0 for symmetrical skew
greater value = stronger skew
comparing data:
comment on measure of location (usually mean or median)
comment on measure of spread
make comparison in context
compare median and IQR (not affected by extreme values, or if the data is skewed)
compare mean and standard deviation (used when data are fairly symmetrical)