16 - Averages and the normal distribution Flashcards
grouped data definition
where the frequency is shown in terms of a range
ungrouped data definition
individual data points
how to find mode from grouped data (most commonly in a histogram)
identify highest frequency class, draw diagonal line from top of block to either side of the highest class
intercept = estimated modal value readable from x axis
how to find median from an ogive
median rank = cumulative total of the variable / 2
then look across y axis at this rank
advantages and disadvantages of mean
+
- used frequently
- understood
- uses all data
- may not be value in distribution
- distorted by extreme high low values
- ignores dispersion
advantages and disadvantages of mode
+
- not distorted by high/low values
- corresponds to actual value in data
- ignores dispersion
- doesnt take all data into account
advantages and disadvantages of median
+
- not distorted by high/low values
- corresponds to actual value in distribution
- ignores dispersion
- limited use
what is dispersion
method of determining location or central point of distribution
shows the spread of a variable about its average
standard deviation is a measure of dispersion, larger the SD, more dispersed the data is
what is x in grouped data
the mid point
how to work out fx2 (squared)
frequency multiplied by x2
what is the coefficient of the variance
standard deviation as a % of the mean - higher %, higher dispersion
properties of standard deviation
- based on all values in distribution
- suitable for further statistical analysis
- more difficult to understand
what is variance
square of standard deviation
what is range and quartiles
RANGE = measure of spread between highest and lowest values
QUARTILES = divide distribution into quarters
interquartile range
Q3 - Q1
quartile deviation
1/2 (upper quartile - lower quartile)
decile
distribution in 1/10ths
normal distribution
distribution symmetrical around the mean - important as it arises in real life
reason for bell shape
higher concentration closer to mean and lower away from mean
is always symmetrical and either side represents 50%
what is the total area under the curve
100% of population
properties of normal distribution
- width is measured as standard deviation, 3 standard deviations on each side of mean
- range is 6 standard deviations
what is a Z score
distance from the mean in the normal distribution measured by number of standard deviations they represent
eg,
translates 1.9 standard deviations away from the mean into a % away from mean
adjustment factor formula
standard class width / current class width
how to find the likelihood of something from raw numbers
- identify z score using formula given of x - mean / SD
- read off the table to find the % within that z score
- manipulate figure to find out what q is asking eg, more, less or between
how to find the value of x when given a % above or below
- identify what z score to find, eg if we are finding 80% above, we need to find a z score from 30% (50% already accounted for by the mean)
- find z score with this % then substitute into the formula
- solve formula to find x