Central Tendency + Dispersion Flashcards
why must a researcher analyse data?
to identify general patterns and trends
known as descriptive statistics (measures of central tendency and dispersion)
measures of central tendency describe the data in terms of an average
measures of dispersion describe the data in terms of how spread out it is
measures of central tendency
measures of central tendency inform us about central/middle values for a set of data
they are ways of calculating a typical value for a set of data
the average can be calculated in different ways each one appropriate for different situation
three types; MODE, MEAN and MEDIAN
mean
the arithmetic average of a data set
calculated by adding up all the data items and dividing by the number of data items there is
can only be used with the ratio and interval level data
median
the middle value in an ordered list
all data items must be arranged in order and the central value is the median
if there are an even number of data items, there will be two central values — to calculate the median you must add the two data items are divide by two
the median can be used with ratio, interval and ordinal data
mode
the most common item in a set of data
NOMINAL DATA — the category that has the highest frequency count
INTERVAL / ORDINAL DATA — the data item that occurs most frequently, to identify this the data items need to be arranged in order, the modal group is the group with the greatest frequency
if two categories of data items have the same frequency, the data has two modes and is considered bimodal
measures of dispersion
sets of data can be described in terms of how dispersed and spread out the data items are
two types; RANGE and STANDARD DEVIATION
range
the distance between the top and bottom values of the set of data
many sets of data have the same mean or other measure of central tendency so the range can be helpful as a further method of describing the data so the data can be shown to be different
always add one AFTER calculating the range
EXAMPLE = 15-3+1 — the 1 is added because the bottom value of 3 could represent a value as low as 2.5 and the top number 15 could represent a number as high as 15.5, the range in this case would be 13
standard deviation
the more precise method of expressing dispersion
standard deviation is a measure of the average distance between each data item above and below the mean, ignoring plus or minus values
shows the amount of variation in a set of data by assessing the spread of data around the mean
levels of measurement / different kinds of data
Nominal
Ordinal
Interval
Ratio
nominal
data in separate categories
such as grouping people according to their favourite football team
ordinal
data that is ordered or ranked in some way
for example, asking people to put a list of football teams in order of liking
the difference between each item is not the same, the individual may like the first item a lot more than the second but there may only be a small difference between the items ranked second and third
interval
data measured using units of equal intervals
uses accepted units of measurement
does not have a true zero point (e.g. 0 cm does not exist, nothing can be 0 cm long)
ratio
data with a true zero point
such as temperature
evaluation of measures of central tendency
MEAN
• the mean is the most sensitive measure of central tendency because it takes account of the exact distance between all the values of all the data, this means that it can be easily distorted by one or a few extreme values and thus end up being misrepresentative of the data as a whole
- the mean cannot be used with nominal data
- precise because it takes the exact values of all the data into account
THE MEDIAN
• the median is not as sensitive as the mean and is unaffected by extreme values so can be useful under such circumstances
• can be easier to calculate
THE MODE
• unaffected by extreme values
- much more useful for discrete data
- the only method that can be used when the data is in categories i.e. nominal data
- not a useful way for describing data when there are several modes
evaluation of measures of dispersion
RANGE
• easy to calculate
- affected by extreme values
- fails to take account of the distribution of the numbers, for example it doesn’t indicate whether most numbers are closely grouped around the mean or spread out evenly
STANDARD DEVIATION
• a precise measure of dispersion because it takes all the exact values into account
- it is not difficult to calculate as long as you have a calculator
- may hide some of the characteristics of the dataset, such as extreme values