Lec 2 Flashcards
Mode
❖ It is the value which occurs most frequently.
Data distribution with one mode is called
Unimodal
If all values are different there is no mode or called
Non modal
Sometimes, there are more than one mode: two modes is called
…………..; more than two is called
……….. distribution.
Bimodal
Multimodal
Normally, the mode is used for ………………where we wish to
know which is the most common category
Categorical data
What is the advantage of the mode
Sometimes gives a clue about the etiology of the disease.
Disadvantages of mode
With small number of observations, there may be no mode
✓ It is less amenable to tests of statistical significance
This is the mode value for the data set
0,3,4,5,7,7,7,7,7,8,10,10
7
The Mean
Mean It is the average of the data or the sum of all values of
a set of observations divided by the number of these observations.
most popular and well known
measure of central tendency
The mean an average
Mean used with
Discrete and continuous data
Advantages of mean
Uniqueness: For a given set of data there is one and only
one mean, it is single value.
✓ Simple to compute.
✓ All values are included.
Disadvantages of mean
The main disadvantage of mean is the presence of extreme
values, i.e. very high or very low values.
The Median (50th percentile)
The median of a data set is the value that lies exactly in the middle.
To calculate the median:
The value of the median will be
value in the middle for odd
number and
the average of the two values for even numbers.
Advantages of median:
.
Advantages of median:
✓ It is a single value,
simple, easy to compute easy to
understand,
unaffected by extreme values.
Disadvantages of median
It provides no information about all values (observations).
✓ It is less amenable than the mean to tests of statistical
significance.
Mode described
Qualitative categorical data
Mean is ……… value
Median is ……..value
Mode is……….value
Single (unique)
Single (unique)
Sometimes it’s not unique
quintile
statistical value of a data set that represents 20% of
a given population.
✓ The first quintile represents the lowest fifth of the data (1 -20%)
✓ The second quintile represents the second fifth (21% - 40%) and
so on.
tertiles
❖ A population split into three equal parts is divided into
One of the most common metrics in statistical analysis, the
………., is actually just the result of dividing a population into
…………
Median
Two quantiles
Quartiles
These are the observations in an array that divide the distribution
into four equal parts
❖ 1st (lower Quartile): the value below which 25 of observations lie
in an ordered array.
❖ 2nd quartile = Median = 50th percentile
❖ Upper Quartile = 75th percentile
❖ Interquartile Range: is the middle 50 % of all observations (From
25-75)
Median result from
Two quantile
Second quartile
50 th percentile
Centiles
Those values, in a series of observations arranged in ascending
order of magnitude, which divide the distribution into 100 equal
parts.
Define the range
It considers important ⁉️
Should be used
The range is the difference between the largest and the smallest
observation in the data.
❖ It is an important measurement However, they do not give much
indication of the spread of observations about the mean.
❖ Should be used in conjunction with other measures of variability.
Advantages and disadvantages of the range
Advantages
✓ Simple to calculate
✓ Easy to understand
Disadvantages
✓ It neglect all values in the center and depend on the extreme
value, and extreme value are dependent on sample size.
✓ It is not based on all observations.
✓ It is not amenable for further mathematic treatment.
Variance
The average of sum of squares of the deviation from the mean
The standard deviation
The standard deviation measured the variability between
observations in the sample or the population from the mean of
that sample or that population.
Standard error of the mean
It measures the variability or dispersion of the sample mean from
population mean.
❖ It is used to estimate the population mean, and to estimate
differences between populations means.
Coefficient of variation (CV)
❖ It has no unit.
Coefficient of variation used to compare
It measure
It is used to compare dispersion in two sets of data especially when
the units are different.
relative rather than absolute variation.
in consideration all values in the set.
Variance can never be
A negative value
The problem with the variance is the
Squared unit
Standard deviation it is ……..
Unit of standard deviation …….
Square root of the variance
Unit not squared