S1.2 Summary Statistics Flashcards
Calculate summary statistics for single data sets and use them in the interpretation of data.
Statistics
Statistics is the science of collecting, classifying and analyzing information.


Statistic
A statistic is a numerical value (such as the mean or range) calculated from a set of data.

Measures of Location

The mean, median and mode are three summary statistics that represesent the centre or the average of a set of data. They are called the measures of central tendency ( or measures of location).
Mean
(Measure of Location)
The mean is the average score in a set of data.

Median
(Measure of Location)
The median is the score which is located in the middle of an ordered set of data. The median divides the data into two equal groups.

Mode
(Measure of Location)
The mode is the most occuring score in a set of data. The mode is more useful for categorical data.

Quantiles
Quantiles are points in a set of ordered data which divide the data into equal groups. Commonly used quantiles are quartiles, deciles and percentiles.
Quartiles
Quartiles (Q1, Q2, Q3) divide a data set into 4 equal groups.
Q3 separates the lower 75% of scores from the uper 25% of scores.

Deciles
Deciles (D1, D2, … D<span>9</span>) divide a set of data into 10 equal groups.
D2 cuts of the lower 20% of scores from the upper 80% of scores

Percentiles
Percentiles (P1, P2, P3, … P100) seperate a large set of data into 100 equal groups.

Measures of Spread
A second important feature of a set of data is how spread out its scores are.
Three statistics which measure the spread of the scores in a set of data are the range, interquartile range (IQR) and standard deviation.
Range
(Measure of Spread)
The range is a measure of spread.

Interquartile Range
(Measure of Spread)

Standard Deviation
(Measure of Spread)
Standard deviation measures how different each score in a data set is from the mean.

Outlier
An outlier is a very high or a very low score in a set of data which is clearly apart from the other scores.

Effect of Outliers
Outliers can affect the reliability of some measures of spread and location.

Detecting Outliers


Five Number Summary

Boxplot
A boxplot (or box-and-whisker plot) is a plot of a five-number summary.

Cumulative Frequency Graph & Polygon
The CF polygon (ogive) starts from the beginning of the histogram and joins the top right-hand corner of each column.

Estimating Quartiles
Q1, Q2, and Q3 can be estimated using the Cumulative Frequency Polygon (Ogive).

Shape of a Distribution
The shape of a distribution (a set of data) shows how the data is spread. The distribution of a set of data can often be classified as being either symmetric, positively skewed or negatively skewed.
Symmetric Distribution
A symmetric distribution is evenly spread either side of its centre.
Positively Skewed
Positively skewed data has a higher propotion of low scores.
Negatively Skewed
Negatively skewed data has a higher proportion of high scores.
Samples & Populations

The population mean and standard deviation are called parameters, The sample mean and standard deviation are statistics used to estimate the values of the population parameters.