Topic 1 - Summarising Data Flashcards
What is a 5-number summary?
A five-number summary simply consists of the smallest data value, the first quartile, the median, the third quartile, and the largest data value.
What is a first quartile ?
• First quartile (known as lower quartile or Q1): is the point between the
lowest 25% of values and the highest 75% of values. It is also called the
25th percentile.
What is a second quartile?
• Second quartile (known as median or Q2): the midpoint of the dataset
What is third quartile?
• Third quartile (known as upper quartile or Q3): is the point between the
lowest 75% and highest 25% of values. It is also called the 75th percentile.
How to work out the location of first quartile?
Loc Q1 = 0.25(n+1)
How to work out the location of second quartile?
Loc Q2 = 0.5(n+1)
How to work out the location of third quartile?
Loc Q3 = 0.75(n+1)
What is a maximum data point?
• Maximum: largest value in the dataset
How to work out location of a percentile ?
Loc =( p/100) x (n+1)
How to work out median?
Location = 0.5(n+1)
What is a central tendency?
Measures of central tendency are examples of descriptive data statistics that depict an overall ‘central’ trend of a set of data. There are three key measures: Mode, Median , Mean
What does spread include?
- Range : the difference between the smallest value and the largest value in a dataset
Range = Maximum - Minimum
- Percentile (quartile is a special case): the value below which a percentage of data falls
Loc = (p/100) x (n+1)
- Interquartile range (IQR): the difference between the upper quartiles
IQR = Q3 - Q1
What is a variance ?
Variance is a statistical measurement used to determine how far each number is from the mean and from every other number in the set.
What is a standard deviation
It tells you, on average, how far each score lies from the mean. In normal distributions, a high standard deviation means that values are generally far from the mean, while a low standard deviation indicates that values are clustered close to the mean.
How to calculate variance and standard deviation?
1.) calculate the range = maximum - minimum
2.) calculate IQR = IQR = Q3-Q1
3.) calculate variance ((x(i)-x)^2)/ (n-1) = s^2 ( 2 d.p)
- when calculating (x(i)) it represents the dataset
- when calculating (x) it is the mean of the dataset
- when calculating (n) it represents the total no. Of datasets
4.) calculate standard deviation find the square root of S from above
What is frequency of a value?
Is the number of times it occurs in a dataset
What is a frequency distribution
Is the pattern of frequencies of a variable. It is the number of times each possible value occurs in a dataset
What is relative frequency distribution
The proportion (probability) of observations of each value or class interval of a variable
What is a histogram
Is a plot that uses class/intervals on the x-axis and the frequency on the y-axis
What is nominal data?
Nominal: it is a naming scale, where
variables are simply “named” or
labeled, with no specific order:
Gender, postcode, political
preference,
What is ordinal data?
Ordinal: it has all its variables in a
specific order, beyond just naming
them: socioeconomic status, military
ranks, and letter grades for
coursework,
What is discrete data?
Discrete: a data type that involves counting
rather than measurement. (a limited
number of values is possible): the number
of students attended class
What is continues data
Continuous: a data type that refers to the
unspecified number of possible
measurements between two realistic points
(data that can take any value, infinite
number of values is possible): height,
temperature,
What displays can be used for numerical data?
• A histogram displays the frequencies or relative frequencies in each
class
• A line chart plot shows data which occur at regular time intervals
• A box plot compares two or more numerical variables