2- Visualisation and Presentation of Data Flashcards
Discrete data
When the data values are quantitative and the numbers are finite or countable
Continuous data
Result from infinitely many possible quantitative values, where the collection of values is not countable.
Levels of measurement
Nominal
Ordinal
Interval
Ratio
Nominal level of measurement
- The nominal level of measurement is characterised by data that consist of names, labels, or categories only.
- The data cannot be arranged in an ordering scheme.
- Cannot be used for calculations.
- Numbers are sometimes assigned to the different categories.
Example of nominal level of measurement
Social security numbers: Substitutes for names, they do not count or measure anything
Yes/No/Undecided :survey responses
Ordinal level of measurement
- Data are the ordinal level of measurement if they can be arranged in some order
- The differences (obtained by subtraction) between data values either cannot be determined or are meaningless.
Examples of ordinal level of measurement
- Course grades: A college professor assigns grades of A, B, C, D or E. These grades can be arranged in order, but we cannot determine the differences.
- Ranks
Interval level of measurement
- Data are at the interval level of measurement if they can be arranged in order, and differences between data values can be found and are meaningful.
- Data at this level do not have a natural zero starting point at which none of the quantity is present
Example of Interval level of measurement
- Data are at the interval level of measurement if they can be arranged in order, and differences between data values can be found and are meaningful.
- Data at this level do not have a natural zero starting point at which none of the quantity is present
Examples of Interval level of measurement
Outdoor temperatures:18 ℃ and 34 ℃ are examples of data at this interval level of measurement. We can determine their difference of 16 ℃, but there is no natural starting point. Though 0 ℃ seems like a starting point, it is arbitrary and does not represent the total absence of heat.
Years: The years 1492 and 1776 can be arranged in an order, and we can determine the difference, and is meaningful but, time did not begin in the year zero, so zero is arbitrary instead of being a natural zero starting point representing “no time”.
Ration Level of measurement
- Data are the ratio level of measurement if they can be arranged in order, differences can be found and are meaningful, and there is a natural zero starting point.
- Here, zero indicates that none of the quantity is present
For data at this level, differences and ratios are both meaningful.
Examples of Ratio Level of measurement
- Car lengths: car lengths of 106 inch for a Smart car, 212 inch for a Mercury Grand Marquis, ( 0 inch represents no length, 212 inch is twice as long as 106 inch)
- Class times: the times of 50 mins and 100 mins for a statistics class (0 min represents no class time, and 100 mins is twice as long as 50 mins)
Extra of ratio measurement
- Ratio level of measurement is called ratio because the zero starting point makes ratios meaningful.
- Consider two quantities when one number is twice the other and ask whether “twice” can be used to correctly describe the data.
Example:
- We can say a person with a hight of 6ft is twice as tall as a person with hight 3ft – the heights are the ratio level of measurements
However, we cannot say 50 ℃ is twice as hot as 25 ℃, temperatures are not at the ratio level
Tables which can represent qualitative data
Frequency Table
Relative frequency table
Percentage frequency table
Cumulative frequency table
Tables which can represent quantitative data
Frequency, Relative frequency, percentage frequency, cumulative frequency tables – instead of using category names we use discrete values here
Charts which can represent quantitative data
Histogram
Bar chart
Line graph
Scatter graph
Box plot
Charts which can represent qualitative data
Bar graph
Pie graph
Measures of location
The mean,
The median,
The mode,
Skewness and kurtosis
Measures of dispersion (variability)
The range and Percentile
Quartiles and interquartile range
The mean deviation
The variance
The standard deviation.
The mean
- Perhaps the most important measure of location is the mean, or average value, for a variable.
- If sample data, the mean is denoted by 𝑥̅
- If a population, the Greek letter 𝜇 is used to denote the mean.
Mode- when there is no mode?
Unimodal
Mode- when there is 2 modes?
Bimodal
Skewness definition
A measure of the degree of asymmetry of a distribution.
Kurtosis definition
A measure of whether the data are peaked or flat relative to a normal distribution.
Description of negative skew
Has a high frequency or relatively high values and a low frequency of relatively low values, so the mean is dragged toward the left (the low values) of the distribution.
Mean, median and mode of negative skew
mode > median > mean
Description of normal skew
Said to be symmetrical ; the mean, median and mode have the same value and thus coincide at the same point of the distribution.
Mean, median and mode of normal skew
Mean=mode=median
Description of positive skew
Has a high frequency of relatively low values and low frequency of relatively high values, so the mean is dragged toward the right (the high values) of the distribution
Mean, mode and median of positive skew
Mean > median > mode
Is the mean a good measure of central tendancy with skewed data?
No- because it is sensitive to extreme values.
Kurtosis definition
A measure of whether the data are peaked or flat relative to a normal distribution.
Positive kurtosis
More peaked amongst the three distribution
Positive kurtosis
More peaked amongst the three distributions- Leptokurtic
Normal kurtosis
Mesokurtic
Negative kurtosis
Flattest distribution- no peak/ obvious curve- Platykurtic
Mean value, standard deviation and kurtosis of all kurtosis’
The mean value (therefore the standard deviation and the variance for all three distributions are the same).
Percentile
A percentile provides information about how the data are spread over the interval from the smallest value to the largest value.
The 𝑝^𝑡ℎ percentile is a value such that at least 𝑝 per cent of the observations are less than or equal to this value and at least (100- 𝑝) per cent of the observations are greater than or equal to this value.
How to calculate the pth percentile
1- Arrange the data in ascending order (smallest value to largest value)
2- Compute an index 𝑖, 𝑖=𝑝/100 𝑛
where 𝑝 is the percentile of interest and 𝑛 is the number of observations.
3 a) If 𝑖 is not an integer, round up. The next integer greater than i denotes the position of
the pth percentile.
b) If 𝑖 is an integer, the 𝑝^𝑡ℎ percentile is the average of the values in positions 𝑖 and 𝑖+1
Variance definition
The variance is a measure of variability that uses all the data. The variance is based on the difference between the value of each data and the mean. The difference is called a deviation about the mean.
How is the deviation about the mean expressed for a sample?
(𝑥_𝑖−𝑥̅)
How is the deviation about the mean expressed for the population?
(𝑥_𝑖−𝜇)
Standard deviation definition
The standard deviation is defined to be the positive square root of the variance.
What does a low standard deviation indicate?
Low standard deviation indicates that the values tend to be close to the mean of the data set.
What does a high standard deviation indicate?
High standard deviation indicates that the values are spread out over a wider range
Histogram frequency equation
Frequency= class interval x frequency density