Numerical Summaries Flashcards
Histogram
visualise quantitative results
Highlights the frequency of data in one class interval compared to another
density scale
Height of each block = proportion in the block/length of the class interval
The area of the whole histogram on the density scale is one (or, in percentage. 100%)
Simple box plot
Graphic display of numerical summaries
5 number summary of data set - the middle 50% of the data in a box, the expected maximum and minimum in the whiskers, and determines any outliers.
comparative box plot
splits up a quantitative variable by a qualitative variable.
Scatter plot
Examines the relationship between 2 quantitative variables.
Heat map
useful when a contingency table is not practical due to too many different values.
end point convention
If an interval contains the left endpoint but excludes the right endpoint, then 18 year old would be counted in [18,25) not [0,18)
crowding
high density within a class interval
Advantages of numerical summaries
A numerical summary reduces all the data to one simple number (“statistic”)
Precise number, less disagreement
Sample mean
unique point at which the data is balanced.
i.e. the numbers to the left of the mean are balanced by the numbers to the right of the mean.
Sample median
the middle data point, when the observations are ordered from smallest to largest.
Robust
Sample median is said to be robust and is a good summary for skewed data as it is not affected by outliers
compareing sample mean and median
The difference between the sample mean and the sample median can be an indication of the shape of the data.
For symmetric data, we expect the sample mean
to be the same as the sample median
For left skewed data, we expect the sample mean
to be smaller than the sample median