Topic 1 - Populations, Samples and Normal Distributions Flashcards
What are descriptive statistics?
Brief descriptive coefficients that summarise a given data set of an entire or sample population.
What is the 5 number summary?
range (min and max value)
median (2nd quartile, middle value)
quartiles (upper and lower, quarter of the way up and down the dataset).
What is the interquartile range?
the 3rd quartile - 1st quartile.
What would the IQR be of a dataset with 1st quartile 105.6 and 3rd quartile 111.1?
IQR = 3rd - 1st.
= 111.1 - 105.6 = 5.5
How would you display the 5 number summary?
Box and whisker plot.
Describe a box plot.
Top of box = 3rd quartile
Line across box = 2nd quartile (median).
Bottom of box = 1st quartile.
So 50% of data within the box.
Whiskers can extend to a min and max but to only a given multiple of the box height intended to allow outliers to be seen.
What are outliers?
Exceptionally small or large values.
What graph is a good way to display all data in a continuous sample?
Histograms
What happens to the shape of distribution as a sample size increases?
Becomes more and more regular - normal distribution.
What is inferential statistics?
The practice of using sampled data to draw conclusions or make predictions about a population from a sample.
How is a population often defined in inferential stats?
in terms of unknown parameters - µ, σ
What are the associated parameters of normal distribution?
µ - measure of location (mean)
σ - measure of spread (SD)
Why can’t the mean be a good representative of skewed data?
Because the mean is unduly influenced by few large values in a sample so it doesn’t represent the whole sample well.
What is an interpretable feature of the normally distributed curve?
Area under the curve.
If we call P the area under the curve up to any value of X, what is the meaning of P?
The probability that a randomly chosen member of population has value < X (cumulative probability).