Lecture 2 - Describing & Summarising Data + Normal Distribution Flashcards
measures of central tendency:
mode (most frequent value), arithmetic mean (n) & the median (middle value in ranked dataset)
what measure of central tendency is affected the most by extreme values?
the mean is affected most by extreme values, the median would not be affected as much
how do you round with your mean values?
you always round your mean values to one decimal place (e.g. 4.988 —> 5.000)
what do histograms primarily show?
frequency
what does a positive skew graph look like?
left-slanted bell shape
what does a negative skew graph look like?
right-slanted bell shape
the more variables in our data…
… the less certain we can be about the estimates from the data, such as the mean
sum of squares:
total sum of squares = sum of all observations ( value in a sample - mean value of a sample)^2
what is the problem with the sum of squares equation?
the more data points you have, the bigger the sum of squares value will be
unreliability is proportional to:
variance
standard deviation equation:
standard deviation = √sum of (each value - mean)^2 / size of population
what does standard error of the mean calculate and how does it differ from standard deviation?
standard error calculates the scatter of the mean values, whereas the standard deviation is the scatter of the raw data values (observations)
Two Standard Error rules of thumb:
1) standard error is a measure of how confident we are that our sample mean is close to the population mean
2) in 95.5% of cases the population mean will fall within ca. 2 standard errors of the sample mean
Gaussian Distribution:
same as normal distribution it is a common continuous probability distribution
it is bell shaped asymptotic at the extremes and symmetrical around the mean with no skew: mean = median - mode
area under the curve is directly proportional to the relative frequency of observations and their probability (p)
what is the Gaussian (Normal) Distribution important for?
statistical analysis