Probability Flashcards
normal distribution
continuous data with a symmetric distribution which is represented by a bell-shape
characterised by two parameters: μ (mean) and σ2 (variance),
where the mean corresponds to the centre and the variance corresponds to the spread/width of the data.
check the normality of data by
histogram
normal probability plot (Q-Q plot)
box plot
how to examine a histogram to check distribution
It is bell shaped and symmetrical if the data are normally distributed.
also check the symmetry by comparing the mean and the median.
If they are approximately equal, then the data are symmetrical.
how to use Q-Q plot to check distribution
If the data are normally distributed, all the data points will lie on a straight line.
check that assumption.
It’s just a visual check, not an air-tight proof, so it is somewhat subjective
how to use box plot to check distribution
checking symmetry and outliers by the quartiles (vertical lines) of data from any distribution, because it is non- parametric. The thick vertical line representing the 2nd quartile is actually the median and the circles are outliers.
If the data are symmetrical, the lengths of the two dashed horizontal lines are equal and the thick vertical line should split the box into two equal parts.
what is a Q-Q plot
scatterplot created by plotting two sets of quantiles against one another.
Two of the main distributions associated with discrete data
the binomial and the Poisson distributions.
binomial distribution
special distribution which can be thought of as the probability associated with a number, say n, of binary identically distributed events (with the same probability) occurring.
Therefore the maximum number of events is n and the minimum is 0.
The Poisson distribution
Unlike the binomial distribution, there is no upper limit on the value that the variable can take
what is a quantile
a value in the data set such that the specified percentage of data lie below that value.
Bootstrapping
is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows the estimation of the sampling distribution of almost any statistic using random sampling methods.
a confidence interval
is a range of estimates for an unknown parameter. A confidence interval is computed at a designated confidence level; the 95% confidence level is most common
what is distrubtion
The pattern by which a
measurement or frequency
varies.