Exploratory Data Analysis Flashcards
What is a normal distribution
A normal distribution is a probability distribution that is symmetric around its mean, examples are heights and weights of people, IQ scores. In a normal distribution, the mean, median, and mode are all equal
What is a skewed distribution
A skewed distribution is a probability distribution where the data is not symmetric around the mean, and one tail of the distribution has more extreme values than the other. There are two types of skewed distributions: left-skewed (negative skew) and right-skewed (positive skew)
What is an example of left-skewed distributions
Prices of used cars, where there are more cards with a high price than with a low price
What is an example of right-skewed distributions
Distributions of age at first marriage, where there are more people who get married at a younger age than at an older age
What is a uniform distribution
A uniform distribution is a probability distribution where all values have an equal chance of occurring. This means that the probability of any value within a given range is the same.
What is an example of uniform distribution
Rolling a fair die, where each number has an equal chance of being rolled
What is a bi-modal distribution
A bi-modal distribution is a probability distribution where there are two distinct peaks, or modes, in the data. This indicates that there are two underlying subpopulations within the data that are distinct from each other.
What is an example of bi-modal distribution
An example of a bi-modal distribution is the distribution of heights for a population that includes both adults and children
What are some key features of a normal distribution
Some key features of a normal distribution include the fact that it is symmetric, the mean, median and mode are all equal, and the frequency falls off in both directions away from the centre
What is the area under the curve of a normal distribution
The area under the curve of a normal distribution is equal to 1, meaning that the probabilities of all possible outcome sum up to 1
What are the two parameters that determine the shape of a normal distribution
The two parameters that determine the shape of a normal distribution are the mean and the standard deviation
What is the empirical rule
The empirical rule is a statistical rule of thumb that states that for a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, 95% of the data falls within two standard deviations of the mean, and 99.7 of the data falls within three standard deviations of the mean
Can a distribution be both normal and skewed
No, a distribution cannot be both normal and skewed. A normal distribution is always symmetric, while a skewed distribution is not symmetric
What is the difference between normal distribution and a uniform distribution
A normal distribution is bell-shaped and symmetric around the mean, while a uniform distribution is flat and all values are equally likely
What is the difference between skewed left and skewed right
Skewed left and skewed right refer to the direction of the tail of the distribution. In a skewed left distribution, the tail is on the left side and the mean is smaller than the median. In a skewed right distribution, the tail is on the right side and the mean is larger than the media
How do skewed distributions impact statistical analysis
Skewed distributions can have a significant impact on statistical analysis because they can influence the interpretation of measures such as the mean and standard deviation
What is the relationship between the mean and median in a skewed distribution
In a skewed distribution, the mean and median can be different from each other. The mean is pulled towards the tail of the distribution, while the median remains in the centre
Why is a perfectly flat uniform distribution rare
A perfectly flat uniform distribution is rare because it would require an infinite sample size, which is not practical in most cases. In reality, even if the distributions are uniform, there will be some small variation due to sampling
What is the relationship between mean and median in uniform distribution
The mean and median are equal. This is because every value in the distribution has the same frequency of occurrence and contributes equally to the calculation of both mean and median
How does a bi-modal distribution differ from a normal distribution
A normal distribution is symmetrical with a single peak, whereas a bi-modal distribution has two peaks and is not symmetrical
What are some examples of phenomena that may exhibit a bi-modal distribution
Income distributions in certain societies, test scores for a bi-modal test, or bi-modal response patterns in psychological studies
What is the Inter Quartile Range (IQR)
The Inter Quartile Range is the range between the first and third quartiles of a dataset
What is the IQR used for
The IQR is used to measure the spread of data by identifying the range between the first quartile (Q1) and the third quartile (Q3)
What are the 6 different data points usually found on a box plot
Minimum, Quartile 1, Median (Q2), Quartile 3, Maximum, Extreme values (outliers)