Chapter 12: Data-based and Statistical Reasoning Flashcards
Mean or average
These calculated by adding up all the individual values within the data set and then dividing the result by the number of values. The meaning may be parameter or statistic depending on whether we are discussing a population or sample. Having an outlier (and extremely large or extremely small value compared to the other data values) can shift the mean towards one end of the range.
Median.
Value for a set of data is its midpoint, where half of the data are greater than the value and half are smaller. In datasets with an odd number of values, the medium will actually be one of the data points. In datasets with an even number of values, the medium will be the mean of the two central data points. A data point must be first listed in increasing fashion.
Mode.
The number that appears the most often is set of data. When we examine distributions, the peaks represent modes.
Normal distribution.
The normal distribution has been solved in the sense that we can transform any the normal distribution to a standard distribution with a mean of 0 and a standard deviation of one. In a normal distribution, all of the measurements of central tendency (mode, mean and median) are the same. Approximately 68% of the distribution is within one standard deviation of the mean, 95% within two, and 99% within 3.
Skewed distribution.
It’s one of the contains a tail on one side or the other of the data set. A negatively skewed distribution has a tail on the left side, where the positive skewed distribution has a tail on the right. The mean of a negative skew distribution will be longer than the medium, while the mean of a positive skewed distribution will be higher than the medium.
Bimodal distributions.
A distribution containing two peaks with the valley in between. They can often be analysed as two separate distribution.
Range
Data size, the difference between its largest and smallest value. It is heavily affected by the presence of a data outliers. It is possible to approximate the standard deviation is 1/4 of the range.
Interquartile range.
IQR = Q3 – Q1
Any value that falls more than 1.5 IQR below the first quartile or above the third quartile is considered an outlier.
Standard deviation.
It is calculated relative to the mean of the data. We calculate it by taking the difference between each data point and the mean, squaring this value, dividing the sum of all of these squared values by the number of the point, the data set -1.
Dependent events.
Do impact each other such that the order changes the probability
Independent events.
Do not impact each other, so their probabilities are never expected to change.
Mutually exclusive outcomes.
Cannot occur at the same time.
Exhaustive.
If there are no other possible outcomes.
In probability when using the word:
And: Multiply the probabilities.
Or: Add the probabilities and subtract the probability of both happening together.
Correlation
Reverse your connection between data. If two variables shrink together, that is, as one increases, so does the other, there is a positive correlation. If two variables train in opposite directions, there’s a negative correlation. Correlation coefficient. A number between -1 and +1 that represents the strength of the relationship. A correlation coefficient of plus one indicates a strong positive relationship. A value of -1 indicates a strong negative relationship. In, a value of 0 indicates no apparent relationship.