ch 12 - Data-based and statistical Reasoning Flashcards
measures of central tendency
those that describe the middle of a sample; how middle is defined can be different
mode
number that appears most often in a set of data
normal distribution
mean, median and mode are at center of distribution
standard distribution
mean is zero and standard deviation of one
skewed distribution
contains a tail on one side or the other of the data set. Negatively skewed is tail to the left with mean lower than median and positively is tail to the right with mean higher than median
bimodal
distribution containing two peaks; do not have to have two modes.
range
difference between data set’s largest and smallest values
interquartile range
related to the median, first and third quartiles; gathered by subtracting the value of the first quartile from the value of the third quartile (IQR = Q sub 3 - Q sub 1)
quartiles
include median (Q sub 2), divide data when placed in ascending order into groups that comprise one-fourth of the entire set; first quartile is 1/4n (number of data) and mean of number at whatever position that is and the number of the next position.
example of using interquartile range to determine outliers
find range which is third quartile - first quartile. Use this range to multiply times 1.5 and add this number to third quartile. Anything above this number is an outlier. Use range to multiply times 1.5 and subtract this number from first quartile - anything falling below this number is an outlier
standard deviation
calculated by taking the difference bt each data point and the mean, squaring this value, dividing the sum of all of these squared values by (the number of points in the data set minus one (so divided by n-1)), and taking the square root of the result
determining outlier via standard deviation
after standard deviation is determined, if a value falls more than 3 x standard deviations outside of mean, it is an outlier.
standard deviation and normal distribution
68% of data points fall within one standard deviation of the mean, 95% fall within two standard deviations, and 99% fall within three standard deviations
independent events in probability
events that have no effect on one another
dependent events in probability
have an impact on one another, such that the order changes the probability
mutually exclusive outcomes
cannot occur at the same time; probability of them occurring together is 0%
exhaustive outcomes
a group of outcomes that is all inclusive so that there are no other possible outcomes
calculating independent probability
P(A) x P(B) - probability of the first option x probability of the second option equals probability that both will occur
probability of at least one of two events occurring
P(A) + P(B) - P(A and B)
hypothesis testing
begins with an idea about what may be different bt two populations
null hypothesis
always a hypothesis of equivalence; says that two populations are equal, or that a single pop can be described by the parameter equal to a given value; when able to rejected based on p-value being greater than significance level (alpha), it means results are statistically significant
alternative hypothesis
may be nondirectional meaning that the populations are not equal, or directional
z- or t-tests
most common hypotheses; rely on standard distribution or the closely related t-distribution
test statistic
calculated and compared to a table to determine the likelihood that that statistic was obtained by random chance (under the assumption that our null hypothesis is true)