Ch 12 - Data Based and Statistical Reasoning Flashcards
What does measure of central tendency provide?
a single value representation for the middle of a group of data
What is the arithmetic mean (average)?
a measure of central tendency that equally weighs all values; it is not affected by outliers
What is the median?
the value that lies in the middle of the data set
- 50% of data points are above and below the median.
(n+1)/2 where n is the number of data values
What is the mode?
the data point that appears most often; there may be multiple (or 0) modes in a data set
What is normal distribution?
symmetrical; the mean, median, and mode are all the same
What is standard distribution?
a normal distribution with a mean of 0 and a standard deviation of one
- used for most calculation
- 68% of data points occur within one standard deviation of the mean, 95% within 2, and 99% within 3
What are skewed distributions?
have differences in their mean, median, and mode
- the skew direction is the direction of the tail of the distributions
What are bimodal distributions?
have multiple peaks, although not necessarily multiples modes
- may be useful to perform data analysis on the 2 groups separately
What is range?
the difference between the largest and smallest values in a data set
What is interquartile range?
difference between the value of the 3rd quartile and the 1st quartile (IQR = Q3 - Q1)
- can be used to determine outliers
Q1 = n x 1/4
Q3 = n x 3/4
(if whole, take mean of this position and next, if decimal round up to the next whole number and take that as quartile position)
What is standard deviation?
a measurement of variability about the mean
- can also be used to determine outliers
sigma = square root (sum (value - mean)^2)n-1
What are outliers?
may be a result of true population variability, measurement error, or a non normal distribution
- any value lower than 1.5 x IQR below Q1 or any value higher than 1.5 x IQR above Q3
- any value that lies more than 3 standard deviations from the mean
What are independent and dependent events?
- the probability of independent events does not change based on the outcome of other events
- the probability of dependent events changes depending on the outcome of other events
What are mutually exclusive outcomes?
cannot occur simultaneously
What does it mean for a set of outcomes to be exhaustive?
there are no other possible outcomes
How are hypothesis tests used?
use a known distribution to determine whether a hypothesis of no difference (the null hypothesis) can be rejected
What does the p value determine?
whether or not a finding is statistically significant is determined by the comparison of a p value to the selected significance level
- significance level 0.05 is commonly used
How do the mean, median, and mode compare for a right-skewed distribution?
the mean of a right (positively) skewed distribution is to the right of the median, which is to the right of the mide
How do range and standard deviation generally relate to one another mathematically?
where the data are not available, the range can be approximated as 4x the standard deviation
Why would the average difference from the mean be an inappropriate measure of distribution?
- the average distance from the mean will always be 0, which is why we always square the distance from the mean and then take the square root at the end for standard deviation
- it forces all of the values to be positive numbers, which will not cancel out to 0
What are the probability rules?
- and: multiply the probabilities
- or: add the probabilities (and subtract the probability of both happening together)
What are confidence intervals?
used to determine a potential range of values for the true mean of a population
How is the p value calculated during a hypothesis test?
after the test statistic is calculated, a computer program or table is consulted to determine the p value of the statistic
How is power related to probability?
power is the probability that the individual rejects the null hypothesis when the alternative hypothesis is true for the population
How do exponential and parabolic curves differ in shape?
they both have a steep component; however, exponential have horizontal asymptotes and become flat on one side while parabolic are symmetrical and have steep components on both sides of a center point
What is correlation?
refers to a connection - direction relationship, inverse relationship, or otherwise - between data
What is causation?
correlation does not necessarily imply causation, but causation does mean correlation
What is required in order for a conclusion to be useful?
there must be practical (clinical) and statistical significance
What is a type 1 error?
the probability of mistakenly rejecting the null hypothesis
- set by selecting a significant level
- p value is greater than significance level
How is the confidence level increased?
to increase the confidence level, one must increase the size of the confidence interval to make it more likely that the true value of the mean is within the range
- thus making the confidence interval wider
What is the most common measure of distribution?
standard deviation; it is most closely linked to the mean of a distribution and can be used to calculate p values, which are probabilities