Chapter 12 - Data-Based and Statistical Reasoning Flashcards
Measures of central tendency:
measurements that describe the middle of a sample
Outlier:
an extremely large or extremely small value compared to the other values
Median:
- what it is also known as
- relationship to outliers
- if mean and median are far from each other
- if mean and median are very close
- equation
- Midpoint; where half of data points are greater than the value and half are smaller
- Least susceptible to outliers, but not useful for data sets with very large ranges or multiple modes
- If the mean and median are far from each other, implies the presence of outliers or a skewed distribution
- If the mean and median are very close, this implies a symmetrical distribution
Mode:
- what it is
- if a data set has two modes
- Number that appears the most often in a set of data
- If a data set has two modes with a small number of values between them, it may be useful to analyze these portions separately or to look for other variables that may be responsible for dividing the distribution into two parts
Normal Distributions:
- what is all the same
- basis for what
- All of the measures of central tendency are the same
* We can transform any normal distribution to a standard distribution, with a mean of zero and a standard deviation of one* - Basis for the bell curve
Skewed Distribution:
- what they are
- negatively skewed distribution
- where tail is
- mean and median relationship
- positively skewed distribution
- where tail is
- mean and median relationship
1. Skewed distribution: one that contains a tail on one side or the other of the data set
- Negatively skewed distribution
- Tail on the left (or negative) side
- Mean will be lower than the median
- Positively skewed distribution
- Tail on the right (or positive) side
- Mean will be higher than the median
(in image: a = negative, b = positive)
Bimodal Distributions:
Bimodal: a distribution containing two peaks with a valley in between
May only have one mode if one peak is slightly higher than the other
Range:
- what it is
- does not consider what
- relationship to outliers
- relationship to standard deviation
- equation
- difference between its largest and smallest values
- Does not consider the number of items of the data set
- Heavily affected by the presence of outliers
- Possible to approximate the standard deviation as one-fourth of the range
- Range = xmax − xmin
Interquartile range + Quartiles:
- what they are
- equation for IQR
Interquartile range: related to the median, first, and third quartiles
Quartiles: including the median (Q2), divide data into groups that comprise one-fourth of the entire set
- The interquartile range is then calculated by subtracting the value of the first quartile from the value of the third quartile:
IQR = Q3 – Q1
Standard Deviation:
- can be used to determine what
- what determines an outlier
- on a normal distribution
- one standard deviation
- two standard deviations
- three standard deviations
- Can be used to determine whether a data point is an outlier
2. If a data point falls more than three standard deviations from the mean, it is considered an outlier
- On a normal distribution:
- 68% of data points fall within one standard deviation of the mean
- 95% fall within two standard deviations
- 99% fall within three standard deviations
Reasons why outliers occur: (3)
- A true statistical anomaly (ex: a person who is over seven feet tall)
- A measurement errors (ex: reading the centimeter side of a tape measure instead of inches)
- A distribution that is not approximated by the normal distribution (ex: a skewed distribution with a long tail)
Independent events vs. Dependent events:
Independent events: have no effect on one another
ex: rolling a dice, picking it up, and rolling it again
Dependent events: do have an impact on one another, such that the order changes the probability
ex: container with five red balls and five blue balls, if you pick up one and don’t put it back, probability changes
Mutually exclusive outcomes:
cannot occur at the same time
Ex: Cannot flip both heads and tails in one throw
Exhaustive (when describing a group):
describes a group when there are no possible outcomes
Ex: flipping heads or tails are exhaustive outcomes of a coin flip; these are the only two possibilities
Null hypothesis (H0):
a general statement or default position that there is no relationship between two measured phenomena, or no association among groups
the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error
Says that two populations are equal, or that a single population can be described by a parameter equal to a given value
Assumed to be true until evidence indicates otherwise