chapter 3: descriptive statistics, numerical methods, and some predictive analytics Flashcards
what is the data set’s central tendency?
represents the center or middle of the data
what is the population mean (μ)?
average of the population measurements
Population parameter
number calculated from all the population measurements
describes some aspect of the population
population mean (μ (mew)) is part of this
Sample statistic
number calculated using the sample measurements
describes some aspect of the sample
to find a point estimate of a population
what is the sample mean
it is X bar
average of the sample
what is the median
The value of the middle point of the ordered measurements
value such that 50% of all measurements lie above (or below) it
what is the mode
The most frequent value
most frequently occurrent measurement in data
what happens to the median if the number of measurements is odd
the median is the middlemost measurement in the ordering (In increasing order)
the most in the. middle
are the values that are observed “most typically”
what happens to the median if the number of measurements is even
the median is the average of the two middlemost measurements in the ordering
so what is the difference between the mean and the median
the mean will find the average
the median will find the most middle point of the measurements (in increasing order)
what does it mean for data to have two modes
the data is bimodal
there is a higher frequency
if there is skewness, should you use the mean or the median
median
If there is skewness, the mean is trash to use
are all measures of central tendency necessarily typical values??
naaaah boyyy
what is a point estimate
one number estimate of the value of the population parameter
should not be a blind guess
what is “n” for the formula of the sample mean?
number of sample measurements
sample size
will the sample mean always equal the population mean??
aaaah boy
unless you are extremely lucky
what happens when more than two modes exist in data?
the data is multimodal
what is a modal class?
only happens if data is presented in classes
it is the class having the highest frequency
when is a mean or median used more than a mode?
when we want to describe a data set’s central tendency by using a single number
why would we use the relative frequency curve on a histogram?
to smooth out the shape of the sample population
describe the mean, median, and mode in a symmetrical relative frequency curve
the mean, median, and mode are all equal
describe the mean, median, and mode in a skewed to the right relative frequency curve
mean > median > mode
mode is located under the highest point of the frequency curve
mean is larger cause it also average larger values
median is resistant to extreme values but mean is not
describe the mean, median, and mode in a skewed to the left relative frequency curve
mean < median < mode
mode is located under the highest point of the frequency curve
mean is smaller cause it also average smaller values
what is the range?
Largest minus the smallest
measure
Measures the interval spanned by all the data
what is the variance
average of the squared deviations of all the population measurements from the population mean
what is the standard deviation
The square root of the population variance
Tells you how much deviation there is between the population variance
what are the there absolute measures
standard deviation
variance
range
what is the empirical rule?
- 26% of the population measurements
- 44%
- 73%
when do we use the emperical rule?
When we have a symmetrical or bell shaped distribution, this shit works really well
If a population has mean µ and standard deviation σ and is described by a normal curve
when is chebysev’s theorem practical?
With not much info, but do have mean of population (mew) and variance (sigma (sugma dicc))
Don’t use if its skewed af
what is the chebysev’s theorem formula
100(1-1/k2)%