Descriptive Statistics Flashcards
mean
average of a set of measurements
what is the most common metric to describe LOCATIONN of a frequency distribution
mean
sample mean
average of the measurements in the sample
how to calculate the sample mean
sum of all the observations divided by the number of observations
standard deviation (s)
measures how far from the mean the observations typically are
most used measurement of distribution spread
standard deviation
what does a large vs small standard deviation indicate about the data
large = most of the observations are far from the mean
small = most observations are close to the mean
how to calculate the standard deviation
the square root of the variance
can the standard deviation be negative
NO
variance (s2) formula
what is the deviation
difference between a measurement and the mean
what will the average of the deviations be
zero
how to find the variance
square the deviations
how is the standard deviation often expressed
relative to the mean
coefficient of variation (CV)
calculates the standard deviation as a percentage of the mean
higher vs lower coefficient of variation
higher = more variability relative to the mean
lower = individuals are more consistently the same relative to the mean
when does the coefficient of variation only make sense
when all measurements are greater than or equal to zero
formula for coefficient of variation
divided the standard deviation by the sample mean
what is the sample size in a frequency table
the frequency total NOT the number of rows (this total is 395)
median
middle observation in a set of data
how is the median often displayed
a box plot
how to calculate the median
sort the sample observations from smallest to largest
(odd number of observations = middle number)
(even number of observations = average of middle pair)
quartiles
values that partition data into quarters
what is the first, second and third quartiles
first: Middle value of the measurement below the median
second: the median
third: Middle value of the measurements LARGER than the median
interquartile range (IQR)
Span of the middle half of the data from the first quartile to the third quartile
how to calculate the IQR
Compute the first and third quartiles
Then subtract them
Box plots display
the median and interquartile range
what do the lower and upper edges of a box plot represent
first and third quartiles
what is the interquartile range in a box plot
the span of the box
what is the horizontal line dividing a box in box plot
the median
how are extreme values shown in a box plot
by a dot or line
what are examples of measures of spread
standard deviation
interquartile range
what are examples of measures of location
mean and median
when are the mean and standard deviation LESS informative than the median and interquartile range
data is strongly skewed or have extreme observations
when data is strongly skewed or have extreme observations what measures are LESS informative
mean and standard deviation
median vs mean
median is the middle measurement of a distribution BUT the mean is the center of gravity
is the mean sensitive to extreme values
YES
is the median sensitive to extreme values
Less so than the mean
is the standard deviation sensitive to extreme values
YES
is the interquartile range affected by extreme values
NO
percentile vs quantiles
percentile - value below which the X percent of the individuals lie
(50th percentile = median = half the data OR 25th percentile = first quartile)
quantiles - proportion less than or equal to the given value (represented by decimal)
(10th percentile = 0.10 quartile
Median = 0.5 quartile)
most used descriptive stats for a categorical variable
proportions
what is the analogues statement for the mean, standard deviation, median and interquartile range
mean : Standard deviation :: median : interquartile range
how is the Coefficient of variation beneficial in comparisons
allows us to compare variables with different units because the CV is unitless
how are measures of mean and median similar
both describe location of frequency distribution
what is the purpose of the median
to partition ordered measures into two halves
are the mean, median, st.d and IQR similar or different when the distribution is symmetrical and unimodal
give similar information
what parameter gives the bulk of information when data is skewed
the median because the mean is pulled towards the outliers and away from the bulk of data
is standard deviation more or less sensitive to extreme values than the mean
MORE sensitive
why is the st.d more sensitive to extreme values than the mean
because extreme values of large deviations which when squared amplify that effect
when is IQR better for data and when is the STDEV
IQR - regarding the MAIN part of the data
STDEV - better for information of ALL data in the distribution (with the spread)
how do you calculate a proportion
number of category divided by total number in all categories
what does the proportion (P-hat) estimate
estimate of true population proportion (p)
what must the proportion sum to
1