Ch 2 Descriptive Statistics Flashcards
ordered array
data arranged from smallest to largest (usually)
relative frequencies
the proportion of values falling into a class interval. We divide the number of values in each category by the total number of values
experimental probability or empirical proability
interpreting the relative frequencies as the probablity of occurence within a given interval
frequency histogram and frequency polygon
special types of bar and line graphs
cut points
points on the horizontal axis where the bars meet
stem and leaf displays
bears a strong resemblance to the histogram and served the same purpose
statistic
descriptive measure computed from a sample
parameter
descriptive measure computed from population
measures of central tendency
mean, median and mode
(arithmetic mean)
average
first property of a mean
uniqueness, for a given set of data, there is exactly one arithmetic mean
second property of a mean
simplicity, the arithmetic mean is easily understood easy to compute
third property of a mean
since each and every value in a set of data enters into the computation of the mean and, in some cases, can so distort it that it becomes undesirable as a measure of central tendency
outliers(extreme values)
values that deviate appreciably from most of the measurements in a data set
robust estimators
estimators that are insensitive to outliers
trimmed mean
a robust estimator of central tendency
median
value that divides the ordered array into 2 equal parts
first property of the median
uniqueness, as with mean, there is a unique median for a given set of data
second property of the median
simplicity, the median is easy to calculate
third property of the median
robustness, it’s not as drastically affected by extreme values like the mean
mode
the value that occurs most frequently, if all the data items are different, there is no mode
skewness
classification of data distributions on the basis of whether they are symmetric or asymetric
symmetric
the left half of its graph (histogram or frequncy polygon) will be a mirror image of its right half
skewed distribution
if the graph of a distribution is asymmetric
skewed to the right, positively skewed
graph has long tail to the right
skewed to the left, negatively skewed
graph has long tail to left
measures of dispersion
describe the variation, spread and scatter of the distribution
range
difference between the largest and smallest values in a set of observations
variance
measures dispersion based on how the data points are scattered about the mean
standard deviation (SD)
square root of the variance, has the same units of the data
coefficient of variation
used for comparing the variation of 2 or more distributions
five number summary
given a set of n observations x1, x2,….xn, the pth percentile P is the value of X such that p percent or less of the observations are greater than P
interquartile range (IQR)
difference between the third and first quartiles
box and whisker plots (boxplots)
graphical representation of the five number summary
kurtosis
measure of the degree to which a distribution is “peaked” or flat in comparison to a normal distribution whose graph is characterized by a bell shaped distribution