Chpt 3- Numerically Summarizing Data Flashcards
arithmetic mean
adding all the values of a variable in a data set, then dividing by the number of observations.
(average)
median
The value that lies in the middle of the data when arranged in ascending order. (M)
If the number of observation is even, then the median is the data set value that falls at the mean of the observations between n/2 and n/2 +1 positions. (The average of the two middle values on the ascending list)
If the number of observations is odd, then the median is the data set value that falls at (n+1)/2. (The middle value on the ascending list)
resistant statistic
a numerical data summary is described as resistant if extreme values do not affect its value substantially.
In a given data set, the mean may be resistant but not the median, or visa versa.
mode
the mode of a variable is the most frequent observation(s) that occurs in the data set.
Pareto chart
bar graph that organizes data by frequency, or relative frequency.
Either ascending or descending.
Pareto chart
bar graph that organizes data by ascending or descending frequency, or relative frequency.
Side-by-side bar graph
compares data for more than one variable.
Ogive graph
a graph representing cumulative frequency or relative frequency of the data.
frequency polygon
connects data points with a line.
Range
The range (R) of a variable is the difference between the largest and smallest data values.
R = largest data value (minus) smallest data value
dispersion
the degree to which the data are spread out
population standard deviation around a variable
square root of the sum of squared deviations about the population mean, divided by
The Empirical Rule
If a distribution is roughly bell-shaped, then…
99.7% fall within 3 standard deviations of the mean.
95% fall within 2 standard deviations of the mean.
68% fall within 1 standard deviation of the mean.
modal class
The class of data that has the highest frequency
z-score
the distance that a data value is from the mean in terms of the number of standard deviations (the number of standard deviations from the mean).
Unitless
Has a mean (center) of 0 and a standard deviation (spread) of 1.
kth percentile
Pk of a set of data is a value such that k percent of the observations are less than or equal to the value.
quartiles
divide data sets into fourths.
To find quartiles…
- Arrange data in ascending order
- Determine the median, M, or Q2
- Divide the data set into halves (below M and above M)
Q1 is the median of the bottom half. Q3 is the median of the top half.
Interquartile range IQR
the range of the middle 50% of the observations.
IQR = Q3 - Q1
outliers
extreme observations in the data set.
(Median and IQR are resistant to outliers. Mean and Standard Deviation are NOT resistant to outliers.)
Fences
Fences are a way to check for outliers using quartiles.
Lower fence = Q1 - 1.5(IQR)
Upper fence= Q3 + 1.5(IQR)
Outside either fence indicates an outlier.
Five-Number Summary
Minimum, Q1, Q2 (median M), Q3, Maximum