HW3 CH2 - measure of center, variation, 5 # Sum, Box Plots Flashcards
Define the measure of center
Descriptive measure that reveals the center or most typical values of a data set
What is a sample mean?
sum of all values divided by the total number of observations in the data set
how do you obtain the sample mean?
add all the data and divide it by how much data there is
what is the symbol for sample mean?
x with a line above it
what is the symbol for population mean?
the u with a tail, “mu”
what is a median?
A number that divides the top 50% of the data from the bottom 50%
how do you find the median?
rearrange numbers from least to greatest, odd # is in the middle, even # is (add both middle #’s)/2
what is mode?
the value that occurs the most often in the data set, frequency > 1
Is it possible for a data set to have 2 or more mode? (T/F)
yes
what is resistant measure?
a measure is robust (resistant) if extreme values have little to no influence on its outcome
what is a robust measure, mean or median?
median
What is measures of Variation (Dispersion)?
descriptive measures that describe how much variation or “spread” there is in a data set
what is range?
The difference between the largest observation and the smallest observation
what are the disadvantages of range?
- measure is based only on 2 values
- not resistant: highly susceptible to outliers
what is deviation?
The difference between an observation and the mean
what is a sample standard deviation?
Roughly on average, the difference between an observation and the mean
Is range resistant?
no
Does range show how spread out the data is?
Yes
is standard deviation robust?
no
Why transform data?
changing units, making the shape symmetric, make the relationship between 2 variables linear
define parameter
numerical summary oof the population
define statistic
numerical summary of the sample
define quartiles
this divides the data set into 4 equal parts
What is the interquartile range?
the difference between the third and the
What is the 5 number summary?
is consists of the info:
1. minimum value
2. first quartile
3. second quartile (median)
4. third quartile
5. maximum value
What is an outlier
A value that is distant from other observations in the data set
define a boxplot
a graph that displays the distribution of a data set using the 5 number summary , which we can easily see the outliers
what advantage does histogram have against boxplot?
displays more information about the distribution of a data set
Define Dot plot
a graphical display of data using dots (dot = value in data set) limit value grouping
define stem and leaf plot
a table in which each possible value is split into a “stem” (1st digit) and “leaf” (last digit)
What are the advantages of stem leaf and dotplot?
displays all possible values in the data set
what are the disadvantages in the stem leaf and dotplot?
When the data set is large this will not be informative, use a histogram instead
what is a histogram?
a graph is drawn using vertical bars.
bar height = frequency
what does a frequency histogram
what do outliers affect?
Mean and standard deviation (not resistant measures)
what is the degrees of freedom?
n-1 of the sample variance
Name the 4-step process to organize a statistical problem
state: what is the practical question?
plan: what specific statistical operations does this problem call for?
solve: analyze the data with graphs and computations
conclude: give your practical conclusion
The mean is a measure of center whereas the standard deviation measures the ____________ of data about the mean.
variability
The line in the box of a boxplot marks where the __________ is.
median
Standard Deviation measures…
variability of data about the mean or the difference between an observation and the mean
what is deviation?
The difference between an observation and the mean xi - x
how do you figure if a sample is an outlier?
if it is within the upper limit or the lower limit calculations (Q1+1.5 x IQR) and (Q3+1.5 x IQR)
How do you find the interquartile range?
Q3 - Q1 = IQR