Chapter 1 - Exploring Data Flashcards
displays the counts of stations in each format category
frequency table
The distribution of a categorical variable lists the categories and gives either the ____ or the ____ of individuals who fall within each category
count; percent
shows the percentage of stations in each former category
relative frequency table
____ or ____ are used to display the distribution of a categorical variable more vividly
pie chart or bar graph
distribution of values of that variable among all individuals described by the table
marginal distribution
describes the relationship between two categorical variables
conditional distribution
knowing the value of one variable helps predict the value of the other
association
the count of the distributions of a categorical variable
frequency
percent of individuals that fall within each category
relative frequency
used to summarize large amounts of information by grouping outcomes into categories
two way table
right side of the graph is much longer than the left side
skewed to the right
skewed to the left
left side of the graph is much longer than the right side
when a graph has a single peak it is
unimodal
when a graph has two clear peaks
bimodal
graphical display for small data sets that give us the shape of the distribution with the numerical values in the graph
stem plot
a graph where nearby values are grouped together in order for a clearer distribution
histogram
notation that refers to mean of a sample
x bar
notation that refers to mean of population
mew
midpoint of a distribution
median
measures the range of the middle 50% of the data given
iqr
distribution consisting of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest
five number summary
______ graphs can be formed using the five number summary
box plots
distance of values in a distribution from the mean
standard deviation
the average squared deviation after finding an average of the squares deviations and then taking the square root
variance
use ____ and ____ only for reasonably symmetric distributions that don’t have outliers
mean and standard deviation
use ____ and ____ for describing a skewed distribution or a distribution with strong outliers
median and IQR
Four step process to organize a statistics problem
state
plan
do
conclude