Summer Work Vocab Flashcards
individuals
the objects described by a set of data. Individuals may be people, animals, or things.
variable
any characteristic of an individual. A variable can take different values for different individuals.
categorical variable
places an individual into one of several groups or categories.
quantitative variable
takes numerical values for which it makes sense to find an average.
discrete variables
a variable can’t take on any value between its minimum value and its maximum value
continuous
a variable can take on any value between its minimum value and its maximum value
univariate data
a study that looks at only one variable
bivariate data
a study that examines the relationship between two variables
population
the total set of observations that can be made
sample
a set of observations drawn from a population
census
a study that obtains data from every member of a population
distribution
what values the variable takes and how often it takes these values.
inference
drawing conclusions that go beyond the data at hand.
frequency table
a table that shows frequency counts for a categorical variable
relative frequency table
a table that shows relative frequencies for different categories of a categorical variable
roundoff error
the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision
pie chart
show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories. A pie chart must include all the categories that make up a whole.
bar graph
represent each category as a bar. The bar heights show the category counts or percents.
two-way table
organizes data about two categorical variables measured for the same set of individuals.
marginal distribution
the distribution of values of that variable among all individuals described by
the table.
conditional distribution
describes the values of that variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.
segmented bar graph
one kind of stacked bar chart, but each bar will show 100% of the discrete value
side-by-side bar graph
bar chart where segments are placed next to each other
association
knowing the value of one variable helps predict the value of the other.
Simpson’s paradox
a trend appears in several different groups of data but disappears or reverses when these groups are combined
dotplot
a graph where each data value is shown as a dot above its location on a number
shape
describes the visual pattern of a distribution
mode
most common value
center
midpoint of a set of values
spread
how much the data varies
range
how spread the data is
outlier
an individual value that falls outside the overall pattern.
symmetric
if the right and left sides of the graph are approximately mirror images of each other.
skewed right
the right side of the graph (containing the half of the observations with larger values) is much longer than the left side.
skewed left
the left side of the graph is much longer than the right side.
unimodal
a distribution having a single peak
bimodal
a distribution having 2 peaks
multimodal
distributions with more than 2 peaks
stemplot
separate each observation into a stem and a one-digit leaf.
splitting stems
dividing information in stemplots into multiple “stems” or components
back-to-back stemplots
a method for comparing two data distributions by attaching two sets of ‘leaves’ to the same ‘stem’ in a stemplot
histogram
plot the counts (frequencies) or percents (relative frequencies) of values in equal-width classes.
mean
measure of center found by adding a set of values
and dividing by the number of observations.
median
The median is the midpoint of a distribution, the number such that about half the observations are smaller and about half are larger.
interquartile range (IQR)
The interquartile range (IQR) measures the range
of the middle 50% of the data.
five-number summary
The five-number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest.
boxplot
a graph made by the five number summary of a distribution
standard deviation
measures the typical distance of the values in a distribution from the mean. It is calculated by finding an average of the squared deviations and then taking the square root.
variance
the squared standard deviation