Data Vocabulary Flashcards
Association
A connection between data values.
Bivariate data
Pairs of linked numerical observations. Example: a list of heights and weights for each player on a football team.
Box Plot
A method of visually displaying a distribution of data values by using the median, quartiles, and extremes of the data set. A box shows the middle 50% of the data.
Box-and-Whisker Plot
A diagram that shows the five-number summary of a distribution. (Five-number summary includes the minimum, lower quartile (25th percentile), median (50th percentile), upper quartile (75th percentile), and the maximum. In a modified box plot, the presence of outliers can also be illustrated.
Categorical Variables
Categorical variables take on values that are names or labels. The color of a ball (e.g., red, green, blue), gender (male or female), year in school (freshmen, sophomore, junior, senior). These are data that cannot be averaged or represented by a scatter plot as they have no numerical meaning.
Center
Measures of center refer to the summary measures used to describe the most “typical” value in a set of data. The two most common measures of center are median and the mean.
Conditional Frequencies
The relative frequencies in the body of a two-way frequency table.
Correlation Coefficient
A measure of the strength of the linear relationship between two variables that is defined in terms of the (sample) covariance of the variables divided by their (sample) standard deviations.
Dot plot
A method of visually displaying a distribution of data values where each data value is shown as a dot or mark above a number line.
First Quartile
(Q1) The “middle value” in the lower half of the rank-ordered data
Five Number Summary
Minimum, lower quartile, median, upper quartile, maximum.
Histogram
Graphical display that subdivides the data into class intervals and uses a rectangle to show the frequency of observations in those intervals—for example you might do intervals of 0-3, 4-7, 8-11, and 12-15
Interquartile Range
A measure of variation in a set of numerical data. The interquartile range is the distance between the first and third quartiles of the data set. Example: For the data set {1, 3, 6, 7, 10, 12, 14, 15, 22, 120}, the interquartile range is 15 – 6 = 9.
Joint Frequencies
Entries in the body of a two-way frequency table.
Line of Best Fit
A straight line that best represents the data on a scatter plot. This line may pass through some of the points, none of the points, or all of the points. Remind students that an exponential model will produce a curved fit. (also called trend or regression line).