Chapter 1 Vocab Flashcards
Data Analysis
Organizing, displaying, summarizing and asking questions about a certain topic.
Individuals
Objects described by a set of data. Individuals may be people, animals or things.
Variable
A characteristic of an individual. A variable can take different values for different individuals.
Categorical Variable
Places an individual into one of several groups or categories. (Variables that take on values that are names or labels)
Quantitative Variable
Variables that have are measured on a numeric or quantitative scale. (Takes numerical values for which it makes sense to find an average).
Distribution
The pattern of variation of a variable. Tells us what value a variable takes and how often it takes it.
Inference
Draws conclusions that go beyond the data at hand.
Frequency Table
Displays the counts (frequencies) of stations in each format category.
Relative Frequency Table
Shows the percent (relative frequencies) of stations in each format category.
Roundoff Error
The difference between an approximation of a number used in computation and its exact value (the difference between its number value and the percentage it represents)
Pie Chart
Show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories.
Bar Graph
Represent each category as a bar. The bar heights show the category counts or percents.
Two-Way Tables
The observed number or frequency for two variables, the rows indicating one category and the columns indicating the other category.
Marginal Distribution
(Of one of the categorical values in a two-way table) is the distribution of values of that variable among all individuals described by the table. Essentially the row and column totals in a two-way table.
Conditional Distribution
Describes the values of a variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable. Also known as Ditional Distribution.
Side by Side Bar Graph
A bar graph representing 2 separate categorical values, of which are represented separately across the x-axis by different colors and are placed next to each other, making it easier to read between the two groups.
Segmented Bar Graph
Used for grouping or categorizing the parts of a whole. The bars in this chart are categorized into stacking order to represent different values. The bar segments within a category bar are placed on top of each other. Different colors will show distinctive parts of the whole bar.
Association
The relation that two variables share. The term “association” is used between two variables when knowing the value of one variable helps predict the other variable’s value.
Dotplot
A graph that displays quantitative data by showing each data value as a dot above its location on the number line.
Overall Pattern
Describes the distribution by the shape, center and spread of the data.
Departures
An individual value that falls outside the overall pattern.
Center
Center is the median and/or mean of the data.
Spread
The spread is the range of the data.
Shape
The shape describes the type of graph. The four ways to describe shape are whether it is symmetric, how many peaks it has, if it is skewed to the left or right, and whether it is uniform.
Outlier
A data point that differs significantly from other observations.
Mode
The most common value of a data set.
Symmetric Distribution
When the right and left sides of the graph are approx. mirror images of each other.
Right-Skewed Graph
Where the tail of the graph is on the right side, if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side.
Left-Skewed Graph
Where the tail of the graph is on the left side, if the left side of the graph (containing the half of the observations with larger values) is much longer than the right side.
Unimodal Dot Plot
A dot plot in which there is only one peak
Bimodal Dot Plot
A dot plot in which there are two peaks
Stemplot
A plot where each data value is split into a “leaf” (usually the last digit) and a “stem” (the other digits). Stems are written in a vertical column with the smallest at the top to the largest at the bottom, where no stem is skipped, even if there is no data value. A vertical line is written at the right of this column, for the “leaf” is to be written to the right of the line. The “leaf” on the right side are arranged in numerical order, increasing in number from the stem. (Ex: [5|2 4] represents the values of 52 and 54).
Stem
All digits but the final (ones) digit
Leaf
The final (ones) digit
Splitting Stems
A method used to more accurately represent data using a stemplot, therefore making it easier to identify the shape of the plot. Separates the “leaf” values from 0-4 and 5-9 on separate stems of the same value.
Back-to-Back Stemplot
A stemplot used for representing two sets of categorical data. This is done by representing one set of data’s “leaf” values on the left from the stem, and one set of data on the right.
Histogram
A diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval. Displays the distribution of a quantitative variable.
Mean
The average of the data set. Also known as “x bar”(x̅), and represented by the letter x with a horizontal line above it. Found by adding all of the data points and dividing by the amount of points that were added. Formula in compact notation is x̅ = ∑xi / n
Resistant Measure of Center
Identifies if the measure of center is affected by outliers. If it is affected, it isn’t a resistant measure of center. If it isn’t affected, it is a resistant measure of center.
Median
The midpoint of a distribution, the number such that about half the observations are smaller and about half are larger. Arrange all observations in order of size, from smallest to largest. If the number of observations n is odd, the median is the center observation in the ordered list. If the number of observations n is even, the median is the average of the two center observations in the ordered list.
Range
Shows the full spread of the data. Calculated by subtracting the smallest number in the data set from the largest number in the data set. Could be less accurate due to outliers.
First Quartile
The median of the observations that are to the left of the median in the ordered list.
Third Quartile
The median of the observations that are to the right of the median in the ordered list.
Interquartile Range
(Q3-Q1) Is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles.
1.5 Outlier Rule
This rule uses the first and third quartile values as well as the IQR to calculate outliers in the data set. The rule is: if a data point is less than (Q1- 1.5 x IQR) or more than (Q3+ 1.5 x IQR), it is an outlier.
Five Number Summary
A set of 5 numbers that show the overall spread and diversity of a data set. The 5 numbers are: the minimum data point, the first quartile, the median, the third quartile, and the maximum data point (in that order).
Boxplot
A graph that is formed by the 5 number summary, creating a visual on a number line that shows the quarters of the data set. A boxplot is arranged above a number line, with a central box drawn from the first quartile (Q1) to the third quartile (Q3), and a line inside the box to mark the median. Lines (called whiskers) extend from the box out to the smallest and largest observations that are not outliers. Outliers are marked with a special symbol like an asterisk.
Deviation
The distance a data point is from the mean of the set. (xi-x̄)
Variance
The expectation of the squared deviation of a random variable from its mean. The average of the squared deviations. Formula: (S2 = ∑(xi-x̄)/n-1) Variance is also represented by (s2x).
Standard Deviation
The “typical” distance of the values in the data set from the mean. Formula: (Standard Deviation = √variance) Standard deviation is also represented by (sx).