Chapters Two-Eight: Vocabulary Flashcards
The What
The variables (or labels)
The Who
The subject(s)
Quantitative Variables
Data that is measured in units
Area Principle
The area populated by a part of the graph that corresponds to the magnitude of the value it represents.
Marginal Distribution
The frequency distribution of one of the variables in a margin.
Conditional Distribution
The distribution of one variable for just those cases that satisfy a condition on another variable.
Distribution
1) gives the possible values of a variable
2) relative frequency of each value
Area Principle
Each data value should be represented by the same amount of area.
Categorical data condition
Displaying and describing categorical data.
Contingency Table
Displays counts and, sometimes, percentages of individuals falling into named categories of two or more variables.
Simpson’s Paradox
When averages are taken from different groups and they appear to contradict the overall averages.
How do you describe the shape of a histogram?
1) distinguish how many modes
2) determine whether it’s symmetrical or skewed
3) find any outliers or gaps
Mode
The hump or high point in a histogram. It can be bimodal or unimodal.
Uniform
Distribution that is basically even.
Median
The middle value of data. Normally used for a skewed graph with the interquartile range.
Range
The difference between between the maximum and minimum.
Interquartile Range
Difference between the first and third quartiles.
What does the 5-Number Summary describe?
The minimum, Q1, the median, Q3, and the maximum
Mean
Found by summing all of the data by dividing by the count
Variance
The sum of squared deviations from the mean, divided by n-1
Standard Deviation
Usually reported with the mean. It is the square root of the variance.
Outlier
Any point more than 1.5 IQR from either end of the box in a box plot.
Far Outlier
If a point is 3.0 IQR from either end of the boxplot
Standardizing
Used to eliminate units. Standardized values can be compared even if the original variables had different units.
How do you find a standardized value?
By subtracting the mean an dividing by the standard deviation.
Shifting
Adding a constant to each data value adds the same constant to the mean, median, and quartiles. It does not change the standard deviation or IQR.
Rescaling
Multiplying each data value by a constant mulitplies both the measures of position and the measures of spread.
Parameter
A numerically valued attribute of a model.
What does a z-score tell?
How many standard deviations a value is from the mean.
Nearly Normal Condition
The model is unimodal and symmetric.
What does the association of a plot show?
1) direction
2) form
3) strength
Correlation Coefficient
“r” is a numerical measure of the direction and strength of a linear association.
Lurking Variable
A variable other than x and y that simultaneously affects both variables, accounting for the correlation between the two.
Predicted Value
“y” found for a given x-value in the data. The predicted values (x, ^y) all fit exactly on the line.
Residuals
The predicts value subtracted by the original value.
Regression to the Mean
Because the correlation is always less that 1.0, the predicted ^y tends to be fewer standard deviations from its mean than “x” was from its mean.
Line of Best Fit
Housetop y = B(of 0) + (B(of 1) • x)
B(of 0) = y-intercept
B(of 1) = slope
Slope
Given in “y-units per x-unit.” Changes of one unit in x are associated with changes of of B(of 1).