Descriptive Statistics Flashcards
Association
A relationship between two variables if knowing the value of one variable is useful (to some degree) in predicting the value of the other variable. There are three aspects of this relationship: direction, strength, and form.
Bar Graph
A graph that displays the distribution of a categorical variable.
Binary Categorical Variable
A categorical variable with only two possible categories, for example, left or right.
Bins or Classes
These are the subintervals of equal length that are used in constructing a histogram.
Bivariate Data
Data for which there are two variables for each observation (for example: x and y)
Boxplot
The graph that illustrates the five-number summary. The boxes are drawn between the quartiles and median. The whiskers extend from the quartiles io the minimum or maximum.
Categorical Variable
A variable that records a group designation such as gender or type of vehicle.
Causation
A relationship between two variables that goes a step farther than correlation, stating that a change in the value of the x variable will cause a change in the value of the y variable.
Consistency
This refers to how variable, or spread out, the values in a dataset are for a quantitative variable.
Correlation Coefficient
A number that measures the degree to which two quantitative variables are associated, generally donated by r.
Data
The numbers or categories recorded for the observational units in a study.
Direction
One of the three aspects of association between quantitative variables which refers to whether greater values of one variable tend to occur with greater values of the other variable (positive association) or with smaller variables of the other variable (negative association).
Distribution
The pattern of variation of a variable; with a categorical, distribution means the variable’s possible categories and the proportion of responses in each.
Dotplot
A display of the distribution of relatively small data sets where each data point is represented by a dot.
Empirical Data
With mound-shaped, symmetric distributions, approximately 68% of the observations fall within one standard deviation of the mean, approximately 95% of the observations fall within two standard deviations of the mean, and approximately 99.7% fall within three standard deviations of the mean.
Extrapolation
An attempt to predict the response variable for values of the explanatory variable beyond those contained in the data.
Five-number Summary
The minimum value, lower quartile, median, upper quartile, and maximum value.
Frequency or Count
The number of observational units in a subinterval.
Histogram
A graphical display similar to a dotplot or stemplot, but more feasible when displaying very large data plots. Bars are constructed whose height correspond to the frequency in each subinterval.
Influential (Observation)
When removing an observation from a data set substantially changes the least squares regression equation, the observation is considered influential. Typically, observations that have extreme explanatory (x) value (far below or far above the sample the sample mean) have potential to be influential.
Intercept Coefficient
(Y-Intercept)
The predicted value of the response (y) variable when the explanatory (x) variable has a value of 0 when the least square regression line is used.
Interquartile Range
The difference between the upper quartile and lower quartile.
Least Squares Regression Line
The line that achieves the exact minimum value of the sum of the squared residuals.
Lower Quartile
The 25th percentile or the value such that 25% of the observations fall below it and 75% fall above it.
Mean
The arithmetic average or balance point of a distribution.
Median
The middle value in a distribution; often considered the “typical” value.
Mode
The most commonly occurring value in a distribution.
Modified Boxplot
A specialized boxplot that conveys additional information by treating outliers differently. On these graphs, you mark outliers using a special symbol and then extend the boxplot’s whiskers only to the most extreme non outlier value.
Mound Shaped Distribution
Single peak is at the center of the distribution.
Observational Unit
The person or thing to which the variable number or category is assigned such as a student in your class.
Predictor Variable
The explanatory variable
Population
This refers to the entire group of people or objects (observational units) or interest.
Proportion
A fraction between 0 and 1, possibly including 0 and 1.
Quantitative Variable
A variable that measures a numerical characteristic such as height or weight.
Relative Frequency or Proportion
The fraction of observational units in a subinterval.
Research Question
A question that looks for patterns in a variable or compares a variable across different groups or looks for a relationship between variables.
Residual
The difference between the observed y-value and the y-value predicted by your line for the corresponding x-value. The residual indicates the vertical distance from an observation to the regression line.
Residual = observed - fitted
Resistant
When a measure’s value is relatively unaffected by the presence of outliers.
Sample Size
The number of observational units studied in a sample.
Scatterplot
The simplest graph for displaying two quantitative variables simultaneously using a vertical axis for the response variable and the horizontal axis for the explanatory variable.
Side-by-Side Stemplot
A stemplot that is used to compare two sets of data where a common set of stems is placed in the middle of the display with leaves out in either direction.
Skewed Left
The tail of the distribution follows the smaller values towards the left.
Skewed Right
The tail of the distribution follows the larger values towards the right.
Slope Coefficient
The predicted change in the response (y) variable associated with a one-unit increase in the explanatory (x) variable when using the least square regression line.
Split Stemplot
This type of stemplot is used when there are too few stems and important details can be lost because the data points are clumped together. A split stemplot that displays each stem twice, where the 0-4 leaves appear on the first stem and the 5-9 leaves appear on the second.
Standard Deviation
The typical distance that a data value in a distribution differs from the mean of the sample.
Statistical Tendency
This refers to the observational units in one group being more likely to be in a certain category (for a categorical variable) or to have higher values (for a quantitative variables) than those in another group.
Strength
One of the three aspects of the association between quantitative variables which indicates how closely the observations follow the relationship between the variables. In other words, the strength of the association reflects how accurately you could predict the value of one variable based on the value of the other variable.
Sum of Squared Residuals
Used in determination of the line of best fit it is the sum of the squares of the residuals. The line with the smallest sum is the line of best fit.
Symmetric Distribution
Left side of a distribution is roughly a mirror of the right side.
Upper Quartile
The 75th percentile or the value such that 75% of the observations fall below it and 25% fall above it.
Variable
Any characteristic of a person or thing that can be assigned a number or category.
Variability
The phenomenon of a variable taking on different values or categories from one observation unit to another.