Descriptive Statistics Flashcards
Association
A relationship between two variables if knowing the value of one variable is useful (to some degree) in predicting the value of the other variable. There are three aspects of this relationship: direction, strength, and form.
Bar Graph
A graph that displays the distribution of a categorical variable.
Binary Categorical Variable
A categorical variable with only two possible categories, for example, left or right.
Bins or Classes
These are the subintervals of equal length that are used in constructing a histogram.
Bivariate Data
Data for which there are two variables for each observation (for example: x and y)
Boxplot
The graph that illustrates the five-number summary. The boxes are drawn between the quartiles and median. The whiskers extend from the quartiles io the minimum or maximum.
Categorical Variable
A variable that records a group designation such as gender or type of vehicle.
Causation
A relationship between two variables that goes a step farther than correlation, stating that a change in the value of the x variable will cause a change in the value of the y variable.
Consistency
This refers to how variable, or spread out, the values in a dataset are for a quantitative variable.
Correlation Coefficient
A number that measures the degree to which two quantitative variables are associated, generally donated by r.
Data
The numbers or categories recorded for the observational units in a study.
Direction
One of the three aspects of association between quantitative variables which refers to whether greater values of one variable tend to occur with greater values of the other variable (positive association) or with smaller variables of the other variable (negative association).
Distribution
The pattern of variation of a variable; with a categorical, distribution means the variable’s possible categories and the proportion of responses in each.
Dotplot
A display of the distribution of relatively small data sets where each data point is represented by a dot.
Empirical Data
With mound-shaped, symmetric distributions, approximately 68% of the observations fall within one standard deviation of the mean, approximately 95% of the observations fall within two standard deviations of the mean, and approximately 99.7% fall within three standard deviations of the mean.
Extrapolation
An attempt to predict the response variable for values of the explanatory variable beyond those contained in the data.
Five-number Summary
The minimum value, lower quartile, median, upper quartile, and maximum value.
Frequency or Count
The number of observational units in a subinterval.
Histogram
A graphical display similar to a dotplot or stemplot, but more feasible when displaying very large data plots. Bars are constructed whose height correspond to the frequency in each subinterval.
Influential (Observation)
When removing an observation from a data set substantially changes the least squares regression equation, the observation is considered influential. Typically, observations that have extreme explanatory (x) value (far below or far above the sample the sample mean) have potential to be influential.
Intercept Coefficient
(Y-Intercept)
The predicted value of the response (y) variable when the explanatory (x) variable has a value of 0 when the least square regression line is used.