Chapter 12 Flashcards
Descriptive statistics
used to describe the characteristics of a sample or a population
Inferential statistics
used to infer (that is, to estimate) population parameters (the value within the population) from a subgroup (a sample) of the population (recall that the value from the sample is referred to as a sample statistic)
Univariate statistics
used when we are considering one variable
bivariate statistics
used when considering two variables,
multivariate statistics
examining three or more variables.
Parametric statistics
built-in assumptions about the data distribution that must be met if the statistic is to be used
non-parametric statistics
do not have built-in assumptions about the data distribution that must be met if the statistic is to be used
frequency distribution reports
how many cases take on each value of the variable
raw frequency
actual number of cases
relative frequency
expressed as a percentage of the cases
cumulative frequency
the running tally of cases that take on the current and all preceding values of the variable
Measures of dispersion
a group of statistics that indicate how well the measure of central tendency represents the distribution,
mode
the most frequently occurring value in the distribution
variation ratio
the proportion of cases that do not fit within the modal category. variation ratio = 1 – (number of cases in modal category/number of total cases
median
the “middle”; it is the value of the observation that splits the distribution of cases in half
range
is the measure of dispersion used with ordinal-level variables. As the name implies, it is literally the range of possible values that the variable encompasses
outlier
a case that differs significantly from the others
mean
calculated by adding all values in a distribution and dividing by the total number of cases
skew the mean
pulling it in the direction of the extreme scores
standard deviation (S)
estimates the average amount that each observation differs from the mean.
standardized scores (also known as z-scores)
scores expressed as the number of standard deviations they fall from the mean of the total distribution of scores
contingency tables (also referred to as cross-tabulation tables or cross-tabs)
In these tables, the cell in which any individual case is located is contingent upon its scores for each of the variables.
scatter plots
graphs in which the point where an individual case lies is contingent upon its scores for each of the variables.
perfect correlation
exists when knowing the value on one variable always allows us to perfectly predict the value on the other.
Measures of association.
indicate the strength of the relationship with a single numerical value.
proportional reduction in error (PRE) measures.
A PRE measure is basically a before-and-after comparison: we compare the amount of error we have before knowing the value on the independent variable with the amount of remaining error after knowledge about the independent variable is taken into account. In other words, to what degree does knowledge about the independent variable reduce our error in predicting values of the dependent variable?
the regression line
the line that best fits the data
Analysis of variance (ANOVA)
a statistical technique in which the observed variance is partitioned into components (within-group and between-group variances); in political science, often used when the dependent variable is measured at the nominal or ordinal level (particularly the former) and the independent variables are measured at the interval level.
Measures of variation:
statistical measures for the dispersion of scores around the central tendency.