IDE 620 Week 2 Flashcards
A way of depicting frequency distributions for categorical (nominal) variables, such as religious affiliation, ethnic group, or state of residence
Bar Graph
Note that the bars do not touch in a bar graph, as they do in a histogram.
A distribution having two modes or peaks.
Bimodal Distribution
Strictly speaking, for a distribution to be called bimodal, the peaks should be the same height. However, it is quite common to call any two-humped distribution bimodal, even when the high points are not exactly equal.
Any of several methods, pioneered by John Tukey, of discovering unanticipated patterns and relationships, often by presenting quantitative data visually. The stem-and-leaf display and the box-and-whisker diagram are well-known examples.
Exploratory Data Analysis (EDA) `
A tally of the number of times each score occurs in a group of scores. More formally, a way of presenting data that shows the number of cases having each of the attributes of a particular variable.
Frequency Distribution
A graph with frequency shown by the height of contiguous bars, used for variables measured at the interval and ratio levels.
Histograph
Because the data in a histogram are interval or ratio, the bars should touch; in a bar graph for nominal or ordinal data, the bars do not touch.
A graphic depiction of data relying on one or more lines.
Line graph
The lines can be linear or curvilinear. For example, a line graph of the business cycle over the past 50 years might be plotted on a line graph.
The most common (most frequent) score in a set of scores.
Mode
A distribution of scores or measures that, when plotted on a graph, produce a nonsymmetrical curve.
Skewed Distribution
A positively (or upward or right) skewed distribution is one in which the infrequent scores are on the high or right side of the x-axis, such as the scores on a difficult test. A left (or downward or negatively skewed) distribution is one in which the rare values are on the low or left side of the x-axis, such as the scores on an easy test. One way to sort out which is which is to remember that a skewer is a pointy thing; when the pointy end of the distribution is on the right, it is right skewed, and conversely for left skewed
The points falling between half a measurement unit below and half a unit above the number.
Real Limits (of a Number)
The degree to which measures or scores are bunched on one side of a central tendency and trail out (become pointy, like a skewer) on the other.
Skewness
The more skewness in a distribution, the more variability in the scores.
Computer programs often compute indexes of skewness. Positive values indicate a positive or right skew. Negative values indicate a negative or left skew.
A way of recording the values of a variable, created by John Tukey, that presents raw numbers in a visual, histogram-like display. It is a histogram in which the bars are built out of numbers.
Stem-and-Leaf Display
A distribution with only one mode
Unimodal
Any of several statistical summaries that, in a single number, represent the typical or average number in a group of numbers. Examples include the mean, mode, and median.
Measures of central tendency
A batting average is a well-known measure of central tendency in the United States. A grade point average might be a more important example for many college students.
A variable that can take on many possible values
Continuous variable
A variable that takes on only a few possible values
Discrete variable
The variable you are measuring
Dependent variable
The variable that you manipulate
Independent variable
Person who created stem-and-leaf display and EDA (exploratory data methods)
John Tukey
Leftmost digits of a number
Leading digits (most significant digits)
Vertical axis of na stem-and-leaf display
Stem
Digits to the right of the leading digits
Trailing digits (less significant digits)
Horizontal axis of display containing the trailing digits
Leaves
Average; most popular measure of location or central tendency; has the desirable mathematical property of minimizing the variance.
Mean
computed by taking the nth root of the product of n scores (e.g., the square root of 2 scores, the cube root of 3, etc.).
Geometric mean
calculated by dividing n by the sum of the reciprocals of the numbers.
Harmonic mean (smaller than the arithmetic and geometric mean)
The middle score or measurement in a set of ranked scores or measurements; the point that divides a distribution into two equal halves; the 50th percentile.
Median
The most common (most frequent) score in a set of scores.
Mode
A mean computed after removing the extreme observations.
Trimmed Mean
Thus, a trimmed mean is a measure of central tendency that allows the researcher to deal separately with a distribution’s outliers.
Anything that produces systematic error in a research finding; causes of bias can range from poor data collection to flawed measurements to inappropriate statistical analysis. While the distortions due to systematic error continue to grow in the long run, random errors tend to balance out in the long run.
Bias
The effects of any factor that the researcher did not expect to influence the dependent variable.
Bias `
A type of graph in which boxes and lines show a distribution’s shape, central tendency, and variability. The “boxplot,” as it is often called, gives an informative picture of the values of a single variable and is helpful for indicating whether a distribution is skewed and has outliers.
Box-and-whisker diagram
The upper and lower boundaries of each box in a box-and-whisker diagram
Hinges
The number of values “free to vary” when computing an inferential statistic. It’s the number of pieces of information that can vary independently of one another or, alternatively stated, the number of unconstrained observations used in calculating an estimate
Degrees of freedom
A statistic showing the amount of variation or spread in the scores for, or values of, a variable.
Measure of dispersion
When the dispersion is large, the scores or values are widely scattered; when it is small, they are tightly clustered. Two commonly used measures of dispersion are the variance and the standard deviation. A measure of dispersion always implies the presence of a measure of central tendency, such as a mean. For example, the standard deviation measures deviation from the mean.
The mean value of a variable in repeated samplings or trials
Expected value
The mean of the sampling distribution of a statistic.
Expected value
The middle half of a distribution. A measure of dispersion calculated by taking the difference between the first and third quartiles (that is, the 25th and 75th percentiles). Also called “midspread.”
Interquartile Range (IQR)
A subject or other unit of analysis that has an extreme value on a variable or a combination of variables or has a large residual value.
Outlier
Outliers are important because they can distort the interpretation of data or make misleading a statistic that summarizes values (such as a mean). Outliers may also indicate that a sampling error has occurred by including a case from a population different than the target population.
Divisions of the total rank-ordered cases or observations in a study into four groups of equal size. Technically, the three points that divide a series of ordered scores into four groups.
Quartiles
A measure of variability, of the spread or the dispersion of values in a series of values.
Range
To get the range of a set of scores, you subtract the lowest value or score from the highest.
A statistic that shows the spread, variability, or dispersion of scores in a distribution of scores. It is a measure of the average amount the scores in a distribution deviate from the mean.
Standard deviation
The standard deviation is the square root of the variance. As a variable, it is symbolized as SD, Sd, s, or lowercase sigma (σ).
A measure of the spread of scores in a distribution of scores, that is, a measure of dispersion.
Variance
The larger the variance, the farther the individual cases are from the mean. The smaller the variance, the closer the individual scores are to the mean.
Specifically, the variance is the mean of the sum of the squared deviations from the mean score divided by number of scores. That is, it’s the average distance from the mean in squared units. (See sum of squares for an example.) Taking the square root of the variance gives you the standard deviation (i.e., it converts the variance into regular, nonsquared units). A variance cannot be less than zero, nor can the standard deviation.