Chapter 2 & 3- Data Types and Visualization Flashcards
frequency distribution
a summary display for a distribution of data organized or summarized in terms of how often a category, score, or range of scores occurs.
simple frequency distribution
a summary display for (1) the frequency of each individual score or category (ungrouped data) in a distribution or (2) the frequency of scores falling within defined groups or intervals (grouped data) in a distribution.
Grouped data
a set of scores distributed into intervals, where the frequency of each score can fall into any given interval.
interval
a discrete range of values within which the frequency of a subset of scores is contained.
To construct a simple frequency distribution for grouped data (3 steps)
Step 1: Find the real range.
Step 2: Find the interval width.
Step 3: Construct the frequency distribution.
Interval width= Real range/ number of intervals
real range
one more than the difference between the largest and smallest values in a data set
interval width or class width
the range of values contained in each interval of a grouped frequency distribution.
Interval boundaries
the upper and lower limits for each interval in a grouped frequency distribution.
The lower boundary is the smallest value in each interval, and the upper boundary is the largest value in each interval.
four rules for creating a simple frequency distribution:
Each interval is defined (it has a lower and upper boundary). Intervals such as “or more” or “less than” should not be expressed.
Each interval is equidistant (the interval width is the same for each interval).
No interval overlaps (the same score cannot occur in more than one interval).
All values are rounded to the same degree of accuracy measured in the original data (or to the ones place for the data listed in Table 2.1).
open interval, or open class
an interval with no defined upper or lower boundary
Outliers
extreme scores that fall substantially above or below most of the scores in a particular data set.
cumulative frequency distribution
a summary display that distributes the sum of frequencies across a series of intervals.
relative frequency distribution
a summary display that distributes the proportion of scores in each interval. It is computed as the frequency in each interval divided by the total number of frequencies recorded.
proportion
a part or portion of all measured data. The sum of all proportions for a distribution of scores is 1.0.
relative percent distribution
a summary display that distributes the percentage of scores occurring in each interval relative to all scores distributed.
cumulative relative frequency distribution
a summary display that distributes the sum of relative frequencies across a series of intervals.
cumulative percent distribution
a summary display that distributes the sum of relative percents across a series of intervals.
Cumulative relative frequencies and cumulative percents are a sum of the proportion and percent of scores, respectively, across intervals. These sum to 1.00 or 100%, respectively.
percentile point
the value of a score on a measurement scale below which a specified percentage of scores in a distribution fall.
percentile rank
the percentage of scores with values that fall below a specified score in a distribution.
Ungrouped data
a set of scores or categories distributed individually, where the frequency for each individual score or category is counted
To construct a histogram, we follow three rules:
Rule 1: A vertical rectangle represents each interval, and the height of the rectangle equals the frequency recorded for each interval. This rule implies that the y-axis should be labeled as a number or count. The y-axis reflects the frequency of scores for each interval.
Rule 2: The base of each rectangle begins and ends at the upper and lower boundaries of each interval. This rule means that histograms cannot be constructed for open intervals because open intervals do not have an upper or a lower boundary. Also, each rectangle should have the same interval width.
Rule 3: Each rectangle touches adjacent rectangles at the boundaries of each interval. Histograms are used to summarize continuous data, such as the time (in months) it takes to find employment. The adjacent rectangles touch because it is assumed that the data are continuous. In other words, it is assumed that the data were measured along a continuum.
histogram
a graphical display used to summarize the frequency of continuous data that are distributed in numeric intervals (grouped).
frequency polygon
a dot-and-line graph used to summarize the frequency of continuous data at the midpoint of each interval.
ogive
a dot-and-line graph used to summarize the cumulative percent of continuous data at the upper boundary of each interval.
bar chart, or bar graph
a graphical display used to summarize the frequency of discrete and categorical data that are distributed in whole units or classes.
pie chart
a graphical display in the shape of a circle that is used to summarize the relative percent of discrete and categorical data into sectors.
Measures of central tendency
statistical measures for locating a single score that is most representative or descriptive of all scores in a distribution.
population size
the number of individuals who constitute an entire group or population. The population size is represented by a capital N.
sample size
the number of individuals who constitute a subset of those selected from a larger population. The sample size is represented by a lowercase n.
population mean
sample mean
sum of N scores (x) divided by N
the sum of n scores (x) divided by n
weighted mean
(denoted Mw) is the combined mean of two or more groups of scores in which the number of scores in each group is disproportionate or unequal.
median
the middle value in a distribution of data listed in numeric order
mode
the value in a data set that occurs most often or most frequently
normal distribution
(also called the symmetrical, Gaussian, or bell-shaped distribution) is a theoretical distribution in which scores are symmetrically distributed above and below the mean, the median, and the mode at the center of the distribution.
skewed distribution
a distribution of scores that includes outliers or scores that fall substantially above or below most other scores in a data set
positively skewed distribution
a distribution of scores that includes one or a few scores that are substantially larger (toward the right tail in a graph) than most other scores.
negatively skewed distribution
a distribution of scores that includes one or a few scores that are substantially smaller (toward the left tail in a graph) than most other scores.
modal distribution
a distribution of scores in which one or more scores occur most often or most frequently.
unimodal distribution
a distribution of scores in which one score occurs most often or most frequently. A unimodal distribution has one mode.
bimodal distribution
a distribution of scores in which two scores occur most often or most frequently. A bimodal distribution has two modes.
multimodal distribution
a distribution of scores where more than two scores occur most often or most frequently. A multimodal distribution has more than two modes
nonmodal distribution
also called a rectangular distribution, is a distribution of scores where all scores occur at the same frequency. A nonmodal distribution has no mode.
How to determine lower and upper limits for frequency distribution
LL: multiply the bin width by the bin position
UL: LL1 (width-1)
Ll2 (width-1)…