Data Analysis Flashcards
Quantitative or numerical variables
Result is a number (age, height, etc.)
Categorical or nonnumerical variables
Result is something other than a number (eye color, person voted for, etc.)
Frequency or count
Number of times a variable appears in the data
Relative frequency
Frequency of the variable appearing divided by the total number of data (appears as fractions, decimals, or percents)
Histograms (4 things)
Show interval data (often in percentage of relative frequency) and there are NO gaps between bars like in bar graphs. A gap indicates no data for that interval. Useful for identifying the shape or spread of data.
Measures of central tendency
Goal: find the “center” of the data. Mean, median, and mode.
Weighted mean
divide only the numbers that are DIFFERENT (not the frequencies for each one) multiplied by the frequencies
Ex: 2, 4, 5, 5, 6, 6, 6, 7, 9
(2) + (4) + 2(5) + 3(6) + (7) + (9) / 6 = 8.333
Which measure of central tendency is least affected by outliers?
The median
Measures of position (6)
Least, greatest, median, quartiles, percentiles (99 to divide into 100 groups)
How to calculate the 1st and 3rd quartiles
The median of the lower half of the data from the median as a whole, and the median of the upper half of the data (in an ordered list!)
Measures of dispersion (3)
indicate the degree of spread of the data
range, interquartile range, standard deviation
Interquartile range
difference between 3rd quartile and 1st quartile (measures the spread of the middle half of the data; less susceptible to outliers)
How to find the standard deviation (5 steps)
- find the mean
- find the difference between the mean and each value
- square each difference
- find the average of the squared differences
- take the square root of the average
The mean is X SD away from the mean.
The mean is 0 SD from the mean
Most data fall within X SD of the mean
3 SD