CH13 Univariate and Bivariate Analysis of Quantitative Data Flashcards
the set of different values of a variable that have been observed and how common each value is
distributions
a distribution of the possible values of a variable along with the number of observations for each value that was observed
frequency distribution
a presentation of the possible values of a variable along with the percentage of observations for each value that was observed
relative distribution
ideal visual presentation for categorical variable
bar graph
visualization for the frequency distribution of a continuous variable, with bars representing a range of values of a continuous variable
histogram
a single value that summarizes some feature of a distribution
summary statistic
Summary statistics that indicate the middle of a distribution
measures of central tendency
the sum of all of a variable’s values divided by the number of observations
Use:
1) at ? level (unless skewed)
2) to report ? score (the fulcrum that exactly balances all scores)
3) to anticipate additional statistical analysis
mean
1) interval/ratio
2) typical
the middle value observed when observations are ranked from the lowest to the highest (useful bc it disregards outliers)
Use:
1) at ? level
2) when ? level has skewed distribution
3) to report ? (always lies at exact center of distribution)
median
1) ordinal
2) interval/ratio
3) central score
the most common value of a variable, is the closest measure to “average” for a nominal variable
Use:
1) at ? level
2) for quick and easy measure for ? variables
3) to report most ? score
mode
1) nominal
2) ordinal/interval/ratio
3) most common score
describes the amount by which each observation in a distribution varies, or differs, from the others
variation
3 measures of variation for quantitative variables
Range, the interquartile range, and the standard deviation
The simplest variation; the difference between the maximum and minimum values of a distribution
range (R)
the difference between the first quartile and the third quartile (only use middle two quartiles); avoids the problem created by outliers, by showing the range where most cases lie.
interquartile range
the points in a distribution corresponding to the first 25% of the cases, the first 50% of the cases, and the first 75% of the cases.
quartile (Q)
the average distance between the value of each observation and the overall mean; formally the square root of the average squared differences between the values of a distribution and the distribution’s mean
standard deviation (s)
When a variable is normally distributed, 68% of the cases (almost exactly 2/3) will lie between plus and minus 1 standard deviation from the distribution’s mean, and 95% of the cases will lie between plus and minus 2 standard deviations from the mean, 99% of cases are plus and minus 3 standard deviation from mean; always symmetric
normal distribution
General relationship in normal distribution
standard deviation percentages
+- 1s = about 69%
+- 2s = about 95%
+- 3s = about 99%
Fundamental aspects of a normal curve
Bell Shaped
Unimodal
Symmetrical
Unskewed
Theoretical
Mode, Median, and Mean are equal
a presentation of distributions between two or more variables as a table. The table presents the categories of one variable as rows and the categories of the other variable as columns
cross tabulation
The totals in the bottom row and right-most column
marginal frequencies