2: Descriptive Stats For Distirbutions Part 1 Flashcards
Data matrices
Data organized into a grid format
- row= case
- column=variable
- cell=value
Frequency tables
Showing the number of cases with each value of a variable
- calculate %
- works best for categorical or discrete variables
- for Quan that take many values report binned within certain ranges
- hard to extract more detailed information or easily grasp shape of distribution
Categorical variables (nominal data)
Pie graphs visually represent proportions or %
Stemplots
Graphical with all values of a quantitative variable
Frequency histograms
Display the frequency distribution of one variable
-possible values of variable on X
-frequency of each variable on Y
Height = frequency
-for continuous variables: values binned into ranges
-as number of measurements increases, a frequency histogram approximats a curved shape
Normal distribution
Specific shape of frequency:
-bell shaped
-symmetrical
-
Positive skew
Tail to the right. (Most values on left)
- floor effects: cluster at low end
- high outliers
Negative skew
Tail to left (most values on right)
- ceiling effects (many cluster at high end)
- low outliers
Central tendency
- center of distributions
- mode,median,mean
Describing variability
Spread of distribution
-range, standard deviation
Mode
Most frequently occurring
- near center if normal
- may have multiple modes (bimodal) non normal
- values other than most frequent not considered
- limited application for continuous variables
Preferred when:
- discrete data
- nominal scale
- multimodal
Median
Value at midpoint
- resistant to outliers
- takes more info into account than mode, but still ignores magnitudes
Preferred when;
- skewed distribution
- ordinal scale
Mean
Average of all values
- sensitive to outliers
- most informative
Preferable when:
- interval or ratio
- normal distribution
Skewed distribution
Median remains stable near center,
-outliers pull the mean in the direction of tail
Multimodal
Modes may be most informative
-mean and median may not represent the typical value
Range
Max-min
- very sensitive to outliers
- values between extremes not accounted for
Preferred if Max and min relevant
Interquartile range
Defines the range of middle 50% -Q1: median lower half -Q3: median of upper half These are boudriesnif middle 50% -very resistant to outliers -more info than range but still ignores the magnitude of most values. IQR= Q3-Q1
Simple rough estimate
Standard deviation
Variance: S^2 average squared deviation of each score from mean
Standard deviation is square root of variance
Variance= sum(X-M)^2/N-1
-most widely used
-sensitive to outliers
-take all scores
Preferred in most cases if normal distribution
Five number summary
Max score Third quartile Median First quartile Min score
- resistant measure of center
- resistant measure of spread
- sensitive measure of outer limits of spread
Box plots
Values of variable Y
First and third quartile ends in box
A line in box is median
Whiskers max and min scores