Basic Descriptive Data Analysis Flashcards
frequency distribution table
ranked order scores that shows the number of times each value occurred
categorical data contains what in a frequency distribution table
raw and relative frequency
continuous data contains what in a frequency distribution table
raw, relative, cumulative
raw frequency
how many fall into that data, usually whole number
example: 5/27 people were age 0-10
relative frequency
how does the number of data points relate to the entire sample, in %
example: 5/27 = 18.5 %
cumulative frequency
cumulative % up to indicated range you are looking at
class intervals (BIN)
defined range limits in which data is grouped
categorical variables
separate due to lack of relationship to one another - ranking order tho
what are two major statistical outputs of categorical data
frequency and percentage
what constitutes a normal distribution curve
bell shaped curve
symmetrical around the mean
what is the statistical significance of normal distribution?
many datasets follow the bell shaped symmetrical around the mean shape
what does bimodal mean
bad, “two humps”
suggestive of 2 different populations
left skewed
negatively skewed
tail is to the left
right skewed
positively skewed
tail is to the right
stem and leaf plot
used with continuous data
good for showing individual data
bad for large amounts of data
histogram
continuous data
we can get distribution curves with a histogram
good for showing midpoint of data and large amounts of data
bad for showing individual data
mean
equals sum of all values / total number of values
when is mean commonly used
used for measuring central tendency
when is mean less helpful
when outliers present or with skewed distribution
median
equals value of middle of ranked data
when is median most helpful
more helpful than mean when outliers present or with skewed distribution
mode
equals value that occurs most often
less commonly used compared to mean and median
what is the relationship between mean, median and mode with symmetrical data
mean = median = mode
what is the relationship between mean, median and mode with right skewed data
mode < median
what is the relationship between mean, median and mode with left skewed data
mode > median
range
nominal and ordinal
2 extreme scores
can be listed as interval, misleading value is outliers present
percentiles
describes a scores position within distribution
interquartile range
Q3 - Q1
more helpful than range when outliers present
Q1
value that occurs at the first quarter mark (25%)
Q2
value that occurs at the second quarter mark = median
Q3
value that occurs at the third quarter mark (75%)
box plot
summary in 5 numbers
min, Q1, Q2/median, Q3, max
pros and cons of box plots
large number set
do not keep exact values
lose info within data
standard deviation
average absolute distance of each point from the mean
what is standard deviation helpful for
distinguishing statistically significant data points from random fluctuations
proportional area under a normal curve: 68 %
within 1 standard deviation from the mean
proportional area under a normal curve: 95 %
within 2 standard deviation from the mean
proportional area under a normal curve: 99.7 %
within 3 standard deviations from the mean
coefficient of variation
unitless measures that depicts the size of the SD relative to the mean
especially helpful when comparing variation of greater than or equal to 2 variables measured in different units
often expressed as a %