lecture 5 Flashcards
what are guidelines for chosing K
K = [10/3 log10 (n)] +1
log10 = log base ten
n = total number obs
k often rounded to integer
gives indication of just the right number
what makes a histogram easier to read
when bin ends fall on interval points on x axis
use more interpretable intervals
can choose scale that best fits *consider range of data
how does density change histogram
vertical scale changes
multiply by bin width (interval width)
what does histogram show
need labels - shows spread, min/max
just see categories of data not whole SET
WHAT CAN we deduce from histogram
the number (or the propor- tion) of observations in a given interval
can add up heights of rectangles within interval
tricky if only part of rectangle if included
what do we lose with histograms
know nothing about distribution of observations within rectangle itself
Access to fine data bc only seeing a frequency summary
what happens when All of the observations in the rectangle are included in the interval - histograms
add entire rectangle height
what happens when None of the observations in the rectangle are included in
the interval - histograms
do not add rectangle at all
what happens when fraction of obs in rectangle included in interval - histograms
add same fraction of rectangle height
(half a bin)
can only make approximation
what are numerical measures of central tendency
try to capture where data situated
where on x axis its located
descriptive statistics
name measures of central tendency
mean/average
median
mode
what is sample mean
sample average
add all up and divide by total
number of obs
ex = 3,5,5,6,8
(3+5+5+6+8)/5 = 5.4
what happens to sample mean when have a super large value in a set of small value numbers
mean pulled away from smaller values towards larger
what happens to sample mean when have a super small value in a set of large value numbers
mean pulled away from larger values towards smaller ones
what are outliers
extreme obs in sample
what do outliers do
distort sample mean as a measure of central tendency
must be careful when outliers present
is sample mean a good measure of central tendency
maybe not in presence of outliers
what is sample median
m
number if we ordered number from smallest to largest then it would be the middle
50% data points in sample below median and 50% above median
is median always super simple
nooo
if odd = has middle
but if even = more complicated
if has ties in data = multiple identical numerical values = all numbers in middle the same
have specific rules for special cases
what happens if n odd - median
m = (n+1)/2 position
what happens if n even - median
m = average of (n/2)th and (n/2 +1)th position
is sample median heavily influenced by outliers
NOOO
median probably wont change due to outliers
median is robust to outlying observations