lecture 5 Flashcards
what are guidelines for chosing K
K = [10/3 log10 (n)] +1
log10 = log base ten
n = total number obs
k often rounded to integer
gives indication of just the right number
what makes a histogram easier to read
when bin ends fall on interval points on x axis
use more interpretable intervals
can choose scale that best fits *consider range of data
how does density change histogram
vertical scale changes
multiply by bin width (interval width)
what does histogram show
need labels - shows spread, min/max
just see categories of data not whole SET
WHAT CAN we deduce from histogram
the number (or the propor- tion) of observations in a given interval
can add up heights of rectangles within interval
tricky if only part of rectangle if included
what do we lose with histograms
know nothing about distribution of observations within rectangle itself
Access to fine data bc only seeing a frequency summary
what happens when All of the observations in the rectangle are included in the interval - histograms
add entire rectangle height
what happens when None of the observations in the rectangle are included in
the interval - histograms
do not add rectangle at all
what happens when fraction of obs in rectangle included in interval - histograms
add same fraction of rectangle height
(half a bin)
can only make approximation
what are numerical measures of central tendency
try to capture where data situated
where on x axis its located
descriptive statistics
name measures of central tendency
mean/average
median
mode
what is sample mean
sample average
add all up and divide by total
number of obs
ex = 3,5,5,6,8
(3+5+5+6+8)/5 = 5.4
what happens to sample mean when have a super large value in a set of small value numbers
mean pulled away from smaller values towards larger
what happens to sample mean when have a super small value in a set of large value numbers
mean pulled away from larger values towards smaller ones
what are outliers
extreme obs in sample