lecture 4 Flashcards
describe graphical methods
quantitative data = recorded on meaning full numerical scale
can graph - summarize, describe and detect patterns in data
Methods describe shape of the data
no inference just summarize and display
shape of distribution
name 2 graphical methods for quantitative data
histograms
boxplots
describe histograms - set up
values of qualitative variable (single) separated into class intervals = each has same width
intervals = scale of horizontal axis
frequency or relative freq of obs in each class interval determined - bars = lower–>upper range (count what falls into each interval)
Vertical bar placed Over each class interval - height of bar equal to class freq or relative freq
usually same width but can be different sometimes = more complicated
describe histograms of body temps
need labels - x, y and title
see where bulk of obs are
can make different intervals - can be better for data
what happens when we use smaller intervals (less wide bins)
narrower bins = lower max value
more bins = more detailed picture
what are important things to look at on histogram
x,y axis
labels
positions of bins
vertical height of bars
freq of obs falling into bin
name 8 steps to making histograms
1 - order obs from lowest to highest (min–>max)
2 - Xs = smallest value, Xl = largest value
3 - compute difference Xl-Xs
4 - decide on number of classes (intervals) k into which you divide obs
5 - comute l = (Xl-Xs)/k = lenght of each interval
6 - start with Xs, form k intervals, one after the other
7 - specify intervals [Xs,Xs +), [Xs +l, Xs + 2l),…,[Xl - l, Xl]
8 - for each interval plot rectangle with height equal to obs/proportion
can adjust - what is best to show for the data
describe step 4 for histograms - k value
Shouldn’t be too small or large
show shape of data well
describe step 8 for histograms - height
number of obs in data that fall into interval = freq
ORRRRRR
proportion (out of n (total number of obs)) of obs that fall into interval = relative freq (this helps get better idea of chance it could fall into this category)
y axis variable
describe step 7 for histograms - intervals
[a,b) = includes a but NOT b
[a,b] = includes both
first (k-1) intervals closed on left, open on right, last interval closed on both sides
make a histogram for n=40
sort
Xs and Xl
decide on k
ex = [0,8), [8,16), [16,24), [24,32)
16 falls into 3rd category
describe convention in R - binning
DEFAULT: right = true, histogram cells are intervals of form (a,b], includes right end point but not keft, except first cell, include.lowest is true
right = false, intervals [a,b) and include.lowest means include highest
must be careful when defining open/closed bc will get different histogram
why is number of intervals = k important
if k too small = conveys little info
if k too large = info too detailed and wont show shape of distribution properly
NO INCORRECT CHOICE THo
what do we look at when inspecting or constructing a histogram
location on x axis
spread of data
shape of distribution
take note of vertical and horizontal scales
is vertical measuring freq, relative freq or density
label x axis properly
describe density plot
density in interval k = 1/l x count in interval k/n
relative freq of obs per x increment
Divide by l = now % per unit length measurement
standardization by bin width is important for probability