lecture 7 Flashcards
what must we be careful with with sample variance
rounding
if round mean = can get a negative variance which is impossible
so be careful
what do we need to describe a data set
provide
numerical measures of center = mean, median, mode
Numerical measures of spread = range, variance
we must describe distribution of values or obs in sample
can be symmetric around a central point but not always
describe symmetric distributions
Sample have perfectly symmetric distribution if its histogram has a symmetric shape around some x value
mean = median
how often does sample have perfectly symmetrical distribution
RARE
in some cases approximation pretty good
so mean ~ median
around the same
Unlikely they are numerically exactly the same - more possible for a population
how can data be skewed
left or right skew
depends on where tail is = away from central, where freq of obs low
describe right skew
positively right skewed data = histogram shows long tail on right
median < mean
mean moves to right
right tail dominating
not outliers tho, since like slowly drags out tail
describe left skew
negatively left skewed data = histogram shows long tail on left
median>mean
mean pulled to left bc directly influenced by data
can we compete sample mean, median and variance if we only see histogram
nooo
data grouped in intervals
by looking at height we can see how many obs in interval but do not know exact location within each - hard to make precise conclusions only rough calculations
only approx
also cannot draw inferences
from samples to populations
data representative of some underlying population
real goal is to understand characteristics of population based on sample
corresponding population quantities - things we measure
sample mean = Xbar –> pop mean = μ
sample variance = s^2 –> pop variance = σ^2
Sample standard dev = s –> pop stand dev (s.d.) = σ
what does sample have to be
representative of population
how to think about population - interpretation
as sample with extremely large sample size
keep collecting data until learnt everything about way data generated
Very large, infinite sample size= we have learnt everything we possibly could learn
Ultimately - histogram where bins gets smaller and smaller until it becomes a nice curve