models, populations and estimations Flashcards
What is a Bernoulli distribution
A Bernoulli distribution is distribution over a binary variable
(which can always be written as {0, 1}).
I It has a single parameter, which we will denote by θ, which gives
the probability that it takes the value of 1
What is a normal distribution?
The normal distribution is a probability distribution over a
continuous variable.
I It has two parameters: The mean, usually denoted by µ, and the
standard deviation, usually denoted by σ
Populations and samples and inference
- Our statistical models have parameters that are assumed to have
fixed but unknown values. - We must estimate these values from data.
- However, our estimates will always be subject to uncertainty.
- To introduce this major topic of statistical inference, we must first
consider the topic of populations and samples.
Statistics of samples can be used to estimate the true values of
populations.
- But statistics of samples can be very variable.
- But understanding how the statistics vary (i.e. knowing the
sampling distribution), we can
- For example, we can say things like (informally speaking) If the
true mean is 100, the mean of a sample of 10 values could be anywhere
from around 90 to around 110. . . .
- With reasoning like this, we did get a sample of mean of 105, we
could ask (informally speaking), if that is compatible or not with
the true mean being, say, 100
populations and samples
The concept of a population is a very important concept in
statistics.
-The population is a (possibly hypothetical, usually infinite) set
from which our data is assumed to be a sample.
- The statistical model is in fact a model of the population: it is a
model of the set from which our data is a sample.
- To understand how we infer the properties of the model from
data, we must understand how samples relate to populations.
Sampling distributions
We actually know that IQ in the population is normally
distributed with a mean of 100 and a standard deviation of 15. It
is designed that way.
- But what if we didn’t know that and we were trying infer the
mean and the standard deviation of IQ in the population from a
sample. - We get a sample, calculate the mean and the standard deviation.
- What similar will this mean and standard deviation of this sample
be to the true mean and standard deviation?
r studio for a bell shaped curve
plot_normal(mean = 0, sd = 1)
Plot_normal(mean = 100, sd = 15, xmin = 50, xmax = 150)
Plot_normal_sample(1000, mean = sd = 15, bins = 50)
^^histogram
r studio for statistical properties of normal distribution
normal_percentiles(mean = 0, sd = 1)
Normal_percentiles(mean = 100, sd = 15)
to create scattergram
Plot_repeat_normal_samples(N = 100, n = 10 mean = 100, sd = 15)
to create histogram
hist_repeat_normal_samples(N = 1000, n = 10, mean = 100, sd = 15)
samples from normal distribution
Plot_normal_sample(1000, mean = sd = 15, bins = 50
repeat normal samples
Plot_repeat_normal_samples(N = 100, n = 10 mean = 100, sd = 15)
Sampling distributions from normal distributions to create histogram
hist_repeat_normal_samples(N = 1000, n = 10, mean = 100, sd = 15)