1. Introduction, Frequentist inference Flashcards
What is descriptive statistics/algorithmic
- Characterising datasets
- Dataset statistics:
Mean, standard deviation, median,
etc. - Reveals facts about in-sample
distribution - “Is this what I expect to see?”
What is inductive statistics/inferential
- Uses samples to draw conclusions
about populations - Reveals probable facts about
out-of-sample distribution - We have the sample (the data) and we want to know about the population.
Statistical inquiry is usually motivated by? What category are there?
A business question:
- Prediction: regression
- Decision problems: hypothesis testing
- Experimental design: not talked about in the class too much
Statistical model contains:
Assumption about a distribution with certain parameters
Some kind assumption about the function
Data and evaluation are outside the model
Explain a experiment and event:
- An experiment is any process, real or hypothetical, in which the possible outcomes can
be identified ahead of time. - An event is a well-defined set of possible outcomes of the experiment.
Throwing a dice is an experiment, the cast of 1 in the dice would be an event
Sample space:
The sample space is the set of all possible outcomes. Casting one dice would have a sample space of 6 different outcomes.
An event is a subset of the sample space.
Random variable
A real-valued function X : S 7→ R defined on the sample space.
Example: the sum of a dice. Casting two dice with one eye = 2
What do we want to know about random variables?
Full specification:
- What values can they assume, and how likely are they?
We are aiming to predict the distribution
Descriptive statistics
- What is their value typically?
Central tendency (where does the prob. tend to cluster)
- How certain are we about this typical value?
We are aiming yo see how “spread out” are the values?
Dispersion
Central tendency : what is it going to be
Dispersion : how certain are we that it is going to be that
Central tendency: Expectation
The expectation or mean is the sum of all possible values of a random variable,
weighted by their probability.
Dispersion: Variance and standard deviation
How many parameters do we need for a categorical choice between 10 alternatives?
Each choice needs a parameter so 10 (said in class)
Christian said: you can get away with 9, because you need to sum them up to 1?
Parameters
A characteristic or combination of characteristics that determine the joint
distribution for the random variables of interest. Fx for the normal dist. the mean is set at
Are parameters random variables?
Frequentist would say no
Bayesian would say yes
What questions to ask when making a statistcal model:
Scope: What do we want to model? Identify random variables of interest
Structure: How does it hang together? specify of joint dist. of random variables
Parameters: How can we fit it? Identification of parameters of distribution assumed unknow
Optional (bayesian): Specifiction of joint dist for unknow paratemers
What does the frequentist approach to statistics assume?
- experiments are infinitely repeatable, and
- the underlying parameters remain constant.
In frequentisme probabilities can then be estimated as:
relative frequencies in a long run of experiments.
What is typical quantity of interest is the standard error of an estimate:
This is the standard deviation of the sampling distribution of a statistic.
If we sample over and over again, we will get the sample dist, and the standard deviation of the sampling dist will tell us how far we are from the real dist.
WHat is theta
A constant thing, a number we dont know, but we want to know
- A number that we cant know, we want to know
Constant
What is theta hat
Some sort of sample statistic (quatative that we have computed, any value we can compute) that we use as an estimator for theta
- This is a number can know, close to theta
Constant
What is captial theta hat
The estimator, and a random variable
A function of the random variable that gives us from the sample data. The estimator of theta hat is a random variable
In bayesian statistic the parameter is a random variable itself
What does bias tell us? Write it out
Bias: How far off is the estimator on average?
We look at the overall thing
What does variance tell us? Write it out
Variance: What is the spread of our estimates?
This is more like in terms of the enkelte data??
Explain frequentism in practice
The probabilistic properties of a procedure of interest are derived and then applied
verbatim to the procedure’s output for the observed data.
What is problem about frequentism?
In deriving the inference procedures, we assume we know the data distribution F, but in reality we don’t.
What is the Plug-in principle? (not required to know in detail)
The frequentist accuracy estimate of the data is itself estimated from the observed data. We use information we have already about the coin observed to fx calculate the standard error about the coin?
What is the delta method and Taylor-series approximations
If you have some results about your parameters and you apply a function,
If you fo certain things, what is the outcome of your random variable going to be like?
The transformation does: we know something about the standard error, and you apply a transformation the delta method approximate the straight line, which is the derivative, and this tell you that if you initial value has a certain dist. if the transformation is very steep, the corresponding value is going to have a larger dist. (IM SO SORRY)
TAKE AWAY: This is a type of reasoning the frequentists would apply.
What is pivotal statistics
This is one of whose distribution does not depend in the underlying distribution of F. We can normalise the t-test to show it is independent of the sigma in the t-test. This means we remove the problem of knowing the underlying distribution.
What steps do we take when we do simulation and the bootstrap
- Modern computer power allows us to simulate the “infinite” sequence of
experiments numerically. (this was a bold assumption back in the days, when they did it like 5 times by hand) - Create B new bootstrap samples from your existing data sample by resampling
with replacement. - Run the inference procedure for each of the bootstrap samples.
- Draw conclusions from the empirical distribution of the estimates.
What is the problem with these methods? (plug-in, taylor-series, bootstrap, pivotal
There is some cases where there methods are just not applicable.
Frequentist theory show;
That certain procedures are asymptotically optimal
under certain assumptions.
- In parametric probability models, the maximum-likelihood estimate
is the optimum estimate in terms of asymptotically minimum standard error.
- Neyman-Pearson lemma:
The likelihood ratio test is uniformly most powerful (has lowest type-II error)
among hypothesis tests with a given type-I error rate.
This is nice because you get the guidelines on what to do under these assumptions.