1. Introduction, Frequentist inference Flashcards
What is descriptive statistics/algorithmic
- Characterising datasets
- Dataset statistics:
Mean, standard deviation, median,
etc. - Reveals facts about in-sample
distribution - “Is this what I expect to see?”
What is inductive statistics/inferential
- Uses samples to draw conclusions
about populations - Reveals probable facts about
out-of-sample distribution - We have the sample (the data) and we want to know about the population.
Statistical inquiry is usually motivated by? What category are there?
A business question:
- Prediction: regression
- Decision problems: hypothesis testing
- Experimental design: not talked about in the class too much
Statistical model contains:
Assumption about a distribution with certain parameters
Some kind assumption about the function
Data and evaluation are outside the model
Explain a experiment and event:
- An experiment is any process, real or hypothetical, in which the possible outcomes can
be identified ahead of time. - An event is a well-defined set of possible outcomes of the experiment.
Throwing a dice is an experiment, the cast of 1 in the dice would be an event
Sample space:
The sample space is the set of all possible outcomes. Casting one dice would have a sample space of 6 different outcomes.
An event is a subset of the sample space.
Random variable
A real-valued function X : S 7→ R defined on the sample space.
Example: the sum of a dice. Casting two dice with one eye = 2
What do we want to know about random variables?
Full specification:
- What values can they assume, and how likely are they?
We are aiming to predict the distribution
Descriptive statistics
- What is their value typically?
Central tendency (where does the prob. tend to cluster)
- How certain are we about this typical value?
We are aiming yo see how “spread out” are the values?
Dispersion
Central tendency : what is it going to be
Dispersion : how certain are we that it is going to be that
Central tendency: Expectation
The expectation or mean is the sum of all possible values of a random variable,
weighted by their probability.
Dispersion: Variance and standard deviation
How many parameters do we need for a categorical choice between 10 alternatives?
Each choice needs a parameter so 10 (said in class)
Christian said: you can get away with 9, because you need to sum them up to 1?
Parameters
A characteristic or combination of characteristics that determine the joint
distribution for the random variables of interest. Fx for the normal dist. the mean is set at
Are parameters random variables?
Frequentist would say no
Bayesian would say yes
What questions to ask when making a statistcal model:
Scope: What do we want to model? Identify random variables of interest
Structure: How does it hang together? specify of joint dist. of random variables
Parameters: How can we fit it? Identification of parameters of distribution assumed unknow
Optional (bayesian): Specifiction of joint dist for unknow paratemers
What does the frequentist approach to statistics assume?
- experiments are infinitely repeatable, and
- the underlying parameters remain constant.