F5 Statistical programming in R Flashcards
What is the relation between probability theory and statistics?
Statistics is the application of theory to real world data.
Do we like or do we hate R packages?
We hate them.
What do we assume about about errors? It is related to the ability to obtain t- and p-values.
We assume homoscedasticity of residual variance. If not, then robust standard errors.
Var(u|x)=sigma^2
What are the five assumptions for OLS?
1) Linearity between dependent and all independent variable (the DGP is due to underlying linearity)
2) Independence between observations E(u|x)=E(u). Normal distribution of errors (in the error term) – used for inference
3) no outliers
4) Homoscedasticity – the error term has a constant variance
5) No multicollinearity
What is the difference between the variance, the standard deviation and the standard error?
Variance: Var(x) or sigma^2 is the expected value of the squared deviations from the mean.
Standard deviation: Sigma is the square root of the variance.
Standard error: SE is sigma/squareroot(n) and relates to the distribution of a coefficient.
What is the purpose of a link function?
A link function transforms the probabilities of the levels of a categorical response variable to a continuous scale.
What is the poisson distribution used for?
Count variabel - number of events.
What are five important distributions?
(Standard) normal distribution
Standard logistic distribution
Poisson distribution
Binomial distribution
Bernoulli distribution
What regression model do you use it the dependent variable is ‘limited’ to binary outcomes or count variables?
Logistic regression (binary outcomes)
Poisson regression (count data)
What’s important for the outcome variable?
Important to respect the data generating process of the outcome variable.
Probability
Discrete values (countable, dichotomous outcomes)
USE GLM
Do you use ML or OLS for GLM?
ML: Maximum likelihood. Maximizes the likelihood of observing the data