F5 Statistical programming in R Flashcards

1
Q

What is the relation between probability theory and statistics?

A

Statistics is the application of theory to real world data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Do we like or do we hate R packages?

A

We hate them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do we assume about about errors? It is related to the ability to obtain t- and p-values.

A

We assume homoscedasticity of residual variance. If not, then robust standard errors.
Var(u|x)=sigma^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the five assumptions for OLS?

A

1) Linearity between dependent and all independent variable (the DGP is due to underlying linearity)

2) Independence between observations E(u|x)=E(u). Normal distribution of errors (in the error term) – used for inference

3) no outliers

4) Homoscedasticity – the error term has a constant variance

5) No multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between the variance, the standard deviation and the standard error?

A

Variance: Var(x) or sigma^2 is the expected value of the squared deviations from the mean.

Standard deviation: Sigma is the square root of the variance.

Standard error: SE is sigma/squareroot(n) and relates to the distribution of a coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of a link function?

A

A link function transforms the probabilities of the levels of a categorical response variable to a continuous scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the poisson distribution used for?

A

Count variabel - number of events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are five important distributions?

A

(Standard) normal distribution
Standard logistic distribution
Poisson distribution
Binomial distribution
Bernoulli distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What regression model do you use it the dependent variable is ‘limited’ to binary outcomes or count variables?

A

Logistic regression (binary outcomes)
Poisson regression (count data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s important for the outcome variable?

A

Important to respect the data generating process of the outcome variable.

Probability
Discrete values (countable, dichotomous outcomes)

USE GLM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Do you use ML or OLS for GLM?

A

ML: Maximum likelihood. Maximizes the likelihood of observing the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly