Prelim Exam Prep Deck Flashcards
Sandwich (robust) estimator for variance is given by ….
Gibbs sampler is given by
What’s consistency?
What’s the Fisher information matrix?
M-estimator
Any solution to \sum \psi(Y_i, theta) = 0
\psi does not depend on i or n.
true parameter value of theta is given by E\psi(Y_i, theta) =0
distribution (derived through Taylor expansion) is of form
\hat theta \sim AN (theta_0, V(theta_0) / n )
V= A^-1 B A^(-1T)
See sandwich variance estimator for more details
What’s the gamma-exponential model?
prior p(theta) = Gamma(alpha, beta)
&
likelihood p( y | theta) = Exponential( theta )
> >
posterior p(theta | y) = Gamma(alpha + 1, beta + y)
What’s the Poisson-Gamma Model?
What are Jeffrey’s priors?
What’s the normal-normal model?
What’s the beta-binomial model?
What’s the Metropolis Algorithm?
What’s the Metropolis-Hastings algorithm?
What’s importance sampling?
What’s the normal pdf?
What’s the gamma pdf?
What’s the gamma mean?
What’s the weak law of large numbers?
What’s the CLT?
What’s asymptotic normality of the MLE?
What’s the Gauss-Markov Theorem?
The Gauss Markov theorem says that, under certain conditions, the ordinary least squares (OLS) estimator of the coefficients of a linear regression model is the best linear unbiased estimator (BLUE), that is, the estimator that has the smallest variance among those that are unbiased and linear in the observed output variables.
What are the conditions required for the Gauss-Markov Theorem to apply?
independent/ uncorrelated error is the key.
What’s the OLS estimator?
What’s the sum of an infinite power series?
What’s the sum of an infinite geometric series?
What’s Small o: convergence in probability?
What’s Big O: stochastic boundedness?
What’s the EM algorithm?
What’s an orthogonal projection?
The orthogonal projection of a vector $s$ onto a given subspace $R$ is the vector $rin R$ that is closest to $s$.
What does it mean for a projection to be idempotent?
What’s Newton’s Method?
When/ how might Monte Carlo Integration be useful for likelihood estimation?
What are M-estimators and how are they different from MLE?
What’s the Poisson PMF?
When might consistency/ asymptotic approximation of the MLE break down?
If the number of parameters increases with the sample size, the usual theorem about consistency/ asymptotic approximation (normality) of MLEs doesn’t apply.
When might we use the delta method?
The Delta method is a theorem that can be used to derive the distribution of a function of an asymptotically normal variable.
It is often used to derive standard errors and confidence intervals for functions of parameters whose estimators are asymptotically normal.
It can be useful in contexts where asymptotic properties of the MLE breakdown e.g. the number of parameters increases with sample size and the fisher’s information consequently doesn’t approximate the asymptotic variance.
What are the assumptions for OLS regression?
- Linearity in parameters (variable transformation can be used to meet this assumption but may make interpretation difficult)
- Full column rank X (When a matrix is rank deficient (not full rank), its determinant becomes zero and subsequently, its inverse does not exist)
- If perfect multicollinearity exists between any two or more regressors, matrix X does not have “k” linearly independent columns. It is, therefore, not full rank.
- If n<k then the rank of the matrix is less than k, it does not have full column rank. Therefore, when we have k regressors, we should have at least k observations for being able to estimate β by OLS.
- For simple linear regression, if all the values of the independent variables x are the same then we have two columns in matrix X — one having all 1’s (coefficient for the intercept term) and the other having all c (the constant value of the regressor), as shown below. It is evident that in this case too, the matrix is not full rank, the inverse does not exist, and β cannot be determined. - Zero Conditional Mean of Errors
Note: Conditions 1-3 mean the OLS is unbiased.
To have the OLS as the BLUE by Gauss Markov Theorem:
- Homoscedasticity: we do not want the regressors to carry any useful information regarding the errors.
- Nonautocorrelation: Errors are unrelated to other errors.
Additional assumption:
6. Normality
- When all the other assumptions are met along with the normality assumption, OLS estimates coincide with the Maximum Likelihood estimates, giving us certain useful properties to work with.
- Coefficient estimates for βᵢ can be shown to be a linear function of errors ϵᵢ. A very cool property to keep in mind— a linear function of a normally distributed random variable is also normally distributed. Therefore assuming ϵᵢ as normally distributed gives us βᵢ estimates as normally distributed as well. * This makes it easier for us to calculate confidence intervals and calculate p-values for the estimated β — coefficients (which are commonly seen in R and Python model summaries). If the error normality condition is not satisfied, then all the confidence intervals and p-values of individual t-tests for β — coefficients are unreliable.
Source: https://towardsdatascience.com/assumptions-in-ols-regression-why-do-they-matter-9501c800787d#:~:text=What%20does%20it%20mean%20for,linearly%20independent%20of%20each%20other.
What is Iteratively reweighted least squares?
The method of iteratively reweighted least squares (IRLS) is used to solve certain optimization problems with objective functions of the form of a p-norm, and can be used to estimate the beta coefficients for a generalized linear model.
What’s the invariance property of the MLE?
What’s Bayes Rule?
What’s the definition of correlation?
What’s standard brownian motion?
What’s the variance of the sum of two random variables?
What’s a Poisson process?
What’s the exponential distribution?
Note, the exponential distribution is memoryless.
What’s the connection between the exponential distribution and the gamma?
Sum of exponentials(lambda) = gamma(n, lambda)
What’s the general form for the f-statistic?
What’s the Kronecker product of A (an m × n matrix) and B is (a p × q matrix)?
What’s the bernoulli pdf?
EK=p and VarK=p(1-p)
I think the pdf might actually be
f = p^X(1-p)^(1-X)
What’s the binomial pdf?
EX=np and VarX=np(1-p)
What’s variance in terms of expectation?
Note what variance reduces to when E(X) = 0.
What’s the law of total expectation?
What is the E( beta(a,b) )?
What’s a useful gamma function identity?
What’s the Bayes Factor for model comparison?
What’s the beta-binomial model’s marginal?
a) time to state change?
b) generator matrix?
c) how might we use the generator matrix to get the stationary probabilities?
What’s the difference between Bayesian and frequentist residuals in the context of Linear Models?
What’s the equivalent Bayesian setup of a frequentist linear regression?
What’s the equivalent Bayesian setup of a frequentist penalized regression?
What’s the beta function in terms of the gamma function?