Midterm Flashcards
Independence
If Pr(A & B) = Pr(A)*Pr(B) OR Pr(A | B) = Pr(A)
- Determine if variables are correlated
- Avoid biased estimates
- Investigate autocorrelation, which could lead to inefficiency and incorrect standard errors
Conditional Expectation
- Calculating conditional Variance
- Law of iterated expectations
Law of Iterated Expectations
E[E[g(x,y)|x1, x2]|x1] = E[g(x,y)|x1]
- Essential for deriving unbiased estimators
- How information affects predictions
Mean Independence
E[Y|X]=E[Y]
- E[u|X]=0
- Helps to justify assumptions in instrumental variable regression
Coefficient of Determination (R^2)
= SST / SSE
The proportion of the variance in the dependent variable that is captured by the model.
- Predictive Power
- Comparing models
Leverage Point
Observation whose explanatory variables have the potential to exert an unusually strong effect on the fitted model.
- Can disproportionately influence the results of a regression model.
- Model stability
- Important to detect to ensure robust and reliable regression analysis
Frisch-Waugh Theorem
Simplifies a MLR model by allowing it to focus on the impact of a single variable, holding others constant.
Beta_1 = (X1’ M2 X1)^-1 X1’ M2 y
- Useful when dealing with panel data sets and individual effects.
- How do individual variables contribute to the regression model?
Positive Definite
If x’Ax >0 for all x != 0
- Ensures OLS estimates have a unique and stable solution
- Guarantees the Var-Cov matrix is invertible
- Ensures MLE reach a unique minimum
Nonnegative Definite
If x’Ax >= 0 for all x != 0
- Var-Cov matrix must be non negative to make meaningful statistical inferences
Square Root Matrix
R is a square root matrix if A = RR’
- Used in Monte Carlo
- Essential for Generalized Least Squares
Level Curves
Sets of points where a function takes the same value.
c in R{x: x’Ax = c}
- Visualizing constraints and trade-offs
Normal Distribution
f(y) = 1 / [2 sqrt(2pi o^2)] exp(- (y - mu)^2/(2o^2))
- Central Limit Theorem
- Standard statistical tests
- Symmetry and known properties make it easy to deal with
Multivariate Normal Distribution
If it can be expressed as a non-stochastic linear transformation of a vector of independent standard normal random variables.
- Modeling relationships between multiple dependent variables (time series)
- Enables factor analysis and multivariate hypothesis testing
Student’s t-Distribution
t(m) = z / sqrt(chi^2(m) / m)
- As sample size increases, it approaches the normal distribution
- Useful for small and large datasets
- Central to testing the significance of individual coefficients in regression analysis
Snedecor’s F-Distribution
If the ratio can be expressed as the ratio of two independent chi^2 random variables each divided by its degrees of freedom.
- Comparing variances across groups
- Testing the significance of a regression model
- Test restrictions on multiple regression coefficients
- Testing if nested models provide a significantly worse fit than a more complex model
Neyman-Pearson Lemma
The critical region should not contain any outcomes that are relatively more likely under the null than under the alternative.
- Likelihood Ratio Tests
- Balances the tradeoff between Type I and Type II errors
Unbiased Test
The probability of rejection under each point in the null is always less than the probability of rejection under each point in the alternative.
- Provides a fair test that does not favour the null or alternative hypothesis.
Restricted Least Squares Estimator
Incorporates constraints into the regression model.
- Testing hypotheses about relationships between variables
- Improves the efficiency of estimates
Consistent Estimator
If the estimator converges in probability to the true value of the parameter.
- Essential for reliable long-term inference
- With sufficient data, the estimator will yield results close to the actual population parameter.
Law of Large Numbers
Sample averages converge to their population means.
- Provides the foundation for the consistency of estimators.
Asymptotics
The behaviour of estimators and test statistics as the sample size approaches infinity.
Independent and Identically Distributed
Random variables that are independent of each other and drawn from the same probability distribution.
- OLS, MLE, LLN
- Simplifies assumptions -> Makes it easier to derive consistent and unbiased estimators.
Continuous Mapping Theorem
plim g(x) = g(plim x)
- Allows the application of continuous functions to converging sequences of random variables while preserving convergence
Convergence in Distribution
Let z be a sequence of random variables distributed by Fn. If Fn -> F, then we say z converges in distribution to F
- Analyzing the behaviour of r.v.’s as the sample size increases.
- Ensures the distribution of an estimator or test statistic approaches a limiting distribution
- Fundamental for Central Limit Theorem.
Wald Test
Testing the significance of one or more coefficients in a regression model.
- Widely used in Generalized Linear Models and MLE
Maximum Likelihood Estimator
The value of the parameter vector that maximizes the Likelihood Ratio Test.
- Produces efficient and consistent estimators.
- Central to Generalized Linear Models
- Incorporates non-linear relationships
Efficient Score Test (the Lagrange Multiplier)
Tells us how the estimation changes if we relax the constraint.
- Useful when the unrestricted model is complex and difficult to estimate
- Effective for detecting small deviations from the null hypothesis
Information Criteria
Balances model fit with model complexity
- Choosing the best model
- Time series
Akaike (AIC)
Penalizes the inclusion of additional parameters
- Prevents overfitting
- Time series
Bayes/Schwartz BIC
Imposes a stronger penalty for the number of parameters than AIC
- More conservative than AIC
Rolling Binned Means Estimator
- Smooths time series data by calculating averages over sliding windows
- Reduces noise
- Mitigates the effects of outliers
Kernel
Weigh observations based on their distance from a target point.
- Bias-variance trade off
Curse of Dimensionality
The exponential increase in complexity and data requirements as the number of variables in a model grows
Ridge Regression
Adds a penalty proportional to X’X which helps to shrink the coefficients towards zero without making them zero.
- Used to address multicollinearity
- Balances bias and variance
- Prevents overfitting
Best Subset Selection
Identifies the best combination of predictors from a larger set of potential explanatory variables.
- Related to Information Criteria
LASSO
Some coefficients are set to zero, while others are shrunk towards the origin.
- Improves model prediction and interpretability
- Balances model complexity and predictive accuracy
Training and Validation Set
Sample is randomly divided into two parts: A training set and a validation set. Fit the model with the training set and use the validation set to compute MSE.
- Tests a model on unseen data.
Leave one out Cross Validation
Train the model on all but one observation. Test it on the remaining observation. Repeat for each observation. Average the results.
- Unbiased estimate of model performance.
- Useful for small datasets.
K-Fold Cross Validation
Divide into K groups. Treat each fold as the validation sample and average the MSE across the K groups.
- Works well for small and large datasets.
- More robust performance estimate
Sieve Estimation
A linear combination of a family of functions that can be used to approximate m(x) arbitrarily well.
- Uses a sequence of simpler models instead of a fixed functional form.
- Avoids overfitting
- As the number of observations grows, the sieve converges to the true model.
Neural Nets
Machine Learning Model
- Used for modeling complex non-linear relationships
- Flexibility and High Predictive Power
Panel Data Fixed Effects
Controls for unobserved heterogeneity when this heterogeneity is constant over time and correlated with independent variables.
- Controls for time invariant factors