Glossary Flashcards

Question

Degrees of freedom

Answer 1

Depends on how many statistics have been found from the information. The mean is degree 1, variance 2, skewness 3...

Answer 2

How far away an estimate is from the true value - property of expectations SE(X^¯) = s /√n *If, under a null hypothesis, two samples are drawn from the same distribution, they have the same sample means and the same sample variances - which can be pooled for the sample error.* SE = √[s²_A / n_A + s²_B / n_B] *when pooling* SE = √[p(1-p)/n] *for Bernoulli* SE = √[p₁(1-p₁)/n₁ + p₂(1-p₂)/n₂] *when pooling Bernoulli* *NB. Population: σ²→σ ; Sample: s²/n = Var(d^{^}) → s/√n = SE(d^{^})*

Answer 3

Due to the CLT, the sample mean is normally distributed when *n* is large t = ^{X¯ - μ} / _SE(X^¯) β^{^} is an estimator of the parameter β in a statistical model; β₀ is a non-random, known constant; and se(β^{^}) is the standard error of the estimator, β^{^}.

Answer 4

"There is a 95% probability that the true XXX is between *A* and *B*" CI = [ μ^{^} ± z • SE(μ^{^}) ]

Answer 5

1) State null and alternative hypotheses, decide if it's a one- or two-tailed test 2) Suppose H₀ is true: under H₀, t = ^X¯-μ/_se(X^¯) ~ N(0,1), as it's a large sample 3) Decision rule: reject H₀ at (e.g. 5)% significance level if |t|\>z (two-tailed), or if t\>z (one-tailed) 4) Carry out test: plug in values 5) Decide whether or not to reject H₀ * Significance levels: if the probability of getting μ*

Answer 6

When the null is rejected when it's actually true, *e.g. saying a drug works when it doesn't* P(Type I error) = significance level, α = p-value, *e.g. 6% of the time, the null will be accidentally rejected*

Answer 7

Accepting the null even though it is false *- e.g. saying a drug doesn't work when it does* P(Type II error) = β **Power of test** = 1 - β

Answer 8

Probability of obtaining a value at least as extreme as *t* under the null P(|Z| ≥ t) = 2 - φt for a two-sided test (1 - φt for one-sided) Probability of observing this t-statistic if the null were true Smallest significance level to not reject the null

Answer 9

Should be: * Unbiased - E[µ^{^}] = µ * Consistent - converges in probability to true value with higher n * Efficient - low variance

Answer 10

Economist's guide to welfare: X = Y(P/C) Y is nominal household income P is cost of living index (inflation adjusted) C represents changes in tastes (cost of achieving the same utility with different tastes - e.g. family type). Equivalisation: C=1 (couple, no children), =0.67 (first adult), =0.33 (spouse), =0.33 (other adults and older kids), =0.2 (younger kids) - showing differing costs Not an ideal measure - there are lots of other variables and externalities

Answer 11

X may not be normally distributed, but ln(X) might be

Answer 12

Compares sample CDF to hypothesised population CDF - looks for longest distance between them Tests for whether they have the same distribution - e.g. is sample log normally distributed?

Answer 13

Describe tails of some distributions - e.g. income, stock returns These distributions have no mean or variance - they carry on indefinitely

Answer 14

1. Monotonicity - more is weakly better than less 2. Anonymity - blind to names 3. Symmetry - swapping two people's incomes has no effect on social welfare 4. Dalton's principle / quasiconcavity - inequality is bad, so bend towards origin 5. Homogeneity of degree 1 - doube income, double welfare (not vital)

Answer 15

1. 90:10 ratio - only considers 2 data points, not overall inequality 2. Gini - considers ranks, not absolute values 3. Coefficient of variation - sensitive to top and bottom, so double everyone's income and you double inequality

Answer 16

SWF where you must explicitly decide on an inequality aversion parameter, ε

Answer 17

Tough to measure - UK uses 60% of median income Relative Skewed incentives - e.g. taxing very poor to lift fairly poor aboce the line Ignores distribution of poor's income

Answer 18

The credibility of inference decreases with the strength of the assumptions maintained ## Footnote *i.e. always make minimal assumptions for greater credibility in results*

Answer 19

The outcome we did not see - what would have happened without treatment. A major issue in causal inference. * E.g. effect of class size on performance - how would big class kids have done in small classes?* * E.g. people who have been to hospital report being less healthy - but how would they have been without going to hospital?*

Answer 20

Y_i= Y_i(1)D_i + Y_i(0)(1-D_i) D = 0 = no treatment → observe Y(0) Treatment determines which of the potential outcomes we observe

Answer 21

Causal effect = Y_i(1) - Y_i(0) We only see one of these values for each person/patient

Answer 22

ATE = E[Y_i(1)] - E[Y_i(0)] Need Y(1) and Y(0) for all, treated and non-treated - half the data is missing By the LIE, ATE is equal to the weighted proportion of people's outcomes depending on whether they had treatment (i.e. outcome for treated x proportion treated + outcome for treated if they had been untreated x proportion untreated)

Answer 23

ATT = E[Y_i(1) | D_i=1] - E[Y_i(0) | D_i=1] Only consider those who were treated Need both Y(1) and Y(0) for those who were treated - but Y(0) not available

Answer 24

Some members of the population are less likely to be included in a sample than others, due to certain attributes. Observed difference in averages (µ_A - µ_B) = ATT ("causal effect") + selection bias Selection bias = outcome if they hadn't been treated, given that they were - outcome if they hadn't been treated, given that they weren't Selection bias = E[Y(0)|D=1] - E[Y(0)|D=0] *Think about sign of selection bias: e.g. those who went to hospital sicker than those that did not → health of treated if untreated \< health of untreated → selection bias \< 0 → observed difference in averages \< ATT*

Answer 25

Random assignment of treatment - eliminates selection bias. Mean values between treated and untreated should be the same, since assigment of treatment is independent of potential outcomes ATE = outcome for treated E[Y(1)] - outcome for untreated E[Y(0)] = ATT Not always possible (ethics), but look for natural experiments that have the same effect

Answer 26

1. Instrumental variable 2. Bounds 3. Conditional independence

Answer 27

A randomly assigned instrument affects assignment of treatment, which affects outcome: *Z affects X but doesn't affect Y other than through X* E.g. military service is randomly assigned in a lottery, and military service affects earnings Assume instrument is binary, then you have D if I=0 and D if I=1. Produces never-takers, always-takers, compliers, defiers *Generally very difficult to find a true IV*

Answer 28

We may not know the exact value for Y(0) or Y(1), but there may be bounds (e.g. 0-100%) Average then lies within a range Can calculate upper/lower bounds for ATE Can then tighten with assumptions: e.g. health of treated \< health of untreated

Answer 29

Assignment to treatment is independent of potential outcomes, conditional on covariates E.g. for all groups of men, average height is the same, independent of what group you're in

Answer 30

Validity of sample observed and conditions surrounding gathering of data - are the findings credible? Must ensure that: * The individual's response to treatment is unaffected by others' responses * There is no contamination * Everyone complies * There is no Hawthorne Effect (people don't change their behaviour just because they are in a trial)

Answer 31

A study has external validity if the findings for the sample can be generalised to the population ## Footnote *Must be sure that there is not just a small scale / local effect - that it goes for people that did not volunteer for the test, for spillovers, long term etc.*

Answer 32

The effect of treatment on 'compliers' LATE = E[Y | D=1, T=c] - E[Y | D=0, T=c] where T is type *LATE = ATE if compliers are a random selection of the population*

Answer 33

How a variable (e.g. the mean) changes as you change something else Splits Y_i into a part explained by X_i and a part that isn't (the residual) Y_i = E[Y_i|X_i] + e_i

Answer 34

Any random variable y can be expressed as: **Y = E[Y|X] + e** (the conditional expectation and an error term) Where e is a random variable satisfying: i) E[e|X] = 0, (e is mean independent of X) ii) E[h(x)e] = 0, (e is uncorrelated with any function of X) *→ E[E(Y|X) | X] = E[Y|X]* *From the LIE, which states that the mean of the conditional means is the mean → E[Y] = E[E[Y|X]]*

Answer 35

Let m(X) be any function of X, then: E(Y|X) = min E[(Y - m(X))²] From this it can be proven that the conditional expectation is the best prediction, where 'best' means minimum mean squared error

Answer 36

A linear regression is the best linear approximation to the CEF, in the least-squares sense Y_i = E[Y_i|X_i] + e_i becomes Y_i = [b₀ + b₁X_i] + u_i We don't know b₀ and b₁ so we look for β₀ and β₁ - values which minimise the expected loss - the expected sum of squares: Y_i = [β₀ + β₁X_i] + u_i β₀ = E[Y_i] - β₁E[X_i] = intercept β₁ = cov(Y_i,X_i) / var(X_i) = slope

Answer 37

Assesses the goodness of fit of the linear regression (population): R² = ^ESS/_TSS = 1 - ^RSS/_TSS TSS = ESS + RSS *A high R² value means more variation is captured by the model.*

Answer 38

Total sum of squares Σ(Y_i - E[Y_i])² Overall variability

Answer 39

Estimated sum of squares Σ(Y^{^}_i - E[Y^{^}_i])² Sum of squares of predictions, variability in model

Answer 40

Sum of squared residuals, variability of what's left over Σu_i² High means it's a bad model

Answer 41

Can expand the linear regression to include more variables (more βs): E[Y|X₁,X₂,...X_k] = β₀ + β₁X₁ + β₂X₂ +...β_kX_k β₀ = E[Y] - β₁E[X₁] - β₂E[X₂] - ... β_kE[X_k] β_k = cov(Y,X^~_k) / var(X^~_k) Where X^~_k is the residual from a regression of X_k on all other regressors; the variation in X_k which cannot be explained by the other regressors

Answer 42

β₁, β₂ etc. are the slopes with respect to X₁, X₂ etc. Partial derivative - how Y varies with X₁, holding the other variables constant Elasticity wrt X₁ = ε = ∂E[Y|X₁,X₂]/∂X₂ \* X₂/E[Y|X₁,X₂] *NB: Age helps earnings to start with, but then hiders - earnings start to drop off again. Putting in an "age²" term helps find the max point.*

Answer 43

When two or more predictor variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others If X_k is perfectly explained by the other regressors, it has no independent contribution to the prediction problem and β_k = 0

Answer 44

Calculated for a given sample - will be different for different samples, even if they are from the same population Add hats to the population regression - sample estimates. Use sums, not expectations: β^{^}₀ = E[Y] - β^{^}₁E[X]

Answer 45

Estimator of the standard deviation of the regression error term *u*. Cannot be calculated for population version (rely on unknown population figures) The *n-2* is adjustment for downward bias due to use of sample (-1 as normal, -1 for parameters from sample)

Answer 46

Degrees of freedom = n - k - 1, where there are k+1 parameters estimated from the sample More regressors = fewer degrees of freedom = bigger standard error *Beware of multicollinearity - where one of the variables adds no new info and will simply increase adjusted R²*

Answer 47

Adding regressors reduces SSR and so R² - but it also reduces degrees of freedom and so increases SER. The adjusted R² punishes for additional regressors: adjR² = 1 - (n-1/n-k-1)\*(SSR/TSS)

Answer 48

Cannot just do individual t-tests - chance of Type I error grows, but can instead use the rule of thumb F-stat: * ur = unrestricted regression model - set tested parameters to zero (or whatever null says) in regression, then find R² * q = number of restrictions * k = number of regressors in unrestricted model

Answer 49

Used for testing multiple variables: P(-t_c \< t \< t_c) = α/k k is the number of t-tests; adjust significance level α to account for multiple tests *E.g. with 2 tests originally at 5% significance, now use 0.05/2=0.025, so t_c=2.24...then do t-tests as normal*

Answer 50

Takes value 0 or 1 in a regression - e.g. male/not male, black/not black Coefficient therefore only affects intercept, not slope

Answer 51

Where an additional variable would add more information (but e.g. may not be observable) The additional X may affect Y through X₁, so β₁ is overstated - latent effect of omitted X Also means E[u|X] ≠ 0 - upward/downward bias *Not solved by adding in loads of variables - increases standard errors and reduces degrees of freedom. There is a trade-off between bias and variance*

Answer 52

Effect of X=1 (being female) on Y (wages) being between being X=1 or X=0 (being female or male)

Answer 53

1. We assume that the conditional distribution of the error term given X has a mean of 0, so Cov=0 and E(u)=0, but this may fail 2. **Causality** - e.g. what individuals earn on average if we could change their schooling, holding everything else fixed - not causal if there is selection bias (CIA does not hold) and other things affect earnings 3. **Bad controls** may be determined by other covariates - e.g. occupation determines wages, but education determines occupation 4. **Measurement errors** - if the regressor is observed with error, so we observe X_i = X_i\* + ε_i where ε is correlated with X, the CIA does not hold and we have attenuation bias

Answer 54

When the error term is correlated with a regressor, for example due to measurement error

Answer 55

Where both X causes Y and Y causes X ## Footnote *E.g. Class size and test score*

Answer 56

_Stage 1:_ decompose X into exogenous components uncorrelated with error, and endogenous component correlated with error - i.e. X_i = π₀ + π₁Z_i + v_i - and then sub into Y, where v is correlated with u (but not with Z, the IV). FInd the OLS estimators of π. _Stage 2:_ use uncorrelated component to obtain an estimator of β₁

Answer 57

Data on the same observational unit over time - e.g. GDP growth, bank base rate ## Footnote *E.g. the quantity theory of money say that money growth and inflation are correlated over time, which means that in any given year they may be different, but over time they move together*

Answer 58

Time before present period - first lag is in previous period (t-1), second is before that (t-2), j^th is j periods before the present (t-j)

Answer 59

Correlation of a series with its own lagged values - first autocorrelation is corr(Y_t,Y_t-1), j^th correlation is corr(Y_t,Y_t-j) The j^th sample autocorrelation is:

Answer 60

Covariance of a series with its own lagged values In a stationary series, autocovariance is not constant - joint distrobutions change

Answer 61

Where Y_t is regressed against its own lagged values

Answer 62

Number of lags used as regressors

Answer 63

First order autoregressive model Y_t = β₀ + β₁Y_t-1 + u_t The betas do not have a causal interpretation - they only show the correlation Can test H₀: β₁=0 to see if Y_t-1 is useful in predicting Y_t

Answer 64

P^th order autoregressive model - using multiple lags Use t or F tests to determine lag order p - ensure each coefficient is significantly non-zero *But, note that 5% of these are wrong - F tests tend to lead to excessively long lag orders*

Answer 65

Alternative method to determine optimal lag order: use minimum value of AIC or BIC to optimise between R² and efficiency

Answer 66

Autoregressive distributed lag model - *p* lags of *Y*, *r* lags of *X* Use other variables that may be useful predictors of Y, beyond lagged Ys - e.g. to predict GDP growth, use prior growth, inflation, oil prices etc.

Answer 67

Test whether lagged Xs should be included F test of hypothesis that all estimated coefficients on lagged Xs are zero Not exactly causal - more a test of whether Xs have marginal predictive power

Answer 68

1. We are using an estimated model, so there are differences between estimated and true betas 2. 'Stuff happens' - random errors

Answer 69

Test model by choosing a date near the end of the sample, testing the forecast, and finding the errors

Answer 70

A time series is stationary if its probability distribution does not change over time. E(Y) and Var(Y) do not change over time. *Non-stationary is the opposite - e.g. if data is different in the second half to the first half, then the historical data is not good for predicting the future*

Answer 71

A persistent long-term movement of a variable * Deterministic - non-random, always changes by same amount / constant first difference * Stochastic - trends and randomness, e.g. random walk, where the best prediction of tomorrow is today and variance depends on time (wider over time) → non stationary *Can have a random walk with drift, where Y follow a random walk around a linear trend*

Answer 72

If *A(x) \< B(x)* then A first-order stochastically dominates B If *area under A(x) \< area under B(x)* then A second-order stochastically dominates B

Answer 73

1. AR coefficients are biased towards zero 2. t-stats may not have standard normal distributions even in large samples 3. Spurious regressions - things look correlated due to random walks

Answer 74

Where β₁=1 in AR(1) model - random walk with drift (β₀ gives drift) ## Footnote *To avoid unit roots / random walk stochastic trends, take first differences*

Answer 75

Do a normal t-test to test the null that δ=0 (i.e. β₁=1) against the alternative that δ\<0 (then it is stationary) ΔY_t = β₀ + δY_t-1 + u_t But t is not normally distributed - so use Dickey-Fuller critical values Reject = stationary *Similar for an AR(p) model - δ is the sum of the betas, minus 1*

Answer 76

May know that data changes after a certain date (e.g. change of currency regime) - can do a test to see if coefficients change

Answer 77

If X and Y are cointegrated, they have the same stochastic trend. Computing Y-θX (where θ is the cointegrating coefficient) eliminates this stochastic trend. E.g. X is real wages, Y is productivity

Answer 78

1. Test the order of integration for X and Y - if they are the same, estimate the cointegration coefficient 2. Test residuals (Y-θX) for stationarity by Dickey-Fuller 3. If proven to be stationary, we have cointegration

Answer 79

The number of times a series needs to be differenced to be stationary ## Footnote *A random walk is order 1*

Answer 80

Challenge to the random walk model - consumption responds by more than Hall model predicts, due to unexpected changes in income ## Footnote *May be due to credit-constrained individuals - spend everything*

Answer 81

If measured income is smoother than permanent income

Answer 82

P(W | L) = P(L | W) \* P(W) / P(L)