Glossary Flashcards

1
Q

Random variable

A

A variable whose value depends on the outcome of a random experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Support (of a RV)

A

Set of possible values a RV can take

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Probability mass/density function

A

Each point on the line represents the probability of getting that value

PMF = discrete, e.g. how many heads in 10 tosses of a coin

PDF = continuous, e.g. weight of females in California aged 18-25

fx(x) = P(X=x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cumulative distribution function

A

Integrate the PDF to get the CDF

This is cumulative, so the slope is always upward-sloping, and at each value of x, the corresponding value shows the probability of getting up to that point

Fx(x) = P(X≤x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variance

A

Distance from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Statistic

A

A single measure of some attribute of a sample, calculated by applying a function to the set of data.

The function itself is independent of the sample’s distribution; that is, the function can be stated before realization of the data.

The mean of a population is not a statistic (no RV), but the mean of a sample is (sample variables chosen randomly)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Uniform distribution

A

Continuous RV between a and b - same probability of getting every point between a and b

Mean = (a+b)/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Bernoulli distribution

A

Either take on the value of “p” or “1-p

Support {0,1}

Population mean = p
Population variance = p(1-p)

Sample mean = X¯
Sample variance = n/n-1 X¯ ( 1 - X¯ )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Binomial distribution

A

Probability of something happening over a period of time - X is the number of successes in n independent Bernouilli trials

X~B(n,p)

E(X) = np

CLT: binomial is discrete, but as you do more trials, it starts to look continuous: as n→∞, B ~ N ( μ , σ2/n )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Normal distribution (skewness and kurtosis)

A

Averages more common than extremes, bell-shaped diagram

Skewness = measure of symmetry
Kurtosis = fatness in tails

*z = X-μ/σ *or z = X-μ^/SE(μ^) with samples
Then use the tables: Φ(z) = P(X ≤ z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Jensen’s inequality

A

If g(.) is concave, then E[g(X)] < g(E[X])

E.g. mean of the log < log of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Marginal probability

A

Marginal probability = the probability of an event occurring, p(A)

The probability that A=1, regardless of the value of B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Joint probability

A

Joint probability = p(A and B), the probability of event A and event B occurring

So, the joint distribution is the set of probabilities of possible pairs of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Conditional probability

A

p(A|B), the probability of event A occurring, given that event B occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Conditional mean

A

Mean of the conditional distribution → E[Y|X=x]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Law of iterated expectations

A

The mean of the conditional means is the mean → E[Y] = E[E[Y|X]]

E.g., suppose we are interested in average IQ generally, but we have measures of average IQ by gender. We could figure out the quantity of interest by weighting average IQ by the relative proportions of men and women

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Covariance

A

Do X and Y vary together?

σ(X,Y) = E[(X - E[X])*(Y - E[X])] = E[XY] - E[X]E[Y]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Correlation

A

Measures only linear relationship - may be 0 is the relationship is perfect yet non-linear

ρX,Y = cov(X,Y) / σXσY

(σ are standard deviations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

E[aX + bY] =

Var[aX + bY] =

A

= aE[X] + bE[Y]

= a2Var[X] + b2Var[Y] + 2abCov(X,Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Random sample

A

When a random experiment is repeated n times, we obtain n independent identically distributed (IID) random variables

Drawing one person makes another no more likely, drawn from the same population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Sample mean

A

The mean of a subset of the population (and a random variable)

xbar = 1/n * Σxi

The larger the sample, the smaller the variance of the sample mean.

The sample mean is an unbiased estimator of the population mean (µ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Law of large numbers

A

If Ynare IID, the sample mean converges in probability to the population mean as the sample size grows

As n → ∞, X¯ - μ → 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Central limit theorem

A

If Yn are IID with mean µ and variance σ2 and n is large, then the sample mean (Ybar) is approximately normally distributed, with mean µ and variance σ2/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Sample variance

A

s2 = 1/n-1 ∑(Xi - X¯)2

We have lost one degree of freedom: if you have n-1 numbers and the mean, you can work out the last number. The last number is not independent - we only have n-1 independent observations

The sample variance is an unbiased estimator of population variance - PROOF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Degrees of freedom
Depends on how many statistics have been found from the information. The mean is degree 1, variance 2, skewness 3...
26
Standard error
How far away an estimate is from the true value - property of expectations SE(X¯) = s /√n *If, under a null hypothesis, two samples are drawn from the same distribution, they have the same sample means and the same sample variances - which can be pooled for the sample error.* SE = √[s2A / nA + s2B / nB] *when pooling* SE = √[p(1-p)/n] *for Bernoulli* SE = √[p1(1-p1)/n1 + p2(1-p2)/n2] *when pooling Bernoulli* *NB. Population: σ2→σ ; Sample: s2/n = Var(d^) → s/√n = SE(d^)*
27
T-statistic
Due to the CLT, the sample mean is normally distributed when *n* is large t = X¯ - μ / SE(X¯) β^ is an estimator of the parameter β in a statistical model; β0 is a non-random, known constant; and se(β^) is the standard error of the estimator, β^.
28
Confidence intervals
"There is a 95% probability that the true XXX is between *A* and *B*" CI = [ μ^ ± z • SE(μ^) ]
29
Hypothesis testing - what are the steps? [5]
1) State null and alternative hypotheses, decide if it's a one- or two-tailed test 2) Suppose H0 is true: under H0, t = X¯-μ/se(X¯) ~ N(0,1), as it's a large sample 3) Decision rule: reject H0 at (e.g. 5)% significance level if |t|\>z (two-tailed), or if t\>z (one-tailed) 4) Carry out test: plug in values 5) Decide whether or not to reject H0 * Significance levels: if the probability of getting μ*
30
Type I error
When the null is rejected when it's actually true, *e.g. saying a drug works when it doesn't* P(Type I error) = significance level, α = p-value, *e.g. 6% of the time, the null will be accidentally rejected*
31
Type II error
Accepting the null even though it is false *- e.g. saying a drug doesn't work when it does* P(Type II error) = β **Power of test** = 1 - β
32
P value
Probability of obtaining a value at least as extreme as *t* under the null P(|Z| ≥ t) = 2 - φt for a two-sided test (1 - φt for one-sided) Probability of observing this t-statistic if the null were true Smallest significance level to not reject the null
33
Estimators
Should be: * Unbiased - E[µ^] = µ * Consistent - converges in probability to true value with higher n * Efficient - low variance
34
Real equivalised household income
Economist's guide to welfare: X = Y(P/C) Y is nominal household income P is cost of living index (inflation adjusted) C represents changes in tastes (cost of achieving the same utility with different tastes - e.g. family type). Equivalisation: C=1 (couple, no children), =0.67 (first adult), =0.33 (spouse), =0.33 (other adults and older kids), =0.2 (younger kids) - showing differing costs Not an ideal measure - there are lots of other variables and externalities
35
Log normal distribution
X may not be normally distributed, but ln(X) might be
36
Kolmogorov Smirnov test
Compares sample CDF to hypothesised population CDF - looks for longest distance between them Tests for whether they have the same distribution - e.g. is sample log normally distributed?
37
Power laws
Describe tails of some distributions - e.g. income, stock returns These distributions have no mean or variance - they carry on indefinitely
38
What criteria should social welfare fuctions meet? [7]
1. Monotonicity - more is weakly better than less 2. Anonymity - blind to names 3. Symmetry - swapping two people's incomes has no effect on social welfare 4. Dalton's principle / quasiconcavity - inequality is bad, so bend towards origin 5. Homogeneity of degree 1 - doube income, double welfare (not vital)
39
What are examples of inequality measures? [3]
1. 90:10 ratio - only considers 2 data points, not overall inequality 2. Gini - considers ranks, not absolute values 3. Coefficient of variation - sensitive to top and bottom, so double everyone's income and you double inequality
40
Atkinson index
SWF where you must explicitly decide on an inequality aversion parameter, ε
41
Problems with measuring poverty? [4]
Tough to measure - UK uses 60% of median income Relative Skewed incentives - e.g. taxing very poor to lift fairly poor aboce the line Ignores distribution of poor's income
42
Manski's law of decreasing credibility
The credibility of inference decreases with the strength of the assumptions maintained ## Footnote *i.e. always make minimal assumptions for greater credibility in results*
43
Counterfactual
The outcome we did not see - what would have happened without treatment. A major issue in causal inference. * E.g. effect of class size on performance - how would big class kids have done in small classes?* * E.g. people who have been to hospital report being less healthy - but how would they have been without going to hospital?*
44
Potential outcomes framework
Yi = Yi(1)Di + Yi(0)(1-Di) D = 0 = no treatment → observe Y(0) Treatment determines which of the potential outcomes we observe
45
Causal effect in the potential outcomes framework
Causal effect = Yi(1) - Yi(0) We only see one of these values for each person/patient
46
Average treatment effect
ATE = E[Yi(1)] - E[Yi(0)] Need Y(1) and Y(0) for all, treated and non-treated - half the data is missing By the LIE, ATE is equal to the weighted proportion of people's outcomes depending on whether they had treatment (i.e. outcome for treated x proportion treated + outcome for treated if they had been untreated x proportion untreated)
47
Average effect of treatement on the treated
ATT = E[Yi(1) | Di=1] - E[Yi(0) | Di=1] Only consider those who were treated Need both Y(1) and Y(0) for those who were treated - but Y(0) not available
48
Selection bias
Some members of the population are less likely to be included in a sample than others, due to certain attributes. Observed difference in averages (µA - µB) = ATT ("causal effect") + selection bias Selection bias = outcome if they hadn't been treated, given that they were - outcome if they hadn't been treated, given that they weren't Selection bias = E[Y(0)|D=1] - E[Y(0)|D=0] *Think about sign of selection bias: e.g. those who went to hospital sicker than those that did not → health of treated if untreated \< health of untreated → selection bias \< 0 → observed difference in averages \< ATT*
49
Randomised controlled trials
Random assignment of treatment - eliminates selection bias. Mean values between treated and untreated should be the same, since assigment of treatment is independent of potential outcomes ATE = outcome for treated E[Y(1)] - outcome for untreated E[Y(0)] = ATT Not always possible (ethics), but look for natural experiments that have the same effect
50
If you can't have a randomised controlled trial, how do you remove selection bias? [3]
1. Instrumental variable 2. Bounds 3. Conditional independence
51
Instrumental variables
A randomly assigned instrument affects assignment of treatment, which affects outcome: *Z affects X but doesn't affect Y other than through X* E.g. military service is randomly assigned in a lottery, and military service affects earnings Assume instrument is binary, then you have D if I=0 and D if I=1. Produces never-takers, always-takers, compliers, defiers *Generally very difficult to find a true IV*
52
Bounds
We may not know the exact value for Y(0) or Y(1), but there may be bounds (e.g. 0-100%) Average then lies within a range Can calculate upper/lower bounds for ATE Can then tighten with assumptions: e.g. health of treated \< health of untreated
53
Conditional independence assumption
Assignment to treatment is independent of potential outcomes, conditional on covariates E.g. for all groups of men, average height is the same, independent of what group you're in
54
Internal validity
Validity of sample observed and conditions surrounding gathering of data - are the findings credible? Must ensure that: * The individual's response to treatment is unaffected by others' responses * There is no contamination * Everyone complies * There is no Hawthorne Effect (people don't change their behaviour just because they are in a trial)
55
External validity
A study has external validity if the findings for the sample can be generalised to the population ## Footnote *Must be sure that there is not just a small scale / local effect - that it goes for people that did not volunteer for the test, for spillovers, long term etc.*
56
Local average treatment effect
The effect of treatment on 'compliers' LATE = E[Y | D=1, T=c] - E[Y | D=0, T=c] where T is type *LATE = ATE if compliers are a random selection of the population*
57
Conditional expectation function (Regression in a population)
How a variable (e.g. the mean) changes as you change something else Splits Yi into a part explained by Xi and a part that isn't (the residual) Yi = E[Yi|Xi] + ei
58
Decomposition property
Any random variable y can be expressed as: **Y = E[Y|X] + e** (the conditional expectation and an error term) Where e is a random variable satisfying: i) E[e|X] = 0, (e is mean independent of X) ii) E[h(x)e] = 0, (e is uncorrelated with any function of X) *→ E[E(Y|X) | X] = E[Y|X]* *From the LIE, which states that the mean of the conditional means is the mean → E[Y] = E[E[Y|X]]*
59
Prediction property
Let m(X) be any function of X, then: E(Y|X) = min E[(Y - m(X))2] From this it can be proven that the conditional expectation is the best prediction, where 'best' means minimum mean squared error
60
Regression
A linear regression is the best linear approximation to the CEF, in the least-squares sense Yi = E[Yi|Xi] + ei becomes Yi = [b0 + b1Xi] + ui We don't know b0 and b1 so we look for β0 and β1 - values which minimise the expected loss - the expected sum of squares: Yi = [β0 + β1Xi] + ui β0 = E[Yi] - β1E[Xi] = intercept β1 = cov(Yi,Xi) / var(Xi) = slope
61
R squared
Assesses the goodness of fit of the linear regression (population): R2 = ESS/TSS = 1 - RSS/TSS TSS = ESS + RSS *A high R2 value means more variation is captured by the model.*
62
TSS
Total sum of squares Σ(Yi - E[Yi])2 Overall variability
63
ESS
Estimated sum of squares Σ(Y^i - E[Y^i])2 Sum of squares of predictions, variability in model
64
SSR
Sum of squared residuals, variability of what's left over Σui2 High means it's a bad model
65
Multiple regression
Can expand the linear regression to include more variables (more βs): E[Y|X1,X2,...Xk] = β0 + β1X1 + β2X2 +...βkXk β0 = E[Y] - β1E[X1] - β2E[X2] - ... βkE[Xk] βk = cov(Y,X~k) / var(X~k) Where X~k is the residual from a regression of Xk on all other regressors; the variation in Xk which cannot be explained by the other regressors
66
Elasticity of a regression
β1, β2 etc. are the slopes with respect to X1, X2 etc. Partial derivative - how Y varies with X1, holding the other variables constant Elasticity wrt X1 = ε = ∂E[Y|X1,X2]/∂X2 \* X2/E[Y|X1,X2] *NB: Age helps earnings to start with, but then hiders - earnings start to drop off again. Putting in an "age2" term helps find the max point.*
67
Multicollinearity
When two or more predictor variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others If Xk is perfectly explained by the other regressors, it has no independent contribution to the prediction problem and βk = 0
68
Sample linear regression
Calculated for a given sample - will be different for different samples, even if they are from the same population Add hats to the population regression - sample estimates. Use sums, not expectations: β^0 = E[Y] - β^1E[X]
69
Standard error of the regression
Estimator of the standard deviation of the regression error term *u*. Cannot be calculated for population version (rely on unknown population figures) The *n-2* is adjustment for downward bias due to use of sample (-1 as normal, -1 for parameters from sample)
70
Multiple regression with samples
Degrees of freedom = n - k - 1, where there are k+1 parameters estimated from the sample More regressors = fewer degrees of freedom = bigger standard error *Beware of multicollinearity - where one of the variables adds no new info and will simply increase adjusted R2*
71
Adjusted R2
Adding regressors reduces SSR and so R2 - but it also reduces degrees of freedom and so increases SER. The adjusted R2 punishes for additional regressors: adjR2 = 1 - (n-1/n-k-1)\*(SSR/TSS)
72
F-test: joint hypothesis testing
Cannot just do individual t-tests - chance of Type I error grows, but can instead use the rule of thumb F-stat: * ur = unrestricted regression model - set tested parameters to zero (or whatever null says) in regression, then find R2 * q = number of restrictions * k = number of regressors in unrestricted model
73
Bonferroni test
Used for testing multiple variables: P(-tc \< t \< tc) = α/k k is the number of t-tests; adjust significance level α to account for multiple tests *E.g. with 2 tests originally at 5% significance, now use 0.05/2=0.025, so tc=2.24...then do t-tests as normal*
74
Dummy variable
Takes value 0 or 1 in a regression - e.g. male/not male, black/not black Coefficient therefore only affects intercept, not slope
75
Omitted variable bias
Where an additional variable would add more information (but e.g. may not be observable) The additional X may affect Y through X1, so β1 is overstated - latent effect of omitted X Also means E[u|X] ≠ 0 - upward/downward bias *Not solved by adding in loads of variables - increases standard errors and reduces degrees of freedom. There is a trade-off between bias and variance*
76
Cross partial
Effect of X=1 (being female) on Y (wages) being between being X=1 or X=0 (being female or male)
77
What problems do we encounter with regressions? [4]
1. We assume that the conditional distribution of the error term given X has a mean of 0, so Cov=0 and E(u)=0, but this may fail 2. **Causality** - e.g. what individuals earn on average if we could change their schooling, holding everything else fixed - not causal if there is selection bias (CIA does not hold) and other things affect earnings 3. **Bad controls** may be determined by other covariates - e.g. occupation determines wages, but education determines occupation 4. **Measurement errors** - if the regressor is observed with error, so we observe Xi = Xi\* + εi where ε is correlated with X, the CIA does not hold and we have attenuation bias
78
Attenuation bias
When the error term is correlated with a regressor, for example due to measurement error
79
Simultaneous causality
Where both X causes Y and Y causes X ## Footnote *E.g. Class size and test score*
80
Two-stage least squares estimator
_Stage 1:_ decompose X into exogenous components uncorrelated with error, and endogenous component correlated with error - i.e. Xi = π0 + π1Zi + vi - and then sub into Y, where v is correlated with u (but not with Z, the IV). FInd the OLS estimators of π. _Stage 2:_ use uncorrelated component to obtain an estimator of β1
81
Time series data
Data on the same observational unit over time - e.g. GDP growth, bank base rate ## Footnote *E.g. the quantity theory of money say that money growth and inflation are correlated over time, which means that in any given year they may be different, but over time they move together*
82
Lag
Time before present period - first lag is in previous period (t-1), second is before that (t-2), jth is j periods before the present (t-j)
83
Autocorrelation
Correlation of a series with its own lagged values - first autocorrelation is corr(Yt,Yt-1), jth correlation is corr(Yt,Yt-j) The jth sample autocorrelation is:
84
Autocovariance
Covariance of a series with its own lagged values In a stationary series, autocovariance is not constant - joint distrobutions change
85
Autoregression
Where Yt is regressed against its own lagged values
86
Order of autoregression
Number of lags used as regressors
87
AR(1) model
First order autoregressive model Yt = β0 + β1Yt-1 + ut The betas do not have a causal interpretation - they only show the correlation Can test H0: β1=0 to see if Yt-1 is useful in predicting Yt
88
AR(p) model
Pth order autoregressive model - using multiple lags Use t or F tests to determine lag order p - ensure each coefficient is significantly non-zero *But, note that 5% of these are wrong - F tests tend to lead to excessively long lag orders*
89
Information criterion
Alternative method to determine optimal lag order: use minimum value of AIC or BIC to optimise between R2 and efficiency
90
ADL(p,r)
Autoregressive distributed lag model - *p* lags of *Y*, *r* lags of *X* Use other variables that may be useful predictors of Y, beyond lagged Ys - e.g. to predict GDP growth, use prior growth, inflation, oil prices etc.
91
Granger causality test
Test whether lagged Xs should be included F test of hypothesis that all estimated coefficients on lagged Xs are zero Not exactly causal - more a test of whether Xs have marginal predictive power
92
Why will forecasts always be wrong? [2]
1. We are using an estimated model, so there are differences between estimated and true betas 2. 'Stuff happens' - random errors
93
Pseudo out of sample forecasting (POOS)
Test model by choosing a date near the end of the sample, testing the forecast, and finding the errors
94
Stationary series
A time series is stationary if its probability distribution does not change over time. E(Y) and Var(Y) do not change over time. *Non-stationary is the opposite - e.g. if data is different in the second half to the first half, then the historical data is not good for predicting the future*
95
Trend
A persistent long-term movement of a variable * Deterministic - non-random, always changes by same amount / constant first difference * Stochastic - trends and randomness, e.g. random walk, where the best prediction of tomorrow is today and variance depends on time (wider over time) → non stationary *Can have a random walk with drift, where Y follow a random walk around a linear trend*
96
Stochastic domination
If *A(x) \< B(x)* then A first-order stochastically dominates B If *area under A(x) \< area under B(x)* then A second-order stochastically dominates B
97
Problems with trends [3]
1. AR coefficients are biased towards zero 2. t-stats may not have standard normal distributions even in large samples 3. Spurious regressions - things look correlated due to random walks
98
Unit root
Where β1=1 in AR(1) model - random walk with drift (β0 gives drift) ## Footnote *To avoid unit roots / random walk stochastic trends, take first differences*
99
Dickey-Fuller test
Do a normal t-test to test the null that δ=0 (i.e. β1=1) against the alternative that δ\<0 (then it is stationary) ΔYt = β0 + δYt-1 + ut But t is not normally distributed - so use Dickey-Fuller critical values Reject = stationary *Similar for an AR(p) model - δ is the sum of the betas, minus 1*
100
Structural breaks
May know that data changes after a certain date (e.g. change of currency regime) - can do a test to see if coefficients change
101
Cointegration
If X and Y are cointegrated, they have the same stochastic trend. Computing Y-θX (where θ is the cointegrating coefficient) eliminates this stochastic trend. E.g. X is real wages, Y is productivity
102
How do you test for cointegration?
1. Test the order of integration for X and Y - if they are the same, estimate the cointegration coefficient 2. Test residuals (Y-θX) for stationarity by Dickey-Fuller 3. If proven to be stationary, we have cointegration
103
Order of integration
The number of times a series needs to be differenced to be stationary ## Footnote *A random walk is order 1*
104
Excess sensitivity
Challenge to the random walk model - consumption responds by more than Hall model predicts, due to unexpected changes in income ## Footnote *May be due to credit-constrained individuals - spend everything*
105
Excess smoothness
If measured income is smoother than permanent income
106
Bayes' theorem
P(W | L) = P(L | W) \* P(W) / P(L)