ECON 326 FINAL Flashcards

1
Q

Why do we need econometrics? Class size example

A

Economics suggests important relationships with policy implications/should that rarely ever indicates quantitative magnitude of causal effects which ideally would be determined by experiment (randomized+controlled) however almost always we only have observational (non-experimental) data.

Ex. Decrease in class size increases student achievement, the provincial government should create policy to decrease class sizes, but by how much=quantitative effect?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Random Sampling must satisfy

A

Random Sampling must satisfy: no confounds, each person has equal chance of selection (ex.Tasting saltiness of well mixed soup)
* n>25 Law of Large Numbers
* Random Sample
* Identically distribution
* Sample Independently distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain difference b/w Independent & Identically distributed? Example of coin flip?

A
  • Independent: value of one doesn’t affect/depend on value of another
  • Identical: probability of outcomes is the same, same process used to collect data
  • (ex.Flipping coin, previous result doesn’t affect next flip, each flip has 50/50 probability distribution)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Prove expectation of Y equals population regression equation

A

E[Y]=E[Y|X]=E[0+1Xi+ui]=E(b0)+E(b1x)+E(e)=b0+b1x+e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 Measures of Fit + Formula + Drawing

A
  1. Regression R2(0=no fit,1=perfect): unitless fraction of variance Y that is explained by X, the higher the better
  2. Standard Error of the Regress (SER) units of Y magnitude of typical regression residual in units of Y/spread of the distribution of residuals u, almost the sample standard deviation of OLS residuals, the lower the better
  3. Root Mean Squared Error (RMSE) same as SER but divided by 1/n
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

OLS Estimation + Proof

A
  • OLS Estimation: minimizing squared error, makes the line fit the model so that the sum of difference b/w regression line and true data point.
  • Minimizes avg. squared difference b/w actually values (Yi) and prediction Yi=b0+b1+e based on the estimated line
  • given a set of number=n points (Xi,Yi), find the line of best fit Yi=a+bx that minimizes the sum of squared errors in Y, i=1n(Yi-Yi)2 (vertical distance b/w points & line)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

R Regression Code

A

regression1 <-lm(dependent~independent, data=caschool)
summary(regression1)
coeftest(regression,vcov=vcovHC(regression, type=”HC1”))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Look at R code result, Interpret each element

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

1 Least Squares Assumptions for Causal Reference

A

Randomized Controlled Experiment: for a binary treatment, expected difference in means b/w the treatment & control groups which are divided by random assignment (by computer) ensuring X is uncorrelated with all other determinants of Y, there are no confounding variables (OVB/bias), all individual characteristics that make up u are distributed independently of X so Conditional Distribution E(u|X=x)=0 all other qualities and residuals will cancel out across both groups implying 1 is an unbiased estimator of the causal effect

See graph in doc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

2 Least Squares Assumptions for Causal Reference

A

Identically & Independently Distributed to allow Central Limit Theorem (CLT) to create the sampling distribution of 0 & 1 by simple random sampling; all entities selected from same population (identically distributed) and at random so probability of selecting one school has no correlation with selecting other (independently distributed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

3 Least Squares Assumptions for Causal Reference

A

Large outliers in X and/or Y are rare E(X4or Y4)< it could strongly influence results or create meaningless values of 1, usually X & Y are bounded having finite fourth moments
Scatterplot and removing extreme values of X or Y or else
Trimming: take 1% of data off of both ends
Winsorizing: replacing with less extreme values from within the data distribution, rather than removing them entirely to mitigate their effects without completely discarding data points.

See doc for graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interpret b0 and b1

A

b0 is the average value of Y when X=0

b1 is the unit of change associated with a 1 unit change of X holding all other factors/variables constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Heteroskedasticity means that:
A) homogeneity cannot be assumed automatically for the model.
B) the variance of the error term is not constant.
C) the observed units have different preferences.
D) agents are not all rational.

A

B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The power of the test is:
A) dependent on whether you calculate a t or a t2 statistic.
B) one minus the probability of committing a type I error.
C) a subjective view taken by the econometrician dependent on the situation.
D) one minus the probability of committing a type II error.

A

D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

With i.i.d. sampling each of the following is true EXCEPT:
A) E( ) = .
B) var( ) = /n.
C) E( ) < E(Y).
D) is a random variable

A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Central limit theorem states:
A) states conditions under which a variable involving the sum of Y1,…, Yn i.i.d. variables
becomes the standard normal distribution.
B) postulates that the sample mean is a consistent estimator of the population mean .
C) only holds in the presence of the law of large numbers.
D) states conditions under which a variable involving the sum of Y1,…, Yn i.i.d. variables
becomes the Student t distribution

A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

You have estimated a linear regression to understand the relationship between salary and
years of experience. You want to test the hypothesis:
* Null Hypothesis H0 : The effect of experience on salary is zero (β1=0).
* Alternative Hypothesis HA : Experience significantly affects salary (β1≠0).
Which of the following R commands will provide the t-statistic and p-value for this
hypothesis test?
A) summary(model)
B) coefficients(model)
C) confint(model)
D) t.test(company_data$salary, company_data$experience)

A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which command will predict sales if the advertising budget is 1000 units?
A) predict(model, newdata = data.frame(advertising = 1000))
B) predict(model, newdata = list(advertising = 1000))
C) model$predict(1000)
D) predict(model, advertising = 1000

A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which command extracts the intercept and slope coefficients from the model?
A) coef(model)
B) summary(model)
C) model$coefficients
D) coefficients(model)

A

C

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which R command will show the detailed results (coefficients, residuals, R-squared, etc.) of
the regression?
A) summary(model)
B) print(model)
C) model$coefficients
D) coefficients(model)

A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which of the following is the correct way to run a simple linear regression in R, where sales
is the dependent variable and advertising is the independent variable using the lm()
function?
A) lm(sales ~ advertising, data = dataset)
B) lm(advertising ~ sales, dataset)
C) lm(data = dataset, sales ~ advertising)
D) lm(dataset$sales, dataset$advertising)

A

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

To infer the political tendencies of the students at your college/university, you sample 150
of them. Only one of the following is a simple random sample. You:
A) make sure that the proportion of minorities are the same in your sample as in the
entire student body.
B) call every fiftieth person in the student directory at 9 a.m. If the person does not answer
the phone, you pick the next name listed, and so on.
C) go to the main dining hall on campus and interview students randomly there.
D) have your statistical package generate 150 random numbers in the range from 1 to the
total number of students in your academic institution, and then choose the corresponding
names in the student telephone directory

A

D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

4 elements of Ideal Randomzied Controlled Experiment

A
  • Ideal: subjects follow treatment protocol, perfect compliance, no errors in reporting
  • Randomized: subjects from population of interest are randomly assigned to a treatment or control group so no confounding OVB
  • Controlled: control group permits measuring differential effect of treatment
  • Experiment: treatment assigned, subjects have no choice to avoid reverse causality & selection biases (those who are more likely to be in treatment group make up treatment group causing a bias)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

4th Least Square Assumptiosn for Causal Inference in Multiple Regressions & How it can be violated & Solutions

A

No perfect collinearity, a regressor is an exact linear function of the other regressor, regressors are highly correlated
1. Inserting the same variable twice gives r code of NA, STRATA (dropped)
2. Dummy Variable Trap: one variable can be perfectly predicted from the others, making it impossible to accurately interpret the individual effects of each dummy variable on the model due to redundancy with the intercept term, mutually exclusive & exhaustive, include all dummy variables & a constant gives perfect multicollinearity, income v. provinces
* Solution: modify list of regressors, omit intercept or omit a categorical group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
5 Multiple Regression Model Measures of Fit
1. Actual predicted + residual 1. SER standard deviation of predicted residuals with d.f. correction/avg spread 1n-k-1i=1nui2, k=# of variables 1. RMSE standard deviation of predicted residuals without d.f. correction/avg spread 1ni=1nui2, 1. R2 fraction of variance Y explained by variance X. Issue: adding more regressors even if slightly correlated with Y will reduce SSR improving fit ESSTSS=1-SSRTSS,ESS=i=1n(Y-Y)2, SSR=i=1n(Yi-Y)2, TSS=i=1n(Yi-Y)2 1. R2 Adjusted with a degrees of freedom correction, penalizes inclusion of another regressor, doesn’t necessarily increase when adding another regressor, always smaller than unadjusted R2 1-(n-1n-k-1)SSRTSS
26
How to mathematically hold constant variables in multiple regression model
Partial derivatives X2, b=changeY/changeX1 X1, b=changeY/changeX2
27
3 Solutions to Omitted Variable Bias
1. Randomized controlled experiment where treatment is randomly assigned: not feasible 1. Cross Tabulation Approach: will run out of data, control for OVBs, compare cases with differing independent but identical confounding determinants, however 1. Add regressor/use regression doesn’t omit confounder: multiple regression
28
Fill in Direction of Bias Table for Positive & Negative Correlation
Direction: relation b/w Z→X & B/w Z→Y, does it amplify or dampen the relation b/w X→Y Downward Bias: makes relation seem more negative, increases X and decreases Y, decreases X and increases Y Upward Bias: makes relation seem more positive, as increases X and increases Y, decreases X and decreases Y Over & Underestimating: farther and closer to 1=0
29
OVB impact on 1) Bad attention span on X-Media Usage --> Y-Academic Performance 2) Experience on X-Setup --> Y-Game Rank 3) PcETL on X-Class Size --> Y-TestScores
1. Downward Bias: more negative than it actually is. Overestimates the power of media usage on academic performance. May falsely conclude that media usage has a greater effect on grades than reality as attention span plays a large role 1. Upward bias: more positive than it actually is. Overestimates the ability for a good setup to increase rank as experience plays a big factor too. 1. Direction: positive X→Y times positive Z→X, overestimate
30
Omitted Variable Bias + Conditions + Formula
Omitted Variable Bias: biased and not consistent even if n is large E(b^1)=/b1 Arises when one of the variables (Z) in u are correlated with x & satisfies both 1. Is a determinant of Y=part of u 1. Is correlated with X=Corr(Z,X)=Xu0
31
TestScore=698.9-2.28STR, SE(b1)=0.52 significant? Use hypothesis testing, p-test and confidence interval to prove it.
H0:1=0 H0:10 Hypothesis Testing: t=1-1SE(1)=-2.28-00.52=-4.38>2.58=t reject null P-Test: Pr(t>t)=Pr(-4.23>0.05)=2%<5%= reject null Confidence Interval: 11.96SE(1)=-2.28-1.96(0.52),-2.28+1.96(0.52) (-3.3,-1.26) since confidence interval doesn’t include 0, null is rejected
32
5 Step Process for solving slop of population regression line
1. State+Provide Population Object of Interest & Its Estimator: 1 assuming 3 LSA hold (EIIDO) 1. Derive Sampling Distribution (Normal of n large): 1~N(1,v2n(x2)2) Mean: E(1)=1, Variance:Var(1)=12=1nVar((Xi-x)ui)(Var(Xi))2=1nVar(1ni=1n(Xi-x)ui(x2)2) 1. Standard Error of Estimator = Square root of Estimated Variance SE(1)=12= 1. Construct T-Statistic & Confidence Interval for Hypothesis Testing t=estimator-hypothesized valuestandard error of estimator=Y-ysY/n=1-1SE(1)=1-112 1. Significance/Hypothesis Reject: reject if t>1.96 or p>significance
33
Derive Residual, Mean and Variance of b1
See doc
34
Sampling Uncertainty + What do you need to derive to find it
* Sampling Uncertainty: different samples yield different values of 1 & 2 which can be quantified by hypotheses or confidence interval requiring finding the sampling distribution. * Distribution of the OLS estimator: sampling distribution of 1 in large samples is normally distributed, derive mean & variance to compute significance t=b1SE(b1) * 1~N(1,v2n(x2)2), Z=1-E(1)var(1)~N(0,1)
35
Derive b_1, b_0
See doc
36
OLS Blue
OLS BLUE: best linear unbiased efficient estimator
37
Causal Inference v. Predicition
* Place different requirements on data, use same regression toolkit * Causal Inference: learning causal effect on Y of a change in X * Prediction: predicting value of Y given X for an observation not in the data se
38
Regression Error Types + Illustrate
Regression Error/Population Error Term (ui): consists of omitted factors (OVB other than X that influence Y, difference b/w regression line and true data point (& error measurement of Y) * Unexplained Variations/Residual i=1n(Yi-Y)=i=1nei ei=Yi-(b0+b1Xi)=Actual Data Value Y -Predicted Value Y * Explained Variations i=1n(Y-Y) * Total Variation i=1n(Yi-Y)
39
X=434.49, X=294.67 Construct 99%
Confidence Interval: CI[X-2.58294.671744,X+2.58294.671744]=[416.29,452.69] 99% of cases the true mean lies within this interval. True population weekly earnings average would be in this interval. 90% Confidence Interval would be smaller CI[X2.58294.671744>X1.64294.671744]
40
Is difference in average earnings statistically significant: see graph in tutorial 1 doc
Given sample standard deviation use t-statistic: H0=0,H1>0, t=Y<45-Y>45s<452507+s>4521237=4.62>t=2 is significant, null hypothesis rejected
41
Sir Francis Galton, a cousin of James Darwin, examined the relationship between the height of children and their parents towards the end of the 19th century. It is from this study that the name "regression" originated. You decide to update his findings by collecting data from 110 college students, and estimate the following relationship: = 19.6 + 0.73 × Midparh, R2 = 0.45, SER = 2.0 where Studenth is the height of students in inches, and Midparh is the average of the parental heights. (Following Galton's methodology, both variables were adjusted so that the average female height was equal to the average male height.) (a) Interpret the estimated coefficients. (b) What is the meaning of the regression R2 ? (c) What is the prediction for the height of a child whose parents have an average height of 70.06 inches? (d) What is the interpretation of the SER here? (e) Given the positive intercept and the fact that the slope lies between zero and one, what can you say about the height of students who have quite tall parents? Those who have quite short parents? (f) Galton was concerned about the height of the English aristocracy and referred to the above result as "regression towards mediocrity." Can you figure out what his concern was? Why do you think that we refer to this result today as "Galton's Fallacy"?
See doc
42
Sample Midterm Q1
See doc
43
Sample Midterm Q3
See doc
44
Imagine that you were told that the t-statistic for the slope coefficient of the regression line = 698.9 – 2.28 × STR was 4.38. What are the units of measurement for the t-statistic?
D) in denominators SE=f(x) of standard deviation
45
In general, the t-statistic has the following form:
C) x-x/SE(x)=-SE()
46
4 Reasons why Correlation doesn't imply causation
1. Hidden variable causes A & be to move together 1. Coincidence 1. B causes A, reverse causliaty 1. Strong correlation but causal effect weak, A causes B and B causes A
47
Interpret 95% Confidence Interval
Interval that contains the true value 95% of the time when repeatedly sampled.
48
Type 1, Type 2, P-value, Power
* Type 1 Error Pr==significance: false positive, reject Null when it is true * Type 2 Error Pr==1-: false negative, accept Null when it is false * P-Value/Marginal Significance Level=Pr(t>tact)=Pr(x-X/n>xactual-X/n): probability of getting statistic higher than realized sample, contains more info than test reject, reject if p<
49
Process of Student t-distribution
Student t-distribution: if distribution is normal, i.i.d, n<25 or population variance unknown t=x-XsY/n=Y1-Y2s2n+s2n Compute t-statistic Compute degrees of freedom Look up 5% critical value If t-statistic exceeds this critical value reject Note: hypothesis of 2 means might not have a joined normal distribution/student t distribution, even if both have it. Must use s^2/n
50
Sampling Distribution of Predicted Y
**Sampling Distribution of Y**: distribution of Y over diff possible samples of size n * **Unbiased**: E(Y)=Y * **Efficiency**: Y has smallest variance compared to all other linear unbiased estimators * **Consistent** (Law of Large Numbers): YPY as n increases distribution Y becomes more tightly centered around Y interval of true population value, guaranteed when IID and Variance
51
4 Moments of Statistics
1. Mean (1st Moment): expected value of Y=E(Y)=Y, long-run average over repeated realizations 1. Variance (2nd Moment): E(Y-Y)=Y2 squared spread of distribution Sample Variance: s=1n-1i=1n(Yi-Y)2 estimates population variance, unbiased estimator if sample distribution is i.i.d and 4th moment< Standard Error: Variance=Y, Sample Standard Error: Var(Y)=s2yn=syn=SE(Y) 1. Skewness (3rd Moment): E[(Y-Y)]3Y3 asymmetry of a distribution, 0=symmetrical, >0 long right tail, <0 long left tail 1. Kurtosis (4th Moment): 3=normal distribution, >3 heavy tails=leptokurtic
52
Covariance v. Correlation Conditional Distribution v. COndaitional Mean/Variance
Covariance: cov(X,Y)=E[(X-x)(Z-Y)]=XY linear association b/w X & Y are units of X times Y Covariance of a variable with itself is its variance Cov(X,X)=E((X-x)(X-x))=E((X-x)2) Correlation: Cov(X,Y)XY=XYXy=rXY Conditional Distributions: distribution of Y given X Conditional Mean/variance: E(Y|X) mean/variance E((Y-Y|X)) of conditional distribution
53
Breusch-Pagan Test + What is this test and Why it works? + Construction
Breusch-Pagan Test: indicates whether variance of residuals depends on X-explanatory variables systematically Process: 1. Obtain residuals from estimated equation 1. Form auxiliary equation by rearranging regression in terms of squared residuals 1. Test overall significance of equation with chi-square test
54
White Test + What is this test and Why it works? + Construction
**White Test**: more general than Breusch-Pagan, detects both hetero+misspecification+accounts for non-linear forms of hetero by including square+interaction terms, general tests that detects heteroskedasticity & model misspecification, examines if squared residuals are systematically related to independent variables, squares of ind. Variables and cross-products of ind. Variables **Limitations**: as # of explanatory variables increase, terms of White test auxiliary regression grows rapidly, making estimation computationally intensive Process: 1. Form auxiliary equation by rearranging regression in terms of squared residuals 1. Explanatory Variables: ind. Variable from OG equation, squared terms, interaction terms 1. Test overall significance of equation with chi-square test
55
Under imperfect multicollinearity: a) the OLS estimator cannot be computed. b) two or more of the regressors are highly correlated. c) the OLS estimator is biased even in samples of n > 100. d) the error terms are highly, but not perfectly, correlated. In multiple regression, the R2 increases whenever a regressor is a. added unless the coefficient on the added regressor is exactly zero. b. added no matter the sign of the coefficients on the added regressor. c. added unless there is heterosckedasticity. d. added.
B, A
56
In the multiple regression model, you estimate the effect on Yi of a unit change in one of the Xi while holding all other regressors constant. This: a) makes little sense, because in the real world all other variables change. bcorresponds to the economic principle of mutatis mutandis. c) leaves the formula for the coefficient in the single explanatory d) variable case unaffected. Under the least squares assumptions for the multiple regression problem (zero conditional mean for the error term, all Xi and Yi being i.i.d., all Xi and ui having finite fourth moments, no perfect multicollinearity), the OLS estimators for the slopes and intercept a. have an exact normal distribution for n > 25. are BLUE. b. have a normal distribution in small samples as long as the errors c. are homoskedastic. d. are unbiased and consistent.
D, D
57
In the multiple regression model, the least squares estimator is derived by: a. minimizing the sum of squared prediction mistakes. b. setting the sum of squared errors equal to zero. c. minimizing the absolute difference of the residuals. d. forcing the smallest distance between the actual and fitted values. When you have an omitted variable problem, the assumption that E(ui Xi) = 0 is violated. This implies that a. the sum of the residuals is no longer zero. b. there is another estimator called weighted least squares, which is BLUE. c. the sum of the residuals times any of the explanatory variables is no longer zero. d. the OLS estimator is no longer consistent.
A, D
58
Omitted variable bias a. will always be present as long as the regression R2 < 1. b. is always there but is negligible in almost all economic examples. c. exists if the omitted variable is correlated with the included regressor but is not a determinant of the dependent variable. d. exists if the omitted variable is correlated with the included regressor and is a determinant of the dependent variable. e. exists if the omitted variable is uncorrelated with the included regressor and is a determinant of the dependent variable. The dummy variable trap is an example of a. imperfect multicollinearity b. something that is of theoretical interest only c. perfect multicollinearity d. something that does not happen a lot in regression models e. something that is very common in regression models
D, C
59
Homoskedastic Data + Benefit + Formula + Graph
**Homoskedastic** (Var(u|X) constant): constant variance, spread of data points constant across all values of x, variance of conditional distribution of u given X doesn’t depend on X, when plotting u on X there is no relation Ex. Everyone on TikTok gets same number of views no matter number of followers they have - **Benefit**: more stable, trustworthy statistical models, proves OLS has lowest variance among linear estimators - See doc for formula, never use this, it is wrong unless errors really are homoskedastic (that’s why don’t use Excel), is usually the default that must be overridden, Used Under Heteroskedastic Data: misleading statistical inference
60
Heteroskedasticity + Issue + SE Formula + Graph
**Heteroskedasticity** (Var(u|X) changing): changing variance, spread of data points changes across all values of x, variance of conditional distribution of u given X depends on X, when plotting u on X there is a relation (ex. More followers means more views on TikTok) - **Issue**: predictions become less reliable, biased conclusion - see doc for SE formula gives same result as OLS, when n is large both formulas are equivalent, variance estimates convert to expected values, mores stable
61
How to code robust standard error in R in 2 ways & Strata?
**Basic Method**: regession <- lm(testscr ~ str, data = caschool) test <- coeftest(regression1, vcov = vcovHC(regression1, type = “HC1”)) print(test1) **Alternative Version** library(estimatr) regression <- lm_robust(testscr ~ str, data = caschool, se_type = “HC1”) summary(regression) **Strata** Regress testscr str, robust Regression with robust standard errors Number of obs=420
62
Cost = 7,311.17 + 3,985.20 × Reputation – 0.20 × Size+ 8,406.79 × Dpriv – 416.38 × Dlibart – 2,376.51 × Dreligion R2 = 0.72, SER = 3,773.35 a) Interpret the results. Do the coefficients have the expected sign? b) What is the forecasted cost for a liberal arts college, which has no religious affiliation, a size of 1,500 students and a reputation level of 4.5? (All liberal arts colleges are private.) c) To save money, you are willing to switch from a private university to a public university, which has a ranking of 0.5 less and 10,000 more students. What is the effect on your cost? Is it substantial? d) Eliminating the Size and Dlibart variables from your regression, the estimation regression becomes Predicted cost = 5,450.35 + 3,538.84 × Reputation + 10,935.70 × Dpriv – 2,783.31 × Dreligion; R^2= 0.72, SER = 3,792.68 Why do you think that the effect of attending a private institution has increased now? e) What can you say about causation in the above relationship? Is it possible that Cost affects Reputation rather than the other way around?
(a) An increase in reputation by one category, increases the cost by roughly $3,985. The larger the size of the college/university, the lower the cost. An increase of 10,000 students results in a $2,000 lower cost. Private schools charge roughly $8,406 more than public schools. A school with a religious affiliation is approximately $2,376 cheaper, presumably due to subsidies, and a liberal arts college also charges roughly $416 less. There are no observations close to the origin, so there is no direct interpretation of the intercept. Other than perhaps the coefficient on liberal arts colleges, all coefficients have the expected sign. (b) $ 32,935. (c) Roughly $ 12,4.00. Since over the four years of education, this implies approximately $50,000, it is a substantial amount of money for the average household. (d) Private institutions are smaller, on average, and some of these are liberal arts colleges. Both of these variables had negative coefficients. (e) It is very possible that the university president and chief academic officer are influenced by the cost variable in answering the U.S. News and World Report survey. If this were the case, then the above equation suffers from simultaneous causality bias, a topic that will be covered in a later chapter. However, this poses a serious threat to the internal validity of the study.
63
New Assumptions LSA 4-5 for Homoskedasticity
Applied in fewer cases, if met math simplified, stronger results proved * u is Homoskedastic OLS estimator has Lowest Variance (Best) among linear estimators * u is distributed normally N(0,2) OLS estimator is consistent Homoskedastic Normal Regression Assumptions: t-stat has Student t distribution with d.f.=n-2 (if n<50 or else is normally distributed) and 0 and 1 are normally distributed for all n
64
Gauss-Markov Theorem BLUE Assumptions
LSA1-4 B1 has smallest variance among all linear estimators, OLS is BLUE - best(lowest variance) linear unbiased estimator, specifically linear, not for all estimators, no normal assumption
65
BUE Assumptions
LSA1-5 BUE means best of all estimators, not just linear BUE - best unbiased (+consistent) estimator for all types of regressions if errors are homoskedastic and normally distributed
66
3. OLS Downsides
1. Gauss-Markov Theorem isn’t compelling: condition of homoskedasticity often doesn’t hold/rare, result only for linear estimators only a small subset of estimators, not plausible in applications 1. More sensitive to outliers than other estimators; median is preferred over mean when large outliers exist as it has smaller variance, other estimators more efficient (LAD - least absolute deviations) 1. Most Econometric Applications no good reason to assume u is homoskedastic and normal, most datasets have n>50 which can rely on CLT to perform hypothesis, CI
67
Simple linear model - Individual Hypothesis Testing
r robust standard error does hypothesis testing/t-statistic
68
Joint Hypothesis Testing in Mutliple regression
Evaluates whether multiple coefficients in a regression model are simultaneously equal to specific values/0=not causal, determines if a group of variables has a combined significant impact on the dependent variable **F-Test**: compares fitness of unrestricted model (without constraints) to restricted model (constraints are imposed), how much worse the model fits (variance of errors) when restriction is imposed v. when we don’t, corrects for correlation q restrictions: number of restrictions being tested (ex.Cobb-Douglas Constant Returns to Scale q=1 for +=1) Residual Sum of Squares RSSr: for restricted model Residual Sum of Squares RSSur: for unrestricted model Sample Size n Number of estimated parameters in the unrestricted model k
69
F-Test Process With Large Sample + High/Low Value
* **Large-Sample Distribution of F-stat**: is the distribution of average of 2 independently distributed squared standard normal random variables * **Chi-Squared Distribution Xq/2**: q degrees of freedom, distribution of the sum of q independent squared standard normal random variables * F distributed as Xq2/q * P-Value from F-Stat: tail probability of Xq2q distribution beyond f-stat actually computed, if p<0.05 reject, smaller the better * Null Hypothesis Test of F-Test: groups are similar * High F-Value: if restricted model significantly worsens the model’s fit, reject null hypothesis and indicate at least one of the tested coefficients is significant, we are successful in supporting alternative hypothesis. Variance between groups is much larger compared to the variance within groups. This suggests that the groups are significantly different, providing evidence against the null hypothesis. When t1 and 2 are large * **Low F-Value**: restrictions don’t significantly increase RSS, we fail to reject null, we fail to support alternative hypothesis. between-group variance is similar to the within-group variance, so there's no strong evidence that the group means are different. We fail to reject the null.
70
R Coding for Individual and Joint Hypothesis Testing
Regression <- lm_robust(testscr ~ str + el_pct, data=caschool, se = “HC1”) summary(Regression) reg testscr str pctel, robust; Regression with robust standard errors Regression <- lm(testscr ~ str + expn_stu + el_pct, data=caschool) coeftest(Regression, c(“str=0”, “expn_stu=0”), white.adjust = “hc1”) Reg testscr str expn_stu pctel, r; Regression with robust standard errors
71
Cobb-Douglass Production function Y=AKLInY=InA+InK+InL+u, test for constant returns to scale +=1, (q=1), H0:+=1 against alternative hypothesis H1:+1.
1. Estimate unrestricted model: without imposing 1. +=1 and obtain RSSur residual sum of squares 1. Estimate restricted model: forcing +=1 and obtain RSSr residual sum of squares 1. Compute F-statistic F=(RSSr-RSSur)/qRSSur/(n-k) 1. Compare computed F-stat to critical value from F-distribution or use the p-value to approach to determine significance
72
Testing Single Restrictions on Multiple Coefficients Yi=0+1X1i+1X2i+ui, null H0:1=2, alternative H1:12, not a joint hypothesis with multiple restrictions, this is a single restriction on multiple coefficients
Rearrange (“transform”) the regression: so that restriction becomes restriction on single coefficient in an equivalent regression Perform test directly: some software lets test restriction directly
73
Model Specification + Process
**Model Specification**: How to Decide What Variables to Include in a Regression 1. Identify variable of interest 1. Think of omitted causal effects that could result in omitted variable bias 1. Include control variables to represent omitted causal effects or variables that are correlated with them: effective if conditional mean independence assumption plausibly holds - u is uncorrelated with Y once control variables are included (no systematic bias) 1. Specify a range of plausible alternative models, including additional candidate variables 1. Estimate base model and plausible alternative specifications: sensitivity/robustness checks - Does control variable(s) change coefficient of interest - Statistical significant control variables - Not just maximizing R2 as the real objective is to get an unbiased estimator of variable interest, how well regressor explains variation in Y doesn’t mean you have: Eliminated omitted variable bias, Unbiased estimator of a causal effect, Statistically significant variables using hypotheses tests
74
5 Multicollinearity Consequences
1. Unbiased estimates remain: severe multicollinearity has no impact on biasesness, will be unbiased as long as BLUE assumptions hold 1. Increases variances and standard errors=lower efficiency: multicollinearity makes it less reliable and more difficult to precisely separate individual effects of correlated explanatory variables and variance of estimated coefficients can be large 1. Decreases computed t-scores + Increases Confidence Intervals: multicollinearity decreases t-scores tk=(k-H0)SE(k), standard error increases causes t-score to decrease and increases confidence intervals k1.96 SE(k) 1. Estimates sensitive to changes in specification: with multicollinearity, adding/dropping variables and/or observations can cause substantial fluctuations in estimates as OLS struggles to isolate the independent effects of correlated variables, causing unstable coefficient estimates 1. Overall fit of equation & estimation of coefficients of non multicollinear variables will be largely unaffected: with multicollinearity, adjusted R2 won’t decrease significantly, if at all, high F-test rejects null hypothesis even when individual t-tests show significance
75
How to Detect Multicollinearity
* **Key Indicator**: Combination of high adjusted R2 and no statistically significant individual variables, Perfect Multicollinearity in R: Dropped variables (N/A) * **Simple Correlation Coefficients**: measures strength+direction of linear relation b/w independent variables (-1/+1), useful when organized in a table, **Limitation**: groups of variables can cause multicollinearity even if no single simple correlation coefficient is particularly high due to combined effects ← VIF better * **Variance Inflation Factors (VIF):** identifies how much variance of a regression coefficient is inflated due to multicollinearity, measuring the extent to which a given explanatory variable can be explained by all other explanatory variables in an equation, a regression/VIF for each K independent variable is calculated
76
How to Calculate VIF & Tolerance
**Tolerance** (reciprocal): 1VIF below 0.1 severe multicollinearity 1. Run OLS regression tha has X as function of all other explanatory variables in equation: # coefficients=#VIF functions 1. Calculate Variance Inflation Factor for Coefficient of interest 1. Interpretation: higher VIF means higher multicollinearity See doc for equations
77
When is Multicollinearity a concern? Should it be a concern?
Doesn’t exist in every equation, depends on relation b/w independent variables Important Question: how severe it is, not just how much which depends on dataset and sample election Test Presence of Multicollinearity: No universally accepted statistical tests that definitively confirm/rule out multicollinearity * **Concern**: Higher VIF = strong multicollinearity above 5 or 10 warrant investigation: variable is highly predictable by other independent variables, potential distortion of regression Factors: Higher R2 in auxiliary regression→ High VIF can indicate strong multicollinearity, reduce reliability of estimates
78
3 Multicollinearity Remedies
1. **Do nothing**: unbias and may still be significant to meet theoretical expectations, removing variables that belong to model without justification + rerunning regression multiple times can cause specification bias that fits specific sample rather than being generalizable/functionally correct 1. **Drop Redundant Variable(s)**: variable(s) that measure same concept/identical, corrects specification error based on strong theoretical/literature justification, not just statistical reasoning or arbitrarily (misspecification). With severe multicollinearity dropping highly correlated variables yields similar results 1. **Increase Sample Size (impractical):** reduces variance of coefficients diminishing impact of multicollinearity, more variation in X-explanatory variables allowing for easier distinguish in individual effects. Multicollinearity is more problematic in small samples as correlations b/w independent variables become exaggerated
79
How to Detect/Identify Heteroskedasticity
1. Obvious Specification Errors: incorrect functional form, omitted variables mimic heteroskedasticity 1. Early Warning Signs: high residual variance in scatterplots 1. Graphs of Residuals: pattern, fan out, contract, non-constant variance 1. Testing: Breusch-Pagan, White
80
Consequences of Heteroskedacity
1. Inefficient=NOT BLUE: fails to produce minimum variance estimates 1. Biased Standard Errors: makes unreliable incorrect statistical inferences, hypothesis testing & CI
81
Remedies of Heteroskedasticity
1. If detected check for model specification errors (model is incorrectly formulated theoretically): omitted variables, incorrect functional form. If none, hetero is likely pure in nature 1. Corrected Standard Errors (“HC1”): adjust for hetero, reliable accurate inferences, remains biased in small samples, more accurate than uncorrected large samples, can be used in hypothesis tests, t-tests, f-tests for valid stat inferences even when hetero present. Why Use it for homo and hetero: if data homo, when using robust SE, you end with formula for homo SE. Var()=2(Xi-X)2Var()=(Xi-X)2ui2((Xi-X)2)2 1. Redefining Variables/Rethink Model Based on Theoretical Fremawrk: (ex. Linear to log log specification to stabilize variance) Issue: changes functional form, alters model interpretation, misrepresent relationships, Aggregate Model: dependent variable makes different error term variances more likely Log/Per Capita: large and small explanatory variable values have equal weights as dependent variable does not vary over wide range of sizes
82
BP & White Test in Doc
See doc
83
Mispecification + Solution
* Mis-specified: functional form is wrong causing biased estimate on average * Solution: estimate regression function that is nonlinear in X, OLS can’t be used
84
Nonlinear Models + Expected Value + Assumptions
**Nonlinear Relation** b/w X→Y: effect on Y of a change in X depends on the value of X, the marginal effect (1st derivative) of X is not constant **Expected** change in Y with a change in X1 holding all other Xi constant Y=f(X1+X1,X2,...,Xk)-f(X1,X2,...,Xk) Predicted Values Y=f(X1+X1,X2,...,Xk)-f(X1,X2,...,Xk) **Assumptions LSA1-4:** 1. E(ui|X1i,...Xki)=0 so f is the conditional expectation of Y given the X’s 1. (X1i,...,Xki,Yi) are i.i.d 1. Big outliers are rare, finite fourth moments 1. No perfect multicollinearity
85
Polynomial Regression
Pop. regression f(x) approximated by quadratic, cubic or higher-degree polynomial Yi=0Xi+2Xi2+...+rXir+ui **Interpretation**: 1. Find dYdX1 gives marginal effect of X→Y following equation. Don’t extrapolate outside range of data 1. Plot predicted values Estimation, hypothesis testing proceeds same way as multiple regression model using OLS, LSA1-4, **Code:** Caschool <- caschool %>% + mutate(avginc2 = avginc*avginc) Reg_quad <- lm(testscr ~ avginc + avginc2, data = caschool) coeftest(reg_quad, vcov = vcovHC(reg_quad, type = “HC1”)) caschool <- caschool %>% + mutate(avginc3 = avginc*avginc2) reg_cubic <- lm(testscr ~ avginc + avginc2 + avginc3, data = caschool) coeftest(reg_cubic, vcov = vcovHC(reg_cubic, type = “HC1”))
86
Logarithmic Regression + Why Use it + Cases + Interpretation
**Why Use It** 1. Logarithms Convert Non-Linear Relationships into LInear Ones: stock prices often exhibit exponential growth over time, log makes it linear & easier to estimate in regression model 1. Log Differences Approximate Percentage Changes: analysts care about returns, not raw price levels 1. Reduces Skewness & Make Data more Normally Distributed: stocks often right-skewed, logs compress large values, data closer to normal=key assumption 1. Handles Volatility More Effectively: stabilize volatility 1. Interpretation BEcomes Easier: coefficient can represent elasticity for log(price or GDP) 1. Log Difference Avoid Unit Dependence: unit free, makes models comparable across different financial markets **Cases** - **Linear Log**: 1% increase in X is associated with a 0.11=1100 change in Y. Units of error and SER same. - **Log Linear:** A change in X by one unit is associated with a 1001% change in Y. Units of error and SER are fractional deviations - **log log:** A 1% change in X is associated with a 1% change in Y
87
Logarithmic Code + Mathematical interpretation of coefficient B1
See doc
88
Interaction b/w Independent Variables
**Interaction Terms**: represents product of 2+ variables in a model (Gender x Degree), simple way to describe and study combinations of explanatory variables impact our model, you don’t have to do multiple regressions when 2+ groups make continuous variables differ Cases where YX1 might depend on X2: effect of education on income is not the same for men and women Income=0+1Gender+2Education+3(GenderEducation)+u 3>0 education increases income more for men than for women 3<0 education increases income more for women than for men Interaction Term: captures how the effect of education on income differs by gender
89
Interaction Variable Cases + Code
- **Binary Variables:** - **Continous and Binary:** - **Continous Variables:** See doc
90
Linear Probability Model (OLS) + Advantages/Disadvantages
**Linear Probability Model (OLS)**: linear regression model where Y is binary (0 or 1) predicted value (Y) is a probability that Y=1 and 1 is change in that predicted probability for a unit change in X (linear function of x) Y=0+1Xi+ui, E(Y|X)=1P(Y=1|X)+0P(Y=0|X)=P(Y=1|X)=0+i under LSA assumption 1=P(Y=1|X=x+x)-P(Y=1|X=x)x **Advantages**: simple to estimate and to interpret, inference is the same as for multiple regression (need hetero-robust SE) Real-Life Examples: medical diagnostic, denials of claims **Disadvantages**: says change in predicted for a given change in X is same for all value X (nonlinear relations exist+interactions), can also be less than 0 or greater than 1 Solution: nonlinear probability models
91
Probit Regression + What Needs does it fulfill?
**Needed**: 1. P(Y=1|X) to be increasing in X for 1>0 1. 0P(Y=1|X)1 for all X **Probit Regression**: models the probability that Y=1 using cumulative standard normal distribution function, requires z-table P(Y=1|X)=(0+1X)z-value is 0+1X 1 change in z-value for a unit change in X Cumulative Normal Probability Distribution Used: for its S-shape fulfills all needed, easy to use, straightforward interpretation **Log Likelihood in Probit Regression**: tells how well model’s predicted probabilities match the actual outcomes, more positive (less negative) indicates better model, will always be negative
92
Logit Regression
**Logit Regression:** models probability of Y=1 given X as the cumulative standard logistic distribution function, evaluated as z=0+1X - P(Y=1|X)=F(0+1X)=11+e_-(0+1X) - **Why Use**: computationally faster & easier - **Interpretation**: hold constant all other factors, except for variable of interest, compare difference between two values of variable to get marginal effect.
93
Maximum Likelihood Estimator (MLE)
Maximum Likelihood Estimator (MLE): of coefficients in probit model, value (0,1)that maximizes the likelihood function, value (0,1)best describes full distribution of data * Likelihood Function: conditional density of Y1,,,Yn given X1,...,Xn treated as a function of the unknown parameters 0 and 1 * Large Samples: MLE is consistent, Normally distribution, efficient (smallest variance of all consistent estimators) Probit and logit are estimated via maximum likelihood * Coefficients are normally distributed for large n * Large-n hypothesis testing, conf. intervals is as usual
94
Tutorial 8
See doc
95
List All LSA# & Importance of Each
Chatgpt