ECON 326 FINAL Flashcards

Question

5 Multiple Regression Model Measures of Fit

Answer 1

1. Actual predicted + residual 1. SER standard deviation of predicted residuals with d.f. correction/avg spread 1n-k-1i=1nui2, k=# of variables 1. RMSE standard deviation of predicted residuals without d.f. correction/avg spread 1ni=1nui2, 1. R2 fraction of variance Y explained by variance X. Issue: adding more regressors even if slightly correlated with Y will reduce SSR improving fit ESSTSS=1-SSRTSS,ESS=i=1n(Y-Y)2, SSR=i=1n(Yi-Y)2, TSS=i=1n(Yi-Y)2 1. R2 Adjusted with a degrees of freedom correction, penalizes inclusion of another regressor, doesn’t necessarily increase when adding another regressor, always smaller than unadjusted R2 1-(n-1n-k-1)SSRTSS

Answer 2

Partial derivatives X2, b=changeY/changeX1 X1, b=changeY/changeX2

Answer 3

1. Randomized controlled experiment where treatment is randomly assigned: not feasible 1. Cross Tabulation Approach: will run out of data, control for OVBs, compare cases with differing independent but identical confounding determinants, however 1. Add regressor/use regression doesn’t omit confounder: multiple regression

Answer 4

Direction: relation b/w Z→X & B/w Z→Y, does it amplify or dampen the relation b/w X→Y Downward Bias: makes relation seem more negative, increases X and decreases Y, decreases X and increases Y Upward Bias: makes relation seem more positive, as increases X and increases Y, decreases X and decreases Y Over & Underestimating: farther and closer to 1=0

Answer 5

1. Downward Bias: more negative than it actually is. Overestimates the power of media usage on academic performance. May falsely conclude that media usage has a greater effect on grades than reality as attention span plays a large role 1. Upward bias: more positive than it actually is. Overestimates the ability for a good setup to increase rank as experience plays a big factor too. 1. Direction: positive X→Y times positive Z→X, overestimate

Answer 6

Omitted Variable Bias: biased and not consistent even if n is large E(b^1)=/b1 Arises when one of the variables (Z) in u are correlated with x & satisfies both 1. Is a determinant of Y=part of u 1. Is correlated with X=Corr(Z,X)=Xu0

Answer 7

H0:1=0 H0:10 Hypothesis Testing: t=1-1SE(1)=-2.28-00.52=-4.38>2.58=t reject null P-Test: Pr(t>t)=Pr(-4.23>0.05)=2%<5%= reject null Confidence Interval: 11.96SE(1)=-2.28-1.96(0.52),-2.28+1.96(0.52) (-3.3,-1.26) since confidence interval doesn’t include 0, null is rejected

Answer 8

1. State+Provide Population Object of Interest & Its Estimator: 1 assuming 3 LSA hold (EIIDO) 1. Derive Sampling Distribution (Normal of n large): 1~N(1,v2n(x2)2) Mean: E(1)=1, Variance:Var(1)=12=1nVar((Xi-x)ui)(Var(Xi))2=1nVar(1ni=1n(Xi-x)ui(x2)2) 1. Standard Error of Estimator = Square root of Estimated Variance SE(1)=12= 1. Construct T-Statistic & Confidence Interval for Hypothesis Testing t=estimator-hypothesized valuestandard error of estimator=Y-ysY/n=1-1SE(1)=1-112 1. Significance/Hypothesis Reject: reject if t>1.96 or p>significance

Answer 9

* Sampling Uncertainty: different samples yield different values of 1 & 2 which can be quantified by hypotheses or confidence interval requiring finding the sampling distribution. * Distribution of the OLS estimator: sampling distribution of 1 in large samples is normally distributed, derive mean & variance to compute significance t=b1SE(b1) * 1~N(1,v2n(x2)2), Z=1-E(1)var(1)~N(0,1)

Answer 10

OLS BLUE: best linear unbiased efficient estimator

Answer 11

* Place different requirements on data, use same regression toolkit * Causal Inference: learning causal effect on Y of a change in X * Prediction: predicting value of Y given X for an observation not in the data se

Answer 12

Regression Error/Population Error Term (ui): consists of omitted factors (OVB other than X that influence Y, difference b/w regression line and true data point (& error measurement of Y) * Unexplained Variations/Residual i=1n(Yi-Y)=i=1nei ei=Yi-(b0+b1Xi)=Actual Data Value Y -Predicted Value Y * Explained Variations i=1n(Y-Y) * Total Variation i=1n(Yi-Y)

Answer 13

Confidence Interval: CI[X-2.58294.671744,X+2.58294.671744]=[416.29,452.69] 99% of cases the true mean lies within this interval. True population weekly earnings average would be in this interval. 90% Confidence Interval would be smaller CI[X2.58294.671744>X1.64294.671744]

Answer 14

Given sample standard deviation use t-statistic: H0=0,H1>0, t=Y<45-Y>45s<452507+s>4521237=4.62>t=2 is significant, null hypothesis rejected

Answer 15

D) in denominators SE=f(x) of standard deviation

Answer 16

C) x-x/SE(x)=-SE()

Answer 17

1. Hidden variable causes A & be to move together 1. Coincidence 1. B causes A, reverse causliaty 1. Strong correlation but causal effect weak, A causes B and B causes A

Answer 18

Interval that contains the true value 95% of the time when repeatedly sampled.

Answer 19

* Type 1 Error Pr==significance: false positive, reject Null when it is true * Type 2 Error Pr==1-: false negative, accept Null when it is false * P-Value/Marginal Significance Level=Pr(t>tact)=Pr(x-X/n>xactual-X/n): probability of getting statistic higher than realized sample, contains more info than test reject, reject if p<

Answer 20

Student t-distribution: if distribution is normal, i.i.d, n<25 or population variance unknown t=x-XsY/n=Y1-Y2s2n+s2n Compute t-statistic Compute degrees of freedom Look up 5% critical value If t-statistic exceeds this critical value reject Note: hypothesis of 2 means might not have a joined normal distribution/student t distribution, even if both have it. Must use s^2/n

Answer 21

**Sampling Distribution of Y**: distribution of Y over diff possible samples of size n * **Unbiased**: E(Y)=Y * **Efficiency**: Y has smallest variance compared to all other linear unbiased estimators * **Consistent** (Law of Large Numbers): YPY as n increases distribution Y becomes more tightly centered around Y interval of true population value, guaranteed when IID and Variance

Answer 22

1. Mean (1st Moment): expected value of Y=E(Y)=Y, long-run average over repeated realizations 1. Variance (2nd Moment): E(Y-Y)=Y2 squared spread of distribution Sample Variance: s=1n-1i=1n(Yi-Y)2 estimates population variance, unbiased estimator if sample distribution is i.i.d and 4th moment< Standard Error: Variance=Y, Sample Standard Error: Var(Y)=s2yn=syn=SE(Y) 1. Skewness (3rd Moment): E[(Y-Y)]3Y3 asymmetry of a distribution, 0=symmetrical, >0 long right tail, <0 long left tail 1. Kurtosis (4th Moment): 3=normal distribution, >3 heavy tails=leptokurtic

Answer 23

Covariance: cov(X,Y)=E[(X-x)(Z-Y)]=XY linear association b/w X & Y are units of X times Y Covariance of a variable with itself is its variance Cov(X,X)=E((X-x)(X-x))=E((X-x)2) Correlation: Cov(X,Y)XY=XYXy=rXY Conditional Distributions: distribution of Y given X Conditional Mean/variance: E(Y|X) mean/variance E((Y-Y|X)) of conditional distribution

Answer 24

Breusch-Pagan Test: indicates whether variance of residuals depends on X-explanatory variables systematically Process: 1. Obtain residuals from estimated equation 1. Form auxiliary equation by rearranging regression in terms of squared residuals 1. Test overall significance of equation with chi-square test

Answer 25

**White Test**: more general than Breusch-Pagan, detects both hetero+misspecification+accounts for non-linear forms of hetero by including square+interaction terms, general tests that detects heteroskedasticity & model misspecification, examines if squared residuals are systematically related to independent variables, squares of ind. Variables and cross-products of ind. Variables **Limitations**: as # of explanatory variables increase, terms of White test auxiliary regression grows rapidly, making estimation computationally intensive Process: 1. Form auxiliary equation by rearranging regression in terms of squared residuals 1. Explanatory Variables: ind. Variable from OG equation, squared terms, interaction terms 1. Test overall significance of equation with chi-square test

Answer 26

**Homoskedastic** (Var(u|X) constant): constant variance, spread of data points constant across all values of x, variance of conditional distribution of u given X doesn’t depend on X, when plotting u on X there is no relation Ex. Everyone on TikTok gets same number of views no matter number of followers they have - **Benefit**: more stable, trustworthy statistical models, proves OLS has lowest variance among linear estimators - See doc for formula, never use this, it is wrong unless errors really are homoskedastic (that’s why don’t use Excel), is usually the default that must be overridden, Used Under Heteroskedastic Data: misleading statistical inference

Answer 27

**Heteroskedasticity** (Var(u|X) changing): changing variance, spread of data points changes across all values of x, variance of conditional distribution of u given X depends on X, when plotting u on X there is a relation (ex. More followers means more views on TikTok) - **Issue**: predictions become less reliable, biased conclusion - see doc for SE formula gives same result as OLS, when n is large both formulas are equivalent, variance estimates convert to expected values, mores stable

Answer 28

**Basic Method**: regession <- lm(testscr ~ str, data = caschool) test <- coeftest(regression1, vcov = vcovHC(regression1, type = “HC1”)) print(test1) **Alternative Version** library(estimatr) regression <- lm_robust(testscr ~ str, data = caschool, se_type = “HC1”) summary(regression) **Strata** Regress testscr str, robust Regression with robust standard errors Number of obs=420

Answer 29

(a) An increase in reputation by one category, increases the cost by roughly $3,985. The larger the size of the college/university, the lower the cost. An increase of 10,000 students results in a $2,000 lower cost. Private schools charge roughly $8,406 more than public schools. A school with a religious affiliation is approximately $2,376 cheaper, presumably due to subsidies, and a liberal arts college also charges roughly $416 less. There are no observations close to the origin, so there is no direct interpretation of the intercept. Other than perhaps the coefficient on liberal arts colleges, all coefficients have the expected sign. (b) $ 32,935. (c) Roughly $ 12,4.00. Since over the four years of education, this implies approximately $50,000, it is a substantial amount of money for the average household. (d) Private institutions are smaller, on average, and some of these are liberal arts colleges. Both of these variables had negative coefficients. (e) It is very possible that the university president and chief academic officer are influenced by the cost variable in answering the U.S. News and World Report survey. If this were the case, then the above equation suffers from simultaneous causality bias, a topic that will be covered in a later chapter. However, this poses a serious threat to the internal validity of the study.

Answer 30

Applied in fewer cases, if met math simplified, stronger results proved * u is Homoskedastic OLS estimator has Lowest Variance (Best) among linear estimators * u is distributed normally N(0,2) OLS estimator is consistent Homoskedastic Normal Regression Assumptions: t-stat has Student t distribution with d.f.=n-2 (if n<50 or else is normally distributed) and 0 and 1 are normally distributed for all n

Answer 31

LSA1-4 B1 has smallest variance among all linear estimators, OLS is BLUE - best(lowest variance) linear unbiased estimator, specifically linear, not for all estimators, no normal assumption

Answer 32

LSA1-5 BUE means best of all estimators, not just linear BUE - best unbiased (+consistent) estimator for all types of regressions if errors are homoskedastic and normally distributed

Answer 33

1. Gauss-Markov Theorem isn’t compelling: condition of homoskedasticity often doesn’t hold/rare, result only for linear estimators only a small subset of estimators, not plausible in applications 1. More sensitive to outliers than other estimators; median is preferred over mean when large outliers exist as it has smaller variance, other estimators more efficient (LAD - least absolute deviations) 1. Most Econometric Applications no good reason to assume u is homoskedastic and normal, most datasets have n>50 which can rely on CLT to perform hypothesis, CI

Answer 34

r robust standard error does hypothesis testing/t-statistic

Answer 35

Evaluates whether multiple coefficients in a regression model are simultaneously equal to specific values/0=not causal, determines if a group of variables has a combined significant impact on the dependent variable **F-Test**: compares fitness of unrestricted model (without constraints) to restricted model (constraints are imposed), how much worse the model fits (variance of errors) when restriction is imposed v. when we don’t, corrects for correlation q restrictions: number of restrictions being tested (ex.Cobb-Douglas Constant Returns to Scale q=1 for +=1) Residual Sum of Squares RSSr: for restricted model Residual Sum of Squares RSSur: for unrestricted model Sample Size n Number of estimated parameters in the unrestricted model k

Answer 36

* **Large-Sample Distribution of F-stat**: is the distribution of average of 2 independently distributed squared standard normal random variables * **Chi-Squared Distribution Xq/2**: q degrees of freedom, distribution of the sum of q independent squared standard normal random variables * F distributed as Xq2/q * P-Value from F-Stat: tail probability of Xq2q distribution beyond f-stat actually computed, if p<0.05 reject, smaller the better * Null Hypothesis Test of F-Test: groups are similar * High F-Value: if restricted model significantly worsens the model’s fit, reject null hypothesis and indicate at least one of the tested coefficients is significant, we are successful in supporting alternative hypothesis. Variance between groups is much larger compared to the variance within groups. This suggests that the groups are significantly different, providing evidence against the null hypothesis. When t1 and 2 are large * **Low F-Value**: restrictions don’t significantly increase RSS, we fail to reject null, we fail to support alternative hypothesis. between-group variance is similar to the within-group variance, so there's no strong evidence that the group means are different. We fail to reject the null.

Answer 37

Regression <- lm_robust(testscr ~ str + el_pct, data=caschool, se = “HC1”) summary(Regression) reg testscr str pctel, robust; Regression with robust standard errors Regression <- lm(testscr ~ str + expn_stu + el_pct, data=caschool) coeftest(Regression, c(“str=0”, “expn_stu=0”), white.adjust = “hc1”) Reg testscr str expn_stu pctel, r; Regression with robust standard errors

Answer 38

1. Estimate unrestricted model: without imposing 1. +=1 and obtain RSSur residual sum of squares 1. Estimate restricted model: forcing +=1 and obtain RSSr residual sum of squares 1. Compute F-statistic F=(RSSr-RSSur)/qRSSur/(n-k) 1. Compare computed F-stat to critical value from F-distribution or use the p-value to approach to determine significance

Answer 39

Rearrange (“transform”) the regression: so that restriction becomes restriction on single coefficient in an equivalent regression Perform test directly: some software lets test restriction directly

Answer 40

**Model Specification**: How to Decide What Variables to Include in a Regression 1. Identify variable of interest 1. Think of omitted causal effects that could result in omitted variable bias 1. Include control variables to represent omitted causal effects or variables that are correlated with them: effective if conditional mean independence assumption plausibly holds - u is uncorrelated with Y once control variables are included (no systematic bias) 1. Specify a range of plausible alternative models, including additional candidate variables 1. Estimate base model and plausible alternative specifications: sensitivity/robustness checks - Does control variable(s) change coefficient of interest - Statistical significant control variables - Not just maximizing R2 as the real objective is to get an unbiased estimator of variable interest, how well regressor explains variation in Y doesn’t mean you have: Eliminated omitted variable bias, Unbiased estimator of a causal effect, Statistically significant variables using hypotheses tests

Answer 41

1. Unbiased estimates remain: severe multicollinearity has no impact on biasesness, will be unbiased as long as BLUE assumptions hold 1. Increases variances and standard errors=lower efficiency: multicollinearity makes it less reliable and more difficult to precisely separate individual effects of correlated explanatory variables and variance of estimated coefficients can be large 1. Decreases computed t-scores + Increases Confidence Intervals: multicollinearity decreases t-scores tk=(k-H0)SE(k), standard error increases causes t-score to decrease and increases confidence intervals k1.96 SE(k) 1. Estimates sensitive to changes in specification: with multicollinearity, adding/dropping variables and/or observations can cause substantial fluctuations in estimates as OLS struggles to isolate the independent effects of correlated variables, causing unstable coefficient estimates 1. Overall fit of equation & estimation of coefficients of non multicollinear variables will be largely unaffected: with multicollinearity, adjusted R2 won’t decrease significantly, if at all, high F-test rejects null hypothesis even when individual t-tests show significance

Answer 42

* **Key Indicator**: Combination of high adjusted R2 and no statistically significant individual variables, Perfect Multicollinearity in R: Dropped variables (N/A) * **Simple Correlation Coefficients**: measures strength+direction of linear relation b/w independent variables (-1/+1), useful when organized in a table, **Limitation**: groups of variables can cause multicollinearity even if no single simple correlation coefficient is particularly high due to combined effects ← VIF better * **Variance Inflation Factors (VIF):** identifies how much variance of a regression coefficient is inflated due to multicollinearity, measuring the extent to which a given explanatory variable can be explained by all other explanatory variables in an equation, a regression/VIF for each K independent variable is calculated

Answer 43

**Tolerance** (reciprocal): 1VIF below 0.1 severe multicollinearity 1. Run OLS regression tha has X as function of all other explanatory variables in equation: # coefficients=#VIF functions 1. Calculate Variance Inflation Factor for Coefficient of interest 1. Interpretation: higher VIF means higher multicollinearity See doc for equations

Answer 44

Doesn’t exist in every equation, depends on relation b/w independent variables Important Question: how severe it is, not just how much which depends on dataset and sample election Test Presence of Multicollinearity: No universally accepted statistical tests that definitively confirm/rule out multicollinearity * **Concern**: Higher VIF = strong multicollinearity above 5 or 10 warrant investigation: variable is highly predictable by other independent variables, potential distortion of regression Factors: Higher R2 in auxiliary regression→ High VIF can indicate strong multicollinearity, reduce reliability of estimates

Answer 45

1. **Do nothing**: unbias and may still be significant to meet theoretical expectations, removing variables that belong to model without justification + rerunning regression multiple times can cause specification bias that fits specific sample rather than being generalizable/functionally correct 1. **Drop Redundant Variable(s)**: variable(s) that measure same concept/identical, corrects specification error based on strong theoretical/literature justification, not just statistical reasoning or arbitrarily (misspecification). With severe multicollinearity dropping highly correlated variables yields similar results 1. **Increase Sample Size (impractical):** reduces variance of coefficients diminishing impact of multicollinearity, more variation in X-explanatory variables allowing for easier distinguish in individual effects. Multicollinearity is more problematic in small samples as correlations b/w independent variables become exaggerated

Answer 46

1. Obvious Specification Errors: incorrect functional form, omitted variables mimic heteroskedasticity 1. Early Warning Signs: high residual variance in scatterplots 1. Graphs of Residuals: pattern, fan out, contract, non-constant variance 1. Testing: Breusch-Pagan, White

Answer 47

1. Inefficient=NOT BLUE: fails to produce minimum variance estimates 1. Biased Standard Errors: makes unreliable incorrect statistical inferences, hypothesis testing & CI

Answer 48

1. If detected check for model specification errors (model is incorrectly formulated theoretically): omitted variables, incorrect functional form. If none, hetero is likely pure in nature 1. Corrected Standard Errors (“HC1”): adjust for hetero, reliable accurate inferences, remains biased in small samples, more accurate than uncorrected large samples, can be used in hypothesis tests, t-tests, f-tests for valid stat inferences even when hetero present. Why Use it for homo and hetero: if data homo, when using robust SE, you end with formula for homo SE. Var()=2(Xi-X)2Var()=(Xi-X)2ui2((Xi-X)2)2 1. Redefining Variables/Rethink Model Based on Theoretical Fremawrk: (ex. Linear to log log specification to stabilize variance) Issue: changes functional form, alters model interpretation, misrepresent relationships, Aggregate Model: dependent variable makes different error term variances more likely Log/Per Capita: large and small explanatory variable values have equal weights as dependent variable does not vary over wide range of sizes

Answer 49

* Mis-specified: functional form is wrong causing biased estimate on average * Solution: estimate regression function that is nonlinear in X, OLS can’t be used

Answer 50

**Nonlinear Relation** b/w X→Y: effect on Y of a change in X depends on the value of X, the marginal effect (1st derivative) of X is not constant **Expected** change in Y with a change in X1 holding all other Xi constant Y=f(X1+X1,X2,...,Xk)-f(X1,X2,...,Xk) Predicted Values Y=f(X1+X1,X2,...,Xk)-f(X1,X2,...,Xk) **Assumptions LSA1-4:** 1. E(ui|X1i,...Xki)=0 so f is the conditional expectation of Y given the X’s 1. (X1i,...,Xki,Yi) are i.i.d 1. Big outliers are rare, finite fourth moments 1. No perfect multicollinearity

Answer 51

Pop. regression f(x) approximated by quadratic, cubic or higher-degree polynomial Yi=0Xi+2Xi2+...+rXir+ui **Interpretation**: 1. Find dYdX1 gives marginal effect of X→Y following equation. Don’t extrapolate outside range of data 1. Plot predicted values Estimation, hypothesis testing proceeds same way as multiple regression model using OLS, LSA1-4, **Code:** Caschool <- caschool %>% + mutate(avginc2 = avginc*avginc) Reg_quad <- lm(testscr ~ avginc + avginc2, data = caschool) coeftest(reg_quad, vcov = vcovHC(reg_quad, type = “HC1”)) caschool <- caschool %>% + mutate(avginc3 = avginc*avginc2) reg_cubic <- lm(testscr ~ avginc + avginc2 + avginc3, data = caschool) coeftest(reg_cubic, vcov = vcovHC(reg_cubic, type = “HC1”))

Answer 52

**Why Use It** 1. Logarithms Convert Non-Linear Relationships into LInear Ones: stock prices often exhibit exponential growth over time, log makes it linear & easier to estimate in regression model 1. Log Differences Approximate Percentage Changes: analysts care about returns, not raw price levels 1. Reduces Skewness & Make Data more Normally Distributed: stocks often right-skewed, logs compress large values, data closer to normal=key assumption 1. Handles Volatility More Effectively: stabilize volatility 1. Interpretation BEcomes Easier: coefficient can represent elasticity for log(price or GDP) 1. Log Difference Avoid Unit Dependence: unit free, makes models comparable across different financial markets **Cases** - **Linear Log**: 1% increase in X is associated with a 0.11=1100 change in Y. Units of error and SER same. - **Log Linear:** A change in X by one unit is associated with a 1001% change in Y. Units of error and SER are fractional deviations - **log log:** A 1% change in X is associated with a 1% change in Y

Answer 53

**Interaction Terms**: represents product of 2+ variables in a model (Gender x Degree), simple way to describe and study combinations of explanatory variables impact our model, you don’t have to do multiple regressions when 2+ groups make continuous variables differ Cases where YX1 might depend on X2: effect of education on income is not the same for men and women Income=0+1Gender+2Education+3(GenderEducation)+u 3>0 education increases income more for men than for women 3<0 education increases income more for women than for men Interaction Term: captures how the effect of education on income differs by gender

Answer 54

- **Binary Variables:** - **Continous and Binary:** - **Continous Variables:** See doc

Answer 55

**Linear Probability Model (OLS)**: linear regression model where Y is binary (0 or 1) predicted value (Y) is a probability that Y=1 and 1 is change in that predicted probability for a unit change in X (linear function of x) Y=0+1Xi+ui, E(Y|X)=1P(Y=1|X)+0P(Y=0|X)=P(Y=1|X)=0+i under LSA assumption 1=P(Y=1|X=x+x)-P(Y=1|X=x)x **Advantages**: simple to estimate and to interpret, inference is the same as for multiple regression (need hetero-robust SE) Real-Life Examples: medical diagnostic, denials of claims **Disadvantages**: says change in predicted for a given change in X is same for all value X (nonlinear relations exist+interactions), can also be less than 0 or greater than 1 Solution: nonlinear probability models

Answer 56

**Needed**: 1. P(Y=1|X) to be increasing in X for 1>0 1. 0P(Y=1|X)1 for all X **Probit Regression**: models the probability that Y=1 using cumulative standard normal distribution function, requires z-table P(Y=1|X)=(0+1X)z-value is 0+1X 1 change in z-value for a unit change in X Cumulative Normal Probability Distribution Used: for its S-shape fulfills all needed, easy to use, straightforward interpretation **Log Likelihood in Probit Regression**: tells how well model’s predicted probabilities match the actual outcomes, more positive (less negative) indicates better model, will always be negative

Answer 57

**Logit Regression:** models probability of Y=1 given X as the cumulative standard logistic distribution function, evaluated as z=0+1X - P(Y=1|X)=F(0+1X)=11+e_-(0+1X) - **Why Use**: computationally faster & easier - **Interpretation**: hold constant all other factors, except for variable of interest, compare difference between two values of variable to get marginal effect.

Answer 58

Maximum Likelihood Estimator (MLE): of coefficients in probit model, value (0,1)that maximizes the likelihood function, value (0,1)best describes full distribution of data * Likelihood Function: conditional density of Y1,,,Yn given X1,...,Xn treated as a function of the unknown parameters 0 and 1 * Large Samples: MLE is consistent, Normally distribution, efficient (smallest variance of all consistent estimators) Probit and logit are estimated via maximum likelihood * Coefficients are normally distributed for large n * Large-n hypothesis testing, conf. intervals is as usual