Midterm Flashcards

Question 1

Q

Right vs left skewed distribution

Answer

A

Right = mode, median, mean
Left = mean, median, mode

Question 2

Q

What is the variance

Answer

A

The arithmetic average of the squared differences of the data values of the mean

Question 3

Q

How to calculate standard error

Answer

A

Standard deviation divided by the square root of the sample size

Question 4

Q

What is standard deviation

Answer

A

Describes the spread o f values in a continuous distribution - a sample or population
It is used as a descriptive statistic

Question 5

Q

What is a standard error

Answer

A

Is used to measure the accuracy of a sample distribution in representing a population

Question 6

Q

How do you calculate confidence bounds

Answer

A

Mean plus minus (t value * standard error)

Question 7

Q

How to calculate standard error of the difference

Answer

A

Root of (se1 squared plus se2 squared)

Question 8

Q

How to calculate a chi square

Answer

A

Sum of ((observed frequency minus expected frequency)squared) divided by expected frequency

Question 9

Q

What does the chi square tell us about the null hypothesis

Answer

A

If the chi square is larger than the critical value of the degrees of freedom, the null hypothesis can be rejected

Question 10

Q

Explain type I vs type II error

Answer

A

Type II error is one where you fail to reject the null hypothesis when you should
Type I error is one where you reject a null hypothesis when you shouldn’t

Question 11

Q

What is the central limit theorem

Answer

A

Establishes that means of repeated large samples are normally distributed even when underlying distribution of the data is not normal

Question 12

Q

What is a confidence interval?

Answer

A

Best guess of finding the mean of a data set and the confidence that it lies somewhere in the desired interval (generally use 95)

Question 13

Q

Pearson’s correlation coefficient

Answer

A

Measure the association between two continuous variables
R is scaled between -1 and 1
R= covariation of x and y

Question 14

Q

Correlation vs regressions

Answer

A

Correlation tells us how strongly associated two variables are
Regression can tell us on average how much a one unit increase in the independent variable changes the predicated value of the dependent variable

Question 15

Q

What does the line of best fit do

Answer

A

Minimizes the Y distance from each observation to the line

Question 16

Q

Why do we use y hat and how does it differ

Answer

A

Y hat means we are producing estimated y values
In actual values of y we need the error term so y=a+bXi+ei

Question 17

Q

Standard error of the slope

Answer

A

Given by the root mean square error over standard deviation

Question 18

Q

T ratio

Answer

A

t= (b - ßH0)/s.e.

Question 19

Q

What are the five OLS assumptions

Answer

A

Linearity
Mean independence
Homscedasticity
Uncorrelated disturbances
Normal disturbance

Question 20

Q

Explain linearity

Answer

A

Linearity - the dependent variable is a linear function of the x’s plus a population error term ex. yea+ß1x1+ß2x2+e

Pertains to linearity in the parameters

Question 21

Q

Explain mean independence

Answer

A

Zero conditional mean

The mean value of error does not depend on any of the x’s
Assume that e(€)=0

Most important because violations 1. Can generate large bias in the estimates and 2. Cannot be tested for without additional data

Omitted variable bias
Endogenous bias
Measurement error

Question 22

Q

Explain homoscedasticity

Answer

A

The variance of the error cannot depend on the x’s
standard deviation squared is constant
You want homoskedacity
P value has to be >0.05

Non constant variance
Biases the standard errors

Question 23

Q

Explain uncorrelated disturbances

Answer

A

Teh value of the error for any observation is uncorrelated with the value of the error for any other observation

Correlated errors can arise from connected observations, causal effects, or serial correlation

They shrink standard errors, observations are assumed to be more independent that they are, type 1 error danger

Question 24

Q

Explain normal disturbance

Answer

A

The disturbances ,e, are distributed normally

Only the disturbances not the variables must be normally distributed
Normality is the least important assumption

Question 25

Q

How are the OLS assumptions related

Answer

A

1+2 are unbiased estimators
3+4 are BLUE (best linear unbiased estimator) and standard errors are at least as small are those produced by any other method
5 implies that a t table of z table can be used to calculate p values

Question 26

Q

What does the dummy variable do

Answer

A

Helps with comparison of the means of y for different categories of x

Question 27

Q

Collider bias

Answer

A

Occurs when a treatment (independent) variable and outcome (dependent) variable or factors causing these each influence a common third variable and that variable (the collider) is controlled for by design or analysis

More general form of selection bias

Question 28

Q

Post treatment bias

Answer

A

While omitting relevant covariates can lead to omitted variable bias, including covariates that control for you causal mechanism can result in post treatment bias

Question 29

Q

What is multicollinearity and what are the consequences

Answer

A

A situation where more than two explanatory variables in a multiple regression are highly linearly related

Does not bias the estimation of your coefficient estimates
Inflate standard errors of highly colinear variables
Induce unstable estimates

Question 30

Q

which OLS assumptions do time series tend to violate

Answer

A

Mean independence and the independence of errors

Question 31

Q

What is stationarity

Answer

A

A time series is weakly stationary if it’s mean and variance remain constant over time

Question 32

Q

What is a dummy variable

Answer

A

A variable that is coded 1 or 0

Question 33

Q

Regression outlier

Answer

A

An observation where the dependant value y is unusually extreme given its independent value x

Question 34

Q

In which direction is the ß1 biased

Answer

A

ß2>0 and corr(x1,x2)>0 positive

ß2>0 and corr(x1,x2)<0 negative

ß2<0 and corr(x1,x2)>0 negative

ß2<0 and corr(x1,x2)<0 positive

Question 35

Q

Interpret the different long linear relationship

Answer

A

Level level : y=a+ßx one unit change in x leads to a ß unit change in y

Log linear: log(y) = a+ßx one unit change in x leads to a 100*ß change in y

Linear log : y=a+ßlog(x) one percent change in x leads to a ß/100 unit change in y

Log log : log(y)=a+ßlog(x) one percent change in x leads to a ß percent change in y

Question 36

Q

How do we interpret shared terms in non linear regression

Answer

A

If b2 is negative and b3^2 is positive then y is convex (smiley)
If b2 is positive and b3^2 is negative then y is concave (frowny)

Question 37

Q

What does a time counter do

Answer

A

It draws out the trends in a time series data

Question 38

Q

What is a unit root

Answer

A

How much of y is explained by the previous y
The y is almost exactly the same as it’s previous value
Also known as a random walk

Question 39

Q

Random walk

Answer

A

Same value today as yesterday with just a bit of randomness

Question 40

Q

Weakly dependent time series

Answer

A

Covariance stationary time series is weakly dependent if the correlation between x1 and x1+h goes to zero sufficiently quickly as h increases

Question 41

Q

What are the two types of panel data

Answer

A

True panels - longitudinal data measuring the same units repeatedly over time

pooled cross sections - random surveys in multiple years with a new random sample each time

Question 42

Q

pooled cross sections

Answer

A

Advantages
Are amenable to OLS with only minor complications
Increased sample size increases accuracy of estimators and adds statistical power

Pitfalls
Distributions may change in different years
Panel heteroskedaciity

Question 43

Q

Fixed effects/within model

Answer

A

Subtract off the mean value of each group from each observation in a group
Equivalent to adding a dummy variable for each group
Super power- yields within estimation in which only the variation within groups is used for coefficients

Question 44

Q

What is persistence

Answer

A

Persistence in time series Evers to the continuity of an effect after the cause is removed
Often related to the notion of memory properties of time series
Has an effect on standard errors and can lead to false positives and negatives

If the effect of infinitesimally small shock will be influencing in future predictions of time series for a very long time you will have a persistent time series process

Question 45

Q

How do you deal with persistence

Answer

A

Use lag data
Make sure to model your data

Question 46

Q

How do you interpret a marginal effects plot

Answer

A

The y axis is the marginal effect of x on y dy/ dx

And the x axis is now the value of the conditioning variable

Question 47

Q

How do you calculate vif

Answer

A

1/1-r^2

1/tolerance
Tolerance is 1-r^2