Midterm Flashcards

1
Q

Right vs left skewed distribution

A

Right = mode, median, mean
Left = mean, median, mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the variance

A

The arithmetic average of the squared differences of the data values of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to calculate standard error

A

Standard deviation divided by the square root of the sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is standard deviation

A

Describes the spread o f values in a continuous distribution - a sample or population
It is used as a descriptive statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a standard error

A

Is used to measure the accuracy of a sample distribution in representing a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you calculate confidence bounds

A

Mean plus minus (t value * standard error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to calculate standard error of the difference

A

Root of (se1 squared plus se2 squared)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to calculate a chi square

A

Sum of ((observed frequency minus expected frequency)squared) divided by expected frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the chi square tell us about the null hypothesis

A

If the chi square is larger than the critical value of the degrees of freedom, the null hypothesis can be rejected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain type I vs type II error

A

Type II error is one where you fail to reject the null hypothesis when you should
Type I error is one where you reject a null hypothesis when you shouldn’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the central limit theorem

A

Establishes that means of repeated large samples are normally distributed even when underlying distribution of the data is not normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a confidence interval?

A

Best guess of finding the mean of a data set and the confidence that it lies somewhere in the desired interval (generally use 95)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pearson’s correlation coefficient

A

Measure the association between two continuous variables
R is scaled between -1 and 1
R= covariation of x and y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Correlation vs regressions

A

Correlation tells us how strongly associated two variables are
Regression can tell us on average how much a one unit increase in the independent variable changes the predicated value of the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the line of best fit do

A

Minimizes the Y distance from each observation to the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why do we use y hat and how does it differ

A

Y hat means we are producing estimated y values
In actual values of y we need the error term so y=a+bXi+ei

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Standard error of the slope

A

Given by the root mean square error over standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

T ratio

A

t= (b - ßH0)/s.e.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the five OLS assumptions

A

Linearity
Mean independence
Homscedasticity
Uncorrelated disturbances
Normal disturbance

20
Q

Explain linearity

A

Linearity - the dependent variable is a linear function of the x’s plus a population error term ex. yea+ß1x1+ß2x2+e

Pertains to linearity in the parameters

21
Q

Explain mean independence

A

Zero conditional mean

The mean value of error does not depend on any of the x’s
Assume that e(€)=0

Most important because violations 1. Can generate large bias in the estimates and 2. Cannot be tested for without additional data

Omitted variable bias
Endogenous bias
Measurement error

22
Q

Explain homoscedasticity

A

The variance of the error cannot depend on the x’s
standard deviation squared is constant
You want homoskedacity
P value has to be >0.05

Non constant variance
Biases the standard errors

23
Q

Explain uncorrelated disturbances

A

Teh value of the error for any observation is uncorrelated with the value of the error for any other observation

Correlated errors can arise from connected observations, causal effects, or serial correlation

They shrink standard errors, observations are assumed to be more independent that they are, type 1 error danger

24
Q

Explain normal disturbance

A

The disturbances ,e, are distributed normally

Only the disturbances not the variables must be normally distributed
Normality is the least important assumption

25
Q

How are the OLS assumptions related

A

1+2 are unbiased estimators
3+4 are BLUE (best linear unbiased estimator) and standard errors are at least as small are those produced by any other method
5 implies that a t table of z table can be used to calculate p values

26
Q

What does the dummy variable do

A

Helps with comparison of the means of y for different categories of x

27
Q

Collider bias

A

Occurs when a treatment (independent) variable and outcome (dependent) variable or factors causing these each influence a common third variable and that variable (the collider) is controlled for by design or analysis

More general form of selection bias

28
Q

Post treatment bias

A

While omitting relevant covariates can lead to omitted variable bias, including covariates that control for you causal mechanism can result in post treatment bias

29
Q

What is multicollinearity and what are the consequences

A

A situation where more than two explanatory variables in a multiple regression are highly linearly related

Does not bias the estimation of your coefficient estimates
Inflate standard errors of highly colinear variables
Induce unstable estimates

30
Q

which OLS assumptions do time series tend to violate

A

Mean independence and the independence of errors

31
Q

What is stationarity

A

A time series is weakly stationary if it’s mean and variance remain constant over time

32
Q

What is a dummy variable

A

A variable that is coded 1 or 0

33
Q

Regression outlier

A

An observation where the dependant value y is unusually extreme given its independent value x

34
Q

In which direction is the ß1 biased

A

ß2>0 and corr(x1,x2)>0 positive

ß2>0 and corr(x1,x2)<0 negative

ß2<0 and corr(x1,x2)>0 negative

ß2<0 and corr(x1,x2)<0 positive

35
Q

Interpret the different long linear relationship

A

Level level : y=a+ßx one unit change in x leads to a ß unit change in y

Log linear: log(y) = a+ßx one unit change in x leads to a 100*ß change in y

Linear log : y=a+ßlog(x) one percent change in x leads to a ß/100 unit change in y

Log log : log(y)=a+ßlog(x) one percent change in x leads to a ß percent change in y

36
Q

How do we interpret shared terms in non linear regression

A

If b2 is negative and b3^2 is positive then y is convex (smiley)
If b2 is positive and b3^2 is negative then y is concave (frowny)

37
Q

What does a time counter do

A

It draws out the trends in a time series data

38
Q

What is a unit root

A

How much of y is explained by the previous y
The y is almost exactly the same as it’s previous value
Also known as a random walk

39
Q

Random walk

A

Same value today as yesterday with just a bit of randomness

40
Q

Weakly dependent time series

A

Covariance stationary time series is weakly dependent if the correlation between x1 and x1+h goes to zero sufficiently quickly as h increases

41
Q

What are the two types of panel data

A

True panels - longitudinal data measuring the same units repeatedly over time

pooled cross sections - random surveys in multiple years with a new random sample each time

42
Q

pooled cross sections

A

Advantages
Are amenable to OLS with only minor complications
Increased sample size increases accuracy of estimators and adds statistical power

Pitfalls
Distributions may change in different years
Panel heteroskedaciity

43
Q

Fixed effects/within model

A

Subtract off the mean value of each group from each observation in a group
Equivalent to adding a dummy variable for each group
Super power- yields within estimation in which only the variation within groups is used for coefficients

44
Q

What is persistence

A

Persistence in time series Evers to the continuity of an effect after the cause is removed
Often related to the notion of memory properties of time series
Has an effect on standard errors and can lead to false positives and negatives

If the effect of infinitesimally small shock will be influencing in future predictions of time series for a very long time you will have a persistent time series process

45
Q

How do you deal with persistence

A

Use lag data
Make sure to model your data

46
Q

How do you interpret a marginal effects plot

A

The y axis is the marginal effect of x on y dy/ dx

And the x axis is now the value of the conditioning variable

47
Q

How do you calculate vif

A

1/1-r^2

1/tolerance
Tolerance is 1-r^2