Week 1 and Week 2 Flashcards by Tereza Grahovac Andersson

What is probability theory good for?

Learn about a large group:
large group = population
> Too large to look at everyone
• Look at small subgroups
> describe sample = descriptive statistics
• Infer properties of a population from a sample
> Inferential statistics

How well did you know this?

Not at all

Perfectly

How to choose a smart sample?

Choose a random sample

How well did you know this?

Not at all

Perfectly

How do we denote sample and realized sample?

Sample set: Ω
realized sample: ω_0

How well did you know this?

Not at all

Perfectly

Denote Expectation of X

E[X]

How well did you know this?

Not at all

Perfectly

How do we calculate variance? And why?

measures how accurately x is predicted by E[X]. Variance of x= var(x)=(x-E[X])^2

How well did you know this?

Not at all

Perfectly

If we calculate Var(x) and it’s LARGE, is E[X] a good prediction of x(ω_0)?

No, E[X] is not a good prediction of x(ω_0). Only if variance is small.

How well did you know this?

Not at all

Perfectly

How do we calculate standard deviation?

sd(x)=sqrt(var(x))

How well did you know this?

Not at all

Perfectly

Optimality of E[X]

E[X]=arg min E[(x-a)^2]

How well did you know this?

Not at all

Perfectly

What does it mean if:
Cov(Y_1,Y_2)<0
Cov(Y_1,Y_2)>0
Cov(Y_1,Y_2)=0

Cov(Y_1,Y_2)<0 = negative relationship, negatively correlated
Cov(Y_1,Y_2)>0= positive relationship, positively correlated
Cov(Y_1,Y_2)=0 = uncorrelated

How well did you know this?

Not at all

Perfectly

Does a random sample give a representative sample?

Yes

How well did you know this?

Not at all

Perfectly

What does k and K denote?

k = observed X’s
K = all of the X’s

How well did you know this?

Not at all

Perfectly

“we cannot predict the value of U by observing regressors” is denoted how?

E[U|X1,…,Xk]=0

How well did you know this?

Not at all

Perfectly

Assumption OLS-2 (exogeneity)

The linear regression model satisfies: E[U|X1,…,Xk]=0

The regressors are exogenous.

How well did you know this?

Not at all

Perfectly

If E[U|X1,…,Xk]=0 holds, what does that say about the covariance?

cov(U,Xj)=0
- each regressor uncorrelated with unobserved component
- find j with cov (U, Xj) ≠ 0 and exogeneity fails

How well did you know this?

Not at all

Perfectly

Assumption OLS-3

(Full rank, informal statement): The best linear prediction of Y is unique.
This assumption is often called the “no perfect collinearity assumption”

How well did you know this?

Not at all

Perfectly

B^_0 is equal to what?

B^_0=E^[Y]-B^_1E[X]

How much does the score increase if the model is: score = β0 + β1 log(study time) + β2courses + U

We estimate βˆ1(ω0) = 16.74237. This means that the estimated effect of increasing study time by 1% is approximately 16.74237/100 points on the exam.

How much does the score increase if the model is: log(score) = β0 + β1study time + β2courses + U.

This implies that changing the variable X1 by 1 unit will change output by approximately 100β1 %.
We estimate βˆ1(ω0) = .014173. This means that we estimate that studying one extra hour will increase your exam score by approximately 100 × .014173% = 1.4173%.

How much does the score increase if the model is: log(score) = β0 + β1 log(study time) + β2courses + U.

This implies that changing the variable X1 by 1% will change output by approximately β1%. In this case, the coefficient β1 is called the (approximate) elasticity of X1.
We estimate an approximate elasticity of βˆ1(ω0) = .263274. This means that increasing your study time by 1% will boost your exam score by approximately .263274%.

How much does the score increase if the model is: score = β0 + γ0stat + β1study time + U

We estimate γˆ0(ω0) = 51.33396. The estimated effect can be interpreted as follows. Suppose we could re-write the past of someone who didn’t take a statistics course and make them attend at least one such course. It is estimated that this would boost their grade by approximately 51.33 points.

How much does the score increase if the model is: score = β0 + β1study time + β2math + γ1(study time × math) + U.

The marginal effect of study time is given by β1 + γ1math. If γ1 > 0 this means that the higher your mathematical ability the more efficiently you study.

OLS assumption 1

Functional form

Ols assumption 4

Random Sample

If OLS 1-4 holds what does that mean?

That E[B^_1]=B_1 –> B^_1 is unbiased for B_1
- B^_1 is consistent for B_1
- B^_1 ≈ B_1 in large samples

OLS 5

The conditional variance of the unobserved component U is constant. If OLS 5 is satisfied: the error term is HOMOSKEDASTIC.

Can standard deviation replace standard error and vice versa?

Yes.

What does large and small values of rj indicate?

Small values of rj indicate that Xj cannot be approximated well by the other regressors. Large values of rj indicate that Xj can be approximated quite well by the other regressors.

What if - σ decreases - var(Xj) increases - rj decreases

If σ decreases then the variance of βˆj will decrease. If var(Xj) increases then the variance of βˆj will decrease. If rj decreases then the variance of βˆj will decrease.

Do we try to disprove the null hypothesis?

YES

If H_0 is true and we reject the H_0, what is that called?

Type 1 error

If H_0 is false and we don't reject H_0, what is that called?

Type 2 error.

What do we say if H_0 is rejected?

X is significant.

When do we use F tests?

For testing of multiple hypothesis.

When do we use t-test?

For testing of one hypothesis.

F test: Which one do we reject? If p-value(ω0) ≥ α ⇒ Fˆ(ω0) ≤ cα If p-value(ω0) < α ⇒ Fˆ(ω0) > cα

We reject if p-value(ω0) < α ⇒ Fˆ(ω0) > cα

Can we say anything if we can't reject the null hypothesis?

No. We learn nothing.

Formula for R^2 when Var^ (Y) is unknown

R^2 = Var^ (Y^) / (Var^ (Y^) + Var^ (U^)) = 1 / (1 + Var^ (U^)/(Var^ (Y^))

Formula for Var (X_1)

Var (X_1) = 1/n * (X - E(X) )^2

Compute marginal effect of a square variable and another variable. Example m(x1) = β0 + β1x_1 + β2x_1^2

We derive and calculate as: m(x1) = β1 + 2*β2x_1.