Week 11: Core Skills In Regression Flashcards

1
Q

What is regression known as?

A

Conditional Expectation Function = E(Y|X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a conditional expectation function?

A

It tells us the expected (predicted) value of Y for some set of X variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which variables do we include when using regression as a predictor?

A

All the variables, regardless of their statistical significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is another main use of regression?

A

To find marginal effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe a marginal effect.

A

The impact of a one-unit change in X on E(Y |X).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the marginal effect in linear regression?

A

The coefficient of the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is differentiation used for in statistics?

A
  • Compute marginal effects from regressions
  • Find the minimum or maximum point of mathematical functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define ‘estimand’.

A

The unknown parameter(s) that we aim to estimate [e.g. E(Y )]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define ‘estimator’.

A

Functions of sample data which we use to learn about
the estimands.
[e.g. the sample mean mean estimator 1n (sum of n (i) =1 y(i) ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define ‘estimate’.

A

Particular values of estimators that are realised in a given sample dataset.
[e.g. the mean of a sample µ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the estimands in regression?

A

The βs, the true population coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the estimators in regression?

A

The OLS regression function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the estimates in regression?

A

The βˆs, the estimated coefficients from our
regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is there uncertainty in statistics?

A

Due to the process of sampling, we observe only one of many possible estimates from the full population. Our sample mean or regression coefficient is an imprecise estimate of the true population estimand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is the sampling distribution of the estimator important?

A

The sampling distribution of the estimator shows the probability of different estimates over repeated samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

List the 4 sampling distribution facts.

A
  • Mean is the true β
  • Normally distributed (Central Limit Theorem)
  • Can estimate its variance from the sample variance
  • The standard error is its standard deviation
17
Q

How do we create a sampling distribution in R?

A

By using simulation

18
Q

How do we approximate the sampling distribution?

A

By calculating the standard error (standard deviation).

19
Q

Define the sample variance.

A

An unbiased estimator of the variance of the true
sampling distribution of any β(j) from a multiple regression.

20
Q

When do we say β is statistically significant?

A

If, under the null hypothesis, it would only have occurred 5% of the time or less, e.g. when |t| > 1.96.

21
Q

What does the standard error say?

A

The bigger it is, the more uncertainty we have about the true value of β.

22
Q

When do we reject the null hypothesis?

A

When α = 0.05, reject the null hypothesis when |t| > 1.96, e.g. reject when p < 0.05 under the null hypothesis

23
Q

When do we accept the null hypothesis?

A

If the 95% confidence interval contains 0, we cannot reject the null hypothesis.

24
Q

What is the Pseudo-Bayesian Approach?

A

A different approach involves directly simulating the sampling distribution, and using it to quantify uncertainty unlike using the standard deviation.

25
Q

How would we take n draws in R?

A

rnorm(n,mean=,sd=)

26
Q

When is simulation most useful?

A

When we want to show uncertainty about a
function of the coefficients such as the predicted outcome for a given set of X variables.

27
Q

Write the 5 steps to using simulation.

A
  1. Estimate a regression model
  2. Create n simulations of the coefficients using the multivariate normal distribution [in R, use the sims() command in the arm package]
  3. For each set of n simulated coefficients, calculate the function required, storing the results
  4. The 95% confidence interval is the 0.025th and 0.975th values of the vector from (3) [95% of possible values are contained within it]
  5. The standard error is the standard deviation of the vector from (3)
28
Q

What does the command %*% do in R?

A

Carries out matrix multiplication, e.g. (pred.outcomes <- values %*% t(coefs))

29
Q

What is coefs?

A

A matrix with 1000 rows and five columns (each row is a simulation).

30
Q

What is values?

A

A vector of X used for prediction: 1 row and five columns