3. Testing and Evaluating Linear Models Flashcards

1
Q

What are the three parts of evaluation?

A

Evaluating individual coefficients, evaluating overall model quality, evaluating model assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What question would ask to explore significant of individual effects?

A

Is our model predictor informative of the relationship between x and y?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we evaluate individual coefficients?

A

Hypothesis is needed to make the data testable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the steps involved in hypothesis testing?

A

Research question

Statistical hypothesis

Calculate estimate of effect of interest

Calculate appropriate t-statistic

Evaluate t-statistic against the null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What should a good research question include?

A

Constructs under study
the relationship being tested
A direction of relationship
Target populations etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the different types of hypothesis?

A

Null = Precise and states specific value for the effect of interest

Alternative = Not specific, states something other than null is more likely to occur

H0 = B1 = 0
H1 = B1 not = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What would a null hypothesis suggest about the relationship between x and y?

A

If x and y are unrelated, change in x will not result in any change in y do b1 will be equal to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a p-value?

A

The P value means the probability, for a given statistical model that, when the null hypothesis is true.

E.g. P < 0.05 is the probability that the null hypothesis is true so in this case we would reject the null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a t-statistic?

A

T is simply the calculated difference represented in units of standard error.

A test statistic describes how closely the distribution of your data matches the distribution predicted under the null hypothesis of the statistical test you are using so if it is a larger the number, it is further away from what the null hypothesis would predict it to be.

Predicted value of beta/SE of predicted beta

(the smaller the SE, the more precise)

The greater the magnitude of T, the greater the evidence against the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we actually test the statistical significance of individual coefficients?

A

We select a significance level, α (typically .05)

Then we calculate the p-value associated with our test statistic (here β)

If the associated p is smaller, then we reject the null.

If it is larger, then we fail to reject the null.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does it mean if the p-value is < t-stat?

A

Reject the null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does it mean if the p-value is > t-stat?

A

Fail to reject the null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What sampling distribution is used for the null hypothesis?

A

T-distribution - n-k-1 degrees of freedom

Need significance level and critical value to compare observed t-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a critical value?

A

Establishes regions in sampling distribution of test statistic = Used to calculate upper and lower bounds of CI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the different factors that impact SE value?

A

SE is smaller when residual variance (SS Residual) is smaller
SE is smaller when sample size ( N ) is larger
SE is larger when the number of predictors (k) is larger
SE is larger when a predictor is strongly correlated with other predictors ( R2xj)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a t-distribution?

A

Standardized differences to sample means to population mean when population SD isn’t known

Normally distributed population

17
Q

What is the confidence level for null?

A

1 - alpha

18
Q

What does it mean if the confidence interval doesn’t include 0?

A

If it doesn’t include 0, then it is significant

19
Q

How can we compare the critical value and t-statistic to tell us if we can reject the null or not?

A

If the value of the test statistic is less extreme than the critical value, then the null hypothesis cannot be rejected.

Absolute value of t-statistic > critical value = Reject the null

20
Q

When are we more likely to find a significant effect?

A

When we have picked good variables (smaller residual SS) and we have a large sample

21
Q

How do we evaluate overall model performance?

A

The aim of our linear model is to build a model which describes y as a function of x.

That is we are trying to explain variation in y using x so we evaluate model evaluation via assessing variation.

22
Q

What does variation in y stand for?

A

Total variation of interest

23
Q

What is variation made up of?

A

Model and Residual Variance

24
Q

How do we measure total variation in the outcome?

A

Sum of Squares = SS model + SS residual

25
Q

What does R2 mean (coefficient of determination)?

A

Quantifies the amount of variability in the outcome accounted for by the predictors.

More variance accounted for, the better.

Represents the extent to which the prediction of y is improved when predictions are based on the linear relation between x and y.

R2 = SSmodel/SStotal or 1 - (SSresidual/SStotal)

26
Q

What is adjusted R2?

A

It is the R2 value adjusted for when there are two or more predictors

Random sampling can impact it

Adjusted for sample size and number of predictors

Increased IVs = ^ Value

27
Q

Why is it important to compute adjusted R2 in a model with multiple predictors?

A

It accounts for random fluctuation that comes with increases in sample size & number of predictors

28
Q

What does comparing R2 and adjusted R2 tell us?

A

The most vital difference between adjusted R-squared and R-squared is simply that adjusted R-squared considers and tests different independent variables against the model and R-squared does not.

So if big difference between them - i.e adjusted R2 is a lot smaller then the additional input variables are not adding value to the model.

29
Q

If there is a smaller sample, what does this mean for fluctuations in adjusted R2?

A

In smaller samples , the fluctuations from zero will be larger on average.

30
Q

If we have a highly correlated predictor, how does that impact the SE of coefficients?

A

Increases SE as we’re less certain that our variables are driving the effect