Week 3 Flashcards

1
Q

What is R squared?

A

R-squared is the goodness of fit. So how well the line fits with the data.
It gives us the proportion of variation of a DV that can be explained by the IV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the formula of R-Squared? What are the components and their meaning?

A

The formula for R-squared is:
(TSS-RSS)/TSS.
RSS means residual sum of squares, and this looks at the variation of the DV that cannot be explained by the IV. So essentially, the residuals.
TSS means the total sum of squares, and this looks at the total variation of the DV
TSS-RSS gives us the variation of the DV that can be explained by the IVs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the formulas for RSS and TSS?

A

RSS is:
all the residuals squared together for every x value.
so e^2+e^2+e^2
TSS is:
(y_1-sample_mean)^2+(y_2-same_mean)^2+…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

so what does a R^2 of 0 and 1 mean?

A

R^2=0 means that the model can explain 0 variation of the DV
R^2=1 means the model explains all variation of the DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the problem with R^2?

A

The problem is that every additional IV to the model increases R^2, even if the newly added IV does not help to explain the DV. Thus, for multiple linear regressions, this doesnt work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

So what should be used for goodness of fit for multiple linear regression?

A

The adjusted r^2 should be used. This is because this considers whether the added IV actually helps the DV or not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the std Error showcase?

A

It is the measure of precision for the estimated coefficients. The smaller the std error, the more representative the sample will be of the overall population. It is also inversely proportional to the sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the t value?

A

It tells us how far our estimated coefficients are away from 0. If t statistic is high, the p value is low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what to do when the IV has more than 2 categories?

A

In this scenario, we must use one of the categories as the base/reference condition, where we compare the other categories to the base. Thus, we find the coefficients for each other condition, which is the difference between base and non-base

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is winsorizing?

A

the transformation of statistics by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers. Winsorization is a way of handling the impact of extreme values or outliers. The statistical method involves capping your data and replacing the most extreme points—those that don’t fit the rest of your data—with less extreme values from the same dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly