Exam 2 Flashcards

Question 1

Q

Correlation (definition, symbol, range, AKA)

Answer

A

A standardized measure that indicates how strongly two variables are related to each other.

Represented by r

Ranges from -1 to 1

AKA: Correlation Coefficient

Question 2

Q

Relationship between correlation and variability in the data

Answer

A

Inverse. As variability in the data increases, correlation will decrease.

Question 3

Q

Simple Linear (OLS) Regression Formula

Answer

A

y = β0 + β1(X) + e

Y: our best guess given X
β0: intercept
β1: slope/ regression coefficient
X: input
e: error term

Question 4

Q

e (names, definition, interpretation)

Answer

A

Error term or Residual

The difference between the actual observed value of the dependent variable and the value predicted by the regression model.

Represents dispersion/variability → Inverse relationship with correlation

Question 5

Q

β1 (names, interpretation)

Answer

A

Slope or regression coefficient

For a one unit increase in x, there is a β1 unit increase in Y

Question 6

Q

Ordinary Least Squares Regression Method

Answer

A

Calculate the line the minimizes the sum of the squared residuals (SSR)

Question 7

Q

OLS Regression Assumptions (5)

Answer

A

There is a linear relationship
The observations are independent
The errors (e) are normally distributed with mean 0
The errors are homoscedastic (the variance of errors does’t change)
The dependent variable is a continuous numeric value

Question 8

Q

SST (name, definition, formula)

Answer

A

Total Variability

The difference between the mean and a given y value

y - ȳ

Question 9

Q

SSE (name, definition, formula)

Answer

A

Explained Variability

The amount of variability from the mean is explained by the model

ŷ - ȳ

Question 10

Q

SSR (name, definition, formula)

Answer

A

Residual Variability

The amount of variability from the mean that cannot be explained by the model

y - ŷ

Question 11

Q

R Squared (Formula and definition)

Answer

A

SSE/SST

The proportion of variance in Y that can be explained by X

Question 12

Q

Significance Testing the Regression Coefficient

Answer

A

Testing whether the coefficient is significantly different from 0.

Tells us whether the relationship between X and Y is significant.

A higher coefficient and lower variability will decrease the p-value.

Question 13

Q

Multiple Regresson Formula

Answer

A

Y = β0 + β1(X1) + β2(X2) +…+ βn(Xn) + e

Question 14

Q

βn in Multiple Regression

Answer

A

The effect of Xn on Y, HOLDING ALL OTHER VARIABLES CONSTANT

Question 15

Q

R2 in Multiple Regression

Answer

A

The proportion of the variance in Y that is explained by all independent variables in the model

Question 16

Q

Partial R2

Answer

A

The proportion of the variance in Y that is explained by one independent variable, HOLDING ALL OTHER VARIABLES CONSTANT

Question 17

Q

Cohen’s D in Multiple Regression

Answer

A

Effect Size

Used when units are different among X variables.

< 0.2 = ignored
< 0.5 = small
< 0.8 = medium
< 1.3 = large
1.3+ = very large

Question 18

Q

Adjusted R2

Answer

A

Tells us the predictive power of the model for data outside the sample.

Decreases when a predictor is added that does not improve the model.

Question 19

Q

Potential Issues with Multiple Regressions (2)

Answer

A

Multicollinearity
Overfitting

Question 20

Q

Multicollinearity

Answer

A

High correlation between two or more predictor variables creates redundancy

VIF = 1: no effect of multicollinearity
VIF > 1: Moderate effect of multicollinearity
VIF > 5: High effect of multicollinearity
VIF > 10: major effect of multicollinearity (X should be removed)

Question 21

Q

Overfitting

Answer

A

When a model has so many X variables that it becomes overly complex, learning idiosyncratic patterns of a
particular sample that may not generalize to the general
population

Indicated by a high cohen’s f2 but a low change in adjusted R2

Question 22

Q

Cohen’s f2

Answer

A

How much R2 changes when the variable is added to the model

Question 23

Q

Adjusted R2 Δ

Answer

A

How much adjusted R2 changes when the variable is added to the model

Question 24

Q

Bias/Variance Trade Off

Answer

A

High-bias (Underfitting): A simple linear regression model trying to fit a complex nonlinear relationship may fail to capture the data’s structure, leading to errors.

High-variance (Overfitting): A high-degree polynomial regression may perfectly fit the training data but perform poorly on new data due to capturing noise.

Question 25

Q

Categorical Significance (Reference)

Answer

A

Intercept = Reference group
Coefficients = Difference in group mean from reference group mean
Null Hypothesis = Difference is 0

Question 26

Q

Categorical Significance (GHLT Pairwise)

Answer

A

Coefficients = difference between each group
Null Hypothesis: difference between each group is 0

Question 27

Q

Categorical Significance (No Intercept)

Answer

A

Coefficients = group means
Null Hypothesis: Each group mean is difference from 0

Question 28

Q

Why do we use GHLT instead of running multiple tests?

Answer

A

When we run multiple hypothesis test at a time, there is a higher chance of encountering a Type 1 error. GLHT automatically adjusts the p-value to account for this.