Test Prep Flashcards

1
Q

What is the formula for the simple linear regression model?

A

y=β _0 +β_1*x_1

Where:
y: Dependent (response) variable
x_1: Independent (predictor) variable
𝛽_0: Intercept
𝛽_1: Slope (coefficient)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the dependent variable (y) represent in simple linear regression?

A

y is the dependent variable (response). It’s the outcome we are trying to predict or explain, like house price in a real estate model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the intercept (𝛽_0) mean in simple linear regression?

A

β _0 is the intercept. It’s the predicted value of y when x=0. It shows where the line crosses the y-axis. For example, it could represent the price of a house with zero square feet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the slope (𝛽_1) mean in simple linear regression?

A

β_1 is the slope (coefficient). It shows how much y changes when x increases by 1 unit. For example, it might show how much house price increases for each extra square foot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the error term (ϵ) in simple linear regression?

A

ϵ represents the error or residual, the difference between the actual y and the predicted y. It accounts for variation in y that isn’t explained by x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the intercept tell us when the predictor variable (x) is zero?

A

It tells us the predicted value of y when x=0, essentially giving the baseline value of the response variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does covariance measure?

A

Covariance measures whether two variables tend to move in the same direction (positive) or in opposite directions (negative).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a positive covariance indicate?

A

A positive covariance indicates that the two variables increase or decrease together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does a negative covariance indicate?

A

A negative covariance indicates that as one variable increases, the other decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is covariance difficult to interpret?

A

Covariance is hard to interpret because its value depends on the scale of the variables and can be any large positive or negative number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does correlation measure?

A

Correlation measures both the direction and the strength of the relationship between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the value boundaries of correlation?

A

Correlation is always between -1 and 1, with:

1 meaning a perfect positive relationship.
-1 meaning a perfect negative relationship.
0 meaning no relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is correlation different from covariance?

A

Correlation is standardized and bounded between -1 and 1, making it easier to interpret than covariance, which has no fixed scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does R-squared measure in a regression model?

A

R-squared measures the proportion of the total variability in the outcome variable that is explained by the predictor variable(s) in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the range of R-squared values?

A

R-squared values range from 0 to 1. A value of 0 means the model explains none of the variability, while 1 means the model explains all the variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is R-squared related to residuals?

A

R-squared is calculated as
1− (TotalSumofSquares(TSS)/SumofSquaredResiduals(SSR)). It reflects how much of the data’s variation is captured by the model compared to the residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does a high R-squared value indicate about residuals?

A

A high R-squared value indicates that the residuals are small, meaning the model’s predictions are close to the actual values and the model fits the data well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does a low R-squared value indicate about residuals?

A

A low R-squared value indicates that the residuals are large, meaning the model’s predictions are far from the actual values and the model does not fit the data well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are predicted values in a regression model?

A

Predicted values (ŷ) are the values estimated by the regression model for the outcome variable based on the predictor variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are observed values of y in a regression model?

A

Observed values (y) are the actual values of the outcome variable collected during data gathering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How is a residual calculated in a regression model?

A

Residuals are calculated as:

Residual = Observedvalue (𝑦) − Predictedvalue (ŷ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does a positive residual indicate?

A

A positive residual indicates that the model under-predicted the outcome (the actual value is higher than the predicted value).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does a negative residual indicate?

A

A negative residual indicates that the model over-predicted the outcome (the actual value is lower than the predicted value).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the range of correlation coefficients?

A

Correlation coefficients range from -1 to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does a correlation coefficient of 1 indicate?

A

A correlation coefficient of 1 indicates a perfect positive correlation; as one variable increases, the other variable increases proportionally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does a correlation coefficient of -1 indicate?

A

A correlation coefficient of -1 indicates a perfect negative correlation; as one variable increases, the other variable decreases proportionally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does a correlation coefficient of 0 indicate?

A

A correlation coefficient of 0 indicates no linear relationship; the variables do not have a consistent pattern of moving together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do you interpret a positive correlation coefficient (e.g., 𝑟 > 0)?

A

A positive correlation coefficient means that as one variable increases, the other variable also increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do you interpret a negative correlation coefficient (e.g., r<0)?

A

A negative correlation coefficient means that as one variable increases, the other variable decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does a correlation coefficient close to 0 indicate?

A

A correlation coefficient close to 0 indicates a weak or negligible relationship between the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does a correlation coefficient around 0.3 to 0.7 (or -0.3 to -0.7) suggest?

A

This suggests a moderate relationship between the variables. (Subject to interpretation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What does a correlation coefficient close to 1 or -1 indicate?

A

This indicates a strong relationship; the variables move closely in sync with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What should you remember about correlation in relation to causation?

A

Correlation does not imply causation; a high correlation doesn’t mean one variable causes the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What kind of relationships does the correlation coefficient capture?

A

The correlation coefficient captures linear relationships only; non-linear relationships are not well represented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How can outliers affect the correlation coefficient?

A

Outliers can heavily influence the correlation, making it seem stronger or weaker than it actually is for most of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How do you interpret a positive slope coefficient (𝛽) in a regression model?

A

A positive slope coefficient means that as the predictor variable increases, the outcome variable is expected to increase as well. It shows the rate of increase in the outcome for every one-unit increase in the predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How do you interpret a negative slope coefficient (𝛽) in a regression model?

A

A negative slope coefficient means that as the predictor variable increases, the outcome variable is expected to decrease. It shows the rate of decrease in the outcome for every one-unit increase in the predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What does the magnitude of a slope coefficient tell you in a regression model?

A

The magnitude of a slope coefficient indicates how strong the relationship is between the predictor variable and the outcome variable. Larger coefficients mean a bigger effect on the outcome, while smaller coefficients indicate a weaker effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How would you use the intercept and slope coefficients to make predictions in a regression model?

A

Use the intercept as the baseline value of the outcome when predictors are zero. Add the product of the slope coefficients and their corresponding predictor values to the intercept to make predictions about the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What does the t-distribution help with in regression analysis?

A

the t-distribution helps determine if the estimates from the regression are statistically significant and reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is a t-statistic in regression?

A

The t-statistic tells us if a variable in your model has a strong impact on what you’re trying to predict. It does this by comparing how big the variable’s effect is to the variability of that effect. A large t-statistic means the effect is strong and likely real, while a small one means it might just be random noise.

Example:
If you’re analyzing how hours studied affects exam scores:

The t-statistic for the coefficient of hours studied helps determine if the relationship you observe is likely due to a true impact of studying hours on scores, or if it might be just a coincidence in your sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What does a p-value represent?

A

The p-value tells you how likely it is to observe your data (or something more extreme) if H₀ were true. A low p-value means that it’s unlikely the data you observed could have occurred just by random chance under H₀.

43
Q

How do you interpret a p-value of 0.03?

A

A p-value of 0.03 means there’s a 3% chance that the observed effect is due to random chance rather than a real effect.

44
Q

How does the t-distribution relate to p-values?

A

the t-distribution is used to calculate the p-value from the t-statistic, which tells you if your results are likely due to chance or if they reflect a true effect.

Imagine you’re checking if a new study method really works. After analyzing the test score changes, you calculate a t-statistic of 2.45. Using the t-distribution, you find that there’s only a 3.7% chance you’d see such a result if the method didn’t actually improve scores. Since this is a small chance, you conclude the method likely does have an effect.

45
Q

What is the goal of the least squares method in relation to the arithmetic mean?

A

The goal is to find the line or model that minimizes the total squared differences between the observed values and the predicted values from the model.

46
Q

Why do we use squared differences in the least squares method?

A

Squaring the differences amplifies larger errors more than smaller ones, ensuring the model minimizes the biggest discrepancies, making the model more robust overall.

47
Q

How does the arithmetic mean act as the least squares deviation estimator?

A

The arithmetic mean minimizes the sum of squared differences between itself and all data points, making it the best estimate to represent the data.

48
Q

What happens if you use a number other than the arithmetic mean to calculate squared differences?

A

Using any number other than the arithmetic mean will result in larger total squared differences compared to using the mean.

49
Q

Why is the arithmetic mean considered the “best” average?

A

The arithmetic mean is the “best” average because it’s the number that fits the data most evenly (Its locationed mostly at the center of the data). It makes the total of all the squared gaps between itself and the data points as small as possible.

50
Q

What does “least squares” refer to?

A

mathematical method used to find the best-fitting line or model for a set of data points by minimizing the sum of the squared differences (errors) between the observed values and the values predicted by the mode

51
Q

How does choosing any number other than the mean affect squared deviations?

A

Choosing any number other than the mean results in a larger total of squared deviations compared to choosing the mean.

52
Q

What is the effect of the arithmetic mean on squared deviations?

A

The arithmetic mean minimizes the total squared deviations (differences) between itself and the data points. This means the sum of the squared differences between each data point and the mean is the smallest possible compared to any other number.

53
Q

What does a statistically significant p-value (e.g., < 0.05) for a predictor mean?

A

It means there is evidence that the predictor has a significant effect on the outcome variable.

54
Q

What does a high R-squared value (close to 1) in regression output suggest?

A

It suggests that the predictor(s) explain a large portion of the variability in the outcome variable.

55
Q

What does a confidence interval that does not include zero around a coefficient indicate?

A

It indicates that the predictor has a meaningful effect on the outcome variable.

56
Q

If the R-squared value of your regression model is 0.75, what hypothesis could you make?

A

Hypothesis: “Approximately 75% of the variation in the outcome variable can be explained by the predictor(s).”

56
Q

How do you interpret a 95% confidence interval for a predictor of (3, 7)?

A

Hypothesis: “The effect of the predictor on the outcome variable is likely between 3 and 7 units.”

57
Q

What is the purpose of a confidence interval in regression?

A

A confidence interval provides a range of values within which the true parameter value is likely to fall, accounting for uncertainty in the estimate.

58
Q

How do you construct a confidence interval around a regression parameter estimate?

A

Place a “window” (interval) around the parameter estimate, using a margin of error to define the range from a lower bound to an upper bound.

59
Q

How do you interpret the confidence interval for a regression parameter?

A

The interval represents a range of values within which you are confident the true parameter value lies. For example, a 95% confidence interval means that 95 out of 100 intervals constructed this way would contain the true parameter.

60
Q

If a regression coefficient estimate is +5 and the margin of error is ±2, what is the confidence interval?

A

The confidence interval is from 3 to 7.

61
Q

What does a confidence interval of 3 to 7 for a regression parameter tell you?

A

It suggests that the true effect of the predictor is likely between 3 and 7 units, and you can be confident about this range based on your sample data.

62
Q

What does a 95% confidence level mean when interpreting a confidence interval?

A

It means that if you took 100 different samples and constructed intervals in the same way, approximately 95 of those intervals would contain the true parameter value.

63
Q

Why is it important to understand the margin of error in a confidence interval?

A

The margin of error determines how wide the confidence interval is, reflecting the degree of uncertainty about the parameter estimate.

64
Q

What does the residual indicate in regression analysis?

A

The residual indicates the part of the outcome that the model could not predict, showing the error or difference between the actual and predicted values.

65
Q

What does Adjusted R-squared account for when comparing models?

A

Adjusted R-squared accounts for the number of predictors in the model and penalizes adding too many predictors, helping to prevent overfitting.

66
Q

Why might a simpler model perform better than a more complex one?

A

A simpler model is less likely to overfit the data, meaning it may perform better on new, unseen data.

67
Q

How can cross-validation help in comparing models?

A

Cross-validation tests how well each model performs on different subsets of data, providing a more reliable sense of how it will perform on new data.

68
Q

What is the trade-off when comparing a simple model versus a complex model?

A

A complex model may fit the current data better but risks overfitting, while a simple model may be less accurate but better generalizes to new data.

69
Q

What is overfitting in the context of regression models?

A

Overfitting occurs when a model is too complex and captures noise in the data rather than the true underlying pattern, leading to poor performance on new data.

70
Q

How does R-squared help in evaluating model performance?

A

R-squared shows how well the model explains the variation in the outcome. However, to compare models with different predictors, Adjusted R-squared and other criteria should also be considered.

71
Q

What are residuals in regression?

A

Residuals are the differences between the observed values and the predicted values from a regression model. They represent the part of the outcome that the model couldn’t predict, showing how far off the model’s predictions are from the actual data points.

72
Q

What does the F-test help you determine in model comparisons?

A

The F-test helps you determine if adding more predictors to your model significantly improves its ability to explain the outcome, comparing a more complex model to a simpler one.

73
Q

What question does the F-test answer when comparing models?

A

The F-test answers: “Does this model with more predictors do a better job than a simpler model?”

74
Q

In layman’s terms, how does the F-test work when adding predictors?

A

The F-test checks if adding a new predictor to your model meaningfully improves the predictions. If the F-test result is significant (low p-value), the new model is better.

75
Q

Why is Adjusted R-squared more reliable than regular R-squared?

A

Adjusted R-squared is more reliable because it only increases if new predictors truly improve the model, rather than just adding predictors that don’t help much.

76
Q

What does Adjusted R-squared prevent in your model?

A

Adjusted R-squared prevents you from adding too many unnecessary variables, as it adjusts the fit measure to account for the number of predictors.

77
Q

How are the F-test and Adjusted R-squared useful together?

A

The F-test shows if adding predictors improves the model, while Adjusted R-squared ensures that you’re not adding predictors that don’t really help, giving you a better quality fit.

78
Q

In simple terms, what is the difference between the F-test and Adjusted R-squared?

A

The F-test checks if adding more predictors improves the model overall, while Adjusted R-squared gives a more accurate measure of how well the model explains the data without adding unnecessary complexity.

79
Q

What is the null hypothesis (H₀) in a linear regression model?

A

The null hypothesis states that the predictor has no effect on the outcome, meaning the coefficient (slope) is zero.

80
Q

What is the alternative hypothesis (H₁) in a linear regression model?

A

The alternative hypothesis states that the predictor does affect the outcome, meaning the coefficient (slope) is not zero.

81
Q

What is a t-test used for in regression analysis?

A

A t-test is used to check how far the estimated coefficient (slope) is from zero, determining if the predictor has a significant effect on the outcome.

82
Q

What do we do if the p-value is less than 0.05?

A

If the p-value is less than 0.05, we reject the null hypothesis, meaning we have evidence that the predictor affects the outcome.

82
Q

What does the p-value tell us in a hypothesis test?

A

The p-value tells us the probability that we would observe this coefficient if the null hypothesis (no effect) were true. A small p-value suggests that the predictor has a significant effect.

83
Q

What do we do if the p-value is greater than 0.05?

A

If the p-value is greater than 0.05, we fail to reject the null hypothesis, meaning we do not have enough evidence that the predictor affects the outcome.

84
Q

What does a small p-value suggest in hypothesis testing?

A

A small p-value suggests that the predictor has a significant effect on the outcome, and we should reject the null hypothesis.

85
Q

What does it mean to reject the null hypothesis?

A

Rejecting the null hypothesis means we’ve found evidence that the predictor a

86
Q

What does it mean to fail to reject the null hypothesis?

A

Failing to reject the null hypothesis means there is no evidence that the predictor has a significant effect on the outcome (the slope could be zero).

87
Q

What happens when you add an additional predictor to a regression model?

A

Adding another predictor provides more information to the model, helping explain the outcome more effectively by considering additional factors.

88
Q

Does adding a new predictor replace the existing ones?

A

No, each predictor keeps its own effect and gets its own coefficient (slope), showing how it affects the outcome independently of the others.

89
Q

Why might adding a predictor improve the model?

A

Adding a predictor can improve the model’s accuracy because it helps explain more of the variability in the outcome by accounting for additional factors.

90
Q

What does each new predictor get in a multiple regression model?

A

Each new predictor gets its own slope (coefficient), which shows how much the outcome changes with that predictor, assuming other factors are held constant.

91
Q

Can adding a predictor always help the model?

A

No, sometimes adding a predictor doesn’t help much if it doesn’t explain much more about the outcome. It may not improve the model’s predictions.

92
Q

What is the effect of adding a useful predictor to a model?

A

A useful predictor improves the model by explaining additional variability in the outcome, making predictions more accurate.

93
Q

How do predictors affect the outcome when there are multiple in the model?

A

Each predictor affects the outcome independently, and the model calculates how much the outcome changes with each predictor while keeping the others constant.

94
Q

What are qualitative input features in a multiple regression model?

A

Qualitative input features are categorical variables that represent groups or categories, such as gender, education level, or location

95
Q

Why do we need to code qualitative features in regression analysis?

A

We need to code qualitative features because regression models require numerical input, and coding converts categories into a numerical format.

96
Q

What is dummy coding in regression analysis?

A

Dummy coding converts a qualitative feature into multiple binary variables, each representing a category, with one category serving as the reference group.

97
Q

How would you code the qualitative feature “Color” with categories Red, Blue, and Green using dummy coding?

A

You would create two dummy variables:

Color_Red: 1 if Red, 0 otherwise
Color_Blue: 1 if Blue, 0 otherwise (Green would be the reference category)

98
Q

In a regression model, what does a coefficient for a dummy variable indicate?

A

It indicates how much the outcome variable is expected to change when that category is present compared to the reference category.

99
Q

What would a regression output look like for a qualitative feature “Location” coded as Urban, Suburban, and Rural?

A

You could have:

Intercept = $200,000
Coefficient for Location_Urban = +$50,000
Coefficient for Location_Suburban = +$30,000

100
Q

How would you interpret the coefficient for Location_Urban in the previous example?

A

If the house is in an Urban location, the predicted price is $200,000 + $50,000 = $250,000.

101
Q

What does coding qualitative features allow you to do in regression analysis?

A

Coding allows you to include non-numeric categories in the regression model, helping you understand their influence on the outcome variable.

102
Q

In layman’s terms, how can you think of qualitative features in a regression model?

A

Think of qualitative features as different flavors; coding them lets you see how each “flavor” affects the overall “taste” (outcome) you are predicting!