Test 1 Prep Flashcards

1
Q

What is the formula for the simple linear regression model?

A

y=β _0 +β_1*x_1+ϵ

Where:
y: Dependent (response) variable
x_1: Independent (predictor) variable
𝛽_0: Intercept
𝛽_1: Slope (coefficient)
ϵ: Error term (residual)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the dependent variable (y) represent in simple linear regression?

A

y is the dependent variable (response). It’s the outcome we are trying to predict or explain, like house price in a real estate model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the intercept (𝛽_0) mean in simple linear regression?

A

β _0 is the intercept. It’s the predicted value of y when x=0. It shows where the line crosses the y-axis. For example, it could represent the price of a house with zero square feet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the slope (𝛽_1) mean in simple linear regression?

A

β_1 is the slope (coefficient). It shows how much y changes when x increases by 1 unit. For example, it might show how much house price increases for each extra square foot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the error term (ϵ) in simple linear regression?

A

ϵ represents the error or residual, the difference between the actual y and the predicted y. It accounts for variation in y that isn’t explained by x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the intercept tell us when the predictor variable (x) is zero?

A

It tells us the predicted value of y when x=0, essentially giving the baseline value of the response variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does covariance measure?

A

Covariance measures whether two variables tend to move in the same direction (positive) or in opposite directions (negative).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a positive covariance indicate?

A

A positive covariance indicates that the two variables increase or decrease together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does a negative covariance indicate?

A

A negative covariance indicates that as one variable increases, the other decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is covariance difficult to interpret?

A

Covariance is hard to interpret because its value depends on the scale of the variables and can be any large positive or negative number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does correlation measure?

A

Correlation measures both the direction and the strength of the relationship between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the value boundaries of correlation?

A

Correlation is always between -1 and 1, with:

1 meaning a perfect positive relationship.
-1 meaning a perfect negative relationship.
0 meaning no relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is correlation different from covariance?

A

Correlation is standardized and bounded between -1 and 1, making it easier to interpret than covariance, which has no fixed scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does R-squared measure in a regression model?

A

R-squared measures the proportion of the total variability in the outcome variable that is explained by the predictor variable(s) in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the range of R-squared values?

A

R-squared values range from 0 to 1. A value of 0 means the model explains none of the variability, while 1 means the model explains all the variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is R-squared related to residuals?

A

R-squared is calculated as
1− (TotalSumofSquares(TSS)/SumofSquaredResiduals(SSR)). It reflects how much of the data’s variation is captured by the model compared to the residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does a high R-squared value indicate about residuals?

A

A high R-squared value indicates that the residuals are small, meaning the model’s predictions are close to the actual values and the model fits the data well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does a low R-squared value indicate about residuals?

A

A low R-squared value indicates that the residuals are large, meaning the model’s predictions are far from the actual values and the model does not fit the data well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are predicted values in a regression model?

A

Predicted values (ŷ) are the values estimated by the regression model for the outcome variable based on the predictor variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are observed values of y in a regression model?

A

Observed values (y) are the actual values of the outcome variable collected during data gathering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How is a residual calculated in a regression model?

A

Residuals are calculated as:

Residual = Observedvalue (𝑦) − Predictedvalue (ŷ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does a positive residual indicate?

A

A positive residual indicates that the model under-predicted the outcome (the actual value is higher than the predicted value).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does a negative residual indicate?

A

A negative residual indicates that the model over-predicted the outcome (the actual value is lower than the predicted value).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the range of correlation coefficients?

A

Correlation coefficients range from -1 to 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does a correlation coefficient of 1 indicate?

A

A correlation coefficient of 1 indicates a perfect positive correlation; as one variable increases, the other variable increases proportionally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does a correlation coefficient of -1 indicate?

A

A correlation coefficient of -1 indicates a perfect negative correlation; as one variable increases, the other variable decreases proportionally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does a correlation coefficient of 0 indicate?

A

A correlation coefficient of 0 indicates no linear relationship; the variables do not have a consistent pattern of moving together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do you interpret a positive correlation coefficient (e.g., 𝑟 > 0)?

A

A positive correlation coefficient means that as one variable increases, the other variable also increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do you interpret a negative correlation coefficient (e.g., r<0)?

A

A negative correlation coefficient means that as one variable increases, the other variable decreases.

30
Q

What does a correlation coefficient close to 0 indicate?

A

A correlation coefficient close to 0 indicates a weak or negligible relationship between the variables.

31
Q

What does a correlation coefficient around 0.3 to 0.7 (or -0.3 to -0.7) suggest?

A

This suggests a moderate relationship between the variables. (Subject to interpretation)

32
Q

What does a correlation coefficient close to 1 or -1 indicate?

A

This indicates a strong relationship; the variables move closely in sync with each other.

33
Q

What should you remember about correlation in relation to causation?

A

Correlation does not imply causation; a high correlation doesn’t mean one variable causes the other.

34
Q

What kind of relationships does the correlation coefficient capture?

A

The correlation coefficient captures linear relationships only; non-linear relationships are not well represented.

35
Q

How can outliers affect the correlation coefficient?

A

Outliers can heavily influence the correlation, making it seem stronger or weaker than it actually is for most of the data.

36
Q

How do you interpret a positive slope coefficient (𝛽) in a regression model?

A

A positive slope coefficient means that as the predictor variable increases, the outcome variable is expected to increase as well. It shows the rate of increase in the outcome for every one-unit increase in the predictor.

37
Q

How do you interpret a negative slope coefficient (𝛽) in a regression model?

A

A negative slope coefficient means that as the predictor variable increases, the outcome variable is expected to decrease. It shows the rate of decrease in the outcome for every one-unit increase in the predictor.

38
Q

What does the magnitude of a slope coefficient tell you in a regression model?

A

The magnitude of a slope coefficient indicates how strong the relationship is between the predictor variable and the outcome variable. Larger coefficients mean a bigger effect on the outcome, while smaller coefficients indicate a weaker effect.

39
Q

How would you use the intercept and slope coefficients to make predictions in a regression model?

A

Use the intercept as the baseline value of the outcome when predictors are zero. Add the product of the slope coefficients and their corresponding predictor values to the intercept to make predictions about the outcome.

40
Q

What does the t-distribution help with in regression analysis?

A

the t-distribution helps determine if the estimates from the regression are statistically significant and reliable.

41
Q

What is a t-statistic in regression?

A

The t-statistic tells us if a variable in your model has a strong impact on what you’re trying to predict. It does this by comparing how big the variable’s effect is to the variability of that effect. A large t-statistic means the effect is strong and likely real, while a small one means it might just be random noise.

Example:
If you’re analyzing how hours studied affects exam scores:

The t-statistic for the coefficient of hours studied helps determine if the relationship you observe is likely due to a true impact of studying hours on scores, or if it might be just a coincidence in your sample.

42
Q

What does a p-value represent?

A

The p-value tells you how likely it is to observe your data (or something more extreme) if H₀ were true. A low p-value means that it’s unlikely the data you observed could have occurred just by random chance under H₀.

43
Q

How do you interpret a p-value of 0.03?

A

A p-value of 0.03 means there’s a 3% chance that the observed effect is due to random chance rather than a real effect.

44
Q

How does the t-distribution relate to p-values?

A

the t-distribution is used to calculate the p-value from the t-statistic, which tells you if your results are likely due to chance or if they reflect a true effect.

Imagine you’re checking if a new study method really works. After analyzing the test score changes, you calculate a t-statistic of 2.45. Using the t-distribution, you find that there’s only a 3.7% chance you’d see such a result if the method didn’t actually improve scores. Since this is a small chance, you conclude the method likely does have an effect.

45
Q

What is the goal of the least squares method in relation to the arithmetic mean?

A

The goal is to find the line or model that minimizes the total squared differences between the observed values and the predicted values from the model.

46
Q

Why do we use squared differences in the least squares method?

A

Squaring the differences amplifies larger errors more than smaller ones, ensuring the model minimizes the biggest discrepancies, making the model more robust overall.

47
Q

How does the arithmetic mean act as the least squares deviation estimator?

A

The arithmetic mean minimizes the sum of squared differences between itself and all data points, making it the best estimate to represent the data.

48
Q

What happens if you use a number other than the arithmetic mean to calculate squared differences?

A

Using any number other than the arithmetic mean will result in larger total squared differences compared to using the mean.

49
Q

Why is the arithmetic mean considered the “best” average?

A

The arithmetic mean is the “best” average because it’s the number that fits the data most evenly (Its locationed mostly at the center of the data). It makes the total of all the squared gaps between itself and the data points as small as possible.

50
Q

What does “least squares” refer to?

A

mathematical method used to find the best-fitting line or model for a set of data points by minimizing the sum of the squared differences (errors) between the observed values and the values predicted by the mode

51
Q

How does choosing any number other than the mean affect squared deviations?

A

Choosing any number other than the mean results in a larger total of squared deviations compared to choosing the mean.

52
Q

What is the effect of the arithmetic mean on squared deviations?

A

The arithmetic mean minimizes the total squared deviations (differences) between itself and the data points. This means the sum of the squared differences between each data point and the mean is the smallest possible compared to any other number.

53
Q

What does a statistically significant p-value (e.g., < 0.05) for a predictor mean?

A

It means there is evidence that the predictor has a significant effect on the outcome variable.

54
Q

What does a high R-squared value (close to 1) in regression output suggest?

A

It suggests that the predictor(s) explain a large portion of the variability in the outcome variable.

55
Q

What does a confidence interval that does not include zero around a coefficient indicate?

A

It indicates that the predictor has a meaningful effect on the outcome variable.

56
Q

If the R-squared value of your regression model is 0.75, what hypothesis could you make?

A

Hypothesis: “Approximately 75% of the variation in the outcome variable can be explained by the predictor(s).”

56
Q

How do you interpret a 95% confidence interval for a predictor of (3, 7)?

A

Hypothesis: “The effect of the predictor on the outcome variable is likely between 3 and 7 units.”

57
Q

What is the purpose of a confidence interval in regression?

A

A confidence interval provides a range of values within which the true parameter value is likely to fall, accounting for uncertainty in the estimate.

58
Q

How do you construct a confidence interval around a regression parameter estimate?

A

Place a “window” (interval) around the parameter estimate, using a margin of error to define the range from a lower bound to an upper bound.

59
Q

How do you interpret the confidence interval for a regression parameter?

A

The interval represents a range of values within which you are confident the true parameter value lies. For example, a 95% confidence interval means that 95 out of 100 intervals constructed this way would contain the true parameter.

60
Q

If a regression coefficient estimate is +5 and the margin of error is ±2, what is the confidence interval?

A

The confidence interval is from 3 to 7.

61
Q

What does a confidence interval of 3 to 7 for a regression parameter tell you?

A

It suggests that the true effect of the predictor is likely between 3 and 7 units, and you can be confident about this range based on your sample data.

62
Q

What does a 95% confidence level mean when interpreting a confidence interval?

A

It means that if you took 100 different samples and constructed intervals in the same way, approximately 95 of those intervals would contain the true parameter value.

63
Q

Why is it important to understand the margin of error in a confidence interval?

A

The margin of error determines how wide the confidence interval is, reflecting the degree of uncertainty about the parameter estimate.

64
Q

What does the residual indicate in regression analysis?

A

The residual indicates the part of the outcome that the model could not predict, showing the error or difference between the actual and predicted values.

65
Q

What does Adjusted R-squared account for when comparing models?

A

Adjusted R-squared accounts for the number of predictors in the model and penalizes adding too many predictors, helping to prevent overfitting.

66
Q

Why might a simpler model perform better than a more complex one?

A

A simpler model is less likely to overfit the data, meaning it may perform better on new, unseen data.

67
Q

How can cross-validation help in comparing models?

A

Cross-validation tests how well each model performs on different subsets of data, providing a more reliable sense of how it will perform on new data.

68
Q

What is the trade-off when comparing a simple model versus a complex model?

A

A complex model may fit the current data better but risks overfitting, while a simple model may be less accurate but better generalizes to new data.

69
Q

What is overfitting in the context of regression models?

A

Overfitting occurs when a model is too complex and captures noise in the data rather than the true underlying pattern, leading to poor performance on new data.

70
Q

How does R-squared help in evaluating model performance?

A

R-squared shows how well the model explains the variation in the outcome. However, to compare models with different predictors, Adjusted R-squared and other criteria should also be considered.

71
Q

What are residuals in regression?

A

Residuals are the differences between the observed values and the predicted values from a regression model. They represent the part of the outcome that the model couldn’t predict, showing how far off the model’s predictions are from the actual data points.