Test 1 Prep Flashcards

Question 1

Q

What is the formula for the simple linear regression model?

Answer

A

y=β _0 +β_1*x_1+ϵ

Where:
y: Dependent (response) variable
x_1: Independent (predictor) variable
𝛽_0: Intercept
𝛽_1: Slope (coefficient)
ϵ: Error term (residual)

Question 2

Q

What does the dependent variable (y) represent in simple linear regression?

Answer

A

y is the dependent variable (response). It’s the outcome we are trying to predict or explain, like house price in a real estate model.

Question 3

Q

What does the intercept (𝛽_0) mean in simple linear regression?

Answer

A

β _0 is the intercept. It’s the predicted value of y when x=0. It shows where the line crosses the y-axis. For example, it could represent the price of a house with zero square feet.

Question 4

Q

What does the slope (𝛽_1) mean in simple linear regression?

Answer

A

β_1 is the slope (coefficient). It shows how much y changes when x increases by 1 unit. For example, it might show how much house price increases for each extra square foot.

Question 5

Q

What is the error term (ϵ) in simple linear regression?

Answer

A

ϵ represents the error or residual, the difference between the actual y and the predicted y. It accounts for variation in y that isn’t explained by x.

Question 6

Q

What does the intercept tell us when the predictor variable (x) is zero?

Answer

A

It tells us the predicted value of y when x=0, essentially giving the baseline value of the response variable.

Question 7

Q

What does covariance measure?

Answer

A

Covariance measures whether two variables tend to move in the same direction (positive) or in opposite directions (negative).

Question 8

Q

What does a positive covariance indicate?

Answer

A

A positive covariance indicates that the two variables increase or decrease together.

Question 9

Q

What does a negative covariance indicate?

Answer

A

A negative covariance indicates that as one variable increases, the other decreases.

Question 10

Q

Why is covariance difficult to interpret?

Answer

A

Covariance is hard to interpret because its value depends on the scale of the variables and can be any large positive or negative number.

Question 11

Q

What does correlation measure?

Answer

A

Correlation measures both the direction and the strength of the relationship between two variables.

Question 12

Q

What are the value boundaries of correlation?

Answer

A

Correlation is always between -1 and 1, with:

1 meaning a perfect positive relationship.
-1 meaning a perfect negative relationship.
0 meaning no relationship.

Question 13

Q

How is correlation different from covariance?

Answer

A

Correlation is standardized and bounded between -1 and 1, making it easier to interpret than covariance, which has no fixed scale.

Question 14

Q

What does R-squared measure in a regression model?

Answer

A

R-squared measures the proportion of the total variability in the outcome variable that is explained by the predictor variable(s) in the model.

Question 15

Q

What is the range of R-squared values?

Answer

A

R-squared values range from 0 to 1. A value of 0 means the model explains none of the variability, while 1 means the model explains all the variability.

Question 16

Q

How is R-squared related to residuals?

Answer

A

R-squared is calculated as
1− (TotalSumofSquares(TSS)/SumofSquaredResiduals(SSR)). It reflects how much of the data’s variation is captured by the model compared to the residuals

Question 17

Q

What does a high R-squared value indicate about residuals?

Answer

A

A high R-squared value indicates that the residuals are small, meaning the model’s predictions are close to the actual values and the model fits the data well.

Question 18

Q

What does a low R-squared value indicate about residuals?

Answer

A

A low R-squared value indicates that the residuals are large, meaning the model’s predictions are far from the actual values and the model does not fit the data well.

Question 19

Q

What are predicted values in a regression model?

Answer

A

Predicted values (ŷ) are the values estimated by the regression model for the outcome variable based on the predictor variables.

Question 20

Q

What are observed values of y in a regression model?

Answer

A

Observed values (y) are the actual values of the outcome variable collected during data gathering.

Question 21

Q

How is a residual calculated in a regression model?

Answer

A

Residuals are calculated as:

Residual = Observedvalue (𝑦) − Predictedvalue (ŷ)

Question 22

Q

What does a positive residual indicate?

Answer

A

A positive residual indicates that the model under-predicted the outcome (the actual value is higher than the predicted value).

Question 23

Q

What does a negative residual indicate?

Answer

A

A negative residual indicates that the model over-predicted the outcome (the actual value is lower than the predicted value).

Question 24

Q

What is the range of correlation coefficients?

Answer

A

Correlation coefficients range from -1 to 1.

Question 25

Q

What does a correlation coefficient of 1 indicate?

Answer

A

A correlation coefficient of 1 indicates a perfect positive correlation; as one variable increases, the other variable increases proportionally.

Question 26

Q

What does a correlation coefficient of -1 indicate?

Answer

A

A correlation coefficient of -1 indicates a perfect negative correlation; as one variable increases, the other variable decreases proportionally.

Question 27

Q

What does a correlation coefficient of 0 indicate?

Answer

A

A correlation coefficient of 0 indicates no linear relationship; the variables do not have a consistent pattern of moving together.

Question 28

Q

How do you interpret a positive correlation coefficient (e.g., 𝑟 > 0)?

Answer

A

A positive correlation coefficient means that as one variable increases, the other variable also increases.

Question 29

Q

How do you interpret a negative correlation coefficient (e.g., r<0)?

Answer

A

A negative correlation coefficient means that as one variable increases, the other variable decreases.

Question 30

Q

What does a correlation coefficient close to 0 indicate?

Answer

A

A correlation coefficient close to 0 indicates a weak or negligible relationship between the variables.

Question 31

Q

What does a correlation coefficient around 0.3 to 0.7 (or -0.3 to -0.7) suggest?

Answer

A

This suggests a moderate relationship between the variables. (Subject to interpretation)

Question 32

Q

What does a correlation coefficient close to 1 or -1 indicate?

Answer

A

This indicates a strong relationship; the variables move closely in sync with each other.

Question 33

Q

What should you remember about correlation in relation to causation?

Answer

A

Correlation does not imply causation; a high correlation doesn’t mean one variable causes the other.

Question 34

Q

What kind of relationships does the correlation coefficient capture?

Answer

A

The correlation coefficient captures linear relationships only; non-linear relationships are not well represented.

Question 35

Q

How can outliers affect the correlation coefficient?

Answer

A

Outliers can heavily influence the correlation, making it seem stronger or weaker than it actually is for most of the data.

Question 36

Q

How do you interpret a positive slope coefficient (𝛽) in a regression model?

Answer

A

A positive slope coefficient means that as the predictor variable increases, the outcome variable is expected to increase as well. It shows the rate of increase in the outcome for every one-unit increase in the predictor.

Question 37

Q

How do you interpret a negative slope coefficient (𝛽) in a regression model?

Answer

A

A negative slope coefficient means that as the predictor variable increases, the outcome variable is expected to decrease. It shows the rate of decrease in the outcome for every one-unit increase in the predictor.

Question 38

Q

What does the magnitude of a slope coefficient tell you in a regression model?

Answer

A

The magnitude of a slope coefficient indicates how strong the relationship is between the predictor variable and the outcome variable. Larger coefficients mean a bigger effect on the outcome, while smaller coefficients indicate a weaker effect.

Question 39

Q

How would you use the intercept and slope coefficients to make predictions in a regression model?

Answer

A

Use the intercept as the baseline value of the outcome when predictors are zero. Add the product of the slope coefficients and their corresponding predictor values to the intercept to make predictions about the outcome.

Question 40

Q

What does the t-distribution help with in regression analysis?

Answer

A

the t-distribution helps determine if the estimates from the regression are statistically significant and reliable.

Question 41

Q

What is a t-statistic in regression?

Answer

A

The t-statistic tells us if a variable in your model has a strong impact on what you’re trying to predict. It does this by comparing how big the variable’s effect is to the variability of that effect. A large t-statistic means the effect is strong and likely real, while a small one means it might just be random noise.

Example:
If you’re analyzing how hours studied affects exam scores:

The t-statistic for the coefficient of hours studied helps determine if the relationship you observe is likely due to a true impact of studying hours on scores, or if it might be just a coincidence in your sample.

Question 42

Q

What does a p-value represent?

Answer

A

The p-value tells you how likely it is to observe your data (or something more extreme) if H₀ were true. A low p-value means that it’s unlikely the data you observed could have occurred just by random chance under H₀.

Question 43

Q

How do you interpret a p-value of 0.03?

Answer

A

A p-value of 0.03 means there’s a 3% chance that the observed effect is due to random chance rather than a real effect.

Question 44

Q

How does the t-distribution relate to p-values?

Answer

A

the t-distribution is used to calculate the p-value from the t-statistic, which tells you if your results are likely due to chance or if they reflect a true effect.

Imagine you’re checking if a new study method really works. After analyzing the test score changes, you calculate a t-statistic of 2.45. Using the t-distribution, you find that there’s only a 3.7% chance you’d see such a result if the method didn’t actually improve scores. Since this is a small chance, you conclude the method likely does have an effect.

Question 45

Q

What is the goal of the least squares method in relation to the arithmetic mean?

Answer

A

The goal is to find the line or model that minimizes the total squared differences between the observed values and the predicted values from the model.

Question 46

Q

Why do we use squared differences in the least squares method?

Answer

A

Squaring the differences amplifies larger errors more than smaller ones, ensuring the model minimizes the biggest discrepancies, making the model more robust overall.

Question 47

Q

How does the arithmetic mean act as the least squares deviation estimator?

Answer

A

The arithmetic mean minimizes the sum of squared differences between itself and all data points, making it the best estimate to represent the data.

Question 48

Q

What happens if you use a number other than the arithmetic mean to calculate squared differences?

Answer

A

Using any number other than the arithmetic mean will result in larger total squared differences compared to using the mean.

Question 49

Q

Why is the arithmetic mean considered the “best” average?

Answer

A

The arithmetic mean is the “best” average because it’s the number that fits the data most evenly (Its locationed mostly at the center of the data). It makes the total of all the squared gaps between itself and the data points as small as possible.

Question 50

Q

What does “least squares” refer to?

Answer

A

mathematical method used to find the best-fitting line or model for a set of data points by minimizing the sum of the squared differences (errors) between the observed values and the values predicted by the mode

Question 51

Q

How does choosing any number other than the mean affect squared deviations?

Answer

A

Choosing any number other than the mean results in a larger total of squared deviations compared to choosing the mean.

Question 52

Q

What is the effect of the arithmetic mean on squared deviations?

Answer

A

The arithmetic mean minimizes the total squared deviations (differences) between itself and the data points. This means the sum of the squared differences between each data point and the mean is the smallest possible compared to any other number.

Question 53

Q

What does a statistically significant p-value (e.g., < 0.05) for a predictor mean?

Answer

A

It means there is evidence that the predictor has a significant effect on the outcome variable.

Question 54

Q

What does a high R-squared value (close to 1) in regression output suggest?

Answer

A

It suggests that the predictor(s) explain a large portion of the variability in the outcome variable.

Question 55

Q

What does a confidence interval that does not include zero around a coefficient indicate?

Answer

A

It indicates that the predictor has a meaningful effect on the outcome variable.

Question 56

Q

If the R-squared value of your regression model is 0.75, what hypothesis could you make?

Answer

A

Hypothesis: “Approximately 75% of the variation in the outcome variable can be explained by the predictor(s).”

Question 57

Q

How do you interpret a 95% confidence interval for a predictor of (3, 7)?

Answer

A

Hypothesis: “The effect of the predictor on the outcome variable is likely between 3 and 7 units.”

Question 58

Q

What is the purpose of a confidence interval in regression?

Answer

A

A confidence interval provides a range of values within which the true parameter value is likely to fall, accounting for uncertainty in the estimate.

Question 59

Q

How do you construct a confidence interval around a regression parameter estimate?

Answer

A

Place a “window” (interval) around the parameter estimate, using a margin of error to define the range from a lower bound to an upper bound.

Question 60

Q

How do you interpret the confidence interval for a regression parameter?

Answer

A

The interval represents a range of values within which you are confident the true parameter value lies. For example, a 95% confidence interval means that 95 out of 100 intervals constructed this way would contain the true parameter.

Question 61

Q

If a regression coefficient estimate is +5 and the margin of error is ±2, what is the confidence interval?

Answer

A

The confidence interval is from 3 to 7.

Question 62

Q

What does a confidence interval of 3 to 7 for a regression parameter tell you?

Answer

A

It suggests that the true effect of the predictor is likely between 3 and 7 units, and you can be confident about this range based on your sample data.

Question 63

Q

What does a 95% confidence level mean when interpreting a confidence interval?

Answer

A

It means that if you took 100 different samples and constructed intervals in the same way, approximately 95 of those intervals would contain the true parameter value.

Question 64

Q

Why is it important to understand the margin of error in a confidence interval?

Answer

A

The margin of error determines how wide the confidence interval is, reflecting the degree of uncertainty about the parameter estimate.

Answer 65

A

The residual indicates the part of the outcome that the model could not predict, showing the error or difference between the actual and predicted values.

Answer 66

A

Adjusted R-squared accounts for the number of predictors in the model and penalizes adding too many predictors, helping to prevent overfitting.

Answer 67

A

A simpler model is less likely to overfit the data, meaning it may perform better on new, unseen data.

Answer 68

A

Cross-validation tests how well each model performs on different subsets of data, providing a more reliable sense of how it will perform on new data.

Answer 69

A

A complex model may fit the current data better but risks overfitting, while a simple model may be less accurate but better generalizes to new data.

Answer 70

A

Overfitting occurs when a model is too complex and captures noise in the data rather than the true underlying pattern, leading to poor performance on new data.

Answer 71

A

R-squared shows how well the model explains the variation in the outcome. However, to compare models with different predictors, Adjusted R-squared and other criteria should also be considered.

Answer 72

A

Residuals are the differences between the observed values and the predicted values from a regression model. They represent the part of the outcome that the model couldn’t predict, showing how far off the model’s predictions are from the actual data points.

Brainscape's Knowledge GenomeTM

Test 1 Prep Flashcards

Brainscape's Knowledge Genome^TM