Test Prep Flashcards

Question

What does a correlation coefficient of 1 indicate?

Answer 1

A correlation coefficient of 1 indicates a perfect positive correlation; as one variable increases, the other variable increases proportionally.

Answer 2

A correlation coefficient of -1 indicates a perfect negative correlation; as one variable increases, the other variable decreases proportionally.

Answer 3

A correlation coefficient of 0 indicates no linear relationship; the variables do not have a consistent pattern of moving together.

Answer 4

A positive correlation coefficient means that as one variable increases, the other variable also increases.

Answer 5

A negative correlation coefficient means that as one variable increases, the other variable decreases.

Answer 6

A correlation coefficient close to 0 indicates a weak or negligible relationship between the variables.

Answer 7

This suggests a moderate relationship between the variables. (Subject to interpretation)

Answer 8

This indicates a strong relationship; the variables move closely in sync with each other.

Answer 9

Correlation does not imply causation; a high correlation doesn’t mean one variable causes the other.

Answer 10

The correlation coefficient captures linear relationships only; non-linear relationships are not well represented.

Answer 11

Outliers can heavily influence the correlation, making it seem stronger or weaker than it actually is for most of the data.

Answer 12

A positive slope coefficient means that as the predictor variable increases, the outcome variable is expected to increase as well. It shows the rate of increase in the outcome for every one-unit increase in the predictor.

Answer 13

A negative slope coefficient means that as the predictor variable increases, the outcome variable is expected to decrease. It shows the rate of decrease in the outcome for every one-unit increase in the predictor.

Answer 14

The magnitude of a slope coefficient indicates how strong the relationship is between the predictor variable and the outcome variable. Larger coefficients mean a bigger effect on the outcome, while smaller coefficients indicate a weaker effect.

Answer 15

Use the intercept as the baseline value of the outcome when predictors are zero. Add the product of the slope coefficients and their corresponding predictor values to the intercept to make predictions about the outcome.

Answer 16

the t-distribution helps determine if the estimates from the regression are statistically significant and reliable.

Answer 17

The t-statistic tells us if a variable in your model has a strong impact on what you’re trying to predict. It does this by comparing how big the variable's effect is to the variability of that effect. A large t-statistic means the effect is strong and likely real, while a small one means it might just be random noise. Example: If you're analyzing how hours studied affects exam scores: The t-statistic for the coefficient of hours studied helps determine if the relationship you observe is likely due to a true impact of studying hours on scores, or if it might be just a coincidence in your sample.

Answer 18

The p-value tells you how likely it is to observe your data (or something more extreme) if H₀ were true. A low p-value means that it’s unlikely the data you observed could have occurred just by random chance under H₀.

Answer 19

A p-value of 0.03 means there’s a 3% chance that the observed effect is due to random chance rather than a real effect.

Answer 20

the t-distribution is used to calculate the p-value from the t-statistic, which tells you if your results are likely due to chance or if they reflect a true effect. Imagine you’re checking if a new study method really works. After analyzing the test score changes, you calculate a t-statistic of 2.45. Using the t-distribution, you find that there’s only a 3.7% chance you’d see such a result if the method didn’t actually improve scores. Since this is a small chance, you conclude the method likely does have an effect.

Answer 21

The goal is to find the line or model that minimizes the total squared differences between the observed values and the predicted values from the model.

Answer 22

Squaring the differences amplifies larger errors more than smaller ones, ensuring the model minimizes the biggest discrepancies, making the model more robust overall.

Answer 23

The arithmetic mean minimizes the sum of squared differences between itself and all data points, making it the best estimate to represent the data.

Answer 24

Using any number other than the arithmetic mean will result in larger total squared differences compared to using the mean.

Answer 25

The arithmetic mean is the "best" average because it’s the number that fits the data most evenly (Its locationed mostly at the center of the data). It makes the total of all the squared gaps between itself and the data points as small as possible.

Answer 26

mathematical method used to find the best-fitting line or model for a set of data points by minimizing the sum of the squared differences (errors) between the observed values and the values predicted by the mode

Answer 27

Choosing any number other than the mean results in a larger total of squared deviations compared to choosing the mean.

Answer 28

The arithmetic mean minimizes the total squared deviations (differences) between itself and the data points. This means the sum of the squared differences between each data point and the mean is the smallest possible compared to any other number.

Answer 29

It means there is evidence that the predictor has a significant effect on the outcome variable.

Answer 30

It suggests that the predictor(s) explain a large portion of the variability in the outcome variable.

Answer 31

It indicates that the predictor has a meaningful effect on the outcome variable.

Answer 32

Hypothesis: "Approximately 75% of the variation in the outcome variable can be explained by the predictor(s)."

Answer 33

Hypothesis: "The effect of the predictor on the outcome variable is likely between 3 and 7 units."

Answer 34

A confidence interval provides a range of values within which the true parameter value is likely to fall, accounting for uncertainty in the estimate.

Answer 35

Place a “window” (interval) around the parameter estimate, using a margin of error to define the range from a lower bound to an upper bound.

Answer 36

The interval represents a range of values within which you are confident the true parameter value lies. For example, a 95% confidence interval means that 95 out of 100 intervals constructed this way would contain the true parameter.

Answer 37

The confidence interval is from 3 to 7.

Answer 38

It suggests that the true effect of the predictor is likely between 3 and 7 units, and you can be confident about this range based on your sample data.

Answer 39

It means that if you took 100 different samples and constructed intervals in the same way, approximately 95 of those intervals would contain the true parameter value.

Answer 40

The margin of error determines how wide the confidence interval is, reflecting the degree of uncertainty about the parameter estimate.

Answer 41

The residual indicates the part of the outcome that the model could not predict, showing the error or difference between the actual and predicted values.

Answer 42

Adjusted R-squared accounts for the number of predictors in the model and penalizes adding too many predictors, helping to prevent overfitting.

Answer 43

A simpler model is less likely to overfit the data, meaning it may perform better on new, unseen data.

Answer 44

Cross-validation tests how well each model performs on different subsets of data, providing a more reliable sense of how it will perform on new data.

Answer 45

A complex model may fit the current data better but risks overfitting, while a simple model may be less accurate but better generalizes to new data.

Answer 46

Overfitting occurs when a model is too complex and captures noise in the data rather than the true underlying pattern, leading to poor performance on new data.

Answer 47

R-squared shows how well the model explains the variation in the outcome. However, to compare models with different predictors, Adjusted R-squared and other criteria should also be considered.

Answer 48

Residuals are the differences between the observed values and the predicted values from a regression model. They represent the part of the outcome that the model couldn't predict, showing how far off the model's predictions are from the actual data points.

Answer 49

The F-test helps you determine if adding more predictors to your model significantly improves its ability to explain the outcome, comparing a more complex model to a simpler one.

Answer 50

The F-test answers: "Does this model with more predictors do a better job than a simpler model?"

Answer 51

The F-test checks if adding a new predictor to your model meaningfully improves the predictions. If the F-test result is significant (low p-value), the new model is better.

Answer 52

Adjusted R-squared is more reliable because it only increases if new predictors truly improve the model, rather than just adding predictors that don’t help much.

Answer 53

Adjusted R-squared prevents you from adding too many unnecessary variables, as it adjusts the fit measure to account for the number of predictors.

Answer 54

The F-test shows if adding predictors improves the model, while Adjusted R-squared ensures that you're not adding predictors that don’t really help, giving you a better quality fit.

Answer 55

The F-test checks if adding more predictors improves the model overall, while Adjusted R-squared gives a more accurate measure of how well the model explains the data without adding unnecessary complexity.

Answer 56

The null hypothesis states that the predictor has no effect on the outcome, meaning the coefficient (slope) is zero.

Answer 57

The alternative hypothesis states that the predictor does affect the outcome, meaning the coefficient (slope) is not zero.

Answer 58

A t-test is used to check how far the estimated coefficient (slope) is from zero, determining if the predictor has a significant effect on the outcome.

Answer 59

If the p-value is less than 0.05, we reject the null hypothesis, meaning we have evidence that the predictor affects the outcome.

Answer 60

The p-value tells us the probability that we would observe this coefficient if the null hypothesis (no effect) were true. A small p-value suggests that the predictor has a significant effect.

Answer 61

If the p-value is greater than 0.05, we fail to reject the null hypothesis, meaning we do not have enough evidence that the predictor affects the outcome.

Answer 62

A small p-value suggests that the predictor has a significant effect on the outcome, and we should reject the null hypothesis.

Answer 63

Rejecting the null hypothesis means we’ve found evidence that the predictor a

Answer 64

Failing to reject the null hypothesis means there is no evidence that the predictor has a significant effect on the outcome (the slope could be zero).

Answer 65

Adding another predictor provides more information to the model, helping explain the outcome more effectively by considering additional factors.

Answer 66

No, each predictor keeps its own effect and gets its own coefficient (slope), showing how it affects the outcome independently of the others.

Answer 67

Adding a predictor can improve the model’s accuracy because it helps explain more of the variability in the outcome by accounting for additional factors.

Answer 68

Each new predictor gets its own slope (coefficient), which shows how much the outcome changes with that predictor, assuming other factors are held constant.

Answer 69

No, sometimes adding a predictor doesn't help much if it doesn't explain much more about the outcome. It may not improve the model's predictions.

Answer 70

A useful predictor improves the model by explaining additional variability in the outcome, making predictions more accurate.

Answer 71

Each predictor affects the outcome independently, and the model calculates how much the outcome changes with each predictor while keeping the others constant.

Answer 72

Qualitative input features are categorical variables that represent groups or categories, such as gender, education level, or location

Answer 73

We need to code qualitative features because regression models require numerical input, and coding converts categories into a numerical format.

Answer 74

Dummy coding converts a qualitative feature into multiple binary variables, each representing a category, with one category serving as the reference group.

Answer 75

You would create two dummy variables: Color_Red: 1 if Red, 0 otherwise Color_Blue: 1 if Blue, 0 otherwise (Green would be the reference category)

Answer 76

It indicates how much the outcome variable is expected to change when that category is present compared to the reference category.

Answer 77

You could have: Intercept = $200,000 Coefficient for Location_Urban = +$50,000 Coefficient for Location_Suburban = +$30,000

Answer 78

If the house is in an Urban location, the predicted price is $200,000 + $50,000 = $250,000.

Answer 79

Coding allows you to include non-numeric categories in the regression model, helping you understand their influence on the outcome variable.

Answer 80

Think of qualitative features as different flavors; coding them lets you see how each "flavor" affects the overall "taste" (outcome) you are predicting!

Test Prep Flashcards

(104 cards)