Key Terms Flashcards

1
Q

What is overfitting?

A

This occurs when the model fits the training data too closely and fails to generalize well to new, unseen data. It might capture noise or random fluctuations in the data rather than the underlying relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is underfitting?

A

This happens when the model is too simple to capture the underlying structure of the data. It fails to fit the training data adequately and performs poorly on both the training and test data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is multicollinearity?

A

a statistical concept where several independent variables in a model are correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is heteroscedasticity?

A

When the standard deviation off a variable monitored over a specific amount of time are non-constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are outliers?

A

Outliers are data points that lie far away from the rest of the data. They can have a disproportionate influence on the regression model, pulling the estimated line towards them. It’s important to identify and handle outliers appropriately to avoid biased results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is nonlinearity?

A

If the relationship between the predictor variables and the response variable is not linear, fitting a linear regression model may not capture the true relationship. In such cases, you may need to consider using nonlinear regression techniques or transforming the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is missing data?

A

Missing data can introduce bias and reduce the effectiveness of regression analysis. It’s important to handle missing data appropriately, either through imputation or using techniques that can handle missing values directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Reverse Causality?

A

This occurs when the cause-effect relationship between two variables is incorrectly determined. For example, instead of A causing B, it’s actually B causing A. This can lead to erroneous conclusions if not properly addressed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Omitted-Factors Bias?

A

This happens when important variables that could influence the relationship between the variables of interest are left out of the analysis. As a result, the estimated effects of the included variables may be biased or misleading.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Self-Selection Bias?

A

This occurs when individuals or groups self-select into a study or treatment group, leading to non-random assignment. For instance, in a survey about job satisfaction, people who respond might be those with particularly strong or weak opinions, skewing the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Bad Data?

A

Poor-quality or inaccurate data can lead to flawed analysis and conclusions. This can include missing data, data entry errors, outliers, or biased data collection methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Measurement Error?

A

Measurement error arises when the observed value of a variable differs from its true value. This can occur due to various reasons such as instrument malfunction, human error, or sampling variability. Measurement error can distort relationships between variables and lead to incorrect inferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are Common Modeling Mistakes?

A

This refers to errors or misconceptions in the modeling process, such as using the wrong statistical technique, misinterpreting results, or applying inappropriate assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Using Mediating Factors or Outcomes as Control Variables?

A

Including variables that are on the causal pathway between the independent and dependent variables as control variables can bias estimates and lead to spurious relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Using an Improper Reference Group?

A

When using categorical variables in regression analysis, choosing an improper reference group can lead to misinterpretation of coefficients and incorrect conclusions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Over-weighting Groups (When Using Fixed Effects or Dummy Variables)?

A

In panel data analysis or regression models with dummy variables, over-weighting certain groups (e.g., certain time periods or categories) can distort results and lead to biased estimates.

17
Q

What is Relationship Identification?

A

To understand and quantify the relationship between a dependent variable and one or more independent variables.

18
Q

What is Prediction?

A

To predict the value of the dependent variable based on the values of the independent variables.

19
Q

What is Control?

A

To control for the effects of independent variables when examining the relationship between the dependent variable and other variables.

20
Q

What is Model Evaluation?

A

To assess the goodness-of-fit of the regression model and determine how well it explains the variation in the dependent variable.

21
Q

What is Inference?

A

To make statistical inferences about the population parameters based on the sample data, such as testing hypotheses about the regression coefficients.

22
Q

What is Forecasting?

A

To forecast future values of the dependent variable based on historical data.