Key Terms Flashcards
What is overfitting?
This occurs when the model fits the training data too closely and fails to generalize well to new, unseen data. It might capture noise or random fluctuations in the data rather than the underlying relationship.
What is underfitting?
This happens when the model is too simple to capture the underlying structure of the data. It fails to fit the training data adequately and performs poorly on both the training and test data.
What is multicollinearity?
a statistical concept where several independent variables in a model are correlated.
What is heteroscedasticity?
When the standard deviation off a variable monitored over a specific amount of time are non-constant
What are outliers?
Outliers are data points that lie far away from the rest of the data. They can have a disproportionate influence on the regression model, pulling the estimated line towards them. It’s important to identify and handle outliers appropriately to avoid biased results.
What is nonlinearity?
If the relationship between the predictor variables and the response variable is not linear, fitting a linear regression model may not capture the true relationship. In such cases, you may need to consider using nonlinear regression techniques or transforming the variables.
What is missing data?
Missing data can introduce bias and reduce the effectiveness of regression analysis. It’s important to handle missing data appropriately, either through imputation or using techniques that can handle missing values directly.
What is Reverse Causality?
This occurs when the cause-effect relationship between two variables is incorrectly determined. For example, instead of A causing B, it’s actually B causing A. This can lead to erroneous conclusions if not properly addressed.
What is Omitted-Factors Bias?
This happens when important variables that could influence the relationship between the variables of interest are left out of the analysis. As a result, the estimated effects of the included variables may be biased or misleading.
What is Self-Selection Bias?
This occurs when individuals or groups self-select into a study or treatment group, leading to non-random assignment. For instance, in a survey about job satisfaction, people who respond might be those with particularly strong or weak opinions, skewing the results.
What is Bad Data?
Poor-quality or inaccurate data can lead to flawed analysis and conclusions. This can include missing data, data entry errors, outliers, or biased data collection methods.
What is Measurement Error?
Measurement error arises when the observed value of a variable differs from its true value. This can occur due to various reasons such as instrument malfunction, human error, or sampling variability. Measurement error can distort relationships between variables and lead to incorrect inferences.
What are Common Modeling Mistakes?
This refers to errors or misconceptions in the modeling process, such as using the wrong statistical technique, misinterpreting results, or applying inappropriate assumptions.
What is Using Mediating Factors or Outcomes as Control Variables?
Including variables that are on the causal pathway between the independent and dependent variables as control variables can bias estimates and lead to spurious relationships.
What is Using an Improper Reference Group?
When using categorical variables in regression analysis, choosing an improper reference group can lead to misinterpretation of coefficients and incorrect conclusions.