Class 8 - Predictive Analytics Using Linear Regression Flashcards

1
Q

What Are Extreme Values, potential reasons and how do we address them?

A

Definition: Extreme values (or influential observations) are data points that are unusually large or small compared to the rest of the dataset.

These values can bias the results of a regression analysis and reduce the accuracy of predictions.

Potential Reasons: Data Entry Errors: Mistakes made when entering data (e.g., typos).
Potential Reasons: Measurement Errors: Inaccurate data collection or recording.
Address Extreme Values: Identify the Extreme Values:
Use graphs (e.g., scatterplots, boxplots) to visually detect outliers

Transform the Data:
Apply transformations (e.g., log transformation) to reduce the impact of extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the ways we can deal with extreme values?

A

Winsorization:
Set the observations with values above or below certain thresholds to the threshold values.
e.g., set variables greater (lower) than the top (bottom) 1 percentile to the value of the top (bottom) 1 percentile

Truncation:
Remove observations where the values are extreme.
e.g., remove observations where the variables are greater than the top 1 percentile at both tails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly