7 - Tools for Machine Learning Flashcards
What was the initial proportion of incorrect decisions in the prior authorization process?
50 percent
What is the current proportion of incorrect decisions after model improvement?
30 percent
In an ideal world, what percentage of prior authorization decisions does Kamala hope to achieve?
0 percent
What approach does David prefer when building models in data science?
Start with a simple model
What is a residual in predictive modeling?
The difference between the true value of the outcome and what the model predicted
What was the average absolute difference between the predicted and actual health care expenditure in the initial model?
$12,000
What features did the linear regression model initially include?
- Demographic data
- Medical claims
- Prior authorization report
What were the top features selected by the linear regression model?
- Prior history of back pain
- Prior medication usage
- Age
How did the first team improve the model’s performance?
By examining residuals and adding more job-related features
What was the average residual after adding new features related to patients’ jobs?
$9,000
What is the purpose of interaction terms in feature engineering?
To represent the relationship between two main terms
What transformation techniques did the second team apply to improve model fitting?
- Square transformation
- Square root transformation
- Log transformation
What is a general rule of thumb when including polynomial terms in a regression model?
Always include the lower-order terms
What strategy did the third team use to address the issue of unrepresentative training data?
Weighted regression
What are outliers in the context of health care expenditure?
Patients who incur significantly higher health care spending than the majority
What is the effect of using least squares linear regression on outliers?
It is affected by outliers more than by non-outliers
What was the goal of the hackathon organized by David?
To improve the performance of the model predicting health care expenditure
What did the team find regarding the residuals of patients with different job types?
Higher residuals for patients in manual labor jobs
Fill in the blank: A _______ is a special term used to represent the relationship between two main effect variables.
interaction term
True or False: The model’s performance improved after adding features about patients’ jobs.
True
What are outliers in the context of health care expenditure?
Outliers are patients who incur significantly higher health care expenditures compared to the majority, such as those with hundreds of thousands of dollars in back pain-related spending over 3 years.
Why is least squares linear regression problematic when dealing with outliers?
Least squares is affected by outliers more than by nonoutliers, which can skew predictions and result in worse predictions for nonoutliers.
Explain the concept of squaring residuals in least squares regression.
Squaring residuals means that a residual of 4 is considered four times as bad as a residual of 2, disproportionately affecting the model’s predictions.
What is least absolute deviation?
Least absolute deviation is a method that uses the absolute value of residuals instead of squaring them, making it less sensitive to outliers.