7 - Tools for Machine Learning Flashcards

1
Q

What was the initial proportion of incorrect decisions in the prior authorization process?

A

50 percent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the current proportion of incorrect decisions after model improvement?

A

30 percent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In an ideal world, what percentage of prior authorization decisions does Kamala hope to achieve?

A

0 percent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What approach does David prefer when building models in data science?

A

Start with a simple model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a residual in predictive modeling?

A

The difference between the true value of the outcome and what the model predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What was the average absolute difference between the predicted and actual health care expenditure in the initial model?

A

$12,000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What features did the linear regression model initially include?

A
  • Demographic data
  • Medical claims
  • Prior authorization report
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What were the top features selected by the linear regression model?

A
  • Prior history of back pain
  • Prior medication usage
  • Age
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How did the first team improve the model’s performance?

A

By examining residuals and adding more job-related features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What was the average residual after adding new features related to patients’ jobs?

A

$9,000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the purpose of interaction terms in feature engineering?

A

To represent the relationship between two main terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What transformation techniques did the second team apply to improve model fitting?

A
  • Square transformation
  • Square root transformation
  • Log transformation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a general rule of thumb when including polynomial terms in a regression model?

A

Always include the lower-order terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What strategy did the third team use to address the issue of unrepresentative training data?

A

Weighted regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are outliers in the context of health care expenditure?

A

Patients who incur significantly higher health care spending than the majority

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the effect of using least squares linear regression on outliers?

A

It is affected by outliers more than by non-outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What was the goal of the hackathon organized by David?

A

To improve the performance of the model predicting health care expenditure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What did the team find regarding the residuals of patients with different job types?

A

Higher residuals for patients in manual labor jobs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Fill in the blank: A _______ is a special term used to represent the relationship between two main effect variables.

A

interaction term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

True or False: The model’s performance improved after adding features about patients’ jobs.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are outliers in the context of health care expenditure?

A

Outliers are patients who incur significantly higher health care expenditures compared to the majority, such as those with hundreds of thousands of dollars in back pain-related spending over 3 years.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Why is least squares linear regression problematic when dealing with outliers?

A

Least squares is affected by outliers more than by nonoutliers, which can skew predictions and result in worse predictions for nonoutliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Explain the concept of squaring residuals in least squares regression.

A

Squaring residuals means that a residual of 4 is considered four times as bad as a residual of 2, disproportionately affecting the model’s predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is least absolute deviation?

A

Least absolute deviation is a method that uses the absolute value of residuals instead of squaring them, making it less sensitive to outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is quantile regression?
Quantile regression focuses on identifying specific quantiles, such as the median, which are robust to outliers.
26
How does mean wealth differ from median wealth when outliers are present?
Mean wealth can be significantly affected by outliers, while median wealth remains unchanged, providing a more stable measure.
27
What is K-nearest neighbor (K-NN) regression?
K-NN regression predicts outcomes based on the K most similar observations, using local information instead of the entire data set.
28
How is the parameter K defined in K-NN regression?
K represents the number of observations used to make a prediction, such as K = 20 meaning the model looks at the 20 most similar data points.
29
What is the common approach to define similarity in K-NN?
The most common approach is to calculate the Euclidean distance between data points.
30
What are the advantages of K-NN regression?
K-NN can handle nonlinear relationships and does not assume linearity, making it flexible for various types of data.
31
What is Naive Bayes?
Naive Bayes is a probabilistic machine learning model used for classification problems, based on Bayes' theorem.
32
What assumption does Naive Bayes make about features?
Naive Bayes assumes that each feature is independent of all other features in the data set.
33
What is the primary output of a Naive Bayes model?
Naive Bayes predicts the probability of a binary outcome, such as whether health expenditure will exceed $20,000.
34
What is the training process for Naive Bayes?
The training process involves estimating prior probabilities of each class and conditional probabilities of each feature given the class.
35
What is a Classification and Regression Tree (CART)?
CART modeling divides the population into smaller subpopulations and makes predictions on those smaller groups.
36
How does the CART algorithm work?
CART starts with the entire population and repeatedly searches for the best variable and value to split the population until stopping rules are met.
37
What are the advantages of CART modeling?
* No assumptions required about features * Can predict various types of variables * Less sensitive to outliers * Can handle missing data * Easily interpretable
38
What are the limitations of CART modeling?
CART models can be sensitive to small changes in data, do not produce coefficients like regression models, and require careful evaluation of variable importance.
39
What is a random forest?
A random forest is a collection of multiple CART models built from random samples of the data.
40
What is the significance of randomness in random forests?
Randomness is applied in creating random samples of data and in selecting input variables at specific nodes.
41
What is the goal of boosting, bagging, and ensembling?
These techniques combine multiple machine learning models to improve predictive performance.
42
What are ensembling methods?
Techniques for combining the predictions of several models into a single, more accurate prediction. ## Footnote Ensembling methods can improve the robustness of predictions.
43
What is model stacking?
A method where different models are created and their predictions are used as input variables to a new model for final predictions. ## Footnote The first models are known as level 1 models, and the final model is called the level 2 model.
44
What is a level 1 model?
The initial models created in model stacking that serve as inputs to the level 2 model.
45
What is the function of a level 2 model in stacking?
To combine predictions from level 1 models to produce a final result.
46
What is bagging?
A special case of ensembling called bootstrap aggregation that involves resampling data sets from the original data set. ## Footnote Bagging can improve prediction accuracy and help avoid overfitting.
47
What does bootstrapping refer to?
Making several data sets from the original data set by resampling observations.
48
What is gradient boosted machine learning (GBML)?
A technique where models learn by giving misclassified observations more weight in the next training iteration.
49
What is the purpose of boosting in machine learning?
To improve predictions by focusing on difficult-to-predict observations while maintaining performance on easier ones.
50
Name two commonly used boosting algorithms.
* Adaboost * Arcboost
51
What models were included in the ensemble described by David?
* Linear regression model * K-NN model * Regression tree model
52
What was the average residual achieved by the stacked ensemble?
$5,000
53
True or False: The best individual model had an average residual of $8,500 before ensembling.
True
54
Fill in the blank: The process of combining multiple models to improve prediction accuracy is called _______.
ensembling
55
How does ensembling improve model performance?
By leveraging the strengths of different models to produce a more accurate and robust prediction.
56
What analogy did David use to explain the benefits of ensembling?
A trivia game show partnership where two individuals complement each other's knowledge.
57
What should be examined when analyzing the residuals of a model?
Patterns in the residuals
58
What is a potential benefit of using metalearning methods like boosting, bagging, and ensembling?
Improved model performance.
59
What is a common challenge when using K-nearest neighbor approaches?
Determining whether it is superior to parametric regression approaches in specific scenarios.
60
What type of terms might be included in modeling based on data knowledge?
* Polynomial terms * Interaction terms