Chapter 12 - Introduction to Predictive Modeling in the Life insurance Industry Flashcards

1
Q

What is predictive modelling?

A

Uses statistics to predict outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of modelling?

A

To present some actual process or phenomenon in the real world so that we can understand it, and then predict how that may work in a variety of applications? (i.e future mortality in u/w)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some examples of modeling targets in insurance (5)?

A

1) Marketing - likelihood to buy, open, rate
2) Sales - Conversion rate
3) U/W - Risk class, mortality, morbidity
4) Servicing - Probability of lapse, call frequency
5) Claims - Claim frequency, claim severity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Framingham study?

A

Study that analyzed many variables to find the risk factors for CAD (has bewen the baseline behind preferred risk U.W for decades)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is triage U/W?

A

Utilizes predictive models - akin to use of reflex testing in abnormal labs - rather than obtaining an u/w requirement because it is -Aimed at predicting whether an additional requirement would provide useful info in u/wing the risk
-Benefit - cost saving, time saving

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is propensity scoring?

A

When more detailed information is collected (lab values) - more complex models can be built o to estimate the likelihood or propensity of a PI having a certain condition or disease.
-Large datasets; cases identified with high propensity could be flagged for APS or additional scrutiny

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the role of predictive modeling in decisions?

A

Models may be used to flag applications for immediate decline if they exceed a certain threshold .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the role of predictive modeling in predicting mortality?

A

The most difficult and complex use of predictive models - creating a model to predict mortality would require a huge amount of data, very challenging to do.
Issue - historical data may not represent current business. Past U/W requirements may have changed. Data may be missing.
-Where mortality models are created, typically they are used in a triage type environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some benefits of predictive modeling (4)?

A

1) Improved mortality, more competitive pricing - by identifying PI’s who are worse or better risk, cthan found through traditional U/W alone
2) Faster case processing and less invasive UW - modeling bypasses traditional methods which can be lengthy and require more from the PI
3) Lower underwriting costs - by identifying cases where requirements can be waived
4) Better underwriter utilization - reduced requirements = more cases without UW review resulting in simpler cases not requiring UW attention. This allows for UW to use expertise to handle more complex cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some challenges with predictive modeling (3)

A

1) Data availability
2) Data quality
3) Model fitting and subject matter expertise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What issues does data availability present in predictive modeling?

A

-Large number of death claims required
-The event being predicted can extend 20 years or more into the future
-This presents a lot of variables to be considered when using data; policies UW 20 yrs ago may not be as relevant in todayts environment (guidelines have shifted, products changed, prior book of business may be different)
-Mortality outcomes on all applicants (not just those issued and accepted) would be required - this requires working with multiple internal and third-party sources to obtain this data for historical applicants.
Challenges with predictor variables - infrequently occurring risks (diseases, conditions)) seen rarely can be considered ‘noise’ in the modeling process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What can be done to address issues with data availability?

A

Alternative modeling targets that occur with greater frequency can be chosen when building a mortality model - examples of alternative targets could be the UW risk decision/risk class or whether a particular requirement provided protective value. This reduced the volume of data required since the target is identified at time of underwriting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What issues does data quality present in predictive modeling?

A

Often data can be missing or corrupt - can be handled by simply eliminating this cases, however, this can result in a substantial amount of missing data, impacting the power of the model. Also, cases with missing data may be biased, resulting in ‘blind spots’ in the model.
-Inconsistent formats, no-linear relationships, collinearity and other technicalities pose issues with data quality as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some ways of handling missing data?

A

‘Filling in’ missing data by imputing the mean or median value and use that value for the cases where the val;ue is missing - can significantly bias the data set (i/e/ if a variable missing from a subset of cases is a specialized lab value that is only collected as a result of a lab reflex test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What issues does model fitting and subject matter expertise present in predictive modeling?

A

1) Underfitting to the data - occurs when the model insufficiently represents real -world phenomena with its predictions being too far from the actual data to be considered useful.
-Can be caused by an insufficient amount of data, by missing key factors ,or utilizing a model form that is too simple.
2) Overfitting - results when a model is developed to accurately predict the target for a particular dataset, but its predictions do not continue to hold into the future. The model must be show n to work not just on the data it was built upon, but on additional data that the model was not exposed to in the development process
-This can be done by partitioning data into separate sets:
a) a “build “ dataset that is used to develop the model.
b) a “validation” dataset that is used to test the model
-If the models shows considerably worse performance on the verification dataset, then possibility of overfitting should be considered.
3) Blind spots - missing data can result in a model that is not predictive in certain areas as the model has no basis in is build data set.
-Caution should be used when using a model in conditions outside of what it was built upon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How are rules engines used in predictive modeling?

A

Predictive models are often paired with rules engines (software tools that automate decision, “if” “then” statements)
-These rules are critical for flagging situations that can be rare, but very concerning for underwriting (particular medical). In these situations, underwriting subject matter expertise can be used to match UW guidelines with models.

17
Q

Name the 5 processes in predictive model development

A

1) Data sources for Underwriting
2) Feature Engineering
3) Feature Selection and Model Development
4) Model Evaluation and Validation
5) Model Implementation and Monitoring

18
Q

Describe the role of data sources for underwriting in model development (8 criteria)

A

Development of a model is based on empirical data analysis, identifying data we can observe today (health) that may have some relationship to the target variable that we cannot observe today (mortality)
Criteria for data used in developing a model

1) Is there a logical hypothesis as to whether the data is related to the target variable of interest?
2) How well is the data correlated to the target variable?
3) Is the data definition stable and consistent over time?
4) How much coverage does the data have a on polices in the study data set
5) For variables that are binary or categorical in nature, is there sufficient variety across the data set
6) Are there any regulatory or ethical concerns with using the data?
7) How much does the data cost to obtain for underwriting purposes?
8) How quickly can the data be obtained at the time of underwriting?

19
Q

What types of data can be obtained for life insurance mortality models?

A

Data provided by proposed insureds: Tell-interview, lab values from medical tests, external data (prescription history, motor vehicle history, MIB Codes)

20
Q

What are the 3 components of the model development process

A

1) Feature engineering
2) Feature selection and model development
3) model validation.

21
Q

What is feature Engineering?

A

Variables used in the model are called features (in UW terms, think of them as risk factors)
-Generally created from raw variables that were captured in data collection (i.e transforming DOB into age, build into BMI)
-Additional features can be creating by combining multiple raw variables into a synthetic variable that can have greater correlation with the target than any of the raw variables independently.

22
Q

What is Principal components analysis in feature engineering

A

Seeks to reduce the number of features when there are many correlated raw variables, that when combined, result in a much more powerful feature which is better correlated with the target.
-each of the features is usually examined using a robust systematic process to validate, improve or remove features as appropriate (looks at each feature from a univariate perspective, considering the average, median,minimum/maximum, frequency,standard deviation, skewness and other characteristics that give insight whether a feature will be useful and effective in the model build.
-Can help identify data quality issues.

23
Q

What data quality issues can exist in feature engineer?

A

1) Missing data - can result in poor performing or biased model. Can be addressed by simple deletion, mean imputation or prediction.
2) Outliers - points in the data that require careful attention because the can have outsized impacts on the model, resulting in skewed or biased results (i.e if height was listed as 60 feet tall).
-Can be dealt with by keeping them, deleting them, transforming or grouping values, inputting an expected value, or treating a segment separately, potentially with a different model.
-How they are dealt with is highly dependent on the type of outlier and reason it was in the data.

24
Q

What types of model types are used in model development (4)

A

Approach depends upon the type of model to be used.
1) Linear regression - Most frequently used type of model to represent business processes. Uses various inputs to predict an outcome that is typically a continuous number. It assumes there is a linear relationship between each of the independent variables and the dependent variable.
2) Logistic regression - similar to linear model - however the target variable is binary in nature. In insurance modeling, a form of this called “survival modeling” results in a survival function - describing the probability that an event occur at time T, occurs later than some given time t. - Cox proportional hazards model is example of this.
3) Decision tree- divides a dataset into progressively smaller sub-segments, using the features in rules. Advantage of being ismple to understand and interpret, however may result in overfitting.
4) Random forests - uses multiple decision trees based on different subsets of the data/features, and outputs the mode or mean prediction of the individual trees. Helps to reduce decision trees limitations

25
Q

What is the Cox proportional hazards model?

A

A type of logistic regression - is the most widely used statistical technique for estimating individual risk in studies of survival

26
Q

What are is the role of Evaluation in predictive modeling?

A

Once the model is selected, it needs to be evaluated and validated.
Evaluation is primarily concerned with determining the predictive power and relative error of a model

27
Q

What are is the role of Validation in predictive modeling?

A

Validation of the model primarily involves verifying that the model is robust, not over-fitted and thus can be relied upon to maintain its predictive power for some time into the future.

28
Q

What is cross-validation?

A

Assess how the results will generalize to an independent dataset. Helpful with smaller datasets that have a limited amount of data

29
Q

What are the 4 steps of model implementation and monitoring?

A

1) Integration and end-to-end testing
2) Reason codes for decision
3) Monitoring
4) Model Hold-outs

30
Q

Describe integration and end -to-end- testing

A

Before a model is implemented, it needs to be tested within the contest of the broader framework to evaluate the “end-to-end” impacts that should be expected from its implementation.
i.e. a model may be able to identify 30% cases where a certain requirement could be waived; however, when implemented with the broader rule framework, this rate may be much less - therefore the cost savings may not completely make up for incremental mortality cost, and thus the model or rules may need to be revisited.

31
Q

What are reason codes for decisions?

A

Without knowing how the model arrived at a particular score, it can reduce confidence in the model and make it difficult to explain decision to UW, managers, agents.
-Regulatory concern - some state require insurer to provide an explanation of why a customer receives and adverse UW decision.
-to address this, modeling software will usually provide reasons codes in addition to ta score. THe codes are usually algined with the most significant features that were used in arriving at the score.
-Some insures use Shapely Values (provide an intuitive model agnostic approach to interpreting model decision that can be used to determine the relative weights of the inputs in determining the output.) - not user friendly, require interpretation.

32
Q

What is the role of monitoring?

A

Ongoing monitoring should be conducted to regularly consider if the model is performing as expected, and address if any results seem surprising.
Any significant variances from expectations should be investigated for model implemntation erros

33
Q

What ar e model hold-outs?

A

AN additional type of monitoring should be conducted tif models are being used to waive requirements. This opens up opportunity for anti-selection - where a PI can manipulate inputs to qualify to have requirements waived for a better rating class.
-A certain percentage of PIs who qualify for waived requirements should be randomly selected for tradition UW - if anti-selection is found, then rules may need to be re-evaluated.

34
Q

What is the Underwriters role in predictive models?

A

By 2030, predictive model may replace the traditional role of an UW.
A shift in role for underwriters in development and management of models.
UWs can play a role in identifying when and how a model needs to be adjusted tdue to changes in risk appetite, UW guidelines or UW requirements.
There will still be a need to underwrite larger and very complex cases. Likely underwriters mix of cases will shift to cases with older ages, higher face amounts, and more complexity theen cases UW today.
-Could result in increased focus on training anbd changes in the concentration in training.

35
Q

What are some ethical and legal concerns with predictive model?

A

A lot of the data used in modeling is regulated. The Fair credit reporting act (FCRA) protects consumer data. MVRS, Rx databases and MIB check is FCRA compliant.