Ch 12 Intro to Predictive Modeling in the Life Insurance Industry Flashcards
4 Benefits of Predictive Modeling
- Improved Mortality & more competitive pricing
- Faster case processing & less invasive underwriting
- Lower underwriting costs
- Better underwriter utilization
Challenges w/ Predictive Modeling for Underwriting
- Data Availability: Need large number of death claims/ data from 20 yrs ago may no longer be relevant/ data must be accessible, electronic format/ modeling targets/ predictor variables
- Data Quality: missing or corrupt data/ inconsistent formats/ technical issues
underfitting
model insufficiently represents real-world phenomena w/ its predictors being too far from actual data to be considered useful. Caused by insufficient amount of data, missing 1 or more key factors, or utilizing a model form that is too simple. Miss important nuances in relationship, making it less useful.
overfitting
Model is developed to accurately predict the target for particular dataset but its predictions don’t continue to hold into the future
Divide into build dataset to develop and validation dataset to test the model. Difference between two show power of model. Worse validation performance is possible overfitting.
blind spot
Missing data results in model that isn’t predictive for certain age bands or other segments of population since model had no basis in its build data set.
rules engine
software tools that automate decisions, critical for flagging situations that can be rare but concerning. Paired w/ predictive models.
areas that use modeling targets
marketing, sales, UW, servicing, claims
applications of predictive modeling
Application Triage & Requirement Selection
Propensity Scoring
Approve/decline decisions
Mortality scoring
3 steps to develop models
feature engineering/ feature selection & model development/ model validation
feature engineering
variable used in model, created from raw variables captured in data collection
principal components analysis
reduces number of features when there are many correlated raw variables, that when combined, result in much more powerful feature better correlated w/ target
outliers
points in data that require careful attention, they can have outsized impacts on model resulting in skewed or biased results. Exist due to data entry errors, measurement errors, data processing errors.
select minimum number of features that retain most productive value but are least correlated w/ each other
accelerates model development
improves interpretability
reduces overfitting
linear regression
uses various inputs to predict an outcome that is typically a continuous number, most frequently used. assumes linear relationship between each independent/dependent variable
logistic regression
similar to linear regression, except target variable is binary in nature (meaning yes/no). form of this called survival modeling.
Cox proportional hazards model: most used statistical technique for estimating individual risk in studies of survival