Section 6 : Segmentation Flashcards
What is adverse selection and why is it problematic?
When higher-risk individuals are more likely to buy insurance due to asymmetric info, leading to losses if not priced appropriately.
“When improper classification causes loss of favorable risks and gain of unfavorable ones.”
📚 Source: Module 4, Section 4.6.1 (p.21–22)
How can big data help mitigate adverse selection?
By enabling more granular risk segmentation and targeted pricing based on behavioral and external data.
📚 Source: Module 4, p.22 + Big Data Paper p.9
What’s the risk of using overly granular pricing?
It may result in discrimination or regulatory concerns, especially if protected classes are disproportionately affected + probably less credible and sensible to big swings in the following years
📚 Source: Big Data Paper p.6, 13–14
What’s the difference between risk and uncertainty in modeling?
Risk is quantifiable; uncertainty is not. Actuarial models work better when risks are measurable.
📚 Source: Big Data Paper p.7
Why Zip Codes variable is useful but hard to model ethically.
useful in predicting loss, but can proxy for race or income, raising fairness issues.
📚 Source: Module 4, p.11–12 + Big Data Paper p.6
What is the purpose of using external data in pricing models?
To enrich internal data, improve predictive power, and overcome limitations of sparse in-house data.
📚 Source: Module 4, Section 4.3.1 (p.11)
What pricing response might you suggest if a competitor charges 10% less?
Consider segment-specific discounting, value-added services, or reviewing expense/reinsurance efficiency.
📚 Source: Module 4, Exercise 4.2 (p.10)
How do actuaries handle missing or default data?
Imputation, modeling missingness (by replacing missing values by mean), or supplementing with external benchmarks.
📚 Source: Module 4, p.12
How do you model rare but severe losses?
Use credibility weighting with industry data and fat-tailed distributions like Pareto.
📚 Source: Module 4, Section 4.3.4 (p.14)
Why are GLMs popular in actuarial pricing?
They handle skewed distributions, provide interpretable outputs, and suit exposure-based data.
📚 Source: Module 5, Section 5.3 (p.9–15)
When might GLMs be insufficient?
When relationships are highly nonlinear or involve complex interactions—data mining may be better.
📚 Source: Module 5, Section 5.4 (p.19–22)
What is the role of PvO (Predicted vs Observed) charts?
To validate that model predictions align with actual outcomes across segments.
📚 Source: Module 4, Section 4.7.2 (p.24)
What do lift charts assess?
The effectiveness of a model at discriminating between high and low-risk segments.
📚 Source: Module 4, Section 4.7.3 (p.25)
Why might you choose a simpler model over a better-fitting one?
To enhance interpretability, regulatory acceptance, and robustness against overfitting
📚 Source: Module 5, Section 5.3.2 (p.12–13)
What are non-risk factors in pricing decisions?
Competition, brand strategy, customer loyalty, marketing channels, and regulation.
📚 Source: Module 4, Section 4.8 (p.28)
How can you use quote conversion data in pricing?
To model demand elasticity and optimize price points for profitability.
📚 Source: Module 4, Section 4.8.2 (p.30)
When should you conduct pricing impact analysis?
Before launching a new pricing strategy to assess profit, growth, and risk implications.
📚 Source: Module 4, Section 4.8.5 (p.34)
How should pricing approaches differ between stakeholders (e.g., consumers vs. regulators)?
Consumers prioritize affordability and fairness; regulators prioritize accessibility, transparency, and non-discrimination; companies prioritize sustainability and profitability.
📚 Source: Big Data Paper p.6–7
When is pricing with technical premiums not enough?
When market dynamics, strategic goals, or regulatory constraints require deviation.
📚 Source: Module 4, p.7–8
Why is exposure normalization key in experience studies?
It ensures fair comparison of loss rates across policies or time.
📚 Source: Module 4, Section 4.3.2 (p.12)
What’s the difference between disparate impact and disparate treatment?
Impact = unintentional unfair outcomes; Treatment = intentional bias.
📚 Source: Big Data Paper p.12
How should you approach using socio-economic data?
Carefully—balance predictive power with risk of bias and regulatory limits.
📚 Source: Big Data Paper p.40–43
Why is explainability important in model deployment?
To meet regulatory expectations and build consumer trust.
📚 Source: Big Data Paper p.8–9
What is the goal of simple tabular analysis in risk classification?
to explore patterns and relativities using summary statistics across one or two rating variables.
What’s a key limitation of tabular analysis?
It does not consider correlations between explanatory variables.
What is the goal of GLM parameter estimation?
To test if predictors significantly impact the outcome (i.e., significantly ≠ 0).
Which distributions are typically used in insurance GLMs?
Poisson for frequency, Gamma for severity.
What’s the difference between an offset and weights in a GLM?
Offset adjusts the mean; weights adjust the importance of observations.
How is the predicted value retrieved in a GLM?
By applying the inverse of the link function.
Which metrics are used to compare GLMs?
AIC, BIC, Scaled Deviance, F-test, Chi-square
What does a small AIC or BIC indicate?
A better-fitting model with fewer parameters.
What is cross-validation?
A technique that trains and validates the model across multiple data splits.
Why is residual analysis important in GLMs?
To detect patterns that show model misfit.
What is distributional bias?
When exposures are unevenly distributed across classes, impacting relativity calculations.
Why might a one-way approach fail?
It doesn’t account for interactions between variables.
What is the goal of minimum bias methods?
To adjust class relativities considering all combinations of risk factors.
What error metric is used to compare one-way and minimum bias methods?
Absolute error—lower is better.
What is the main advantage of Decision Trees?
Easy to interpret and visualize.
What’s a key risk with Decision Trees and Random Forests?
Overfitting—models may capture noise instead of signal.
How do Gradient Boosting Machines (GBM) work?
By building trees sequentially, each correcting the previous.
Why use Random Forests?
For improved accuracy and flexibility across data types.
What are key ethical concerns in modeling?
Data quality, transparency, causality, and fairness.
What is a proxy variable?
A variable that mimics a disallowed or biased variable.
What’s the difference between interpretability and explainability?
Interpretable models explain themselves; explainable models need support to be understood.
What is lift in model performance?
The model’s ability to separate good and bad risks.
What does a ROC curve show?
The trade-off between true and false positives across thresholds.
What is AUROC?
Area Under ROC Curve—closer to 1 = better model.
What does the Gini index measure in modeling?
The ability to rank risks, not profitability.
What are common transformations for non-linear variables?
Binning, polynomial terms, and piecewise linear functions.
What’s the purpose of interaction terms?
To model the combined effect of variables.
What causes multicollinearity in GLMs?
Strong correlation between predictors.
How do you detect aliasing?
Perfect correlation between predictors—instability in model.
What are the high-level model-building steps?
Set goals, collect data, model, validate, implement, maintain.
When is regularization used?
To handle many variables and prevent overfitting.
In classification techniques, What does precision measure?
The proportion of predicted positives that are actually positive.