Use of Multivariate Models in Pricing Flashcards

1
Q

Information required to quote: Motor

A
  1. Information on the policy and coverage.
    - type of cover
    - payment frequency
    - voluntary excess
  2. Details of proposer
    - age
    - gender
    - marital
    - occupation
  3. Details of driver
    - experience
    - age
    - relationship
  4. Details of vehicles
    - make/model
    - parking location
    - value
    - safety features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Information required to quote: Household insurance

A
  1. Policy details
    - type of cover
    - excess
    - special items
    - number of ppl in household- bachelor or family home
    - claims Experience
  2. Proposer
    - age
    - gender
    - marital
    - smoker
    - employment status
  3. House details
    - year purchase
    - type of Property- flat or bungalow
    - age of Property
    - construction type
    - number of bedrooms
    - location
    - ownership e.g lease
    - property value
    - security features e.g alarm
    - trees
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Vehicle Classification Techniques: ABI classifies vehicles into 50 groups based on characteristics. Factors used to establish groupings:

A
  • Damage and part costs.
  • repair times
  • new car values
  • body shells (aluminium or steel)
  • perfomance (acceleration speed)
  • car security.
  • safety features
  • engine size

Many insurer’s use ABI as a starting basis for categorising vehicles, then use adjust based on experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Risk Groupings : Motor

A
Claim type
Size of claim
Past claims Experience
NCD Status
Age
Gender
Vehicle group
Vehicle Age
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Risk Groupings: Household

A
Claim type e.g fire, theft
Size of claim
Number of bedrooms
Location
Age
Sum insured
Past claims Experience
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Pricing with limited Data

A
  • Use other data e.g external data, historical data adjusted, own data for similar biz
  • Include loadings or conservative assumptions
  • Use ILFs/first loss curves to estimate higher layer premium
  • Use qualitative methods e.g where risk perception is important element in pricing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Risk factors that actually affect actual claims cost: Motor

A

Driver
-driving style, Experience, level of skill, power of observation

Vehicle
-value of vehicle/repair cost, safety features e.g airbags, ABS, security features, performance, speed, size, weight

Environment
- the road, the time of day, natural hazards eng ice

Exposure
-amount of driving ( no. of miles)

Third parties

Many of these factors can’t be quantified, change over time and can’t be defined by customer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Point of sale questions act as proxies for genuine risk factors
Effectiveness of these factors depends on:

A
  • How directly it measures a genuine risk factor. E.g vehicle value
  • If proxy is a factual quantity known to Proposer e.g postcode can give more info on area.
  • Whether the fact has an obvious direction, which proposer may misstate to obtain cheaper quote e.g annual mileage.
  • The extent to which the proxy overlap to other e.g age of licence correlates to age of driver
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Risk factors that actually affect actual claims cost: Household

A
  • smoker is correlated to fire. Many house fires caused by cigarettes.
  • Some construction types are vulnerable to fire.
  • Presence of alarm will impact theft.
  • measures of exposure- sum insured, number of bedrooms, number of children.
  • Many customers do not realise they could claim for particular events. Employment status, no. of prior claims, postcode act as proxies for this behaviour.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

External data that may be used to predict claims experience.

A

Proposer

  • previous insurance company.
  • other insurance products
  • customer lifetime value models- asses price elasicity, cross sell, strategy to lapsing insureds, how long they remain in co. tc.
  • Customer behaviour models
  • credit score/insurance score

Location e.g postcode

  • average wealth
  • subsidence/ geological soil
  • flood, theft data
  • census data.

Insured Asset ( motor)

  • data from insurer’s trade body ABI e.g car group.
  • data from motor registration/ licencing authority
  • Additional vehicle data
  • data from inter industry agreement to share claims info.
  • Aggregrator sites may provide competitor rates.
  • Indices for inflation from government consultancies.
  • Tax rates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How external data actually affect actual claims cost:

A
  • Questions relating to perils e.g flood data related to flood peril, ABI data can give info on performance and repair costs.
  • Questions that relate to risk exposure e.g licensing bodies may collect actual mileage data.

Questions that relate to customer discretionary behaviour e.g wealth, prior claims, cross product

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Types of Multivariate

A
  • GLM ( mainly for personal lines)

- Link function is a log function that allows multiplicative relationship between factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Classification and approaches

A
  • Factors such as postcode have large number of levels
  • Classification may be used to produce smaller grouping such that they can be included in the GLM.
  • This improves predicted values taking into account credibility.

Approaches

  • Spatial smoothing
  • Vehicle classification techniques based on ABI.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Spatial Smoothing and type

A
  • physically close areas, same experience.

Types

  1. Distance based smoothing
  2. Adjacency based smoothing.
  • Too low spatial Smoothing means near or neighbouring location codes have little influence causing random noise. Hence reducing predictiveness of model.
  • Too much spatial smoothing can result in the blurring of experience so that some of the true underlying residual variation is lost, again causing distortions.
  • Both under-smoothing and over-smoothing can result in poor pricing and anti selection.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Distance based smoothing

A
  • The further away a location code, the less weight is given to experience
  • Regardless of rural/urban or natural/artificial boundaries.
  • used for weather related perils
  • easy to understand and implement (no distribution assumptions)
  • Can be enhanced to include other dimensions e.g density
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Adjacency based smoothing

A
  • Incorporates info about neighbouring location codes.
  • complex to implement ( iterative algorithm)
  • Natural and artificial boundaries are reflected in the smoothing.
  • Handle urban and rural difference s better. More appropriate for non weather related perils.
17
Q

Forms of models

A
  1. Claims Frequency
    - Error Distribution- Poisson
    - Link Function- Ln(y) - results in multiplicative model e.g Box cox
    - Scale Parameter- 1 or can be fitted
    - Variance- mu
    - Weight- length of time policy on risk i.e number of exposures
    - Offset term- none except if NCD levels are fixed
2. Claim Severity
Predict claim size
-Error Distribution- Gamma. Remember Gamma removes zero sized claims. Such claims may be modelled separately.
-Link Function- log link Ln(y) - results in multiplicative model.
-Scale Parameter- estimated
-Variance- mu^2
- Weight- claim numbers
-Offset term- none
3. Propensity
Models behaviour of policyholder.
-Error Distribution- Binomial
-Link Function- logit link (ln (y/(1-y))- results in multiplicative model and prediction in range [0,1].
-Scale Parameter-1
-Variance- mu *(1-mu)
- Weight- 1
-Offset term- none
  1. Claim numbers
  2. Total claims cost- Tweedie

(See page 774)

18
Q

Claim type models- gives accurate assessment of risk and product coverages.

A

Motor
-accidental damage, tpl property damage, tpl bodily injury, fire, theft, windscreen

Household
-accidental damage, fire, theft, burst pipes, flood, storm, subsidence, liability, personal possessions.

19
Q

Initial analysis prior to multivariate modelling

A
  1. One way analysis
  • investigates exposure and claims distribution, one way statistics e.g Freq, loss ratio.
  • indicates whether a variable has enough info to be included in model.
  • e.g if 90% of the exposure lies in one factor, that factor may be unsuitable (remove factors where there is a high proportion of data with unknown level. Though include low exposure/low claim count by combining with other levels (near aliasing))
  1. Two way analysis
    - considers key statistics for combination of two factors. useful where we suspect correlation.
  2. Correlation analysis
    - assess correlation within portfolio.
    - explains why multivariate differ from univariate.
    - indicates which factors may be affected by removal or inclusion in the GLM.
    - Correlation statistics for categorical factors- Cramer V- a value of zero means knowledge of one or two factors give no knowledge of the value of the other. A value of 1 means it allows the value of other factors to be deduced (highly correlated)
  3. Distribution analysis
    - mainly on distribution of claims amounts
    - identify unusual features or problems e.g large claims, distortions from reserves.
20
Q

Model Combining

A
  • Fitting GLMs separately for frequency and severity provide understanding of the way in which factors affect the cost of claims.
  • Further, it allows the identification and removal (via smoothing) of certain random effects from one element of the experience.
  • Ultimately, these models need to be combined to give an indication of loss cost -or “risk premium” -relativities.
21
Q

Combining models for Multiplicative ( one claim type)

A

The frequency multipliers for each factor can be multiplied by the severity multipliers for the same factors (adding the parameter estimates when using a log link function).

22
Q

Combining models across Additive claim types e.g single theoretical risk premium for many claim types

A
  • selecting a dataset that reflects future mix of business
  • calculate an expected claim frequency and severity by claim type for each record in the data.
  • Combine these fitted values, for each record, to derive the expected cost of claims (according to the individual GLMs) for each record.
  • Fit a further GLM to this total expected cost of claims, with this final GLM containing the union of all factors (and interactions) in all of the underlying models.

Additive methods may also allows us to incorporate non proportional elements, preventing high risk factors from being excessively loaded.

23
Q

Allocation of expenses for risks with high Lapses

A
  • the amount added to each observation’s expected risk premium could be designed to vary according to the results of a separate retention study.
  • This would allow risks with a high propensity to lapse to receive a higher proportion of fixed expense than others.
  • A further GLM is fitted to the sum of the expected risk premium and a (lapse-dependent) expense load.
24
Q

Model Validation methods.

A

-Validation samples can be withheld from the data used in modelling and then used to test how close the model predictions are to actual experience and therefore how accurate they are likely to be for future rates.

Method 1.
Plot A graph of actual vs predicted claims, compare.

  1. Method 2.
    -Calculate a lift curves
    -the steeper the curve, the more effective ( the policies with the highest expected claim frequency have the highest observed claim frequency.
    )
    -Best for comparing 2 models of different forms.
    How:
    a)Obtain an out-of-sample model validation dataset
    b) For each policy in the dataset, find its expected claim frequency using Model 1.
    c)Rank all policies in the validation dataset in order of expected claim frequency, according to Model 1.
    d)Split the ranked policies into groups of 1 to X equal exposure.
    e) Calculate the actual claim frequency for each group and plot this against the group number.
    f) Repeat this process for Model 2, and plot the results on the same chart.
  2. Method 3 Gains curve
    - Gini coefficient- area enclosed by model curve and diagonal line (as a ratio of the triangle above the diagonal)
    - The higher the Gini, the more predictive.
25
Q

Model implementation

A
  • Once the theoretical rates have been produced, they need to be compared with
    a) the current rates (to see what the effect would be on a particular book of business)
    b) if possible, with competitors’ rates.

-There are various graphical representations that can help with this. (Of 796)

26
Q

Model validation: Gains curve: how to create

A
  • Sort the policies from high to low according to the fitted model values.
  • Plot the cumulative fitted values against the cumulative exposure.
  • Plot the cumulative observed values against the cumulative exposure.
  • A straight diagonal reference line should then be plotted.
  • This represents the cumulative fitted values for a model that assigns the same expected value to all policies.
  • A statistical measure for the lift produced by the model is called the Gini coefficient.This can be thought of as the area enclosed by the model curve and the diagonal line as a ratio of the triangle above the diagonal.
  • A high Gini coefficient indicates a good model, ie one that distinguishes well between good and bad risks.
27
Q

Model validation: Actual Vs expected plots: how to create

A
  • Group the policies into bands by expected cost of claims.
  • The exposure in each group will differ, and may be small for some groups, eg where the expected claim cost is very high.
  • It may be necessary to rescale so that the average of the actual and expected values is the same.
  • For each group, plot both the average expected cost of claims and the average actual cost of claims on the same graph.
  • The average expected values will form a straight line plot.
  • Average actual values below this line indicate where the model overestimates the true value; average actual values above the line indicate underestimation.
  • A perfect fit would see the average actual values matching the average expected values, ie sitting along the same straight line plot
28
Q

Offsetting

A
  • The main purpose of offsetting is to fix the relativities of a factor to a set of values that would differ from the naturally fitted values.
  • A classic example of this would be the discount under NCD.
  • discount % may be fixed by marketing and hence cannot be fitted, but need to be allowed for when fitting the other data.
  • Offsetting is also used in more complex models where a hierarchy of models is wanted.
  • This is achieved by fitting the first model, offsetting all the values and then fitting a second model to explain the remaining patterns in the data.
  • Offsetting can be useful for model validation:
  • We first divide the data into modelling and test sets.
  • After fitting a model using the modelling dataset, we fully offset it.
  • The predictiveness of the model can then be judged by how well it performs against the test set, by comparing the observed values directly to the fitted values from the offset model.
  • A poor fit may indicate that there are perhaps significant factors missing from the model
  • If there is little or no data available for a rating factor, e.g introducing new rating factor, offsetting can be used to set the relativities for the factor based on benchmarks or actuarial judgment.