Chapter 3 - Data Flashcards
What time frame of data do actuaries require for considering the future and why?
Usually require data about the present to give an accurate starting point for projecting into the future and past data to use as a guide for constructing models and setting assumptions Ex: exposed to risk or numbers of deaths/claims
Knowledge of structural drivers is also essential ex: recent pandemic or social trends;
Give examples of different data that might be needed for a bank seeking to lend int he form of mortgages? (data thats fact, uncertainty and judgement?)
Facts obtained: Amount of requested mortgage, address of home which loan is secured, house purchase price
Reasonably accurate data: Past loan experience, default rates for different types of borrowers or house price movements for example
Assumptions/Judgement: Banks assessment of the current economy.
Define explicit assumption
Explicit assumptions are those that have been expressed and shared.
Define implicit assumptions
Implicit assumptions are those that haven’t been articulated. We make implicit assumptions based on our personal experience and position, often without even realising that that’s what we’re doing.
Describe the two main uses of data by actuaries
Necessary for actuarial tasks: ex: pension scheme set up
Also for model development to predict what might happen in the future
Compare scientists view of the world vs actuaries in terms of modelling and predictions.
Scientists often know future outcomes as the same thing will happen every time. For Actuaries the situation is more difficult for two reasons:
First, the future is random. Model output is either a probability distribution or characteristics of a probability distribution.
The second difficulty is that the probability distribution or its characteristics are rarely known.
So, while physicists, chemists, economists and actuaries all have theories about how the world works, only the first two work in a reasonably stable, consistent and predictable environment.
Actuaries also have to recognise that observations from the past may or may not be representative of the conditions that will apply in the future.
Explain why insurance companies often collaborate collecting data? - not intuitive to collaborate with competitors?
Insurance companies have long recognised the need to acquire large amounts of data and often do so by collaborating in its collection.
This may be viewed as anti competitive but also may be viewed as increasing competition int he market because new market entrants have much to learn from the other players.
When specifying data requirements - what must be considered by actuaries before data collection
What data they need - Actuaries often may ahve opportunity to be part of the process when a new product or system in introduced. This provides an opportunity to request data fields that may be useful for future analyses.- think of data needs for the future.
Must find out what data is available to you.
Have an idea of the nature of the solution in advance of data collection
Definitions of each field in data you’re collecting
Give examples of data that might be required and also desired from employees for a DB pension scheme calculation of PV of promised benefits
Date of birth, date hired, current status, salary history, benefit amount and annuity choice for actives.
Forecasts may also benefit from knowing: gender and job classification and time in that job. These may help because mortality and retirement rates may differ by both these factors. Future salary values may differ by job classification and time served in that position.
Why do employers collect data that they cannot discriminate based on?
It’s collected to demonstrate compliance with anti discrimination, Need data to be representative of the population and for Reserving purposes! Insurers setting out technical provisions can collect gender data. It is not permitted to affect pricing but reserving needs to take account of gender.
Describe the balance equation between grouping data and credibility
Ideally data to be analysed should be split into homogeneous groups in a mortality investigation. There is a balance to be struck between splitting data into homogeneous groups and having sufficient data in a group
Where data is scarce, such as for numbers of deaths at young ages, splitting data into homogenous groups may result in data groups that are too small to enable any credible analysis.
There is also a need to carry out sensitivity testing to check that if the data are grouped in a different way the same results are obtained.
When using industry wide data why wouldn’t data supplied by different organisations be comparable?
Heterogeneity due to:
Geographical or socio-economic sections of the market, different sales methods, different practices ex: underwriting, Nature of the data stored by different companies will not be the same, Coding used for the risk factors may vary
What problems can arise from using industry wide data?
Heterogeneity - data being on different basis
Data is less detailed or less flexible
External data are often more out of date than internal data
Data quality will depend on the quality of the data systems of all of its contributors
Not all organisations contribute - not representative.
What is a key different between sampling and surveying?
Sampling = truly random selection, forced responses
Survey = biassed by voluntary returns
Define stratified sampling and give an example
Stratified sampling ( risk-based sampling) deliberately biased to large claims / important segments.
Ex: A full valuation of insurance liabilities or a pension scheme may need to use whole population data in order to demonstrate sufficient accuracy whereas customer satisfaction analysis might use survey data as accuracy is less critical.
Define Cross sectional vs longitudinal data
Cross sectional data means looking at multiple individuals over a short period of time, while longitudinal data looks at a (usually smaller) number of individuals over a longer period of time.
Define a record
A record is a collection of data referring to one individual or one contract. A field is a property that a record might have.
Define a relational database
A relational database uses multiple table structures, cross-referencing records between tables- can reduce run times for searching or sorting records.
What are three steps to do to ensure high quality data before the data is used?
Prevention - eliminate errors before they arise. Ex: data capture form tests, feasible values (gender M or F), automatiatic checks, type in email twice,
Detection - Study collected data for errors. Is it in line with my expectations?
Treatment - deal with errors that have been detected. May be possible to repair the data, or use imputation
Define imputation
The assignment of a value to something by inference from the value of the products or processes to which it contributes
Define deterministic data checks and give examples
Deterministic checks look for specific errors that are likely to occur.
Entries restricted to a specific list of possibilities such as male / female,
Entries are restricted to certain numerical ranges, such as range of allowable ages
Entries much bear specific relationships to each other Ex: Surrender value<Death benefit
Define an exploratory check with examples
Examines various global characteristics of the data to see if anything usual has been recorded.
Ex: calculations of max, min, means, stdevs, histograms, correlation / scatter plots etc.
What data checks should you perform acdcording to Mailander 2000?
Know where the data comes from, why and how it was captured
Understand the incentives inherent in the data’s original use
Examine several randomly selected records - anything unexpected?
Look for mistakes ex: blanks and duplicates
Ask for the definition of the critical data items - ex: Does Smoker mean: are you a smoker vs have you ever smoked Tabacco?
Develop ways to verify the data.