Chapter 3 - Data Flashcards
What time frame of data do actuaries require for considering the future and why?
Usually require data about the present to give an accurate starting point for projecting into the future and past data to use as a guide for constructing models and setting assumptions Ex: exposed to risk or numbers of deaths/claims
Knowledge of structural drivers is also essential ex: recent pandemic or social trends;
Give examples of different data that might be needed for a bank seeking to lend int he form of mortgages? (data thats fact, uncertainty and judgement?)
Facts obtained: Amount of requested mortgage, address of home which loan is secured, house purchase price
Reasonably accurate data: Past loan experience, default rates for different types of borrowers or house price movements for example
Assumptions/Judgement: Banks assessment of the current economy.
Define explicit assumption
Explicit assumptions are those that have been expressed and shared.
Define implicit assumptions
Implicit assumptions are those that haven’t been articulated. We make implicit assumptions based on our personal experience and position, often without even realising that that’s what we’re doing.
Describe the two main uses of data by actuaries
Necessary for actuarial tasks: ex: pension scheme set up
Also for model development to predict what might happen in the future
Compare scientists view of the world vs actuaries in terms of modelling and predictions.
Scientists often know future outcomes as the same thing will happen every time. For Actuaries the situation is more difficult for two reasons:
First, the future is random. Model output is either a probability distribution or characteristics of a probability distribution.
The second difficulty is that the probability distribution or its characteristics are rarely known.
So, while physicists, chemists, economists and actuaries all have theories about how the world works, only the first two work in a reasonably stable, consistent and predictable environment.
Actuaries also have to recognise that observations from the past may or may not be representative of the conditions that will apply in the future.
Explain why insurance companies often collaborate collecting data? - not intuitive to collaborate with competitors?
Insurance companies have long recognised the need to acquire large amounts of data and often do so by collaborating in its collection.
This may be viewed as anti competitive but also may be viewed as increasing competition int he market because new market entrants have much to learn from the other players.
When specifying data requirements - what must be considered by actuaries before data collection
What data they need - Actuaries often may ahve opportunity to be part of the process when a new product or system in introduced. This provides an opportunity to request data fields that may be useful for future analyses.- think of data needs for the future.
Must find out what data is available to you.
Have an idea of the nature of the solution in advance of data collection
Definitions of each field in data you’re collecting
Give examples of data that might be required and also desired from employees for a DB pension scheme calculation of PV of promised benefits
Date of birth, date hired, current status, salary history, benefit amount and annuity choice for actives.
Forecasts may also benefit from knowing: gender and job classification and time in that job. These may help because mortality and retirement rates may differ by both these factors. Future salary values may differ by job classification and time served in that position.
Why do employers collect data that they cannot discriminate based on?
It’s collected to demonstrate compliance with anti discrimination, Need data to be representative of the population and for Reserving purposes! Insurers setting out technical provisions can collect gender data. It is not permitted to affect pricing but reserving needs to take account of gender.
Describe the balance equation between grouping data and credibility
Ideally data to be analysed should be split into homogeneous groups in a mortality investigation. There is a balance to be struck between splitting data into homogeneous groups and having sufficient data in a group
Where data is scarce, such as for numbers of deaths at young ages, splitting data into homogenous groups may result in data groups that are too small to enable any credible analysis.
There is also a need to carry out sensitivity testing to check that if the data are grouped in a different way the same results are obtained.
When using industry wide data why wouldn’t data supplied by different organisations be comparable?
Heterogeneity due to:
Geographical or socio-economic sections of the market, different sales methods, different practices ex: underwriting, Nature of the data stored by different companies will not be the same, Coding used for the risk factors may vary
What problems can arise from using industry wide data?
Heterogeneity - data being on different basis
Data is less detailed or less flexible
External data are often more out of date than internal data
Data quality will depend on the quality of the data systems of all of its contributors
Not all organisations contribute - not representative.
What is a key different between sampling and surveying?
Sampling = truly random selection, forced responses
Survey = biassed by voluntary returns
Define stratified sampling and give an example
Stratified sampling ( risk-based sampling) deliberately biased to large claims / important segments.
Ex: A full valuation of insurance liabilities or a pension scheme may need to use whole population data in order to demonstrate sufficient accuracy whereas customer satisfaction analysis might use survey data as accuracy is less critical.
Define Cross sectional vs longitudinal data
Cross sectional data means looking at multiple individuals over a short period of time, while longitudinal data looks at a (usually smaller) number of individuals over a longer period of time.
Define a record
A record is a collection of data referring to one individual or one contract. A field is a property that a record might have.
Define a relational database
A relational database uses multiple table structures, cross-referencing records between tables- can reduce run times for searching or sorting records.
What are three steps to do to ensure high quality data before the data is used?
Prevention - eliminate errors before they arise. Ex: data capture form tests, feasible values (gender M or F), automatiatic checks, type in email twice,
Detection - Study collected data for errors. Is it in line with my expectations?
Treatment - deal with errors that have been detected. May be possible to repair the data, or use imputation
Define imputation
The assignment of a value to something by inference from the value of the products or processes to which it contributes
Define deterministic data checks and give examples
Deterministic checks look for specific errors that are likely to occur.
Entries restricted to a specific list of possibilities such as male / female,
Entries are restricted to certain numerical ranges, such as range of allowable ages
Entries much bear specific relationships to each other Ex: Surrender value<Death benefit
Define an exploratory check with examples
Examines various global characteristics of the data to see if anything usual has been recorded.
Ex: calculations of max, min, means, stdevs, histograms, correlation / scatter plots etc.
What data checks should you perform acdcording to Mailander 2000?
Know where the data comes from, why and how it was captured
Understand the incentives inherent in the data’s original use
Examine several randomly selected records - anything unexpected?
Look for mistakes ex: blanks and duplicates
Ask for the definition of the critical data items - ex: Does Smoker mean: are you a smoker vs have you ever smoked Tabacco?
Develop ways to verify the data.
When checking data detail some other things you should do to verify it?
Check and reconcile data with other courses ex: compare internal demographics.
Check liability or asset exists on a given date and appropriate value has been recorded
Check liability is held or an asset is owned on a given date;
Check when an event is recorded the time of the event and the associated income or expenditure are allocated to the correct accounting period;
Check data is complete, consistent and free from unusual values
Perform random spot checks on data for individual members/policies or assets.
Detail how you should go about data repair
Ideall return to the source but this is expensive
May be able to spot what missing data should have been by Imputation: filling in missing fields based on comparison with other records where data is complete. Can be complicated multivariate analysis and assumes data deletion is random - risky
Also if you spot mistakes try to fix them - unrealistic to assume no mistakes.
What risks arise with use of summarized data
When valuing benefits for a scheme say, it may be appropriate to use summarised data instead of detailed membership data in some circumstances. It should be recognised that the reliability of the values will be reduced, as full validation of the data will be impossible. Summarised data may miss significant differences between the nature of benefits that have been grouped together so only suitable if such inaccuracy is recognised by the users of the results of the calculations.
How should insurers use summarised data provided by national agencies?
In some countries there are organisations that collect data from their member offices and then
make available summaries of all the data to their members. This can be used to determine bases for pricing, cannot be used in place of policy data to set provisions but is a starting point or a place to compare figures. Ex: CMIB figures
Explain what CMIB do?
Continuous Mortality Investigation Bureau of the Institute and Faculty of Actuaries in the UK, which does a large amount of work on mortality and morbidity statistics. CMIB accept data from insurers and publish mortality tables from this.
Why might industry pooled mortality data from CMIB or equivalent be different from population for insurer?
This is because underwriting standards or distribution channels can be different and some insurers may have particularly strong franchise with a certain occupation.
Distribution channels would have different mortality because of the idea of policyholders self electing for life assurance. Some policyholders may have a policy because they sought financial advice, others may be because they saw ads on social media/ on location.
What does data governance mean
Data governance is the term used to describe the overall management of the availability, usability, integrity, and security of data employed in an organisation.