Exam 1 Flashcards
Why do we do predictive modeling?
So we know what can potentially happen in the future to price our premiums correctly and set aside enough reserves to pay off claims
Input Variables
The predictors of the model
Output Variables
The responses of the model
ϵ
Random error term which is independent of X and has mean zero
Reducible Error
Potentially improve the accuracy of our predicted f by using the most appropriate statistical learning technique
Irreducible Error
Y is also a function of ϵ, which by definition cannot be predicted using X. No matter how well we estimate f, we cannot reduce the error introduced by ϵ. This error is larger than 0
Prediction
If one is not concerned with the exact form of predicted f, provided that it yields accurate predictions for y
Inference
If one wants to understand the relationship between x and y (how y changes as a function of x)
Relationship between prediction and inference
Linear models may allow for simple and interpretable inferences but may not yield as accurate predictions. However, non-linear approached can provide accurate predictions, but comes at the expense of being less interpretable
Flexible Models
When a model fits around the data it is using with many function forms, resulting in a model that can be accurately predictable
Overfitting
When all the parameters are considered and the model is way to specific to the data
Underfitting
The model is way too simplistic to capture the underlying patterns in the data
Training Data
Observations used to train or teach our method how to estimate f. Apply a statistical learning method to this first
Test Data
Use a separate data to see how well our estimate of f will behave with different data (“Holdout Data”)
Parametric Methods
Make an assumption about the functional form or shape of f and utilize a procedure that uses the training data to fit or train the model. This method simplifies the model but will not match the true unknown form of f.
Non-parametric Methods
No explicit assumptions about the functional form of f, so we seek an estimate of f that gets as close to the data points as possible without being too rough or wiggly. This has the potential to accurately fit a wide range of possible shapes for f, but a very large number of observations is required here for accuracy
Supervised Models
Each observation of the predictors has an associated response measurement of y. We want to fit a model that relates response to the predictors, with the aim of accurately predicting the response for future observations
Unsupervised Models
For each observation we observe a vector of measurements of x but no associated y response. This is more for understanding relationships between variables or observations
Regression
Problems with a quantitative response
Classification
Problems with a qualitative response or categorial
Mean Squared Error (MSE)
Most commonly used measure to evaluate method performance. Will be small if the predicted responses are close to the true responses. Want to choose the method that gives the lowest test MSE as opposed to lowest training MSE
MSE and Overfitting Relationship
Our MSE cannot be super small and cannot be larger than the training MSE
Variance
The amount by which f would change if we estimated it using a different training data set. More flexible methods have higher variance
Bias
Refers to the error that is when a complex model is oversimplified. More flexible methods tend to have lower bias
Bias-Variance Trade-Off
Selecting a method that achieves low variance and low bias at the same time. Needs to vary enough to the point where the bias is not being impacted enough
How is predictive modeling used in ratemaking?
Actuaries would look at historical data of past claims and certain demographics of policyholders to then adjust to the future time period to showcase industry benchmarks and trends, informed assumptions, and trends and patterns from historical data. Then create a model taking into consideration different factors of a policyholder such as age, smoker class, sex, etc… to predict if a claim needs to be filed or not based on if the policyholder dies or not. If the policyholder has a history of smoking, they have higher risk so their rates might be higher.
How is predictive modeling used in reserving?
We would look at historical data of a policyholder’s age, risk class, gender, health status, etc… Then create a model that takes these factors into consideration. So if their are a lot of old policyholders with histories of chronic smoking for example, likelihood of deaths might be higher therefore more claims might need to be paid in the future so higher reserves should be set aside
How is predictive modeling used in planning?
A life insurance company might have a goal of gaining more customers between the ages of 20-30. So they might create models to asses the conditions or the reasons why people between the ages of 20-30 might not buy life insurance as much. Based on this, they might choose to adjust their existing products or create new ones that can tailor to these conditions and reasons
Difference between writing/running code in RStudio’s Console vs. Source Pane
In the console pane, code is executed immediately after you type it and it cannot be edited or saved. This pane can be used for simple short code execution. While in the source pane, multiple lines of code can be written to execute data graphs for example. And these codes can be edited and saved.
Difference between a dataframe and a matrix?
A matrix is a two-dimensional data structure containing elements of the same type while dataframes are also two-dimensional structures but contain elements of different types