Exam 1 Flashcards
Why do we do predictive modeling?
So we know what can potentially happen in the future to price our premiums correctly and set aside enough reserves to pay off claims
Input Variables
The predictors of the model
Output Variables
The responses of the model
ϵ
Random error term which is independent of X and has mean zero
Reducible Error
Potentially improve the accuracy of our predicted f by using the most appropriate statistical learning technique
Irreducible Error
Y is also a function of ϵ, which by definition cannot be predicted using X. No matter how well we estimate f, we cannot reduce the error introduced by ϵ. This error is larger than 0
Prediction
If one is not concerned with the exact form of predicted f, provided that it yields accurate predictions for y
Inference
If one wants to understand the relationship between x and y (how y changes as a function of x)
Relationship between prediction and inference
Linear models may allow for simple and interpretable inferences but may not yield as accurate predictions. However, non-linear approached can provide accurate predictions, but comes at the expense of being less interpretable
Flexible Models
When a model fits around the data it is using with many function forms, resulting in a model that can be accurately predictable
Overfitting
When all the parameters are considered and the model is way to specific to the data
Underfitting
The model is way too simplistic to capture the underlying patterns in the data
Training Data
Observations used to train or teach our method how to estimate f. Apply a statistical learning method to this first
Test Data
Use a separate data to see how well our estimate of f will behave with different data (“Holdout Data”)
Parametric Methods
Make an assumption about the functional form or shape of f and utilize a procedure that uses the training data to fit or train the model. This method simplifies the model but will not match the true unknown form of f.
Non-parametric Methods
No explicit assumptions about the functional form of f, so we seek an estimate of f that gets as close to the data points as possible without being too rough or wiggly. This has the potential to accurately fit a wide range of possible shapes for f, but a very large number of observations is required here for accuracy
Supervised Models
Each observation of the predictors has an associated response measurement of y. We want to fit a model that relates response to the predictors, with the aim of accurately predicting the response for future observations
Unsupervised Models
For each observation we observe a vector of measurements of x but no associated y response. This is more for understanding relationships between variables or observations
Regression
Problems with a quantitative response
Classification
Problems with a qualitative response or categorial
Mean Squared Error (MSE)
Most commonly used measure to evaluate method performance. Will be small if the predicted responses are close to the true responses. Want to choose the method that gives the lowest test MSE as opposed to lowest training MSE
MSE and Overfitting Relationship
Our MSE cannot be super small and cannot be larger than the training MSE
Variance
The amount by which f would change if we estimated it using a different training data set. More flexible methods have higher variance
Bias
Refers to the error that is when a complex model is oversimplified. More flexible methods tend to have lower bias
Bias-Variance Trade-Off
Selecting a method that achieves low variance and low bias at the same time. Needs to vary enough to the point where the bias is not being impacted enough
How is predictive modeling used in ratemaking?
Actuaries would look at historical data of past claims and certain demographics of policyholders to then adjust to the future time period to showcase industry benchmarks and trends, informed assumptions, and trends and patterns from historical data. Then create a model taking into consideration different factors of a policyholder such as age, smoker class, sex, etc… to predict if a claim needs to be filed or not based on if the policyholder dies or not. If the policyholder has a history of smoking, they have higher risk so their rates might be higher.
How is predictive modeling used in reserving?
We would look at historical data of a policyholder’s age, risk class, gender, health status, etc… Then create a model that takes these factors into consideration. So if their are a lot of old policyholders with histories of chronic smoking for example, likelihood of deaths might be higher therefore more claims might need to be paid in the future so higher reserves should be set aside
How is predictive modeling used in planning?
A life insurance company might have a goal of gaining more customers between the ages of 20-30. So they might create models to asses the conditions or the reasons why people between the ages of 20-30 might not buy life insurance as much. Based on this, they might choose to adjust their existing products or create new ones that can tailor to these conditions and reasons
Difference between writing/running code in RStudio’s Console vs. Source Pane
In the console pane, code is executed immediately after you type it and it cannot be edited or saved. This pane can be used for simple short code execution. While in the source pane, multiple lines of code can be written to execute data graphs for example. And these codes can be edited and saved.
Difference between a dataframe and a matrix?
A matrix is a two-dimensional data structure containing elements of the same type while dataframes are also two-dimensional structures but contain elements of different types
Describe how the flexibility of a model is related to overfitting and underfitting
Flexibility of a model refers to the models ability to fit many different possible function forms. However, when a model is too flexible, this can lead to overfitting because it is capturing the noise data along with the training data. Overfitting makes it harder to make generalizations of the data since the noise data complicates the underlying patterns. Models can also have little flexibility and be too simple which leads to underfitting because the model is unable to function correctly from the trained data.
Difference between prediction and inference
Prediction is trying to use the data model to make the most accurate predictions of outcomes based on input data while inference is trying to understand the relationship between the variables in the model, so finding the underlying patterns to see how the change of one variable affects the other
Difference between training data and test data
Training data is the part of the dataset used to create the model. From this data, the model will reflect the underlying patterns and relationships. The test data will be the other part of the dataset which is used after to asses whether the model works well or not. The model works well when it can produce accurate results, make generalizations, and not overfit or under fit based on the test data.
Difference between supervised and unsupervised learning models
Supervised learning models are when each input value has a corresponding output value to help accurately predict the response for future observations based on these patters of responses to certain predictors. Unsupervised learning models are more focused on trying to identify the underlying patterns of the data without focusing on what the correct output should be, which is why the outputs aren’t labeled.
Difference between regression and classification models
Regression models are models that have a quantitative response (numerical outcomes) while classification models are models with a qualitative response (categorial outcomes).
Difference between independent and dependent variables
Independent variables are the predictors or inputs because they are manipulated to explore the effect on other variables while dependent variables are the response or output variables because they are the variables that change based on the independent variables
Difference between reducible and irreducible error
Reducible error is the portion of the total error that can be minimized as the model is improved to improve the accuracy of predictions. On the other hand, irreducible error is the portion of the total error that cannot be minimized since it is comes from the noise data which is inherent.
Difference between bias and variance
Bias and variance both can result in reducible errors. Bias refers to oversimplified model, so underfitting can happen. While variance is when the model is sensitive to small changes. So these models captures the noise data along with the training data which can lead to overfitting.
Simple Linear Regression
Approximately a linear relationship between our independent and dependent variable and use this assumption to find the line that best fits the data
Method of Least Squares
Produce the best fit estimates of the betas, so as close to all of the data points as possible
Residual
Difference between the model’s estimate of y (hat i) for a given x and the actual y sub i from the data
Residual and bias relationship
Some residuals are positive and some are negative if our model is unbiased. And we are only concerned with finding the betas that define the line and create the smallest magnitudes of the residuals, we can square them and get the same effect
Standard Error
Used to determine if we can estimate our predicted betas and see how close we can get to the actual betas
Residual Standard Error
Attempts to estimate the standard deviation of ϵ, so how much our model will deviate from some unknown “true” regression line. So it measures the lack of fit of the model to the data. If it is small it could fit the data well
P-value
If the p value is less then 0.05 then we reject the null hypothesis and a relationship actually exists. If more than 0.05, then we accept null and the relationship happened by random chance
3 statistics to measure the quality of fit
After determining if there is a relationship or not, we look at the R^2, Residual Standard Error, and F-statistic to measure the quality of fit
R^2 Statistic
Same idea as the RSE but measures on a proportional basis rather than by units. So the proportion of variance in Y that can be explained using X. Takes on a value between 0 and 1
Total Sum of Squares (TSS)
Total amount of variability inherent in the response before the regression model is performed
RSS
Measures the amount of variability left unexplained after performing the regression
TSS - RSS
Measures the amount of variability in the response that is explained (or removed) by building the model
What does an R^2 close to 1 indicate?
Large proportion of the variability in the response is explained by the regression
What does R^2 close to 0 indicate?
The regression does not explain much of the variability response. Either the linear model could be incorrect or the error variance is high or both
Requirements for a linear model
- Constant variance
- Independent and response variables are linear
- Normally distributed
- Variables should be independent
What is the correlation of X and Y used to measure?
The strength of the relationship between X and Y
Multiple Linear Regression
Looks at the individual variable and keeps the rest of the variables constant. Truly independent if we’re able to solve for a variable, freeze the rest and repeat the process
F-Statistic
Test used for multiple linear regression. Look for at least one independent variable to have a significant relationship with our response variable
What does it mean when the F-statistic is less than 1?
We accept the null hypothesis and there is no relationship
What does it mean when the F-statistic is greater than 1?
We reject the null hypothesis and at least one predictor variable is significantly related to the response variable
F-statistic relationship with dataset (n)
When there is a big dataset, and F-statistic bigger than 1 might still provide evidence against the null. If the dataset is small, we would see a much larger value to reject the null hypothesis
Why do we still need to perform F-statistic when our p-value gives the results?
For multiple linear regressions, we want to feel comfortable about the model as a whole along with associated p-value
Forward Selection
Start with the null model and fit p simple linear regressions and then add to the null model the variable that results in the lowest RSS
Backward Selection
We start with all the variables and remove the variable with the largest p-value and stop when all the remaining variables have p-values below 0.05
Mixed Selection
A combination of forward and backwards selection where we add and remove variables until all the variables have a sufficient low p-value.
Predictors with only two variables
Two qualitative inputs that takes on two possible numerical values
Additive Assumption
Association between an independent variable and the response does not depend on the values of other predictors. Can create a new variable involving both of the variables to analyze its overall summary to prove they are still independent of one another
Modeling for Additive Assumption Data
Rely on a polynomial regression to model a better line
Outliers
Data points that have absolute values of standard residuals larger than 2
High-Leverage Points
Measure of how far away the independent variable values of an observation are from other observations in the model
Difference between outliers and high-leverage points
Outliers are data points completely off from the model with their large residuals while high leverage points don’t have large residuals but deviate from the center of the model
Collinearity
The situation in which two or more predictor variables are closely related to one another. This can make it difficult to separate out the individual effects of collinear variables on the response
Regularization of models
Constraining the coefficient estimates or equivalently shrinking them towards zero
Ridge Regression
When all the variables are kept but the betas are minimized by the shrinking penalty to push it towards 0. As lambda increases, the flexibility of the ridge regression fit decreases leading to decreased variance but increased bias
Lasso
When we don’t want to include all predictors so some of the coefficient estimate will be exactly zero removing the respective independent variable from the model
Regression splines
Divide the range of X into K regions and a polynomial fits each region and is constrained so they join smoothly at the region boundaries. More knots leads to extremely flexible fit
Smoothing Splines
Minimizes RSS criterion subject to a smoothness penalty. We want to use a tuning parameter to smooth the spline to make it less wiggly by using second derivative
Local Regression
Able to overlap the polynomial model in a smooth way by computing the fit to a target point using only nearby training observations
Generalized Additive Models
Creates a framework to allow us to extend the model