Exam 1 Flashcards

1
Q

Why do we do predictive modeling?

A

So we know what can potentially happen in the future to price our premiums correctly and set aside enough reserves to pay off claims

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Input Variables

A

The predictors of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Output Variables

A

The responses of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

ϵ

A

Random error term which is independent of X and has mean zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reducible Error

A

Potentially improve the accuracy of our predicted f by using the most appropriate statistical learning technique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Irreducible Error

A

Y is also a function of ϵ, which by definition cannot be predicted using X. No matter how well we estimate f, we cannot reduce the error introduced by ϵ. This error is larger than 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Prediction

A

If one is not concerned with the exact form of predicted f, provided that it yields accurate predictions for y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Inference

A

If one wants to understand the relationship between x and y (how y changes as a function of x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Relationship between prediction and inference

A

Linear models may allow for simple and interpretable inferences but may not yield as accurate predictions. However, non-linear approached can provide accurate predictions, but comes at the expense of being less interpretable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Flexible Models

A

When a model fits around the data it is using with many function forms, resulting in a model that can be accurately predictable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Overfitting

A

When all the parameters are considered and the model is way to specific to the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Underfitting

A

The model is way too simplistic to capture the underlying patterns in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Training Data

A

Observations used to train or teach our method how to estimate f. Apply a statistical learning method to this first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Test Data

A

Use a separate data to see how well our estimate of f will behave with different data (“Holdout Data”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Parametric Methods

A

Make an assumption about the functional form or shape of f and utilize a procedure that uses the training data to fit or train the model. This method simplifies the model but will not match the true unknown form of f.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Non-parametric Methods

A

No explicit assumptions about the functional form of f, so we seek an estimate of f that gets as close to the data points as possible without being too rough or wiggly. This has the potential to accurately fit a wide range of possible shapes for f, but a very large number of observations is required here for accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Supervised Models

A

Each observation of the predictors has an associated response measurement of y. We want to fit a model that relates response to the predictors, with the aim of accurately predicting the response for future observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Unsupervised Models

A

For each observation we observe a vector of measurements of x but no associated y response. This is more for understanding relationships between variables or observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Regression

A

Problems with a quantitative response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Classification

A

Problems with a qualitative response or categorial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Mean Squared Error (MSE)

A

Most commonly used measure to evaluate method performance. Will be small if the predicted responses are close to the true responses. Want to choose the method that gives the lowest test MSE as opposed to lowest training MSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

MSE and Overfitting Relationship

A

Our MSE cannot be super small and cannot be larger than the training MSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Variance

A

The amount by which f would change if we estimated it using a different training data set. More flexible methods have higher variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Bias

A

Refers to the error that is when a complex model is oversimplified. More flexible methods tend to have lower bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Bias-Variance Trade-Off

A

Selecting a method that achieves low variance and low bias at the same time. Needs to vary enough to the point where the bias is not being impacted enough

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How is predictive modeling used in ratemaking?

A

Actuaries would look at historical data of past claims and certain demographics of policyholders to then adjust to the future time period to showcase industry benchmarks and trends, informed assumptions, and trends and patterns from historical data. Then create a model taking into consideration different factors of a policyholder such as age, smoker class, sex, etc… to predict if a claim needs to be filed or not based on if the policyholder dies or not. If the policyholder has a history of smoking, they have higher risk so their rates might be higher.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How is predictive modeling used in reserving?

A

We would look at historical data of a policyholder’s age, risk class, gender, health status, etc… Then create a model that takes these factors into consideration. So if their are a lot of old policyholders with histories of chronic smoking for example, likelihood of deaths might be higher therefore more claims might need to be paid in the future so higher reserves should be set aside

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How is predictive modeling used in planning?

A

A life insurance company might have a goal of gaining more customers between the ages of 20-30. So they might create models to asses the conditions or the reasons why people between the ages of 20-30 might not buy life insurance as much. Based on this, they might choose to adjust their existing products or create new ones that can tailor to these conditions and reasons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Difference between writing/running code in RStudio’s Console vs. Source Pane

A

In the console pane, code is executed immediately after you type it and it cannot be edited or saved. This pane can be used for simple short code execution. While in the source pane, multiple lines of code can be written to execute data graphs for example. And these codes can be edited and saved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Difference between a dataframe and a matrix?

A

A matrix is a two-dimensional data structure containing elements of the same type while dataframes are also two-dimensional structures but contain elements of different types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Describe how the flexibility of a model is related to overfitting and underfitting

A

Flexibility of a model refers to the models ability to fit many different possible function forms. However, when a model is too flexible, this can lead to overfitting because it is capturing the noise data along with the training data. Overfitting makes it harder to make generalizations of the data since the noise data complicates the underlying patterns. Models can also have little flexibility and be too simple which leads to underfitting because the model is unable to function correctly from the trained data.

31
Q

Difference between prediction and inference

A

Prediction is trying to use the data model to make the most accurate predictions of outcomes based on input data while inference is trying to understand the relationship between the variables in the model, so finding the underlying patterns to see how the change of one variable affects the other

32
Q

Difference between training data and test data

A

Training data is the part of the dataset used to create the model. From this data, the model will reflect the underlying patterns and relationships. The test data will be the other part of the dataset which is used after to asses whether the model works well or not. The model works well when it can produce accurate results, make generalizations, and not overfit or under fit based on the test data.

33
Q

Difference between supervised and unsupervised learning models

A

Supervised learning models are when each input value has a corresponding output value to help accurately predict the response for future observations based on these patters of responses to certain predictors. Unsupervised learning models are more focused on trying to identify the underlying patterns of the data without focusing on what the correct output should be, which is why the outputs aren’t labeled.

34
Q

Difference between regression and classification models

A

Regression models are models that have a quantitative response (numerical outcomes) while classification models are models with a qualitative response (categorial outcomes).

35
Q

Difference between independent and dependent variables

A

Independent variables are the predictors or inputs because they are manipulated to explore the effect on other variables while dependent variables are the response or output variables because they are the variables that change based on the independent variables

36
Q

Difference between reducible and irreducible error

A

Reducible error is the portion of the total error that can be minimized as the model is improved to improve the accuracy of predictions. On the other hand, irreducible error is the portion of the total error that cannot be minimized since it is comes from the noise data which is inherent.

37
Q

Difference between bias and variance

A

Bias and variance both can result in reducible errors. Bias refers to oversimplified model, so underfitting can happen. While variance is when the model is sensitive to small changes. So these models captures the noise data along with the training data which can lead to overfitting.

38
Q

Simple Linear Regression

A

Approximately a linear relationship between our independent and dependent variable and use this assumption to find the line that best fits the data

39
Q

Method of Least Squares

A

Produce the best fit estimates of the betas, so as close to all of the data points as possible

40
Q

Residual

A

Difference between the model’s estimate of y (hat i) for a given x and the actual y sub i from the data

41
Q

Residual and bias relationship

A

Some residuals are positive and some are negative if our model is unbiased. And we are only concerned with finding the betas that define the line and create the smallest magnitudes of the residuals, we can square them and get the same effect

42
Q

Standard Error

A

Used to determine if we can estimate our predicted betas and see how close we can get to the actual betas

43
Q

Residual Standard Error

A

Attempts to estimate the standard deviation of ϵ, so how much our model will deviate from some unknown “true” regression line. So it measures the lack of fit of the model to the data. If it is small it could fit the data well

44
Q

P-value

A

If the p value is less then 0.05 then we reject the null hypothesis and a relationship actually exists. If more than 0.05, then we accept null and the relationship happened by random chance

45
Q

3 statistics to measure the quality of fit

A

After determining if there is a relationship or not, we look at the R^2, Residual Standard Error, and F-statistic to measure the quality of fit

46
Q

R^2 Statistic

A

Same idea as the RSE but measures on a proportional basis rather than by units. So the proportion of variance in Y that can be explained using X. Takes on a value between 0 and 1

47
Q

Total Sum of Squares (TSS)

A

Total amount of variability inherent in the response before the regression model is performed

48
Q

RSS

A

Measures the amount of variability left unexplained after performing the regression

49
Q

TSS - RSS

A

Measures the amount of variability in the response that is explained (or removed) by building the model

50
Q

What does an R^2 close to 1 indicate?

A

Large proportion of the variability in the response is explained by the regression

51
Q

What does R^2 close to 0 indicate?

A

The regression does not explain much of the variability response. Either the linear model could be incorrect or the error variance is high or both

52
Q

Requirements for a linear model

A
  • Constant variance
  • Independent and response variables are linear
  • Normally distributed
  • Variables should be independent
53
Q

What is the correlation of X and Y used to measure?

A

The strength of the relationship between X and Y

54
Q

Multiple Linear Regression

A

Looks at the individual variable and keeps the rest of the variables constant. Truly independent if we’re able to solve for a variable, freeze the rest and repeat the process

55
Q

F-Statistic

A

Test used for multiple linear regression. Look for at least one independent variable to have a significant relationship with our response variable

56
Q

What does it mean when the F-statistic is less than 1?

A

We accept the null hypothesis and there is no relationship

57
Q

What does it mean when the F-statistic is greater than 1?

A

We reject the null hypothesis and at least one predictor variable is significantly related to the response variable

58
Q

F-statistic relationship with dataset (n)

A

When there is a big dataset, and F-statistic bigger than 1 might still provide evidence against the null. If the dataset is small, we would see a much larger value to reject the null hypothesis

59
Q

Why do we still need to perform F-statistic when our p-value gives the results?

A

For multiple linear regressions, we want to feel comfortable about the model as a whole along with associated p-value

60
Q

Forward Selection

A

Start with the null model and fit p simple linear regressions and then add to the null model the variable that results in the lowest RSS

61
Q

Backward Selection

A

We start with all the variables and remove the variable with the largest p-value and stop when all the remaining variables have p-values below 0.05

62
Q

Mixed Selection

A

A combination of forward and backwards selection where we add and remove variables until all the variables have a sufficient low p-value.

63
Q

Predictors with only two variables

A

Two qualitative inputs that takes on two possible numerical values

64
Q

Additive Assumption

A

Association between an independent variable and the response does not depend on the values of other predictors. Can create a new variable involving both of the variables to analyze its overall summary to prove they are still independent of one another

65
Q

Modeling for Additive Assumption Data

A

Rely on a polynomial regression to model a better line

66
Q

Outliers

A

Data points that have absolute values of standard residuals larger than 2

67
Q

High-Leverage Points

A

Measure of how far away the independent variable values of an observation are from other observations in the model

68
Q

Difference between outliers and high-leverage points

A

Outliers are data points completely off from the model with their large residuals while high leverage points don’t have large residuals but deviate from the center of the model

69
Q

Collinearity

A

The situation in which two or more predictor variables are closely related to one another. This can make it difficult to separate out the individual effects of collinear variables on the response

70
Q

Regularization of models

A

Constraining the coefficient estimates or equivalently shrinking them towards zero

71
Q

Ridge Regression

A

When all the variables are kept but the betas are minimized by the shrinking penalty to push it towards 0. As lambda increases, the flexibility of the ridge regression fit decreases leading to decreased variance but increased bias

72
Q

Lasso

A

When we don’t want to include all predictors so some of the coefficient estimate will be exactly zero removing the respective independent variable from the model

73
Q

Regression splines

A

Divide the range of X into K regions and a polynomial fits each region and is constrained so they join smoothly at the region boundaries. More knots leads to extremely flexible fit

74
Q

Smoothing Splines

A

Minimizes RSS criterion subject to a smoothness penalty. We want to use a tuning parameter to smooth the spline to make it less wiggly by using second derivative

75
Q

Local Regression

A

Able to overlap the polynomial model in a smooth way by computing the fit to a target point using only nearby training observations

76
Q

Generalized Additive Models

A

Creates a framework to allow us to extend the model