Generalised Linear Model (week 6-8) Flashcards
Where does GLM used for?
General / health insurance pricing
GLM formula?
g(μ) = g(E(Y)) = α + β1X1 + … + βkXk = η
what does g represent?
link function
what is η
linear predictor
what is μ?
g^(-1) (η)
what is the symbol of dispersion parameter
�
b”(teta) from PDF represents
variance function
what is canonical link
transform mean to natural exponential
Why do we need GLM?
Because when the dist is normal, we use PDF to calc P-val or CI. However, if its normally dist, heteroskedacity, and non-linear, we use GLM
link function
we transforming the predictions, or everything except the dependent var
binomial (binary) follow what dist
logistic regression
when we use poisson?
if we have skewed discrete dist
-“num of time u …”
when to use neg binomial?
mean and median diff, unlike poisson
gamma dist when to use?
continuous dist, var must positive >0
how to do GLM? (long)
- what dist is this?
- look at the table, see which μ are u suing (formula sheet)
- write likelihood function ∏(fy)
- compute log likelihood function change ∏(fy) to ∑log(fy)
- fy is from formula sheet page 5 (dont forget exp can diturunin langsung kalau dikali with log)
- masukan the fy (from number 5) use number 2 μ
- derive alpha and beta and set to 0 (if we derive and hv x infront, the x stay still, gbisa di remove, if dont hv x, we can remove langsung all the alpha beta ))
Information Criteria is
-Assess goodness-of-fit and parameter parsimony
-For comparison between diff linear predictors/link functions
How too choose good IC?
find the lowest one
What are 2 types of CI?
AIC and BIC (more likely underfit)
forward and backward selection if look at the BIC AIC
same but find the lowest
Pearson residual vs Deviance residual is used when:
Pearson when normal
Deviance when close to normal dist
If Y is normally dist, pearson and deviance is equal
positive trend is when
when plotting the absolute standarised residual vs scaled fitted values and
b”(teta) increase too slowly
negative trend is when
when plotting the absolute standarised residual vs scaled fitted values and
b”(teta) increase too fast
Short tailed line business
less few years to settle all claims. e.g motor, home, fire
Long tailed line business
more than new years. e.g worker’s compensation, public&product liability
What is Outstanding Claim Liabilities? (OCL)
claims incurred prior valuation date but not paid by valuation date
IBNR is
Incurred but not reported
How to estimate OCL
expressing past claims data as a run-off triangle, then applying reserving methods (Chain Ladder Method)
How to construct the Claims Off Run Triangle
yg kanan kiri itu development year(tahun dibayar) yang turun itu accident year (accident terjadi)
How to make chain ladder method?
- cumulative
- find the development factor value (sum of kolom 2 / sum of kolom 1 tpi panjangny di samain)
- karena panjangnya disamain, kan ada value dari every last kolom/baris yang ga kepake itu dikali sama res no 2 (start from pojok kanan atas or kolom 9 )
- sama kaya no 3, tpi kali ini dikali sama hasil yg no 2 dri sblomnya. eg: kolom 8, itu dikali devfactor 9-10 and 8-9
- dikurangin sama alue dari every last kolom/baris
- repeat kaya no 4, jadi yes tambah banyak dikali dev factornya
residual bootstrapping
allows for both process error and parameter error
what is process error?
process of uncertainty, randomness of the future
what is parameter error?
uncertainty when fitting to a model
how to do residual bootstrapping?
- bootstrap data: residualnya di pick randomly with replacement
- bootstrap data di combine with fitted values to generate pseudo data
- new model is fitted to pseudo data
- expected OCL for pseudo data is estimated
- repeat!
what does pmax do?
set minimum
Logistic regression
When GLM with binomial distribution and logit link function (canonical link)
What is the estimated prob for logistic regression?
1
What happen if we lowering the threshold?
Increase true positivity but also increase false positive
What is False Positive?
When “yes” but no
What happen if we increase the threshold?
Reduce false positive but increase false negative
What is false positive?
When “no” but yes
Downside of logistic reggresion?
sensitive to class imbalance, model may predict majority class more frequently
How to solve the class imbalance?
Oversampling the minority class
How to do logistic regression
1.make oversampling and summary data
2. model the data use data$explanatory var
3. fit into glm use family = binomial and link = logit
4. combine explanatory variables
5. using the new improved model, we do model checking
6. check TP, FP, TN, FN
7. Check the ratio between TP and FP