Generalised Linear Model (week 6-8) Flashcards

Question 1

Q

Where does GLM used for?

Answer

A

General / health insurance pricing

Question 2

Q

GLM formula?

Answer

A

g(μ) = g(E(Y)) = α + β1X1 + … + βkXk = η

Question 3

Q

what does g represent?

Answer

A

link function

Question 4

Q

what is η

Answer

A

linear predictor

Question 5

Q

what is μ?

Answer

A

g^(-1) (η)

Question 6

Q

what is the symbol of dispersion parameter

Question 7

Q

b”(teta) from PDF represents

Answer

A

variance function

Question 8

Q

what is canonical link

Answer

A

transform mean to natural exponential

Question 9

Q

Why do we need GLM?

Answer

A

Because when the dist is normal, we use PDF to calc P-val or CI. However, if its normally dist, heteroskedacity, and non-linear, we use GLM

Question 10

Q

link function

Answer

A

we transforming the predictions, or everything except the dependent var

Question 11

Q

binomial (binary) follow what dist

Answer

A

logistic regression

Question 12

Q

when we use poisson?

Answer

A

if we have skewed discrete dist
-“num of time u …”

Question 13

Q

when to use neg binomial?

Answer

A

mean and median diff, unlike poisson

Question 14

Q

gamma dist when to use?

Answer

A

continuous dist, var must positive >0

Question 15

Q

how to do GLM? (long)

Answer

A

what dist is this?
look at the table, see which μ are u suing (formula sheet)
write likelihood function ∏(fy)
compute log likelihood function change ∏(fy) to ∑log(fy)
fy is from formula sheet page 5 (dont forget exp can diturunin langsung kalau dikali with log)
masukan the fy (from number 5) use number 2 μ
derive alpha and beta and set to 0 (if we derive and hv x infront, the x stay still, gbisa di remove, if dont hv x, we can remove langsung all the alpha beta ))

Question 16

Q

Information Criteria is

Answer

A

-Assess goodness-of-fit and parameter parsimony
-For comparison between diff linear predictors/link functions

Question 17

Q

How too choose good IC?

Answer

A

find the lowest one

Question 18

Q

What are 2 types of CI?

Answer

A

AIC and BIC (more likely underfit)

Question 19

Q

forward and backward selection if look at the BIC AIC

Answer

A

same but find the lowest

Question 20

Q

Pearson residual vs Deviance residual is used when:

Answer

A

Pearson when normal
Deviance when close to normal dist
If Y is normally dist, pearson and deviance is equal

Question 21

Q

positive trend is when

Answer

A

when plotting the absolute standarised residual vs scaled fitted values and
b”(teta) increase too slowly

Question 22

Q

negative trend is when

Answer

A

when plotting the absolute standarised residual vs scaled fitted values and
b”(teta) increase too fast

Question 23

Q

Short tailed line business

Answer

A

less few years to settle all claims. e.g motor, home, fire

Question 24

Q

Long tailed line business

Answer

A

more than new years. e.g worker’s compensation, public&product liability

Question 25

Q

What is Outstanding Claim Liabilities? (OCL)

Answer

A

claims incurred prior valuation date but not paid by valuation date

Question 26

Q

IBNR is

Answer

A

Incurred but not reported

Question 27

Q

How to estimate OCL

Answer

A

expressing past claims data as a run-off triangle, then applying reserving methods (Chain Ladder Method)

Question 28

Q

How to construct the Claims Off Run Triangle

Answer

A

yg kanan kiri itu development year(tahun dibayar) yang turun itu accident year (accident terjadi)

Question 29

Q

How to make chain ladder method?

Answer

A

cumulative
find the development factor value (sum of kolom 2 / sum of kolom 1 tpi panjangny di samain)
karena panjangnya disamain, kan ada value dari every last kolom/baris yang ga kepake itu dikali sama res no 2 (start from pojok kanan atas or kolom 9 )
sama kaya no 3, tpi kali ini dikali sama hasil yg no 2 dri sblomnya. eg: kolom 8, itu dikali devfactor 9-10 and 8-9
dikurangin sama alue dari every last kolom/baris
repeat kaya no 4, jadi yes tambah banyak dikali dev factornya

Question 30

Q

residual bootstrapping

Answer

A

allows for both process error and parameter error

Question 31

Q

what is process error?

Answer

A

process of uncertainty, randomness of the future

Question 32

Q

what is parameter error?

Answer

A

uncertainty when fitting to a model

Question 33

Q

how to do residual bootstrapping?

Answer

A

bootstrap data: residualnya di pick randomly with replacement
bootstrap data di combine with fitted values to generate pseudo data
new model is fitted to pseudo data
expected OCL for pseudo data is estimated
repeat!

Question 34

Q

what does pmax do?

Answer

A

set minimum

Question 35

Q

Logistic regression

Answer

A

When GLM with binomial distribution and logit link function (canonical link)

Question 36

Q

What is the estimated prob for logistic regression?

Question 37

Q

What happen if we lowering the threshold?

Answer

A

Increase true positivity but also increase false positive

Question 38

Q

What is False Positive?

Answer

A

When “yes” but no

Question 39

Q

What happen if we increase the threshold?

Answer

A

Reduce false positive but increase false negative

Question 40

Q

What is false positive?

Answer

A

When “no” but yes

Question 41

Q

Downside of logistic reggresion?

Answer

A

sensitive to class imbalance, model may predict majority class more frequently

Question 42

Q

How to solve the class imbalance?

Answer

A

Oversampling the minority class

Question 43

Q

How to do logistic regression

Answer

A

1.make oversampling and summary data
2. model the data use data$explanatory var
3. fit into glm use family = binomial and link = logit
4. combine explanatory variables
5. using the new improved model, we do model checking
6. check TP, FP, TN, FN
7. Check the ratio between TP and FP

Question 44

Q

Question 45

Q