Retail Credit Risk Flashcards
define retail lending
exposure to an individual/small business, and guaranteed by such person
what are 4 examples of retail lending.
credit cards
residential mortgages
small business facilities
installment loans
what are two characteristics of retail lending
low individual exposure
managed collectively rather than individually
what is a credit risk score
a total number of points that predicts a borrower’s future repayment performance based on historical information
what is a scorecard
a mathematical algorithm used to generate a score for rank-order risk analysis
what are scorecards used for
- lending decisions
- mitigation of portfolio credit risk
what are two benefits of using a scorecard
easy to interpret
easy to monitor
what are the 6 stages in model development
- business objectives
- data preparation
- model development
- model approval
- model deployment
- monitoring
what are 3 aspects of business objectives
- key issues
- expectations for the model
- structure
define key issues
trends, challenges and concerns outlined by the business
define structure
project team members, data and timeline
what are the 5 C’s of data preparation
- Comprehensiveness
- Clean
- Consistent
- Current
- Caretaking
define comprehensiveness
ensuring the data captures the full scope and complexity of the underlying information
define clean
ensuring the accuracy of the data
define consistent
ensuring the uniformity of the data across different sources.
define current
ensuring the data is up to date
define caretaking
the ongoing management of the data to preserve its quality
what are 6 aspects of the data preparation in the model development lifecycle
the 5 C’s
exclusion criteria
timeframe
defining the target and explanatory variables
segmentation (# of models)
sampling
what are three sources of exclusion criteria
scope
data errors
operational
what two periods are involved in the timeframe of model creation
- observation period
- performance period
what are two aspects of the observation periods
- for explanatory variables
- should be representative of the current/future environment
what are two aspects of performance periods
- for the target variable
- should be long enough to have a sufficient number of defaults.
what are the two modeling techniques
- industry standard
- other methodologies
compare the advantages of the two modeling techniques
industry-standard:
1. few variables
2. expert judgment
other methods:
1. many variables
2. one step for variable reduction and model fitting
3. adaptive learning
compare the disadvantages of the two modeling techniques
industry-standard:
1. few variables
2. distributional assumptions
3. separate steps for variable reduction and model fitting
other methods:
1. many variables
2. risk of overfitting
what are the 5 steps of the industry standard model development technique?
- variable transformation
- variable reduction
- model fitting
- scorecard scaling
- scorecard assessment
what technique can be used in variable transofrmation
weight of evidence
define variable reduction
removing any variable that cannot be used or doesnt make sense
what are two techniques for variable reduction
- grouping
- variable clustering
what is grouping
creating bins within a variable
what are three benefits of grouping?
i. Accounts for non-linear relationship between the target and explanatory variables.
ii. Accounts for outliers
iii. Allows for the treatment of missing values as a separate category.
how should grouping be performed?
- Start by creating 20 equal bins.
- Calculate the WOE of each bin.
- Collapse bins with similar WOE.
- Remove variables with weak IV.
what is variable clustering
grouping correlated variables together such that variables within a cluster are highly correlated and variables between of clusters are uncorrelated two reduce the multicollinearity of the model.
which two variables should represent the cluster then using variable clustering?
- the variable with the highest IV
- the variable with the lowest 1-R^2
what are two aspects of model fitting in the industry standard technique?
variable selection: forward, backwards, ridge lasso
assumptions that historical experiences predict future behaviour and that consumer behaviour will not change significantly
define scorecard scaling when using the industry standard technique
raw scores are scaled to a three digit number
what is the formula in score in scorecard scaling
score=offset+(factor⋅ln(2⋅odds) )-PDO
what are the 3 types of scorecard assessment
- rank ordering
- population stability
- benchmarking
what are the 5 evaluation metrics used in rank ordering scorecard assessment
- KS statistic
- misclassification
- ROC curve
- accuracy ratio
- lift chart
what does population stability do
quantify population differences by measuring the shift between two sample distributions
what is the formula for the population shift index (PSI)
PSI=∑[(N_bin-B_bin )⋅ln(N_bin/B_bin ) ]
what values of PSI indicate: no significant shift, a minor shift, a significant shift
<0.1: no significant shift
0.1-0.25: minor shift
>0.25: significant shift
what is benchmarking
comparing a scorecard to an existing scorecard
what is the KS statistic
the maximum difference between the CDFs of the distributions of defaults and non-defaults
what is misclassification
the confusion matrix
what is the ROC curve
the probability a randomly chosen non-default will be ranked righter than a randomly chosen default; plots the true positive rate against the false positive rate
what is the formula of the accuracy ratio
AR=GINI/(Perfect GINI)
what is the GINI index
the area between the Lorenz and random curve
what is a lift chart
the cumulative % of defaults per decile divided by the total population % of defaults.
what does weight of evidence do
transforms explanatory variables into a set of groups based on the similarity of the target variable distributions.
what does WOE measure
how strong a group is at separating defaults from non-defaults
what does a negative WOE signify?
more defaults than non-defaults
what is the formula for WOE
WOE=ln[((# non-defaults)/(total non-defaults))/((# defaults)/(total defaults))]
what is a variable’s information value
the predictive power of a single variable (its ability to separate defaults from non-defaults)
what is the formula for information value
IV=∑[[(# non-defaults)/(total non-defaults)-(# defaults)/(total defaults)]⋅WOE_i
what IV value ranges indicate:
very weak
weak
moderate
strong
<0.02: very weak
0.02-0.1: weak
0.1-0.3: moderate
0.3+: strong