Topics 19-20 Flashcards
Three features of a good rating system
A good rating system will possess the following three features, which together will help entities measure the appropriateness of their internal rating systems:
- Objectivity and Homogeneity. An objective rating system will produce judgments based only on considerations tied to credit risk, while a homogeneous system implies that ratings are comparable among market segments, portfolios, and customer types.
- Specificity. A rating system is specific if it measures the distance from a default event while ignoring other financial elements that are not directly tied to potential default.
- Measurability and Verifiability. Ratings must provide correct expectations related to default probabilities which are backtested on a continuous basis.
Key measures used to assess the risk of default: probability of default (PD), cumulative probability of default, marginal probability of default, annualized default rate (ADR)
Conditional (forward) PD
Conditional (forward) PD = (PDcumt - PDcumt-1)/(Names - PDcumt-1)
Compare agencies’ ratings to internal experts-based rating systems
In terms of the criteria for a good rating system, the following comparisons can be made between agencies’ ratings and internal experts-based rating systems:
- Objectivity and Homogeneity. Agencies’ ratings are 73% compliant, while internal experts-based rating systems are 30% compliant.
- Specificity. Agencies’ ratings are close to 100% compliant, while internal experts-based rating systems are 75% compliant.
- Measurability and Verifiability. Agencies’ ratings are 75% compliant, while internal experts-based rating systems are 25% compliant.
Distinguish between structural approaches and reduced-form approaches to predicting default
The foundation of a structural approach (e.g., the Merton model) is the financial and economic theoretical assumptions that describe the overall path to default. Under this approach, building a model involves estimating the formal relationships that link the relevant variables of the model. In contrast, reduced form models (e.g., statistical and numerical approaches) arrive at a final solution using the set of variables that is most statistically suitable without factoring in the theoretical or conceptual causal relationships among variables.
A significant model risk in reduced form approaches results from a model’s dependency on the sample used to estimate it. To derive valid results, there must be a strong level of homogeneity between the sample and the population to which the model is applied.
Reduced form models used for credit risk can be classified into statistical and numerical-based categories.
- Statistical-based models use variables and relations that are selected and calibrated by statistical procedures.
- Numerical-based approaches use algorithms that connect actual defaults with observed variables.
- Both approaches can aggregate profiles, such as industry, sector, size, location, capitalization, and form of incorporation, into homogeneous “top-down” segment classifications. A “bottom-up” approach may also be used, which would classify variables based on case-by-case impacts. While numerical and statistical methods are primarily considered bottom-up approaches, experts-based approaches tend to be the most bottom up.
Describe Merton model to calculate default probability and the distance to default
Challenges/limitations of using the Merton model
There are many challenges associated with using the Merton model:
- Neither the asset value itself nor its associated volatility are observed.
- The structure of the underlying debt is typically very complex, as it involves differing maturities, covenants, guarantees, and other specifications.
- Because variables change so frequently, the model must be recalibrated continuously.
- Also, its main limitation is that it only applies to liquid, publicly traded firms.
- Using this approach for unlisted companies can be problematic due to unobservable prices and challenges with finding comparable prices.
- Finally, due to high sensitivity to market movements and underlying variables, the model tends to fall short of fully reflecting the dependence of credit risk on business and credit cycles.
Describe linear discriminant analysis (LDA), define the Z-score and its usage
- Linear discriminant analysis (LDA) is one of the most popular statistical methods used for developing scoring models. The contributions (i.e., weights) of each accounting ratio to the overall score are represented by Altman’s Z-score.
- LDA categorizes firms into two groups: the first represents performing (solvent) firms and the second represents defaulting (insolvent) firms.
- A Z cut-off point is used to differentiate both groups, although it is imperfect as both solvent and insolvent firms may have similar scores. This may lead to incorrect classifications.
- Another example of LDA is the RiskCalc® model, which was developed by Moody’s. It incorporates variables that span several areas, such as financial leverage, growth, liquidity, debt coverage, profitability, size, and assets. The model is tailored to individual countries.
- With LDA, one of the main goals is to optimize variable coefficients such that Z-scores minimize the inevitable “overlapping zone” between solvent and insolvent firms. For two groups of borrowers with similar Z-scores, the overlapping zone is a risk area where firms may end up incorrectly classified, historical versions of LDA would sometimes consider a gray area allowing for three Z-score range interpretations to determine who would be granted funding: very safe borrowers, very risky borrowers, and the middle ground of borrowers that merited further investigation. In the current world, LDA incorporates the two additional objectives of measuring default probability and assigning ratings.
- Note that LDA models typically offer only two decisions: accept or reject. Modern internal rating systems, which are based on the concept of default probability, require more options for decisions.
- For Altman: a score below 1.8 means it’s likely the company is headed for bankruptcy, while companies with scores above 3 are not likely to go bankrupt.
Calibration of LDA models
The process of fitting empirical data into a statistical model is called calibration.
This process implies that more work is still needed, even after the scoring function is estimated and Z-scores are obtained, before the model can be used.
- In the case of the model being used simply to accept or reject credit applications, calibration simply involves adjusting the Z-score cut-off to account for differences between sample and population default rates.
- In the case of the model being used to categorize borrowers into different ratings classes (thereby assigning default probabilities to borrowers), calibration will include a cut-off adjustment and a potential rescaling of Z-score default quantifications.
Because of the relative infrequency of actual defaults, a more accurate model can be derived by attempting to create more balanced samples with relatively equal (in size) groups of both performing and defaulting firms. However, the risk of equaling the sample group sizes is that the model applied to a real population will tend to overpredict defaults. To protect against this risk, the results obtained from the sample must be calibrated. If the model is only used to classify potential borrowers into performing versus defaulting firms, calibration will only involve adjusting the Z cut-off using Bayes’ theorem to equate the frequency of defaulting borrowers per the model to the frequency in the actual population.
Describe the application of logistic regression model to estimate default probability
Logistic regression models (also known as LOGIT models), which are from the
Generalized Linear Model (GLM) family, are statistical tools that are also used to predict default.
GLMs typically have three common elements:
- A systematic component, which specifies the variables used in a linear predictor function.
- A random component, which identifies both the target variable and its associated probability function.
- A link function, which is a function of the target variable mean that the model ties to the systematic component.
Define and interpret cluster analysis
Both LDA and LOGIT methodologies are considered “supervised” due to having a defined dependent variable (the default event), while independent variables are applied to determine an ex ante prediction. When the dependent variable is not explicitly defined, the statistical technique is considered “unsupervised.”
Cluster analysis looks to identify groups of similar cases in a data set. Groups represent observation subsets that exhibit homogeneity (i.e., similarities) due to variables’ profiles that allow them to be distinguished from those found in other groups.
Two approaches can be used to implement cluster analysis:
- hierarchical/aggregative clustering and
- divisive/partitioned clustering.
With hierarchical clustering, cluster hierarchies are created and aggregated on a case-by-case basis to form a tree structure with the clusters shown as leaves and the whole population shown as the roots. Clusters are merged together beginning at the leaves, and branches are followed until arriving at the roots. The end result of the analysis typically produces three forms:
- A small number of highly homogeneous, large clusters.
- Some small clusters with comprehensible and well-defined specificities.
- Single, very specific, nonaggregated units.
One of the key benefits of this method is the detection of anomalies. Many borrowers, such as merged (or demerged) companies, start-ups, and companies in liquidation, are unique. This analysis facilitates identifying these unique profiles and managing them separately from other observations.
Divisive clustering begins at the root and splits clusters based on algorithms that assign every observation to the specific cluster whose center (the average of all points in the cluster) is nearest. This approach serves to force the population into fewer cluster groups than what would be found under aggregative clustering. On the other side, high calculation power is needed as expanding the number of observations has an exponential impact.
As an example of applying cluster analysis, we can look to composite measures of profitability such as ROE and ROI. The task is to identify both specific aspects of a firms financial profile and latent (hidden) variables underlying the ratio system, such that the basic information from a firm’s financial statements can be extracted and used for modeling without redundant data and information.
Define and interpret principal component analysis
- Principal component analysis involves transforming an original tabular data set into a second, derived tabular data set.
- The performance of a given variable (equal to variance explained divided by total original variance) is referred to as communality, and the higher the communality (the more general the component is), the more relevant its ability to summarize an original set of variables into a new composed variable.
- The starting point is the extraction of the first component that achieves maximum communality. The second extraction will focus on the residuals not explained by the first component. This process will continue until we have a new principal components set, which will be orthogonal (statistically independent) by design and explain original variance in descending order. In terms of a stopping point, potential thresholds include reaching a minimum predefined variance level or a minimum communality that assures a reasonable level of information using the new set of components.
- An eigenvalue is a measure of the communality associated with an extracted component. The ideal first component is one that corresponds to the first eigenvalue of the set of variables. The second component will ideally correspond to the first eigenvalue extracted on the residuals. All original variables once standardized contribute a value of one to the final variance.
- An eigenvalue greater (less) than one implies that this component is summarizing a component of the total variance which exceeds (is less than) the information provided by the original variable. Therefore, it is common that only principal components with eigenvalues greater than one are considered.
Decribe factor analysis
Factor analysis is similar to principal component analysis, except that factor analysis is used to describe observed variables in terms of fewer unobserved variables called “factors” and can be seen as more efficient.
Factor analysis is often used as the second stage of principal component analysis. In terms of the process, step one is to standardize principal components. Then, the values of the new variables (factor loadings) should be standardized such that the mean equals zero and the standard deviation is equal to one. Even though factor loadings are not comparable (from a size and range perspective) to original variables, they are comparable to each other.
Factors will be contingent on the criteria used to conduct what is called the “rotation.” The varimax method is a rotation method used to target either small or large loadings of a particular variable associated with each factor. As a result of iteratively rotating factor pairs, the resulting solution yields results that make it feasible to identify each variable tied to a single factor. A final solution is reached once the last round provides no added benefit.
Canonical correlation method
- The canonical correlation method is a technique used to address the correspondence between a set of independent variables and a set of dependent variables.
- As an example, if an analyst wanted to understand what is explaining the default rate and any changes in default rates over various time horizons, he can look at the relationship between default rate factors and financial ratio factors and understand what common dimensions existed between the tests and the degree of shared variance.
- This analysis, which is a type of factor analysis, helps us find linear combinations of the two sets that have a maximum correlation with each other. From this analysis, we can determine how many factors are embedded in the set of dependent variables and what the corresponding factors are out of the independent variables that have maximum correlations with the factors from the dependent variable set. The factors from both sets are independent of one another.
- Although this method is very powerful, the disadvantages are that it is difficult to rigorously calculate scores for factors, and measuring the borrower profiles can only be done by proxy as opposed to measuring them in new independent and dependent factors.
Describe the use of a cash flow simulation model in assigning rating and default probability, and explain the limitations of the model
- A cash flow simulation model is most often used to assign ratings to companies that have non-existent or relatively meaningless track records. In an ideal situation, a given firm’s future cash flow simulation will stay in the middle between structural and reduced form models. The simulation will be based on forecasting a firm’s pro forma financial reports and studying the volatility of future performances.
- One of the biggest risks of cash flow simulation models is model risk, which stems from the fact that any model serves as a simplified version of reality. Defining default for the purposes of the model is also challenging, as it cannot always be known if and when a default will actually be filed in real-life circumstances.
- Therefore, the default threshold needs to be set such that it is not too early (the risk of having too many defaults, resulting in transactions that are deemed risky when they are not truly risky) and not too late (the risk of having not enough defaults, thereby understating the potential risk).
- Costs must also be taken into account, as models can cost a lot of money to build, maintain, and calibrate.
- Even given these issues, there are not many feasible alternatives to using the simulation model for a firm in certain conditions when historical data cannot be observed.
Heuristic and numerical methods in predicting defaults
Through the application of artificial intelligence methods, other techniques have been applied to predicting default in recent years. These two primary approaches include:
- Heuristic methods. These methods are designed to mirror human decision-making processes and procedures. Trial by error is used to generate new knowledge rather than using statistical modeling. These methods are also known as “expert systems,” with a goal of reproducing high frequency standardized decisions at the highest level of quality at a low cost. The fundamental idea is to learn from both successes and errors.
- Numerical methods. The objective of these methods is to derive optimal solutions using “trained” algorithms and incorporate decisions based on relatively weak information in very complex environments. An example of this is a “neural network,” which is able to continuously update itself in order to incorporate modifications to the environment.