Module 5: Data and AI Model Governance Flashcards

Question 1

Q

How is data governance defined?

Answer

A

The Data Governance Institute defines data governance as “a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.”

Question 2

Q

What are the three dimensions data sources are judged against?

Answer

A

Accuracy
Consistency
Integrity (data security)

Question 3

Q

What is data provenance?

Answer

A

The word “provenance” refers to the sequence of ownership and handling of items of value.

The use of alternative data has resulted in scrutiny of whether the vendors of such data and the builders of models based thereon have a legal right to use this data. Copyright and intellectual property lawsuits against vendors of Generative AI models are indicative of this issue.

The consequence of model destruction means that no model can be approved for use under model risk-management guidelines without proving ownership of the input data.

Question 4

Q

What additional scrutiny appears with confidential or personally identfiable data?

Answer

A

Controls must be present, operational, and effective.

What is critical for good governance is to distinguish the classification of data being used, and ensure access and use is permissible.

Question 5

Q

For what is a metadata management strategy important?

Answer

A

A robust metadata management strategy should aim to ensure data is high-quality, consistent, and accurate across various systems. Data documentation, data mapping, data dictionary, data definitions, data process flows, data relationships to other data, and data structures are essential for robust metadata management. The use of a comprehensive metadata-management strategy should enable better-informed business decisions, which is an important objective of any data governance initiative.

Question 6

Q

What is a data security strategy for?

Answer

A

A solid data-protection strategy should be in place for safeguarding important information from corruption, malicious or accidental damage, compromise, or loss. The importance of data protection increases with the amount of data created and stored. A data retention policy should also be in place and adhered to.

Question 7

Q

What is the Gramm Leach Bliley Act (GLBA)?

Answer

A

The Gramm-Leach-Bliley Act (GLBA) requires financial institutions to provide customers with information about the institutions’ privacy practices and about their opt-out rights, and to implement security safeguards for customer information. (United States)

Question 8

Q

How can current regulations be boiled down to basic principles?

Answer

A

The majority of regulation can be boiled down to three basic principles: obtaining consent, minimizing the amount of data you hold, and ensuring the rights of data subjects.

Question 9

Q

What is the role of the board of directors in the context of data governance?

Answer

A

A company’s board of directors plays a crucial role in overseeing a firm’s data governance framework, and ensuring that the framework aligns with the organization’s strategic objectives, risk management policies, and compliance requirements.

Among other responsibilities, the board provides approval of the overall data governance framework and policies.

The board is also responsible for oversight of compliance and risk management, ensuring that the organization’s data-governance practices comply with legal, regulatory, and ethical standards. This includes overseeing compliance with data-protection laws (e.g., GDPR), industry standards, and internal policies.

The board also assesses and manages risks related to data breaches, data quality issues, and misuse of data. The board further ensures that the data-governance framework is regularly reviewed and updated to adapt to changing business needs, technologies, and regulatory requirements.

Question 10

Q

Which three elements are captured by quantitative risk modeling?

Answer

A

Quantity of interest: A numerical object whose future value, referring to a specific point in time (the risk horizon), is uncertain.

Potential future scenarios: These scenarios represent possible values of the quantity of interest. They depict potential future outcomes, such as the value of a portfolio in ten business days conditioned on a specific investment decision. To facilitate quantitative analysis, each potential scenario is assigned a weight (probability), indicating its relative importance compared to other scenarios.

Risk measure: This summarizes the essential information derived from analyzing the potential future scenarios. Examples include evaluating the value of a portfolio in ten business days using value at risk (VaR)

Even the most basic statistical risk measures can be useful within the context of QRMs. Often these statistical measures are then brought back in to quantity of interest

In summary, risk models offer a structured approach to envision the future through scenario analysis

Question 11

Q

How is effectiveness of quantitative risk modeling impacted by challenges?

Answer

A

Completeness of scenario sets. It is challenging to anticipate every potential future scenario, especially regarding rare events. Historical data may not fully reflect these events, and capturing them through expert judgment can be difficult.

Feedback effects. The presence of feedback can complicate matters as scenarios and subsequent decisions may influence the behavior of other market participants. As a result, scenario sets and their weights may need ongoing updates.

Communication of results. Reports on QRMs should provide a summary of the main assumptions used, ensuring transparency and avoiding complacency, and should reflect perceived risk based on perceived uncertainty and exposure.

Question 12

Q

What is the difference between white-box and black-box testing?

Answer

A

White-box testing involves testers having access to internal data structures, algorithms, and the actual code. It may include line-by-line proofreading of the code.

Black-box testing treats the software as a closed box, without any knowledge of its internal implementation (partial knowledge is referred to as grey box testing).

Employing both approaches can be beneficial, as white-box testing is considered more effective, whereas black-box testing reduces the likelihood of bias.

Question 13

Q

Should imperfect data used for model development?

Answer

A

Sometimes, the most valuable test configurations emerge from real-world situations. For instance, if the model implementation has reached a prototypical state where parameters and input data can be fed into it, establishing a preliminary process that automatically generates test results from available data is recommended. The less realistic the parameters and input data (e.g., missing trades, excessive trades, incorrect scales or mappings, extreme parameter values, parameters estimated from insufficient data, data provided by inexperienced users, variations in compilers or hardware), the better. Documenting the experience gained from these tests is invaluable.

Question 14

Q

What is the use test?

Answer

A

the use test examines the QRM within its context, considering human interaction, actual usage, acceptance of the model, interpretation of results, and the application of those results. The use test is closely intertwined with user testing, but its implications go beyond that. It serves as an ongoing validation tool, which may not be initiated until the QRM has been in use for a considerable period. The use test is qualitative in nature, and it is difficult to follow a schematic treatment. It represents a validation ideal rather than a specific tool. In essence, the use test evaluates adherence to a foundational principle.

The results of the use test are typically presented to senior management rather than documented as technical reports or spreadsheets.

Question 15

Q

What is model validation?

Answer

A

Model validation follows a lifecycle starting with the identification of the model. Once a model is identified, it is inventoried and scheduled for an initial baseline validation, which occurs prior to implementation and usage.

Once the model is in production, routine periodic validations occur to ensure that the model continues to perform as expected. These can include annual reviews and more in-depth periodic baseline revalidations.

A change-based validation will be triggered if the model owner makes a material change. Ultimately, the model may be retired, in which case it should be stored in a retired model inventory and then decommissioned.

Back testing and performance monitoring occur throughout the model’s production usage.

The model validation effort culminates in a validation report and a rating of whether the model has passed validation.

Findings or issues that need to be addressed by the model developers and/or owners may also result from the validation.

Question 16

Q

What the determines the frequency of model validation?

Answer

A

The frequency and intensity of validation activities should be determined based on the risk ranking of the model. For example, a bank’s high-risk models may have a periodic baseline revalidation every two years, whereas lower-risk models may be on a three- or four-year frequency, and so on.

Question 17

Q

Which three goals does initial validation of a model serve?

Answer

A

Ensuring that the model’s operational feasibility has been checked, and that the model can run as intended without technical malfunctions

Ensuring that the model is properly documented, adheres to firm-wide standards (including but not limited to model development standards, documentation standards, implementation standards, model monitoring standards, and third-party governance standards), and includes executive summaries for essential documents

Ensuring that model users receive sufficient training to interpret and utilize the model’s results effectively

Question 18

Q

What is the primary objective of ongoing or periodic validation?

Answer

A

The primary objective of ongoing or periodic validation is to observe whether the model remains aligned with its intended purpose, the assumptions remain valid, the data are still appropriate, and performance monitoring indicates that the model continues to perform as expected. In addition, model methodology should be reassessed periodically to ensure that it is still in line with best practices and reflects the real world. The real world consistently challenges model assumptions, requiring ongoing validation to assess whether the original assumptions remain valid. Ongoing validation involves an iterative process between the modeling and validation cycles, adapting to changes in the model and repeating successful validation activities when necessary. Ongoing validation could also be supported using benchmark or challenger models.

Question 19

Q

What three general guidelines can support establishing a risk modeling culture?

Answer

A

Awareness: Be aware of the limitations and assumptions of risk modeling. Understand your company’s history with QRMs, the risks they entail, and the validation processes in place. Stay informed about market practices. Recognize that the world is constantly changing.

Transparency: Transparently communicate the assumptions, limitations, and documentation of QRMs. Provide detailed documentation with executive summaries. Document the decision-making process during model development and all validation activities, including unsuccessful attempts. Engage in open communication with end users.

Experience: Learn from past modeling endeavors and apply relevant lessons. Emphasize proper project management and develop prototypes early on. Collect data and continuously improve quantitative skills. Establish and maintain libraries of reusable code. Seek input from other modelers and consider external experts for validation activities.

Question 20

Q

What are the typical steps in model development and testing?

Answer

A

Define objectives and scope.
Data collection and preprocessing
Exploratory data analysis.
Feature engineering.
Model selection.
Model training / model calibration.

Question 21

Q

What are unit tests?

Answer

A

Unit tests validate that each piece of the code performs as designed.

Question 22

Q

What are component tests?

Answer

A

Component tests verify the functionality of individual code sections (e.g., the QRM’s core can be tested independently).

Question 23

Q

What are integration tests?

Answer

A

Integration tests verify the interfaces between components (e.g., between the QRM’s core and its input processing components).

Question 24

Q

What are system tests?

Answer

A

System tests ensure that the completely integrated modeling system meets its requirements. This process may be repeated on a higher level as a system integration test.

Question 25

Q

What are performance tests?

Answer

A

Performance tests assess the speed, responsiveness, and stability of the model or application under various conditions.

Question 26

Q

What are regression tests?

Answer

A

Regression tests verify that the system continues to function after modifications to some of its components.

Question 27

Q

What are acceptance tests?

Answer

A

Acceptance tests determine whether the model or application meets the agreed-upon requirements and is ready for deployment.

Question 28

Q

What is a model?

Answer

A

A model is often described as a simplified representation or abstraction of reality, designed to understand, analyze, or predict aspects of the real world. Whether a calculation constitutes a model can be a gray area, but the presence of uncertainty may indicate that the calculation rises to the level of being a model.

Question 29

Q

Which new aspects compared to traditional models do AI/ML applications require in the model inventory context?

Answer

A

Complexity of methodology and design: With AI/ML models, complexity of the model design becomes more relevant than ever.

Data usage: Data drives the complexity of the AI/ML methodology and thus the difficulty in assessing the model components. Influencing factors to be evaluated are the volume of required data or number of data features, the complexity of data structures (unlabeled, metadata), the quality of data (poorly labeled, low quality, or unstructured data) and whether there are variable interactions and transformations.

Output parameters: A further decisive factor is whether the model in question is based on supervised machine learning with delimited output parameters (e.g., the prediction of a property price) or unsupervised learning (e.g., sentiment analysis, clusters, recommendations), in which there is no direct way to evaluate output accuracy.

Model recalibration: An institution might determine if the model in question is static or requires continuous recalibrations. Thereby, the complexity varies depending on whether a potential recalibration of the model would require an entire redevelopment, or if the initial model structure might be maintained and only retraining the model with recent data would be required.

Model risk ranking: Model risk ranking factors may differ across firms, but can generally be expected to include materiality, complexity, and exposure. Consideration of materiality is essential for efficient allocation of resources within the model risk management process. Prioritization can be based on various factors like economic, operational, or reputational consequences of misused models, usage by different parties, or impact on decisions, financial statements, or regulatory reporting.

Model performance-monitoring metrics: Performance monitoring needs to be tailored to the model type.

Question 30

Q

What typically part of a validation for black box models?

Answer

A

Focus on explainability
Benchmarking
Assessment based on different data sets
Assessment of specific cases
Backtesting
Reporting component

Question 31

Q

What are three main recommendations for model risk governance frameworks for ML/AI Applications?

Answer

A

Begin with existing model risk frameworks. Even though AI/ML introduces new challenges for model risk management, enhanced model risk frameworks should not start from scratch.

Consider the new role of data. The new paradigm suggests that AI/ML is “model free,” and everything depends only on the data. Though that may not be literally true, the more important role of data needs to be addressed within model risk governance frameworks.

Add new perspectives to your model inventory. When it comes to model risk classification, AI/ML will increase the relevance of ethical aspects due to data bias, explainability of model results, and the role of the recalibration process.

Question 32

Q

What is model validation intended for?

Answer

A

Focus on suitability: The ultimate goal of validation is to assess whether a model is appropriate and effectively used for its intended purpose. Initial validation is critical and helps to establish the model’s credibility. After that, ongoing performance monitoring will be performed as proposed and approved as part of the validation process. However, it is crucial to ensure that validation activities remain within the domain of the model’s intended application.

Question 33

Q

What is recursive validation?

Answer

A

Recursive validation: Validation activities should be “recursive,” meaning that they should not be immune to critical examination. It is essential to ensure that the validation process does not rely on defective or inappropriate data and remains within the model’s intended application domain. This emphasizes the need for a challenge of validation activities.

Question 34

Q

What is the goal of validation?

Answer

A

The goal of the validation exercise is thus not to test for a “valid” or fully validated model, but rather to subject the model to a series of attempts to invalidate it. Successful validation implies that the model has withstood rigorous testing, although ongoing validation efforts should be continuous as new challenges may arise.

Question 35

Q

What is the balance between usefulness vs. validity?

Answer

A

Models do not have to be perfect to be useful and used. If weaknesses and limitations are identified during the development or validation process, the insights gained can be utilized to improve the model or restrict its application, leading to an enhanced understanding of both the model’s capabilities and limitations.

Question 36

Q

What is “effective challenge” in the context of model validation?

Answer

A

Effective challenge: The Fed’s SR 11-7 speaks of the “effective challenge” of models. This should encompass a critical analysis by objective and informed parties that are able to identify crucial model assumptions and limitations, and to spot and communicate relevant model weaknesses. This requires independence from the model-development process, a high level of competence with respect to validation activities, and a sufficient degree of influence to spark model improvements.

Question 37

Q

What motivational drivers exist to develop a new QRM or replacing an existing one?

Answer

A

Changing conditions: New models may be required if a business enters new markets, develops new products, or wants to respond to customer requests. These models must be validated.

Regulatory pressure: Regulatory findings that a model is not performing as intended may drive the need for redevelopment or replacement of a model.

Recent crisis: Losses experienced in recent crises may highlight the urgency for enhanced or more sensitive modeling.

Internal concerns: Senior management or other stakeholders may express discomfort with the existing quantitative modeling practices, driven by changes in the business model, increased exposure to innovative products, or changing market conditions.

Innovation and business growth: Businesses may need to respond to new or evolving customer demands, spurring the need for new models or the extension of existing models to accommodate new products.

Businesses may also acquire new lines of business, requiring additional models.

Question 38

Q

What is one of the most challenging aspect of QRM design?

Answer

A

One of the most challenging aspects of QRM design is designing and constructing a set of potential future scenarios and assigning weights to them. During this step, modelers must be mindful of the difficulties associated with risk definition and the issues surrounding statistical usage discussed previously

Question 39

Q

How can shortcomings in a model be addressed?

Answer

A

To mitigate these shortcomings, experienced users may qualitatively adjust for the limitations of such models. It is essential to strike a balance between the robustness of models and the need to account for their shortcomings. Transparent communication and awareness of the potential risks involved are critical in leveraging the benefits of QRMs while avoiding potential pitfalls.

Question 40

Q

What is Discretization?

Answer

A

In some cases, it is more practical to describe the world using continuous variables that can take any value within a certain range. However, there may be cases where variables are discretized, assuming a finite number of values.

Question 41

Q

Why is discretization used?

Answer

A

This might occur for purposes of reducing complexity in modeling path-dependent events such as mortgage prepayments, or when modeling events that occur only at discrete time intervals. In addition, models may involve complex equations or formulas that lack closed form or analytic solutions. Numerical approximation via discretization is a key solution methodology and should yield very close results when properly implemented.

Question 42

Q

What happens to time intervals in discretization?

Answer

A

In discretization, time intervals are divided into days, months, or years, and monetary gains or losses are rounded to cents or other discrete measures. The discretization should accurately reflect the underlying physical or mathematical principles of the continuous model, ensuring that the discrete model is a faithful representation.

Question 43

Q

What are the downsides of discretization?

Answer

A

Care should always be taken in discretization to account for discretization error, stability, convergence, computational complexity, boundary conditions, numerical precision, and the like. The choice of time step is crucial, as with some systems too large a time step can lead to numerical instability, whereas too small a step can lead to excessive computational time. Consistency is also an important consideration.

Question 44

Q

What is approximation in the model validation process?

Answer

A

Approximation involves replacing complex or hard-to-evaluate model components with alternative methods that produce similar results more conveniently. However, choosing an approximation fixes the precision, and there is no room for improvement except by selecting a different approximation. Common examples include the use of price sensitivities or Taylor series approximation in market risk modeling. Although these approximations simplify calculations, they are valid for simple financial instruments like plain bonds and small changes in risk factors. Including instruments based on sensitivities is preferable to excluding them entirely, as long as their limitations are understood.

Question 45

Q

What is numerical evaluation in the model validation context?

Answer

A

In contrast to approximation, numerical evaluation means evaluation of a model component with the potential for arbitrary precision. Therefore, methods for numerical evaluation are frequently equipped with parameters allowing control of precision (order, number of iterations, step size, grid size) – so discretization may serve as a tool here. These methods are theoretically available for all model components, backed by mathematical justifications under certain conditions. However, meeting these conditions in practice can be challenging, requiring a trade-off between running time and precision

Question 46

Q

What is the use case for random numbers?

Answer

A

In some cases, QRMs rely heavily on probability theory, necessitating the assignment of values (realizations) to random variables modeling risk factors. In cases where the distributions of random variables are not empirically given, samples must be drawn from these distributions.

Question 47

Q

What are the core model implementation tasks?

Answer

A

Model design (task M0): The model is designed and documented in a way that translation into computer code is possible.

Core implementation (task C0): The inner workings of the model are implemented as a black box with specified interfaces.

System implementation (task So): The model core is integrated into a (new or existing) system that provides user interfaces, collects input data and parameters, schedules computations, feeds the core, processes output data, and keeps a history of the computations performed.

Question 48

Q

What are the core model adaptation tasks?

Answer

A

Model adaptation (task Mt): The model and its documentation are adapted (or even replaced) in a way that adaptation of existing computer code is possible.

Core adaptation (task Ct): The model core, and perhaps its interfaces, are adapted (or even replaced).

System adaptation (task St): The system around the model core is adapted (or even replaced).

Question 49

Q

What three aspects of running a model are crucial?

Answer

A

Robustness: QRMs can become complex, intertwining with multiple systems. This complexity increases the risk of failure and should be minimized. Efforts should be made to eliminate exotic systems or to standardize interfaces.

Speed: The total response time required for QRMs varies greatly. Credit risk models with quarterly reporting and extensive data-gathering processes may take weeks, whereas market risk models with daily reporting and real-time pre-deal checking demand faster processing.

Flexibility: A desirable aspect of a QRM is its ability to be run in two ways: scheduled runs and ad hoc runs. Scheduled runs provide regular numbers with minimal manual intervention, utilizing stored environments for quick recovery of computations. Emphasis is placed on process- and IT-related reliability and safety. Ad hoc runs, on the other hand, provide additional information on short notice, usually requiring manual intervention and manipulation of input data.

Question 50

Q

What are generally the causes for misinterpreting QRM results?

Answer

A

Lack of comprehension: Misinterpretation often occurs when users lack a comprehensive understanding of the risk model, its underlying assumptions, and the limitations of the outputs.

Neglecting model limitations: Risk models have inherent limitations, and users should be aware of these limitations when interpreting the results.

Overemphasis on point estimates: QRMs often provide estimates with a certain level of uncertainty. Misinterpretation can occur when users focus solely on the point estimates without considering the associated range of potential outcomes or probabilities.

Failure to consider qualitative factors: QRMs tend to focus on quantitative measures, such as probabilities, statistics, and numerical outcomes. However, qualitative factors should not be overlooked

Ignoring the context of the decision: Misinterpretation can occur when QRM results are considered independently, without considering the specific context of the decision being made.

Confirmation bias: Users may unintentionally interpret QRM results in a way that aligns with their preconceived beliefs or desired outcomes.

Question 51

Q

Which three points should be taken into account for validation frameworks in the AI/ML context?

Answer

A

Team qualification. The validation team, in addition to having knowledge of and hands-on expertise in AI/ML techniques, ought to have strong foundations in product, econometric, statistical, and computer science knowledge, as well as a proactive approach to staying abreast of the latest AI/ML advancements.

End-to-end review. The entire model validation framework needs to be reviewed and adapted to the likelihood that AI/ML algorithms will eventually be subject to regulatory review and approval.

More complex is not always better. An appropriate balance needs to be found between model performance and all the other factors (e.g., interpretability, feedback from the supervisor, cost and effort of model implementation, maintenance and monitoring, in-house expertise, availability of reputed code libraries, academic underpinning, etc).

Question 52

Q

Brainscape's Knowledge GenomeTM

Module 5: Data and AI Model Governance Flashcards

Brainscape's Knowledge Genome^TM