Module 5: Data and AI Model Governance Flashcards
How is data governance defined?
The Data Governance Institute defines data governance as “a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.”
What are the three dimensions data sources are judged against?
Accuracy
Consistency
Integrity (data security)
What is data provenance?
The word “provenance” refers to the sequence of ownership and handling of items of value.
The use of alternative data has resulted in scrutiny of whether the vendors of such data and the builders of models based thereon have a legal right to use this data. Copyright and intellectual property lawsuits against vendors of Generative AI models are indicative of this issue.
The consequence of model destruction means that no model can be approved for use under model risk-management guidelines without proving ownership of the input data.
What additional scrutiny appears with confidential or personally identfiable data?
Controls must be present, operational, and effective.
What is critical for good governance is to distinguish the classification of data being used, and ensure access and use is permissible.
For what is a metadata management strategy important?
A robust metadata management strategy should aim to ensure data is high-quality, consistent, and accurate across various systems. Data documentation, data mapping, data dictionary, data definitions, data process flows, data relationships to other data, and data structures are essential for robust metadata management. The use of a comprehensive metadata-management strategy should enable better-informed business decisions, which is an important objective of any data governance initiative.
What is a data security strategy for?
A solid data-protection strategy should be in place for safeguarding important information from corruption, malicious or accidental damage, compromise, or loss. The importance of data protection increases with the amount of data created and stored. A data retention policy should also be in place and adhered to.
What is the Gramm Leach Bliley Act (GLBA)?
The Gramm-Leach-Bliley Act (GLBA) requires financial institutions to provide customers with information about the institutions’ privacy practices and about their opt-out rights, and to implement security safeguards for customer information. (United States)
How can current regulations be boiled down to basic principles?
The majority of regulation can be boiled down to three basic principles: obtaining consent, minimizing the amount of data you hold, and ensuring the rights of data subjects.
What is the role of the board of directors in the context of data governance?
A company’s board of directors plays a crucial role in overseeing a firm’s data governance framework, and ensuring that the framework aligns with the organization’s strategic objectives, risk management policies, and compliance requirements.
Among other responsibilities, the board provides approval of the overall data governance framework and policies.
The board is also responsible for oversight of compliance and risk management, ensuring that the organization’s data-governance practices comply with legal, regulatory, and ethical standards. This includes overseeing compliance with data-protection laws (e.g., GDPR), industry standards, and internal policies.
The board also assesses and manages risks related to data breaches, data quality issues, and misuse of data. The board further ensures that the data-governance framework is regularly reviewed and updated to adapt to changing business needs, technologies, and regulatory requirements.
Which three elements are captured by quantitative risk modeling?
Quantity of interest: A numerical object whose future value, referring to a specific point in time (the risk horizon), is uncertain.
Potential future scenarios: These scenarios represent possible values of the quantity of interest. They depict potential future outcomes, such as the value of a portfolio in ten business days conditioned on a specific investment decision. To facilitate quantitative analysis, each potential scenario is assigned a weight (probability), indicating its relative importance compared to other scenarios.
Risk measure: This summarizes the essential information derived from analyzing the potential future scenarios. Examples include evaluating the value of a portfolio in ten business days using value at risk (VaR)
Even the most basic statistical risk measures can be useful within the context of QRMs. Often these statistical measures are then brought back in to quantity of interest
In summary, risk models offer a structured approach to envision the future through scenario analysis
How is effectiveness of quantitative risk modeling impacted by challenges?
Completeness of scenario sets. It is challenging to anticipate every potential future scenario, especially regarding rare events. Historical data may not fully reflect these events, and capturing them through expert judgment can be difficult.
Feedback effects. The presence of feedback can complicate matters as scenarios and subsequent decisions may influence the behavior of other market participants. As a result, scenario sets and their weights may need ongoing updates.
Communication of results. Reports on QRMs should provide a summary of the main assumptions used, ensuring transparency and avoiding complacency, and should reflect perceived risk based on perceived uncertainty and exposure.
What is the difference between white-box and black-box testing?
White-box testing involves testers having access to internal data structures, algorithms, and the actual code. It may include line-by-line proofreading of the code.
Black-box testing treats the software as a closed box, without any knowledge of its internal implementation (partial knowledge is referred to as grey box testing).
Employing both approaches can be beneficial, as white-box testing is considered more effective, whereas black-box testing reduces the likelihood of bias.
Should imperfect data used for model development?
Sometimes, the most valuable test configurations emerge from real-world situations. For instance, if the model implementation has reached a prototypical state where parameters and input data can be fed into it, establishing a preliminary process that automatically generates test results from available data is recommended. The less realistic the parameters and input data (e.g., missing trades, excessive trades, incorrect scales or mappings, extreme parameter values, parameters estimated from insufficient data, data provided by inexperienced users, variations in compilers or hardware), the better. Documenting the experience gained from these tests is invaluable.
What is the use test?
the use test examines the QRM within its context, considering human interaction, actual usage, acceptance of the model, interpretation of results, and the application of those results. The use test is closely intertwined with user testing, but its implications go beyond that. It serves as an ongoing validation tool, which may not be initiated until the QRM has been in use for a considerable period. The use test is qualitative in nature, and it is difficult to follow a schematic treatment. It represents a validation ideal rather than a specific tool. In essence, the use test evaluates adherence to a foundational principle.
The results of the use test are typically presented to senior management rather than documented as technical reports or spreadsheets.
What is model validation?
Model validation follows a lifecycle starting with the identification of the model. Once a model is identified, it is inventoried and scheduled for an initial baseline validation, which occurs prior to implementation and usage.
Once the model is in production, routine periodic validations occur to ensure that the model continues to perform as expected. These can include annual reviews and more in-depth periodic baseline revalidations.
A change-based validation will be triggered if the model owner makes a material change. Ultimately, the model may be retired, in which case it should be stored in a retired model inventory and then decommissioned.
Back testing and performance monitoring occur throughout the model’s production usage.
The model validation effort culminates in a validation report and a rating of whether the model has passed validation.
Findings or issues that need to be addressed by the model developers and/or owners may also result from the validation.
What the determines the frequency of model validation?
The frequency and intensity of validation activities should be determined based on the risk ranking of the model. For example, a bank’s high-risk models may have a periodic baseline revalidation every two years, whereas lower-risk models may be on a three- or four-year frequency, and so on.
Which three goals does initial validation of a model serve?
Ensuring that the model’s operational feasibility has been checked, and that the model can run as intended without technical malfunctions
Ensuring that the model is properly documented, adheres to firm-wide standards (including but not limited to model development standards, documentation standards, implementation standards, model monitoring standards, and third-party governance standards), and includes executive summaries for essential documents
Ensuring that model users receive sufficient training to interpret and utilize the model’s results effectively
What is the primary objective of ongoing or periodic validation?
The primary objective of ongoing or periodic validation is to observe whether the model remains aligned with its intended purpose, the assumptions remain valid, the data are still appropriate, and performance monitoring indicates that the model continues to perform as expected. In addition, model methodology should be reassessed periodically to ensure that it is still in line with best practices and reflects the real world. The real world consistently challenges model assumptions, requiring ongoing validation to assess whether the original assumptions remain valid. Ongoing validation involves an iterative process between the modeling and validation cycles, adapting to changes in the model and repeating successful validation activities when necessary. Ongoing validation could also be supported using benchmark or challenger models.
What three general guidelines can support establishing a risk modeling culture?
Awareness: Be aware of the limitations and assumptions of risk modeling. Understand your company’s history with QRMs, the risks they entail, and the validation processes in place. Stay informed about market practices. Recognize that the world is constantly changing.
Transparency: Transparently communicate the assumptions, limitations, and documentation of QRMs. Provide detailed documentation with executive summaries. Document the decision-making process during model development and all validation activities, including unsuccessful attempts. Engage in open communication with end users.
Experience: Learn from past modeling endeavors and apply relevant lessons. Emphasize proper project management and develop prototypes early on. Collect data and continuously improve quantitative skills. Establish and maintain libraries of reusable code. Seek input from other modelers and consider external experts for validation activities.
What are the typical steps in model development and testing?
Define objectives and scope.
Data collection and preprocessing
Exploratory data analysis.
Feature engineering.
Model selection.
Model training / model calibration.