Security, compliance, and governance for AI solutions Flashcards by Brandon Copeland

What are the primary goals of security, compliance, governance?

Security: Ensure that confidentiality, integrity, and availability are maintained for organizational data and information assets and infrastructure. This function is often called information security or cybersecurity in an organization.

Governance: Ensure that an organization can add value and manage risk in the operation of business.

Compliance: Ensure normative adherence to requirements across the functions of an organization.

How well did you know this?

Not at all

Perfectly

What are some features of a defense in depth security strategy:

A defense in depth security strategy uses multiple redundant defenses to protect your AWS accounts, workloads, data, and assets. It helps make sure that if any one security control is compromised or fails, additional layers exist to help isolate threats and prevent, detect, respond, and recover from security events.
Applying a defense in depth security strategy to generative AI workloads, data, and information can help create the best conditions to achieve business objectives. Defense-in-depth security mitigates many of the common risks that any workload faces by layering controls, helping teams govern generative AI workloads using familiar tools.
You can use a combination of strategies, including AWS and AWS Partner services and solutions, at each layer to improve the security and resiliency of your generative AI workloads.

How well did you know this?

Not at all

Perfectly

What are the 7 layers of of defense in depth security and their associated AWS services?

1 - Data Protections:

Data at rest: Ensure that all data at rest is encrypted with AWS Key Management Service (AWS KMS) or customer managed key. Make sure all data and models are versioned and backed up using Amazon Simple Storage Service (Amazon S3) versioning.
Data in transit Protect all data in transit between services using AWS Certificate Manager (ACM) and AWS Private Certificate Authority (AWS Private CA). Keep data within virtual private clouds (VPCs) using AWS PrivateLink.

2 - Identity and Access Management: Identity and access management ensures that only authorized users, applications, or services can access and interact with the cloud infrastructure and its services. AWS offers several services that can be used for identity and access management. The fundamental service is AWS Identity and Access Management (IAM).

3 - Application Protection: This includes measures to protect against various threats, such as unauthorized access, data breaches, denial-of-service (DoS) attacks, and other security vulnerabilities. AWS offers several services to protect applications. These include the following:

AWS Shield
Amazon Cognito
Others

4 - Network and Edge Protection: Security services are used to protect the network infrastructure and the boundaries of a cloud environment. Services are designed to prevent unauthorized access, detect and mitigate threats, and ensure the security of the cloud-based resources. AWS services that provide network and edge protection include the following:

Amazon Virtual Private Cloud (Amazon VPC)
AWS WAF

5 - Infrastructure Protection: Protect against various threats, such as unauthorized access, data breaches, system failures, and natural disasters. Ensure the availability, confidentiality, and integrity of the cloud-based resources and data. Some AWS services and features include the following:

AWS Identity and Access Management (IAM)
IAM user groups and network access control lists (network ACLs)

6 - Threat Detection and Incident Response: Identify and address potential security threats or incidents. AWS services that help with threat detection include:

AWS Security Hub
Amazon GuardDuty

AWS services for incident response include the following:

AWS Lambda
Amazon EventBridge

7 - Policies, Procedures, and Awareness: Implement a policy of least privilege using AWS Identity and Access Management Access Analyzer to look for overly permissive accounts, roles, and resources. Then, restrict access using short-termed credentials.

How well did you know this?

Not at all

Perfectly

Give an example of a governance framework:

Establish an AI governance board or committee: This cross-functional team should include representatives from various departments, such as legal, compliance, data privacy, and subject matter experts in AI development.

Define roles and responsibilities: Clearly outline the roles and responsibilities of the governance board, including oversight, policy-making, risk assessment, and decision-making processes.

Implement policies and procedures: Develop comprehensive policies and procedures that address the entire AI lifecycle, from data management to model deployment and monitoring.

How well did you know this?

Not at all

Perfectly

What are the advantages of governance and compliance?

Managing, optimizing, and scaling the organizational AI initiative is at the core of the governance perspective. Incorporating AI governance into an organization’s AI strategy is instrumental in building trust. Governance also helps in enabling the deployment of AI technologies at scale, and overcoming challenges to drive business transformation and growth.
Governance and compliance are important for AI systems used in business to ensure responsible and trustworthy AI practices. As AI systems become more prevalent in decision-making processes, it is essential to have robust governance frameworks and compliance measures in place to mitigate risks. These risks include bias, privacy violations, and unintended consequences.
Governance helps organizations establish clear policies, guidelines, and oversight mechanisms to ensure AI systems align with legal and regulatory requirements, in addition to ethical principles and societal values. Therefore, governance protects the organization from potential legal and reputational risks. It also fosters public trust and confidence in the responsible deployment of AI technologies within the business context.

How well did you know this?

Not at all

Perfectly

Name the security standards used in AWS to secure AI systems:

_National Institute of Standards and Technology (NIST)_
The NIST 800-53 security controls are commonly used for U.S. federal information systems. Federal information systems typically need to undergo a formal evaluation and approval process to verify they have adequate safeguards in place to protect the confidentiality, integrity, and availability of the information and information systems.

_European Union Agency for Cybersecurity (ENISA)_
European Union Agency for Cybersecurity (ENISA) contributes to the EU’s cyber policy. It boosts trust in digital products, services, and processes by drafting cybersecurity certification schemes. It cooperates with EU countries and bodies and helps prepare for future cyber challenges.

_International Organization for Standardization (ISO)_
ISO is a security standard that outlines recommended security management practices and comprehensive security controls, based on the guidance provided in the ISO/IEC 27002 best practice document.

_AWS System and Organization Controls (SOC)_
The AWS System and Organization Controls (SOC) Reports are independent assessments conducted by third parties that show how AWS has implemented and maintained key compliance controls and objectives.

_Health Insurance Portability and Accountability Act (HIPAA)_
AWS empowers covered entities and their business associates under the U.S. HIPAA regulations to use the secure AWS environment for processing, maintaining, and storing protected health information.

_General Data Protection Regulation (GDPR)_
The European Union’s GDPR safeguards the fundamental right of EU citizens to privacy and the protection of their personal information. The GDPR establishes stringent requirements that raise and unify the standards for data protection, security, and compliance across the EU.

_Payment Card Industry Data Security Standard (PCI DSS)_
The PCI DSS is a private information security standard that is managed by the PCI Security Standards Council. This council was established by a group of major credit card companies, including American Express, Discover Financial Services, JCB International, Mastercard, and Visa.

How well did you know this?

Not at all

Perfectly

How does AI compliance differ from regular software development compliance?

_Complexity and opacity_
AI systems, especially large language models (LLMs) and generative AI, can be highly complex with opaque decision-making processes. This makes it challenging to audit and understand how they arrive at outputs, which is crucial for compliance.

_Dynamism and adaptability_
AI systems are often dynamic and can adapt and change over time, even after deployment. This makes it difficult to apply static standards, frameworks, and mandates.

_Emergent capabilities_
Emergent capabilities in AI systems refer to unexpected or unintended capabilities that arise as a result of complex interactions within the AI system. In contrast to capabilities that are explicitly programmed or designed. As AI systems become more advanced, they might exhibit unexpected or emergent capabilities that were not anticipated during the regulatory process. This requires ongoing monitoring and adaptability.

_Unique risks_
AI poses novel risks, such as algorithmic bias, privacy violations, misinformation, and AI-powered automation displacing human workers. Traditional requirements might not adequately address these. Algorithmic bias refers to the systematic errors or unfair prejudices that can be introduced into the outputs of AI and machine learning (ML) algorithms. The following are some examples:

Biased training data: If the data used to train the AI model is not representative or contains historical biases, the model can learn and perpetuate those biases in its outputs.
Human bias: The biases and assumptions of the human developers and researchers who create the AI systems can also get reflected in the final outputs.

_Algorithm accountability_
Algorithm accountability refers to the idea that algorithms, especially those used in AI systems, should be transparent, explainable, and subject to oversight and accountability measures. These safeguards are important because algorithms can have significant impacts on individuals and society. They can potentially perpetuate biases or make decisions that violate human rights or the principles of responsible AI. Examples of algorithm accountability laws include the European Union’s proposed Artificial Intelligence Act, which includes provisions for transparency, risk assessment, and human oversight of AI systems. In the United States, several states and cities have also passed laws related to algorithm accountability, such as New York City’s Automated Decision Systems Law.

The goal of these laws is to ensure that AI systems are designed and used in a way that respects human rights, promotes fairness and non-discrimination, and upholds the principles of responsible AI.

How well did you know this?

Not at all

Perfectly

What is a regulated workload?

Regulated is a common term used to indicate that a workload might need special consideration, because of some form of compliance that must be achieved.

This term often refers to customers who work in industries with high degrees of regulatory compliance requirements or high industrial demands.

Some example industries are as follows:

Financial services
Healthcare
Aerospace

How well did you know this?

Not at all

Perfectly

How do I know if I am using regulated workflows?

Questions to ask:

*Do you need to audit this workload?
*Do you need to archive this data for a period of time?
*Will the predictions created by my model constitute a record or other special data item?
*Do any of the systems you get the data from contain data classifications that are restricted by your organization’s governance, but not a regulatory framework? For example, customer addresses.

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Name and describe the AWS services available for governance and compliance:

_AWS Config_

AWS Config provides a detailed view of the configuration of AWS resources in your AWS account. This includes how the resources are related to one another and how they were configured in the past so that you can see how the configurations and relationships change over time.

When you run your applications on AWS, you usually use AWS resources, which you must create and manage collectively. As the demand for your application keeps growing, so does your need to keep track of your AWS resources. The following are some scenarios where AWS Config can help you oversee your application resources:

Resource administration: Exercise governance over your resource configurations and detect resource misconfigurations.
Auditing and compliance: You might work with data that requires frequent audits to ensure compliance with internal policies and best practices. AWS Config helps you demonstrate compliance by providing access to the historical configurations of your resources.
Managing and troubleshooting configuration changes: You can view how any resource you intend to modify is related to other resources and assess the impact of your change.

_Amazon Inspector_

Amazon Inspector is a vulnerability management service that continuously scans your AWS workloads for software vulnerabilities and unintended network exposure.

Amazon Inspector automatically discovers and scans running AWS resources for known software vulnerabilities and unintended network exposure. Some of these resources include Amazon Elastic Compute Cloud (Amazon EC2) instances, container images, and Lambda functions. Amazon Inspector creates a finding when it discovers a software vulnerability or network configuration issue.

Some examples of these vulnerabilities include the following:

Package vulnerability: There are software packages in your AWS environment that are exposed to common vulnerabilities and exposures (CVEs).
Code vulnerability: There are lines in your code that attackers could exploit. Other vulnerabilities include data leaks, weak cryptography, and missing encryption.
Network reachability: There are open network paths to Amazon EC2 instances in your environment.

Some features of Amazon Inspector include the following:

Continuously scan your environment for vulnerabilities and network exposure.
Assess vulnerabilities accurately and provide a risk score.
Identify high-impact findings.
Monitor and process findings with other services and systems.
Note: The risk score is based on security metrics from the National Vulnerability Database(opens in a new tab) (NVD) and is adjusted according to your compute environment.

_AWS Audit Manager_

AWS Audit Manager helps you continually audit your AWS usage to streamline how you manage risk and compliance with regulations and industry standards.

Audit Manager automates evidence collection so you can conveniently assess whether your policies, procedures, and activities (also known as controls) are operating effectively. When it’s time for an audit, Audit Manager helps you manage stakeholder reviews of your controls.

Some tasks you can perform with Audit Manager include the following:

Upload and manage evidence from hybrid or multi-cloud environments.
Support common compliance standards and regulations.
Monitor your active assessments.
Search for evidence.
Ensure evidence integrity.

_AWS Artifact_

AWS Artifact provides on-demand downloads of AWS security and compliance documents, such as AWS ISO certifications, PCI reports, and SOC Reports.

You can submit the security and compliance documents to your auditors or regulators to demonstrate the security and compliance of your AWS infrastructure.

AWS customers are responsible for developing or obtaining documents that demonstrate the security and compliance of their companies. You will learn more about the responsibilities of customers in a later lesson about the Shared Responsibility Model.

_AWS CloudTrail_
AWS CloudTrail helps you perform operational and risk auditing, governance, and compliance of your AWS account. Actions taken by a user, role, or an AWS service are recorded as events in CloudTrail. Events include actions taken in the AWS Management Console, AWS Command Line Interface (AWS CLI), and AWS SDKs and APIs.

Visibility into your AWS account activity is a key aspect of security and operational best practices. You can use CloudTrail to view, search, download, archive, analyze, and respond to account activity across your AWS infrastructure. You can identify who took which action, which resources were acted upon, and when the event occurred. These and other details can help you analyze and respond to activity in your AWS account.

_AWS Trusted Advisor_
AWS Trusted Advisor helps you optimize costs, increase performance, improve security and resilience, and operate at scale in the cloud.

Trusted Advisor continuously evaluates your AWS environment using best practice checks across the categories of cost optimization, performance, resilience, security, operational excellence, and service limits. It then recommends actions to remediate any deviations from best practices.

Use cases for Trusted Advisor include:

Optimizing cost and efficiency
Assessing your AWS environment against security standards and best practices
Improving performance
Improving resilience

How well did you know this?

Not at all

Perfectly

What are the data governance strategies for AI and generative AI workloads?

_Data quality and integrity_
To ensure the quality and integrity of your data, follow these steps:

Establish data quality standards and processes to ensure the accuracy, completeness, and consistency of data used for AI and generative AI models.
Implement data validation and cleansing techniques to identify and address data anomalies and inconsistencies.
Maintain data lineage and provenance to understand the origin, transformation, and usage of data.
Data lineage and provenance are concepts that describe the origins, history, and transformations of data as it flows through an organization.

_Data protection and privacy_
To ensure data protection and privacy, implement the following steps:

Develop and enforce data privacy policies that protect sensitive or personal information.
Implement access controls, encryption, and other security measures to safeguard data from unauthorized access or misuse.
Establish data breach response and incident management procedures to mitigate the impact of any data security incidents.

_Data lifecycle management_
Some steps for data lifecycle management include the following:

Classify and catalog data assets based on their sensitivity, value, and criticality to the organization.
Implement data retention and disposition policies to ensure the appropriate storage, archiving, and deletion of data.
Develop data backup and recovery strategies to ensure business continuity and data resilience.

_Responsible AI_
Some steps to ensure responsible AI include the following:

Establish responsible frameworks and guidelines for the development and deployment of AI and generative AI models, addressing issues like bias, fairness, transparency, and accountability.
Implement processes to monitor and audit AI and generative AI models for potential biases, fairness issues, and unintended consequences.
Educate and train AI development teams on responsible AI practices.

_Governance structures and roles_
Follow these steps to establish governance structures and roles:

Establish a data governance council or committee to oversee the development and implementation of data governance policies and practices.
Define clear roles and responsibilities for data stewards, data owners, and data custodians to ensure accountable data management.
Provide training and support to artificial intelligence and machine learning (AI/ML) practitioners and data users on data governance best practices.

_Data sharing and collaboration_
You can manage data sharing and collaboration as follows:

Develop data sharing agreements and protocols to facilitate the secure and controlled exchange of data across organizational boundaries.
Implement data virtualization or federation techniques to enable access to distributed data sources without compromising data ownership or control.
Foster a culture of data-driven decision-making and collaborative data governance across the organization.

How well did you know this?

Not at all

Perfectly

Name the concepts that are important considerations for the successful management and deployment of AI workloads:

_Data lifecycles_
Data lifecycles refer to the management of data throughout its entire lifespan, from creation to eventual disposal or archiving. In the context of AI workloads, the data lifecycle encompasses the following stages in the lifecycle of data used to train and deploy AI models:

Collection
Processing
Storage
Consumption
Disposal or archiving

_Data logging_
Data logging involves the systematic recording of data related to the processing of an AI workload. This can include the following:

Tracking inputs
Tracking outputs
Model performance metrics
System events
Effective data logging is necessary for debugging, monitoring, and understanding the behavior of AI systems.

_Data residency_
Data residency refers to the physical location where data is stored and processed. In the context of AI workloads, data residency considerations might include the following:

Compliance with data privacy regulations
Data sovereignty requirements
Proximity of data to the compute resources used for training and inference

_Data monitoring_
Data monitoring involves the ongoing observation and analysis of data used in AI workloads. This can include the following:

Monitoring data quality
Identifying anomalies (An anomaly is an unexpected data point that significantly deviates from the norm.)
Tracking data drift (Data drift is observed when the distribution of the input data changes over time.)
Monitoring also helps to ensure that the data being used for training and inference remains relevant and representative.

_Data analysis_
Data analysis methods are used to understand the characteristics, patterns, and relationships within the data used for AI workloads. These methods help to gain insights into the data. They include the following:

Statistical analysis
Data visualization
Exploratory data analysis (EDA): EDA is a task to discover patterns, understand relationships, validate assumptions, and identify anomalies in data.

_Data retention_
Data retention policies define how long data should be kept for AI workloads. This can be influenced by factors such as the following:

Regulatory requirements
Maintaining historical data for model retraining
Cost of data storage
Effective data retention strategies can help organizations manage the lifecycle of data used in their AI systems.

How well did you know this?

Not at all

Perfectly

Name important governance strategies with AI

_Policies_

Develop clear and comprehensive policies that outline the organization’s approach to generative AI, including principles, guidelines, and responsible AI considerations. Here are some common characteristics of policies:

Policies should address areas such as data management, model training, output validation, safety, and human oversight.
Policies should also cover aspects like intellectual property, bias mitigation, and privacy protection.
Ensure these policies are regularly reviewed and updated to keep pace with evolving technology and regulatory requirements.

_Review cadence_

Implement a regular review process to assess the performance, safety, and responsible AI implications of the generative AI solutions. Here are some common tasks to include in the review process:

The review process could involve a combination of technical, legal, and responsible AI reviews at different stages of the development and deployment lifecycle.
Establish a clear timeline for these reviews, such as monthly, quarterly, or bi-annually, depending on the complexity and risk profile of the solutions.
Ensure that the review process includes diverse perspectives from stakeholders, including subject matter experts, legal and compliance teams, and end-users.

_Review strategies_

Develop comprehensive review strategies that cover both technical and non-technical aspects of the generative AI solutions. Here is some suggested guidance for a review strategy:

Technical reviews should focus on model performance, data quality, and the robustness of the underlying algorithms.
Non-technical reviews should assess the solutions’ alignment with organizational policies, responsible AI principles, and regulatory requirements.
Incorporate testing and validation procedures to validate the outputs of the generative AI solutions before deployment.
Establish clear decision-making frameworks to determine when and how to intervene or modify the solutions based on the review findings.

_Transparency standards_

Commit to maintaining high standards of transparency in the development and deployment of generative AI solutions by ensuring the following:

Include publishing information about the AI models, their training data, and the key decisions made during the development process.
Provide clear and accessible documentation on the capabilities, limitations, and intended use cases of the generative AI solutions.
Establish channels for stakeholders, including end-users, to provide feedback and raise concerns about the solutions.

_Team training requirements_

Ensure that all team members involved in the development and deployment of generative AI solutions are adequately trained on relevant policies, guidelines, and best practices. Some suggestions for team training include the following:

Provide comprehensive training on bias mitigation, and responsible AI practices.
Encourage cross-functional collaboration and knowledge-sharing to foster a culture of responsible AI development.
Consider implementing ongoing training and certification programs to keep team members up to date with the latest advancements and regulatory changes.

How well did you know this?

Not at all

Perfectly

Name some key aspects to consider when monitoring an AI system.

_Performance metrics_

Monitor the performance of the AI system by tracking metrics, such as the following:

Model accuracy: The proportion of correct predictions made by the model
Precision: The ratio of true positive predictions to the total number of positive predictions made by the model
Recall: The ratio of true positive predictions to the total number of actual positive instances in the data
F1-score: The harmonic mean of precision and recall, which provides a balanced measure of model performance
Latency: The time taken by the model to make a prediction, which is an important measure of a model’s practical performance

These metrics can help you assess the effectiveness of the AI model and identify areas for improvement.

_Infrastructure monitoring_

Monitor the underlying infrastructure that supports the AI system, including the following:

Compute resources (for example, CPU, memory, GPU)
Network performance
Storage
System logs

This can help you identify resource bottlenecks, capacity planning issues, and potential system failures.

_Monitoring for bias and fairness_

Regularly assess the AI system for potential biases and unfair outcomes, especially in sensitive domains such as healthcare, finance, and HR. This can help ensure the AI system is making fair and unbiased decisions.

_Monitoring for compliance and responsible AI_

Ensure the AI system’s operations and outputs adhere to relevant regulations, industry standards, and responsible guidelines. Monitor for any potential violations or issues that could raise compliance or responsible AI concerns.

How well did you know this?

Not at all

Perfectly

Name the security tasks used to secure generative AI applications:

Study These Flashcards

_Threat detection_

To detect threats to your AI systems, do the following:

Identify and monitor for potential security threats, such as malicious actors attempting to exploit vulnerabilities in AI systems or using generative AI for malicious purposes.

The following are some examples:
- Generating fake content
- Manipulating data
- Automating attacks

You can assist threat detection by developing and deploying AI-powered threat detection systems. You can analyze network traffic, user behavior, and other data sources to detect and respond to potential threats.

_Vulnerability management_

To help manage vulnerability, do the following:

Identify and address vulnerabilities in AI and generative AI systems, including software bugs, model weaknesses, and potential attack vectors (for example, malware, viruses, and email attachments).
Regularly conduct security assessments, penetration testing (attempt to find and exploit vulnerabilities), and code reviews to uncover and address vulnerabilities. Implement robust patch management and update processes to ensure that AI systems are kept up to date and secure.

_Infrastructure protection_

To ensure that your infrastructure is protected, do the following:

Secure the underlying infrastructure that supports AI and generative AI systems, such as the following:

Cloud computing platforms
Edge devices
Data stores
Implement strong access controls, network segmentation, encryption, and other security measures to protect the infrastructure from unauthorized access and attacks.
Ensure that the AI infrastructure is resilient and can withstand failures, attacks, or other disruptions.

_Prompt injection_

You need to mitigate the risk of prompt injection attacks. In these attacks, adversaries attempt to manipulate the input prompts of generative AI models to generate malicious or undesirable content. To reduce the risk, do the following:

Employ techniques, such as prompt filtering, sanitization, and validation, to ensure that the input prompts are safe and do not contain malicious content.
Develop robust models and training procedures that are resistant to prompt injection attacks.

_Data encryption_
To protect the confidentiality and integrity of the data used to train and deploy AI and generative AI models, do the following:

Implement strong encryption mechanisms to secure both data at rest and data in transit. Data at rest refers to data that is stored on servers, in databases, or on local devices. Data in transit refers to data that is transmitted during communication between different components of the AI system.
Ensure that the encryption keys are properly managed and protected from unauthorized access.

Name OWASP’s (Open Web Application Security Project) top 10 vulnerabilities for LLMs:

Study These Flashcards

1 - Prompt injection: Malicious user inputs that can manipulate the behavior of a language model

2 - Insecure output handling: Failure to properly sanitize or validate model outputs, leading to security vulnerabilities

3 - Training data poisoning: Introducing malicious data into a model’s training set, causing it to learn harmful behaviors

4 - Model denial of service: Techniques that exploit vulnerabilities in a model’s architecture to disrupt its availability

5 - Supply chain vulnerabilities: Weaknesses in the software, hardware, or services used to build or deploy a model

6 - Sensitive information disclosure: Leakage of sensitive data through model outputs or other unintended channels

7 - Insecure plugin design: Flaws in the design or implementation of optional model components that can be exploited

8 - Excessive agency: Granting a model too much autonomy or capability, leading to unintended and potentially harmful actions

9 - Overreliance: Over-dependence on a model’s capabilities, leading to over-trust and failure to properly audit its outputs

10 - Model theft: Unauthorized access or copying of a model’s parameters or architecture, allowing for its reuse or misuse

What are four foundational AWS security services recommended for any workload, any customer, and any industry?

Study These Flashcards

Security Hub
AWS KMS
GuardDuty
AWS Shield Advanced

*Each service provides protection in one of the core security domains of incident response, data protection, threat detection, and network and application protection.

Describe these services:

Study These Flashcards

AWS Security Hub provides customers with a single dashboard to view all security findings, and to create and run automated playbooks.
AWS KMS encrypts data and gives customers the choice and control of using AWS managed keys or customer-managed keys to protect their data.
Amazon GuardDuty is a threat detection service that monitors for suspicious activity and unauthorized behavior to protect AWS accounts, workloads, and data.
AWS Shield Advanced helps protect workloads against Distributed Denial of Service (DDoS) events. AWS Shield Advanced includes AWS WAF and AWS Firewall Manager.

Name the AWS Security Services used to secure AI systems:

Study These Flashcards

Identify sensitive data before training models

Amazon Macie uses ML to automate sensitive data discovery at scale. You can use Amazon Macie to scan S3 buckets for personally identifiable information (PII), personal health information (PHI), financial information, and other sensitive data. You can determine whether you need to remove the data or whether it needs more security protections before training or fine-tuning models. You can also scan databases by extracting data to a data lake in Amazon S3 to then have Amazon Macie scan the database content.

Manage identities and access to AWS services and resources
With AWS Identity and Access Management (IAM), you can specify who or what can access services and resources in AWS. You can also centrally manage fine-grained permissions, and analyze access to refine permissions across AWS. Here are some IAM entities that you can create:

IAM users and IAM user groups: An IAM user is an entity that you create in AWS. The IAM user represents the human user or workload who uses the IAM user to interact with AWS. A user in AWS consists of a name and credentials. An IAM user group is a collection of IAM users. User groups let you specify permissions for multiple users, which can make it more convenient to manage the permissions for those users.
IAM roles: An IAM role is an IAM identity that you can create in your account that has specific permissions. An IAM role is similar to an IAM user, in that it is an AWS identity with permission policies that determine what the identity can and cannot do in AWS.
IAM policies: A policy is an entity that, when attached to an identity or resource, defines their permissions.

Limit access to your data, models, and outputs

Apply a policy of least privilege to training data, models, and applications using AWS IAM Identity Center and IAM Access Analyzer. Here are some other services you can use to limit access:

Explore further zero trust capabilities to add fine-grained access controls with AWS Verified Access and Amazon Verified Permissions.
Use AWS Verified Access to further eliminate the costs, complexity and performance issues related to virtual private networks (VPNs).
You can use Amazon SageMaker Role Manager to build and manage persona-based IAM roles for common ML needs.

Amazon SageMaker Role Manager provides three preconfigured role personas and predefined permissions for common ML activities. These role personas are as follows:

Data scientist persona
MLOps persona
SageMaker compute persona

Protect data from exfiltration (data theft) and manipulation

For strong controls over data ingress and egress from AI systems, you can define strict AWS Network Firewall and Amazon VPC policies. This will prevent the movement of data in and out of your VPCs and networks. Here are some more services you can use to control data entering and leaving your AI systems:

AWS Network Firewall supports deep packet inspection to decrypt, inspect, and re-encrypt inbound and outbound TLS traffic to protect data. Data destined for the internet, another VPC, or another subnet is supported.
Amazon Virtual Private Cloud (Amazon VPC) lets you launch AWS resources in a logically isolated virtual network that you’ve defined. This virtual network closely resembles a traditional network that you would operate in your own data center, with the benefits of using the scalable infrastructure of AWS.

You can use AWS PrivateLink to establish private connectivity from your Amazon VPC to Amazon Bedrock, without having to expose your VPC to internet traffic.

Protect AI workloads with intelligent threat detection

In addition to Amazon GuardDuty, Amazon Inspector and Amazon Detective can help with intelligent threat detection. These services help identify suspicious activity such as AWS credential exfiltration (theft) and suspicious user API usage, including Amazon Bedrock and Amazon SageMaker APIs. Following is a brief description of Amazon Inspector and Amazon Detective:

Amazon Inspector is an automated vulnerability management service that continually scans AWS workloads for software vulnerabilities and unintended network exposure.
Amazon Detective streamlines the investigative process and helps security teams conduct faster and more effective forensic investigations.

Automate incident response and compliance

By automating security tasks on AWS, you can be more secure by reducing human configuration errors. This gives your team more time to focus on other work critical to your business. Task automation makes it more convenient for your security team to work closely with developer and operations teams to create and deploy code faster and more securely. For example, by employing technologies like ML, you can automatically and continuously discover, classify, and protect sensitive data in AWS. You can also automate infrastructure and application security checks to continually enforce your security and compliance controls and help ensure confidentiality, integrity, and availability at all times.

You can automate incident response and compliance with AWS services that you learned about earlier, such as the following:

AWS Security Hub
AWS Config
AWS Audit Manager
AWS Artifact

Defend your generative AI web applications and data

In addition to AWS Shield Advanced and AWS Firewall Manager, which you learned about earlier, you can also use AWS WAF to protect your web applications and data.

AWS WAF helps you protect against common web exploits and bots that can affect availability, compromise security, or consume excessive resources. With AWS WAF you can do the following:

Filter web traffic.
Prevent account takeover fraud.
Use AWS WAF Bot Control to control pervasive bot traffic (such as scrapers, scanners, crawlers). Pervasive bot traffic can consume excess resources, skew metrics, cause downtime, or perform other undesired activities. For more information, see AWS WAF Bot Control(opens in a new tab).

What is data and model lineage?

Study These Flashcards

Data and model lineage refer to the detailed record of the origin, transformation, and evolution of data and models used in AI and generative AI systems.

What is source citation and data origins documentation?

Study These Flashcards

These tasks involve providing information about the sources of the data used to train the generative AI model and the provenance of the data.

Explain data lineage:

Study These Flashcards

Data lineage is a technique used to track the history of data, including its origin, transformation, and movement through different systems. In the context of generative AI, data lineage can be used to document the journey of the training data, from its initial sources to the final model.

Explain cataloging:

Study These Flashcards

Cataloging involves the systematic organization and documentation of the datasets, models, and other resources used in the development of a generative AI system. A well-maintained catalog can serve as a comprehensive repository of information about the components of the AI system. In addition, this information can include sources, licenses, and metadata associated with the training data.

Explain model cards:

Model cards are a standardized format for documenting the key details about an ML model, including its intended use, performance characteristics, and potential limitations. In the context of generative AI, model cards can be used to provide source citations and data origin documentation. This helps users understand the provenance (lineage) of the data used to train the model. Model cards can include details about the datasets used, their sources, licenses, and any known biases or quality issues in the training data.

What are the stages of the data engineering lifecycle?

1 - Automation and access control 2 - Data collection 3 - Data preparation and cleaning 4 - Data quality checks 5 - Data visualization and analysis 6 - IaC deployment 7 - Monitoring and debugging

What are the essential secure data engineering practices that ensure the safety and reliability of AI and generative AI systems?

Data Quality Metrics and Benchmarks #flashcard Define clear data quality metrics and benchmarks, including: - **Completeness**: Training data covers a diverse and comprehensive range of scenarios, without significant gaps or biases. - **Accuracy**: Input data used for training AI models is accurate, up to date, and representative of real-world scenarios. - **Timeliness**: Measures the age of data in a data store. - **Consistency**: Maintain coherence and logical consistency of the data throughout the AI development and deployment process. --- Data Validation and Monitoring #flashcard Implement practices to enhance data quality: - **Data Validation Checks**: Conduct checks at various stages of the data pipeline. - **Regular Data Profiling**: Monitor to identify data quality issues. - **Feedback Loop**: Address problems and continuously improve processes. - **Data Lineage and Metadata**: Understand the origin and transformation of data. --- Privacy-Enhancing Technologies #flashcard Best practices for protecting sensitive information: - **Data Masking & Obfuscation**: Use techniques like differential privacy to reduce breach risks. - **Encryption & Tokenization**: Protect data during processing and storage. --- Access Control Best Practices #flashcard Strategies for controlling access to your data: - **Data Governance Framework**: Establish clear policies for access, usage, and sharing. - **Role-Based Access Controls**: Restrict access to sensitive data with fine-grained permissions. - **Authentication & Authorization**: Use single sign-on, MFA, or IAM solutions. - **Monitoring & Logging**: Detect unauthorized access or anomalies. - **Regular Review of Access Rights**: Update based on the principle of least privilege. --- Ensuring Data Integrity #flashcard Practices to maintain data integrity in AI systems: - **Validation Checks**: Implement checks at various stages, including schema validation and business rule validations. - **Backup & Recovery Strategy**: Ensure data can be restored in case of errors or failures. - **Transaction Management**: Ensure consistency during processing and transformation. - **Data Lineage & Audit Trails**: Track origin, transformations, and changes made to the data. - **Monitoring Integrity Controls**: Regularly test controls for effectiveness and adjust as necessary. ---

Security, compliance, and governance for AI solutions Flashcards

This course helps you understand some common issues of around security, compliance, and governance associated with artificial intelligence (AI) solutions. (27 cards)