Monitoring ML solutions Flashcards
Why is privacy a critical consideration in AI, and how does it relate to Google’s AI principles?
Privacy is integral to ethical AI design because:
It adheres to legal and regulatory standards.
Aligns with social norms and individual expectations.
Safeguards sensitive information.
Privacy is a cornerstone of Google’s fifth AI principle: Incorporate privacy design principles, ensuring AI systems respect user data.
What are sensitive attributes, and how do they impact AI system design?
Sensitive attributes include personally identifiable information (PII) and other critical data, such as:
PII: Names, addresses, SSNs.
Social Data: Ethnicity, religion.
Health Data: Diagnoses, genetic information.
Financial Data: Credit card details, income.
Biometric Data: Fingerprints, facial recognition.
AI systems must handle sensitive data with heightened security and legal compliance, as misuse can result in privacy violations and user mistrust.
What are common de-identification techniques in AI, and their benefits and drawbacks?
Redaction: Deletes sensitive data; irreversible but may reduce model utility.
Replacement: Substitutes values; irreversible, can impact learning.
Masking: Hides parts of data; retains structure but not the original value.
Tokenization: Maps data to unique tokens; reversible, vulnerable to attacks.
Bucketing: Groups numeric data into ranges; reduces granularity.
Shifting: Randomizes timestamps; preserves sequence but is reversible.
Each technique balances privacy and utility based on context.
Explain k-anonymity and l-diversity. How do they enhance privacy?
k-Anonymity: Ensures each record is indistinguishable from at least k-1 others, reducing re-identification risks.
l-Diversity: Ensures that each anonymized group has l distinct sensitive values, addressing homogeneity in k-anonymized data.
These methods collectively enhance privacy while maintaining data utility.
How does differential privacy protect individual data during analysis?
Differential privacy ensures that the inclusion or exclusion of any individual’s data minimally affects the analysis outcome by:
Adding calibrated noise.
Preventing sensitive attribute identification.
Providing strong, mathematically proven privacy guarantees through parameters like epsilon (privacy strength).
What are the trade-offs involved in setting epsilon for differential privacy?
Lower Epsilon: Stronger privacy, but higher noise can degrade data utility.
Higher Epsilon: Less privacy, but better model accuracy.
Selecting epsilon involves balancing privacy with analytical and model performance.
What is DP-SGD, and how does it enhance model training security?
Differentially Private Stochastic Gradient Descent (DP-SGD) integrates differential privacy into SGD by:
Gradient Clipping: Limits the influence of individual samples.
Noise Addition: Protects data during updates. This method is easily implemented using libraries like TensorFlow Privacy.
Describe federated learning and its advantages for privacy.
Federated learning trains models locally on user devices, sharing only gradients with central servers:
Preserves data privacy by avoiding raw data transfer.
Supports personalization, e.g., Gboard predictions.
Updates central models without exposing sensitive user inputs.
What are key privacy challenges in federated learning?
Membership Inference Attacks:
Revealing if specific data points were used.
Sensitive Property Breaches: Exposing private attributes.
Model Poisoning: Malicious users manipulate training data to degrade models.
How does secure aggregation enhance privacy in federated learning?
Secure aggregation encrypts user gradients before sharing with central servers:
Ensures gradients are only decrypted after aggregation.
Protects individual data contributions.
How does Google Cloud prevent training data extraction attacks in generative AI?
Google Cloud:
Excludes customer data from training foundation models.
Encrypts data at rest and in transit.
Ensures generated content cannot reveal specific training data.
What are the risks of training data extraction attacks, and how do they occur?
Risks:
Revealing sensitive information (e.g., addresses).
Violating user privacy.
These occur through iterative prompt crafting to extract memorized training examples from generative models.
How does Google ensure privacy compliance in its AI/ML systems?
Privacy by Default: No customer data in foundation models.
Encryption: TLS in transit, Customer-Managed Encryption Keys (CMEK).
Access Control: IAM for minimal privilege.
How does the Cloud Data Loss Prevention API support sensitive data protection?
The API:
Detects PII in structured/unstructured data.
Applies de-identification techniques like masking and tokenization.
Monitors re-identification risks.
Why is encryption critical for AI systems, and how does Google implement it?
Encryption ensures data security:
Default Encryption: For data at rest and in transit.
Cloud KMS: Centralized management of cryptographic keys.
What rules does IAM enforce to ensure secure access control in Google Cloud?
IAM enforces:
Least-privilege access.
Fine-grained roles for resources.
Audit trails to monitor actions.
What is differential privacy’s role in federated learning?
It prevents gradient leaks by:
Adding noise to gradients before aggregation.
Ensuring individual updates cannot be inferred.
What are the security concerns specific to generative AI models?
Memorization of sensitive data.
Output leakage via prompts.
Vulnerability to adversarial prompts.
How does Google secure generative AI inference pipelines?
Encrypts inputs and outputs in transit.
Stores tuned weights securely.
Provides CMEK for customer-managed encryption.
Summarize the privacy principles applied in AI/ML by Google.
Data Minimization: Collect only necessary data.
Transparency: Document usage and policies.
Security: Encrypt, monitor, and audit all interactions.
What is the relationship between AI safety and Google’s AI principles?
AI safety is grounded in Google’s AI principles, specifically:
Principle 3: “Be built and tested for safety,” emphasizing robust testing to minimize risks.
Principle 2: Avoid creating or reinforcing unfair bias.
Principle 6: Ensure accountability to people, promoting transparency and oversight.
AI safety overlaps with fairness and accountability, ensuring ethical use.
What makes safety more challenging in generative AI compared to discriminative AI models?
Unknown Output Space: Generative AI can produce unexpected and creative outputs, making prediction difficult.
Diverse Training Data: Models trained on large datasets might generate outputs significantly different from the input data.
Adversarial Inputs: Generative AI is more prone to malicious prompt exploitation.
Unlike discriminative models (e.g., classifiers), generative models require extensive safeguards to manage risks.
What are the two primary approaches to AI safety?
Technical Approach: Implements engineering solutions, such as model safeguards, input-output filters, and adversarial testing.
Institutional Approach (AI Governance): Focuses on industry-wide policies, national regulations, and ethical guidelines to govern AI use.
Both approaches complement each other.
What are input and output safeguards in generative AI systems?
Input Safeguards: Block or rewrite harmful prompts before processing.
Output Safeguards: Detect and mitigate unsafe outputs using classifiers, error messages, or response ranking based on safety scores.
These safeguards ensure compliance with safety standards.