6. Buidling Secure ML Pipelines Flashcards by KK Cheng

What are the three types of encryption used in GCP?

Encryption at rest (Cloud Storage and BigQuery tables)
Encryption in transit (Transport Layer Security)
Encryption in use

How well did you know this?

Not at all

Perfectly

What encryption is used to encrypt individual table values in BigQuery?

Authenticated Encryption with Associated Data encryption

How well did you know this?

Not at all

Perfectly

What are the differences between server-side and client-side encryption for encryption at rest?

Server-side encryption:
Encryption that occurs after the cloud storage receives your data, but before the data is written to disk and stored. The keys are managed by Key Management Service.

Client-side encryption:
Encryption that occurs before data is sent to Cloud Storage and BigQuery but it will still be encrypted on the server-side. Client-side keys are managed by the user.

How well did you know this?

Not at all

Perfectly

How does encryption in use work?

Confidential Computing protects your data in memory from compromise by encrypting it while it is being processed. You can encrypt your data in use with Confidential VMs and Confidential GKE Nodes.

How well did you know this?

Not at all

Perfectly

What are the two levels of roles in Identity and Access Management in GCP?

Project‐level roles: Assign roles to a principal (user, group, or service account).
Resource‐level roles: Grant access to a specific resource (individual users). The resource must support resource‐level policies.

How well did you know this?

Not at all

Perfectly

What are the three types of IAM roles available in Vertex AI?

Predefined roles: Administrator and User
Basic roles: Owner, Editor, and Viewer at the project level.
Custom roles: allow you to choose a specific set of permissions, create your own role with those permissions

How well did you know this?

Not at all

Perfectly

What are two types of Vertex AI notebooks with Vertex AI Workbench?

User‐managed notebook instances are highly customizable
Managed notebook is less customizable.
Advantages are its integration with Cloud Storage and BigQuery in JupyterLab and automatic shutdown

How well did you know this?

Not at all

Perfectly

What are two ways to set up user access modes (permission) for both user‐managed and managed notebooks?

Single User Only access mode grants access only to the user that you specify.
Service Account access mode grants access to a service account. You can grant access to one or more users through this service account.

How well did you know this?

Not at all

Perfectly

How can you run Vertex AI APIs in Google Colab?

Create a service account key with access to the Vertex AI administrator and Cloud Storage owner permission.
Then you can provide the location of the JSON key file to the GOOGLE_APPLICATION_CREDENTIALS environment variable to authenticate your Google Colab project

How well did you know this?

Not at all

Perfectly

What are Google Cloud shared responsibility and shared fate models?

Shared responsibility model: The cloud provider must monitor and respond to security threats related to the cloud itself and its underlying infrastructure. Meanwhile, end users are responsible for protecting data and other assets they store in any cloud environment.
Shared fate model: Focuses on how all parties can better interact to continuously improve security.
1. Security recommendations enabled by default
2. Risk protection program
3. Assured workloads and governance

How well did you know this?

Not at all

Perfectly

What is the best practice to secure your workbench?

Use a private IP address
Connect your instance to a VPC network in the same project
Shared VPC network. You can use VPC Service Controls to allow or deny access to specific services

How well did you know this?

Not at all

Perfectly

What are the artifacts protected when you use VPC Service Controls?

Training data for an AutoML model or custom model
Models that you created
Requests for online predictions
Results from a batch prediction request

How well did you know this?

Not at all

Perfectly

How to secure Vertex AI endpoints?

Public endpoint is publicly accessible to the Internet.
For private endpoints, use private connection to talk to your endpoint without your data ever traversing the public Internet,

How well did you know this?

Not at all

Perfectly

How do you secure your Vertex AI training jobs?

Using private IP addresses to connect to your training jobs provides more network security and lower network latency than using public IP addresses.

How well did you know this?

Not at all

Perfectly

What is Federated Learning?

Federated learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on the device.
The updated model is then sent for consolidation.
Lower latency, and less power consumption, all while ensuring privacy.

How well did you know this?

Not at all

Perfectly

What is differential privacy?

Differential privacy (DP) is a system for publicly sharing information about a dataset while withholding information about each individual in the dataset, i.e., adding noise to the dataset.

What is Format-Preserving Encryption?

It is an encryption algorithm that preserves the format of information while it is being encrypted, e.g., payment card verification.

What is tokenization in security?

Tokenization refers to a process by which a piece of sensitive data, such as a credit card number, is replaced by a surrogate value known as a token.

What is personally identifiable information?

PII is a type of data that allows for an individual to be identified, such as name, address, Social Security number (SSN), date of birth, financial information, passport number, telephone numbers, and email addresses.

What does Data Loss Prevention API do?

Data Loss Prevention (DLP) API can de‐identify sensitive data in text content, including text stored in container structures such as tables, e.g., masking, encrypting, replacing or bucketing

What is data profile?

The data profiler lets you protect data across your organization by identifying where sensitive and high‐risk data reside.

What is Risk Analysis?

Use risk analysis methods before de‐identification to help determine an effective de‐identification strategy or after de‐identification to monitor for any changes or outliers.

What is inspection (jobs or triggers)?

A job is an action that Cloud Data Loss Prevention runs to either scan content for sensitive data or calculate the risk of re‐identification. You can trigger a DLP scan job by using Cloud Functions every time a file is uploaded to Cloud Storage.

What does an architecture for de-identification of PII consist of?

Data de‐identification streaming pipeline: De‐identifies sensitive data in text using Dataflow.
Configuration (DLP template and key) management
Data validation and re‐identification pipeline: You can have batch and streaming data de‐identified and stored

What is Patient Healthcare Information?

PHI that is linked based on the list of 18 identifiers such as name, medical record number, Social Security number, IP address, and so on must be treated with special care.

What does Healthcare API do?

The Google Cloud Healthcare API removes PHI from healthcare data. The healthcare API's de‐identification is highly configurable and redacts PHI from text and images

What are different types of strategies for handling sensitive data?

Specific columns in a dataset: Create a view not having the columns. Principal Component Analysis and Coarsening are also ways to de-sensitize data. Sensitive data in unstructured content (known patterns): DLP Sensitive data in images, videos, audio or free-form (unstructured): Use NLP API, Speech API, Vision API and Video Intelligence API.

What are the techniques to handle sensitive fields in data?

IP addresses: Zero out the last octet of IPv4. Numeric quantities: Bucketize Zip codes: Coarsen to include just the first three digits. Location: Use city, state, or zip code