Architecting low-code ML solutions Flashcards
What are the three main stages of the ML workflow in Vertex AI?
Data Preparation, Model Development, and Model Serving.
What are the two types of data commonly dealt with in data preparation?
Structured data (easily stored in tables, e.g., numbers and text) and Unstructured data (cannot be easily stored in tables, e.g., images and videos).
What is the purpose of feature engineering in data preparation?
To process and transform data into useful features before model training.
What is Vertex AI Feature Store used for?
It’s a centralized repository to manage, serve, and share features, ensuring consistency across training and serving.
What are the main benefits of using Vertex AI Feature Store?
Features are shareable, reusable, scalable, and easy to use, thanks to its centralized and scalable setup.
Describe the process of model development in ML.
Model development involves training the model on data, evaluating the performance, and iterating as necessary to improve accuracy.
What is a confusion matrix, and what does it measure?
A confusion matrix is a table used to measure classification model performance by comparing predicted vs. actual values.
Define “precision” in the context of a classification model.
Precision is the ratio of true positives to the sum of true positives and false positives, measuring the accuracy of positive predictions.
Define “recall” in the context of a classification model.
Recall is the ratio of true positives to the sum of true positives and false negatives, measuring how well the model identifies all actual positives.
What is the trade-off between precision and recall?
Optimizing for precision reduces false positives, while optimizing for recall reduces false negatives, often requiring a balance based on the use case.
What is the purpose of the “model serving” stage?
To deploy the model for use in making real-time or batch predictions.
What is MLOps?
MLOps is the practice of applying DevOps principles to machine learning, enabling automation and monitoring of ML systems for continuous integration, training, and deployment.
What are the two ways to build an end-to-end ML workflow in Vertex AI?
Codeless with AutoML in the Google Cloud Console or programmatically with Vertex AI Pipelines.
Describe the role of Vertex AI Pipelines.
It automates, monitors, and manages ML workflows using pre-built SDKs, supporting both codeless and coded approaches.
What are activation functions, and why are they used?
Activation functions introduce non-linearity, allowing neural networks to solve complex problems beyond simple linear relationships.
What is the ReLU activation function?
ReLU (Rectified Linear Unit) turns negative inputs into zero and keeps positive inputs unchanged, commonly used in hidden layers.
How is the softmax activation function different from sigmoid?
Softmax generates probabilities for multi-class classification, while sigmoid outputs a probability for binary classification.
What is a loss function in neural networks?
A loss function measures the error between the predicted and actual outputs for a single instance, guiding learning adjustments.
What is the role of gradient descent in neural networks?
Gradient descent is an optimization method used to adjust weights by finding the minimum value of the cost function.
Define “epoch” in the context of neural network training.
An epoch is one complete pass through the training data, from calculating predictions to adjusting weights.
What is AutoML in Vertex AI?
AutoML is a no-code solution in Vertex AI that automates model training, tuning, and selection for users with minimal coding needs.
What is a neural network’s “cost function”?
A cost function calculates the total error over the entire training set, used to optimize and adjust the model’s parameters.
What is backpropagation in neural networks?
Backpropagation is the process of adjusting weights based on errors calculated by the cost function to improve model accuracy.
Explain the difference between structured and unstructured data, and provide an example of a use case in Vertex AI where both data types are required. How would you handle this in the data preparation stage?
Structured data is highly organized and easily searchable in tabular formats (e.g., rows and columns in databases). Examples include spreadsheets, SQL databases, or CSV files. Unstructured data lacks a predefined format, making it more challenging to process; examples include images, audio files, and text. An example use case where both data types are required is a customer support chatbot that uses structured data (e.g., customer profiles, purchase history) and unstructured data (e.g., text messages). In the data preparation stage, structured data might go through normalization and transformation, while unstructured data could require feature extraction, such as converting text into embeddings or extracting key features from images using a pre-trained model.