1 Chapter 3 Flashcards
What is the role of a data engineer in AI/ML products?
To power the data flow needed for product success and maintain the ETL pipeline
ETL stands for Extract, Transform, Load, a process for data integration.
What does ETL stand for?
Extract, Transform, Load
How often are ETL pipelines generally updated?
In batches and not in real time
What is a data pipeline that is updated continuously used for?
To provide real-time insights for dashboards used by internal business users
What is MLOps?
A practice that combines machine learning and operations to maintain AI systems
What does IaaS stand for?
Infrastructure as a Service
Why is strategizing and planning for AI adoption crucial?
To avoid technical debt and ensure sustainable implementation
What is model decay?
The decline in model performance over time due to changes in underlying data
What is one deployment strategy involving a new model alongside an existing one?
Shadow deployment
In A/B testing, what is the primary goal?
To compare the performance of two slightly different models
What is a gradual deployment strategy that tests new models on subsets of users called?
Canary deployment
What platform does Databricks offer for managing the ML life cycle?
MLflow
What is the purpose of Google’s AI Platform?
To deploy production-level ML pipelines
What is Uber’s ML management tool called?
Michelangelo
What does Meta’s ML platform aim to achieve?
Reusability of ML algorithms and easy access to past projects
What service does Amazon provide for building and deploying ML models?
Amazon SageMaker
What tools did Airbnb use to orchestrate their ML platform?
Zipline, Redspot, DeepThought
What is the promise of AI rooted in?
Quantifying prediction and optimization
What percentage of Amazon’s sales come from their recommendation engine?
35%
What is a smart strategy for implementing AI/ML projects?
Start small, apply to a clear business goal, and track effectiveness
What is essential for justifying investment in AI projects?
Communicating the strength and capabilities of AI
What do we learn through in the context of AI/ML projects?
Iteration
What is the importance of iteration in learning?
Iteration builds confidence through successful task completion.
How does GE utilize AI for customer benefit?
GE offers cost savings to its customers.
What role does Highmark play in preventing future bottlenecks?
Highmark predicts fraud.
How did Amazon benefit from machine learning?
Amazon grew its revenues through ML.
What is the significance of AI in the context of industrial revolution?
AI promises benefits to both companies and consumers.
What are the stages of the NPD cycle for AI/ML products?
Stages include discovery, define, design, implementation, marketing, training, and launch.
What is the focus during the discovery stage of NPD?
Identifying the market need and why AI should address it.
What is defined in the define stage of NPD?
Product requirements and screening ideas from the discovery stage.
What does the design stage of NPD involve?
Creating mockups and defining UI/UX elements.
What is the purpose of the implementation phase in NPD?
Materializing the planned product and achieving performance expectations.
What is a key consideration in marketing AI products?
Balancing communication about AI capabilities without overselling.
What is the focus of the training phase in NPD?
Training users and managing expectations regarding product performance.
What happens during the launch phase of NPD?
Officially releasing the product and assessing its performance against original metrics.
What is the Naive Bayes algorithm used for?
It’s used for classification problems by treating each feature as independent.
What does the Support Vector Machine (SVM) algorithm do?
It splits data into two classes to predict future data points.
What is linear regression used for?
Predicting future data points using one or more variables.
What does logistic regression predict?
A future binary categorical state.
What is the function of decision trees in ML?
They predict both categorical and numerical values using a flowchart-like structure.
How does the random forest algorithm work?
It creates multiple decision trees from random samples and averages the predictions.
What is K-Nearest Neighbors (KNN) used for?
Predicting future values based on the characteristics of neighboring data points.
What does clustering aim to achieve in ML?
Finding patterns or clusters in data without supervision.
What is the purpose of Principal Component Analysis (PCA)?
Reducing dimensions of large datasets while preserving information.
What do deep learning models mimic?
The way the human brain processes information through layers.
What is the goal of the implementation phase in the NPD process?
Achieving optimal performance based on the defined metrics.
What are neural networks primarily used for?
Neural networks are used to make up the models in AI/ML products.
What is the most important factor for AI/ML products?
Data accessibility.
What types of data might you initially start with for model training?
Third-party data or public data.
Why is partnering with customers important in AI/ML product development?
It helps build a product that can be successful with real-world data.
What is a potential risk of using pristine datasets for model training?
The model may perform poorly with real-world data it hasn’t seen before.
Why is having a variety of data crucial for model training?
To ensure good model performance and usability ethics.
What is iterative hyperparameter tuning?
It involves continuously retraining models for performance.
What informs ML engineers on how to tune hyperparameters?
Performance metrics and benchmarks from the define phase of the NPD process.
What are hyperparameters?
Settings that define how a model functions and optimizes performance.
What is an example of a hyperparameter in a decision tree model?
Maximum depth allowed for the decision tree.
What is the coefficient of determination also known as?
R-squared.
What was the R-squared value for the OLS regression model tested?
0.85.
What R-squared value did the random forest model achieve?
0.963.
What hyperparameter was used in the KNN model?
6 neighbors.
What score did the KNN model achieve?
0.994.
What phenomenon occurs when a model performs exceptionally well on training data but poorly on new data?
Overfitting.
What should you be suspicious of when a model gets very close to a perfect score?
That the model may not generalize well to new datasets.
What should AI/ML enthusiasts look for in model performance over time?
Incremental improvement in performance.
What is the next step after comprehensive data training and model adjustment?
Moving forward to deployment.
Fill in the blank: The process of ideating your product, choosing the right model, and gauging performance is ______.
[collaborative].