Machine Learning Engineering Associate 2 Flashcards
Data Transformation, Integrity and Feature Engineering
Data Wrangler
Visual data preparation tool in Amazon SageMaker for exploring; transforming; and analyzing data
Glue
Fully managed extract; transform; and load (ETL) service
Glue DataBrew
Visual data preparation tool that makes it easy to clean and normalize data
Kinesis
Platform for streaming data on AWS
Lambda
Serverless compute service for running code without provisioning servers
SageMaker Ground Truth
Fully managed data labeling service for building accurate training datasets
Class imbalance
Situation where classes in a dataset are not represented equally
Server-side encryption
Data encryption performed by the storage service
Client-side encryption
Data encryption performed by the client before sending to storage
Data anonymization
Removing or encrypting personally identifiable information from datasets
Supervised learning
ML approach where the model is trained on labeled data
Unsupervised learning
ML approach where the model is trained on unlabeled data
Reinforcement learning
ML approach where an agent learns to make decisions by interacting with an environment
Feature importance
Measure of how much each feature contributes to the model’s predictions
SHAP values
Shapley Additive exPlanations; a game theoretic approach to explain machine learning model outputs
XGBoost
Gradient boosting algorithm known for speed and performance