Data Engineering Practice Exam 1 Flashcards
Confusion matrix
A table that illustrates the number or percentage of correct and incorrect predictions for each class by comparing an observation’s predicted class and its true class
Precision
The proportion of true positive predictions among all positive predictions
Recall
The proportion of actual positives that are correctly identified
Accuracy
The percentage of correct predictions out of all predictions made by the model
Root mean squared error (RMSE)
A regression metric that measures the average magnitude of the errors between predicted and actual values
AUC-ROC curve
A tool to evaluate a model’s ability to distinguish between classes across various thresholds; particularly useful in the presence of class imbalance
Blue/green deployment
A strategy that deploys a new version of a model in parallel with the existing one; gradually shifting traffic to the new version while monitoring its performance
Canary release
A deployment strategy where a small percentage of traffic is redirected to a new model version initially
Amazon SageMaker Pipelines
A purpose-built workflow orchestration service to automate machine learning (ML) development
Amazon SageMaker Data Wrangler
A service that reduces the time it takes to aggregate and prepare tabular and image data for ML from weeks to minutes
Pipe input mode
A data streaming method where data is pre-fetched from Amazon S3 at high concurrency and throughput; and streamed into a named pipe
File input mode
A data input method that downloads the entire dataset to the training instance before starting the training job
FastFile mode
A data access method for scenarios where rapid access to data with low latency is needed; best suited for workloads with many small files
Amazon SageMaker Serverless Inference
A deployment option that automatically scales compute resources based on incoming requests; cost-effective for workloads with idle periods between traffic spikes
Amazon Bedrock
A fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies through a single API