Misc. Items from wrong answers Flashcards
Can you use AWS Rekognition for Semantic Segmentation?
No. It really only supports classification and assigning a label using object detection. Not robust enough.
What SageMaker DataWrangler scaler is best used to deal with outliers?
Robust Scaler
What is the three step order to create and share a feature group in Sagemaker Feature Store?
Create a feature Group
Associate the feature group to an online store.
Ingest the data into the offline store in streaming mode.
What is the SageMaker Feature Store option Online store for?
It is used for low-latency and high-availability that provides real-time lookups of features. Good for inference.
What is the SageMaker Feature Store option Offline store for?
Historical data when sub-second retrieval is not needed. Supports Batch jobs. Used for exploration, model training, and batch inference.
Can SageMaker Feature Store have FeatureGroups that can support both online and offline modes?
Yes. They will actually sync.
What are the two ways that SageMaker Feature Store ingests data?
Streaming and Batches.
What process happens when you use SageMaker Feature Store Streaming Mode and you update a feature?
The records are pushed by calling a synchronous PutRecord API call.
What does the Split Data function in SageMaker Data Wrangler do?
It splits your dataset into two or three datasets.
What does the Randomized Split function in SageMaker Data Wrangler do?
Each split is a random non-overlapping sample of the original dataset
What does the Ordered Split function in SageMaker Data Wrangler do?
Splits the dataset based on the sequential order of the observations
What does the Stratified Split function in SageMaker Data Wrangler do?
Splits the dataset to make sure that the number of observations in the input column have proportional representation.
Which split is good for time series data?
Ordered Split
Which split is good for preventing data leakage?
Ordered Split
Which split is good for reducing bias?
Ordered split
What is the best correlation metric to investigate non-linear relationships between numeric features?
Spearman
What is the chi-square correlation metric?
The chi square is a statistical test assessing association between categorical variables.
What is the Phi correlation metric?
The phi coefficient assesses correlation between binary variables.
What correlation metric should be used for linear relationships of numeric features?
Pearson
Is Random Cut Forest a good algorithm for fraud detection?
Yes. RCF can identify outliers or unusual patterns within large datasets without the need for labeled data.
Is F1 good to use for classification models?
Yes
Is RSME good for classification models?
No
What does accuracy measure?
Accuracy measures percent of correct classification cases.
What does SageMaker Model Registry do?
SageMaker Model Registry is a tool to catalog models, manage different versions, and deploy models to production.
Does SageMaker Clarify allow you to evaluate partial dependence plots (PDP)?
Yes.
Does SageMaker Model Registry have explainability analysis functionality?
No
How long can a SageMaker Batch transform job process for?
Up to multiple days.
How long can a SageMaker serverless endpoint process for?
Up to 60 seconds
How long can a SageMaker asynchronous endpoint process for?
Up to 1 hour
If you have a model that needs to be invoked once every 24 hours, what is the best endpoint/processing job to use?
Batch Transform
What are Bedrock Batch Inference Jobs?
Batch inference helps you process a large number of requests efficiently by sending a single request and generating the responses in an Amazon S3 bucket.
What are the steps to create a batch inference job in Bedrock?
Create a batch inference job in Bedrock
Upload the inputs to S3
Specify the stored file as an input to a CreateModelInvocationJob request
Specify the output location for the request as the target S3 bucket.
If your model training requires access to a proprietary python library, how can you grant this access?
Extend the prebuilt SageMaker scikit-learn framework container to include custom dependencies.
Does CodePipeline or CodeBuild allow you to run unit tests and produce an artifact that is ready for deployment?
CodeBuild
Does CodePipeline or CodeBuild allow you to Invoke an AWS Lambda function to inform email recipients when the code is ready for deployment.?
CodePipeline
If training is not time sensitive, what is the most cost-effective way to perform the training?
Use spot instances by enabling EnableManagedSpotTraining.
What does a SageMaker Inference Recommender Endpoint recommendation job do?
Inference Recommender automates load testing and tuning on multiple endpoint instance types and configurations. You can run an endpoint recommendation job to perform custom load tests by specifying instance types, traffic patterns, and production requirements for latency and throughput.
Accelerated computing EC2 instances refer to what kind of processor?
GPU
If training data is located in a differernt region from where SageMaker is being run, how would you access it?
A VPC Endpoint using PrivateLink.
Are Gateway VPC Endpoints cross-region compatible?
No
There is code being used in SageMaker studio that requires creds in Secretsmanager. How can this be accessed with least privileged access?
The SageMaker Studio domain that runs the notebook needs permissions to access various AWS services, including Secrets Manager. You can grant these permissions by attaching a policy to the execution role of the domain.
What hyper parameter can be adjusted for xgboost to prevent overfitting?
Max_depth
Can Fast File Mode be used with Lustre?
No
What can be used in SageMaker for auditing and compliance?
SageMaker Lineage Tracking
What can be used in SageMaker to identify overfitting, saturated activation functions, and vanishing gradients?
SageMaker Debugger
Can you balance data in SageMaker Data Wrangler?
Yes, Using the balance data operation to oversample from the minority class.
How do you address an oscillating pattern of the loss values during training?
Reduce learning size
How can you identify whether a model is overfitting, underfitting, or both?
By examining the loss curves over time. If validation loss is higher than training loss, it is overfitting.
Are embeddings tokens or vectors of numerical data?
vectors of numerical data
Can SageMaker Debugger react based on predefined thresholds for underutilization, overfitting, etc..
Yes
What is the maximum size input for a sagemaker serverless endpoint?
4mb
What is the maximum input size for a sagemaker real time endpoint?
6MB
What is the maximum input size for a sagemaker asynchronous endpoint?
1gb
How can you support custom training and inference code in SageMaker when using a pre-built model?
Use SageMaker Script Mode
Do SageMaker Training Jobs track model versions?
No
Can SageMaker pipeline steps be used in other pipelines?
Yes
What does hyperband hyperparameter tuning do?
It allocates more resources to promising configurations while stopping underperforming configurations early
Can Glue ETL jobs change JSON to Parquet?
Yes.
What is SageMaker Endpoint Data Capture?
It stores the incoming data and prediction which can be used for monitoring and comparison.
What does cross validation help solve?
It helps in addressing both bias and variance
What technique should be applied to ensure all features are treated equally by the model?
Min-Max scaling ensures that all numerical features are on the same scale, typically between 0 and 1, preventing features with larger ranges from dominating the model’s learning process.
High variance results in the model ________________
Overfitting and the solution is early stopping
What does Grid Search Hyperparameter tuning do?
It enumerates every possible combination from a pre-defined “grid” of hyperparameter values.
What are Trackers in SageMaker Experiments ?
they automatically log metadata for each trial, including hyperparameters, datasets, and changes to the code.
What does Directed Acyclic Graph (DAG) in SageMaker Pipelines do?
The Directed Acyclic Graph (DAG) defines the sequence and dependencies between the steps of the pipeline, ensuring that steps are executed in a specific order.
Can AWS Glue save files as record IO protobuff?
No.