Misc. Items from wrong answers Flashcards by Ozzy Campos

Can you use AWS Rekognition for Semantic Segmentation?

No. It really only supports classification and assigning a label using object detection. Not robust enough.

How well did you know this?

Not at all

Perfectly

What SageMaker DataWrangler scaler is best used to deal with outliers?

Robust Scaler

How well did you know this?

Not at all

Perfectly

What is the three step order to create and share a feature group in Sagemaker Feature Store?

Create a feature Group

Associate the feature group to an online store.

Ingest the data into the offline store in streaming mode.

How well did you know this?

Not at all

Perfectly

What is the SageMaker Feature Store option Online store for?

It is used for low-latency and high-availability that provides real-time lookups of features. Good for inference.

How well did you know this?

Not at all

Perfectly

What is the SageMaker Feature Store option Offline store for?

Historical data when sub-second retrieval is not needed. Supports Batch jobs. Used for exploration, model training, and batch inference.

How well did you know this?

Not at all

Perfectly

Can SageMaker Feature Store have FeatureGroups that can support both online and offline modes?

Yes. They will actually sync.

How well did you know this?

Not at all

Perfectly

What are the two ways that SageMaker Feature Store ingests data?

Streaming and Batches.

How well did you know this?

Not at all

Perfectly

What process happens when you use SageMaker Feature Store Streaming Mode and you update a feature?

The records are pushed by calling a synchronous PutRecord API call.

How well did you know this?

Not at all

Perfectly

What does the Split Data function in SageMaker Data Wrangler do?

It splits your dataset into two or three datasets.

How well did you know this?

Not at all

Perfectly

What does the Randomized Split function in SageMaker Data Wrangler do?

Each split is a random non-overlapping sample of the original dataset

How well did you know this?

Not at all

Perfectly

What does the Ordered Split function in SageMaker Data Wrangler do?

Splits the dataset based on the sequential order of the observations

How well did you know this?

Not at all

Perfectly

What does the Stratified Split function in SageMaker Data Wrangler do?

Splits the dataset to make sure that the number of observations in the input column have proportional representation.

How well did you know this?

Not at all

Perfectly

Which split is good for time series data?

Ordered Split

How well did you know this?

Not at all

Perfectly

Which split is good for preventing data leakage?

Ordered Split

How well did you know this?

Not at all

Perfectly

Which split is good for reducing bias?

Ordered split

How well did you know this?

Not at all

Perfectly

What is the best correlation metric to investigate non-linear relationships between numeric features?

Spearman

How well did you know this?

Not at all

Perfectly

What is the chi-square correlation metric?

The chi square is a statistical test assessing association between categorical variables.

How well did you know this?

Not at all

Perfectly

What is the Phi correlation metric?

The phi coefficient assesses correlation between binary variables.

How well did you know this?

Not at all

Perfectly

What correlation metric should be used for linear relationships of numeric features?

Pearson

How well did you know this?

Not at all

Perfectly

Is Random Cut Forest a good algorithm for fraud detection?

Yes. RCF can identify outliers or unusual patterns within large datasets without the need for labeled data.

How well did you know this?

Not at all

Perfectly

Is F1 good to use for classification models?

Yes

How well did you know this?

Not at all

Perfectly

Is RSME good for classification models?

How well did you know this?

Not at all

Perfectly

What does accuracy measure?

Accuracy measures percent of correct classification cases.

How well did you know this?

Not at all

Perfectly

What does SageMaker Model Registry do?

SageMaker Model Registry is a tool to catalog models, manage different versions, and deploy models to production.

How well did you know this?

Not at all

Perfectly

Does SageMaker Clarify allow you to evaluate partial dependence plots (PDP)?

Yes.

Does SageMaker Model Registry have explainability analysis functionality?

How long can a SageMaker Batch transform job process for?

Up to multiple days.

How long can a SageMaker serverless endpoint process for?

Up to 60 seconds

How long can a SageMaker asynchronous endpoint process for?

Up to 1 hour

If you have a model that needs to be invoked once every 24 hours, what is the best endpoint/processing job to use?

Batch Transform

What are Bedrock Batch Inference Jobs?

Batch inference helps you process a large number of requests efficiently by sending a single request and generating the responses in an Amazon S3 bucket.

What are the steps to create a batch inference job in Bedrock?

Create a batch inference job in Bedrock Upload the inputs to S3 Specify the stored file as an input to a CreateModelInvocationJob request Specify the output location for the request as the target S3 bucket.

If your model training requires access to a proprietary python library, how can you grant this access?

Extend the prebuilt SageMaker scikit-learn framework container to include custom dependencies.

Does CodePipeline or CodeBuild allow you to run unit tests and produce an artifact that is ready for deployment?

CodeBuild

Does CodePipeline or CodeBuild allow you to Invoke an AWS Lambda function to inform email recipients when the code is ready for deployment.?

CodePipeline

If training is not time sensitive, what is the most cost-effective way to perform the training?

Use spot instances by enabling EnableManagedSpotTraining.

What does a SageMaker Inference Recommender Endpoint recommendation job do?

Inference Recommender automates load testing and tuning on multiple endpoint instance types and configurations. You can run an endpoint recommendation job to perform custom load tests by specifying instance types, traffic patterns, and production requirements for latency and throughput.

Accelerated computing EC2 instances refer to what kind of processor?

GPU

If training data is located in a differernt region from where SageMaker is being run, how would you access it?

A VPC Endpoint using PrivateLink.

Are Gateway VPC Endpoints cross-region compatible?

There is code being used in SageMaker studio that requires creds in Secretsmanager. How can this be accessed with least privileged access?

The SageMaker Studio domain that runs the notebook needs permissions to access various AWS services, including Secrets Manager. You can grant these permissions by attaching a policy to the execution role of the domain.

What hyper parameter can be adjusted for xgboost to prevent overfitting?

Max_depth

Can Fast File Mode be used with Lustre?

What can be used in SageMaker for auditing and compliance?

SageMaker Lineage Tracking

What can be used in SageMaker to identify overfitting, saturated activation functions, and vanishing gradients?

SageMaker Debugger

Can you balance data in SageMaker Data Wrangler?

Yes, Using the balance data operation to oversample from the minority class.

How do you address an oscillating pattern of the loss values during training?

Reduce learning size

How can you identify whether a model is overfitting, underfitting, or both?

By examining the loss curves over time. If validation loss is higher than training loss, it is overfitting.

Are embeddings tokens or vectors of numerical data?

vectors of numerical data

Can SageMaker Debugger react based on predefined thresholds for underutilization, overfitting, etc..

Yes

What is the maximum size input for a sagemaker serverless endpoint?

4mb

What is the maximum input size for a sagemaker real time endpoint?

6MB

What is the maximum input size for a sagemaker asynchronous endpoint?

1gb

How can you support custom training and inference code in SageMaker when using a pre-built model?

Use SageMaker Script Mode

Do SageMaker Training Jobs track model versions?

Can SageMaker pipeline steps be used in other pipelines?

Yes

What does hyperband hyperparameter tuning do?

It allocates more resources to promising configurations while stopping underperforming configurations early

Can Glue ETL jobs change JSON to Parquet?

Yes.

What is SageMaker Endpoint Data Capture?

It stores the incoming data and prediction which can be used for monitoring and comparison.

What does cross validation help solve?

It helps in addressing both bias and variance

What technique should be applied to ensure all features are treated equally by the model?

Min-Max scaling ensures that all numerical features are on the same scale, typically between 0 and 1, preventing features with larger ranges from dominating the model's learning process.

High variance results in the model ________________

Overfitting and the solution is early stopping

What does Grid Search Hyperparameter tuning do?

It enumerates every possible combination from a pre-defined “grid” of hyperparameter values.

What are Trackers in SageMaker Experiments ?

they automatically log metadata for each trial, including hyperparameters, datasets, and changes to the code.

What does Directed Acyclic Graph (DAG) in SageMaker Pipelines do?

The Directed Acyclic Graph (DAG) defines the sequence and dependencies between the steps of the pipeline, ensuring that steps are executed in a specific order.

Can AWS Glue save files as record IO protobuff?

No.