Misc. Items from wrong answers Flashcards

1
Q

Can you use AWS Rekognition for Semantic Segmentation?

A

No. It really only supports classification and assigning a label using object detection. Not robust enough.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What SageMaker DataWrangler scaler is best used to deal with outliers?

A

Robust Scaler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the three step order to create and share a feature group in Sagemaker Feature Store?

A

Create a feature Group

Associate the feature group to an online store.

Ingest the data into the offline store in streaming mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the SageMaker Feature Store option Online store for?

A

It is used for low-latency and high-availability that provides real-time lookups of features. Good for inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the SageMaker Feature Store option Offline store for?

A

Historical data when sub-second retrieval is not needed. Supports Batch jobs. Used for exploration, model training, and batch inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can SageMaker Feature Store have FeatureGroups that can support both online and offline modes?

A

Yes. They will actually sync.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two ways that SageMaker Feature Store ingests data?

A

Streaming and Batches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What process happens when you use SageMaker Feature Store Streaming Mode and you update a feature?

A

The records are pushed by calling a synchronous PutRecord API call.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the Split Data function in SageMaker Data Wrangler do?

A

It splits your dataset into two or three datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the Randomized Split function in SageMaker Data Wrangler do?

A

Each split is a random non-overlapping sample of the original dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the Ordered Split function in SageMaker Data Wrangler do?

A

Splits the dataset based on the sequential order of the observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the Stratified Split function in SageMaker Data Wrangler do?

A

Splits the dataset to make sure that the number of observations in the input column have proportional representation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which split is good for time series data?

A

Ordered Split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which split is good for preventing data leakage?

A

Ordered Split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which split is good for reducing bias?

A

Ordered split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the best correlation metric to investigate non-linear relationships between numeric features?

A

Spearman

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the chi-square correlation metric?

A

The chi square is a statistical test assessing association between categorical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the Phi correlation metric?

A

The phi coefficient assesses correlation between binary variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What correlation metric should be used for linear relationships of numeric features?

A

Pearson

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Is Random Cut Forest a good algorithm for fraud detection?

A

Yes. RCF can identify outliers or unusual patterns within large datasets without the need for labeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Is F1 good to use for classification models?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Is RSME good for classification models?

A

No

23
Q

What does accuracy measure?

A

Accuracy measures percent of correct classification cases.

24
Q

What does SageMaker Model Registry do?

A

SageMaker Model Registry is a tool to catalog models, manage different versions, and deploy models to production.

25
Q

Does SageMaker Clarify allow you to evaluate partial dependence plots (PDP)?

A

Yes.

26
Q

Does SageMaker Model Registry have explainability analysis functionality?

A

No

27
Q

How long can a SageMaker Batch transform job process for?

A

Up to multiple days.

28
Q

How long can a SageMaker serverless endpoint process for?

A

Up to 60 seconds

29
Q

How long can a SageMaker asynchronous endpoint process for?

A

Up to 1 hour

30
Q

If you have a model that needs to be invoked once every 24 hours, what is the best endpoint/processing job to use?

A

Batch Transform

31
Q

What are Bedrock Batch Inference Jobs?

A

Batch inference helps you process a large number of requests efficiently by sending a single request and generating the responses in an Amazon S3 bucket.

32
Q

What are the steps to create a batch inference job in Bedrock?

A

Create a batch inference job in Bedrock

Upload the inputs to S3

Specify the stored file as an input to a CreateModelInvocationJob request

Specify the output location for the request as the target S3 bucket.

33
Q

If your model training requires access to a proprietary python library, how can you grant this access?

A

Extend the prebuilt SageMaker scikit-learn framework container to include custom dependencies.

34
Q

Does CodePipeline or CodeBuild allow you to run unit tests and produce an artifact that is ready for deployment?

A

CodeBuild

35
Q

Does CodePipeline or CodeBuild allow you to Invoke an AWS Lambda function to inform email recipients when the code is ready for deployment.?

A

CodePipeline

36
Q

If training is not time sensitive, what is the most cost-effective way to perform the training?

A

Use spot instances by enabling EnableManagedSpotTraining.

37
Q

What does a SageMaker Inference Recommender Endpoint recommendation job do?

A

Inference Recommender automates load testing and tuning on multiple endpoint instance types and configurations. You can run an endpoint recommendation job to perform custom load tests by specifying instance types, traffic patterns, and production requirements for latency and throughput.

38
Q

Accelerated computing EC2 instances refer to what kind of processor?

A

GPU

39
Q

If training data is located in a differernt region from where SageMaker is being run, how would you access it?

A

A VPC Endpoint using PrivateLink.

40
Q

Are Gateway VPC Endpoints cross-region compatible?

A

No

41
Q

There is code being used in SageMaker studio that requires creds in Secretsmanager. How can this be accessed with least privileged access?

A

The SageMaker Studio domain that runs the notebook needs permissions to access various AWS services, including Secrets Manager. You can grant these permissions by attaching a policy to the execution role of the domain.

42
Q

What hyper parameter can be adjusted for xgboost to prevent overfitting?

A

Max_depth

43
Q

Can Fast File Mode be used with Lustre?

A

No

44
Q

What can be used in SageMaker for auditing and compliance?

A

SageMaker Lineage Tracking

45
Q

What can be used in SageMaker to identify overfitting, saturated activation functions, and vanishing gradients?

A

SageMaker Debugger

46
Q

Can you balance data in SageMaker Data Wrangler?

A

Yes, Using the balance data operation to oversample from the minority class.

47
Q

How do you address an oscillating pattern of the loss values during training?

A

Reduce learning size

48
Q

How can you identify whether a model is overfitting, underfitting, or both?

A

By examining the loss curves over time. If validation loss is higher than training loss, it is overfitting.

49
Q

Are embeddings tokens or vectors of numerical data?

A

vectors of numerical data

50
Q

Can SageMaker Debugger react based on predefined thresholds for underutilization, overfitting, etc..

A

Yes

51
Q

What is the maximum size input for a sagemaker serverless endpoint?

A

4mb

52
Q

What is the maximum input size for a sagemaker real time endpoint?

A

6MB

53
Q

What is the maximum input size for a sagemaker asynchronous endpoint?

A

1gb