Misc. Items from wrong answers Flashcards

1
Q

Can you use AWS Rekognition for Semantic Segmentation?

A

No. It really only supports classification and assigning a label using object detection. Not robust enough.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What SageMaker DataWrangler scaler is best used to deal with outliers?

A

Robust Scaler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the three step order to create and share a feature group in Sagemaker Feature Store?

A

Create a feature Group

Associate the feature group to an online store.

Ingest the data into the offline store in streaming mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the SageMaker Feature Store option Online store for?

A

It is used for low-latency and high-availability that provides real-time lookups of features. Good for inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the SageMaker Feature Store option Offline store for?

A

Historical data when sub-second retrieval is not needed. Supports Batch jobs. Used for exploration, model training, and batch inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can SageMaker Feature Store have FeatureGroups that can support both online and offline modes?

A

Yes. They will actually sync.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two ways that SageMaker Feature Store ingests data?

A

Streaming and Batches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What process happens when you use SageMaker Feature Store Streaming Mode and you update a feature?

A

The records are pushed by calling a synchronous PutRecord API call.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the Split Data function in SageMaker Data Wrangler do?

A

It splits your dataset into two or three datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the Randomized Split function in SageMaker Data Wrangler do?

A

Each split is a random non-overlapping sample of the original dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the Ordered Split function in SageMaker Data Wrangler do?

A

Splits the dataset based on the sequential order of the observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the Stratified Split function in SageMaker Data Wrangler do?

A

Splits the dataset to make sure that the number of observations in the input column have proportional representation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which split is good for time series data?

A

Ordered Split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which split is good for preventing data leakage?

A

Ordered Split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which split is good for reducing bias?

A

Ordered split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the best correlation metric to investigate non-linear relationships between numeric features?

A

Spearman

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the chi-square correlation metric?

A

The chi square is a statistical test assessing association between categorical variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the Phi correlation metric?

A

The phi coefficient assesses correlation between binary variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What correlation metric should be used for linear relationships of numeric features?

A

Pearson

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Is Random Cut Forest a good algorithm for fraud detection?

A

Yes. RCF can identify outliers or unusual patterns within large datasets without the need for labeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Is F1 good to use for classification models?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Is RSME good for classification models?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does accuracy measure?

A

Accuracy measures percent of correct classification cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does SageMaker Model Registry do?

A

SageMaker Model Registry is a tool to catalog models, manage different versions, and deploy models to production.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Does SageMaker Clarify allow you to evaluate partial dependence plots (PDP)?

A

Yes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Does SageMaker Model Registry have explainability analysis functionality?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How long can a SageMaker Batch transform job process for?

A

Up to multiple days.

28
Q

How long can a SageMaker serverless endpoint process for?

A

Up to 60 seconds

29
Q

How long can a SageMaker asynchronous endpoint process for?

A

Up to 1 hour

30
Q

If you have a model that needs to be invoked once every 24 hours, what is the best endpoint/processing job to use?

A

Batch Transform

31
Q

What are Bedrock Batch Inference Jobs?

A

Batch inference helps you process a large number of requests efficiently by sending a single request and generating the responses in an Amazon S3 bucket.

32
Q

What are the steps to create a batch inference job in Bedrock?

A

Create a batch inference job in Bedrock

Upload the inputs to S3

Specify the stored file as an input to a CreateModelInvocationJob request

Specify the output location for the request as the target S3 bucket.

33
Q

If your model training requires access to a proprietary python library, how can you grant this access?

A

Extend the prebuilt SageMaker scikit-learn framework container to include custom dependencies.

34
Q

Does CodePipeline or CodeBuild allow you to run unit tests and produce an artifact that is ready for deployment?

A

CodeBuild

35
Q

Does CodePipeline or CodeBuild allow you to Invoke an AWS Lambda function to inform email recipients when the code is ready for deployment.?

A

CodePipeline

36
Q

If training is not time sensitive, what is the most cost-effective way to perform the training?

A

Use spot instances by enabling EnableManagedSpotTraining.

37
Q

What does a SageMaker Inference Recommender Endpoint recommendation job do?

A

Inference Recommender automates load testing and tuning on multiple endpoint instance types and configurations. You can run an endpoint recommendation job to perform custom load tests by specifying instance types, traffic patterns, and production requirements for latency and throughput.

38
Q

Accelerated computing EC2 instances refer to what kind of processor?

A

GPU

39
Q

If training data is located in a differernt region from where SageMaker is being run, how would you access it?

A

A VPC Endpoint using PrivateLink.

40
Q

Are Gateway VPC Endpoints cross-region compatible?

A

No

41
Q

There is code being used in SageMaker studio that requires creds in Secretsmanager. How can this be accessed with least privileged access?

A

The SageMaker Studio domain that runs the notebook needs permissions to access various AWS services, including Secrets Manager. You can grant these permissions by attaching a policy to the execution role of the domain.

42
Q

What hyper parameter can be adjusted for xgboost to prevent overfitting?

A

Max_depth

43
Q

Can Fast File Mode be used with Lustre?

A

No

44
Q

What can be used in SageMaker for auditing and compliance?

A

SageMaker Lineage Tracking

45
Q

What can be used in SageMaker to identify overfitting, saturated activation functions, and vanishing gradients?

A

SageMaker Debugger

46
Q

Can you balance data in SageMaker Data Wrangler?

A

Yes, Using the balance data operation to oversample from the minority class.

47
Q

How do you address an oscillating pattern of the loss values during training?

A

Reduce learning size

48
Q

How can you identify whether a model is overfitting, underfitting, or both?

A

By examining the loss curves over time. If validation loss is higher than training loss, it is overfitting.

49
Q

Are embeddings tokens or vectors of numerical data?

A

vectors of numerical data

50
Q

Can SageMaker Debugger react based on predefined thresholds for underutilization, overfitting, etc..

A

Yes

51
Q

What is the maximum size input for a sagemaker serverless endpoint?

A

4mb

52
Q

What is the maximum input size for a sagemaker real time endpoint?

A

6MB

53
Q

What is the maximum input size for a sagemaker asynchronous endpoint?

A

1gb

54
Q

How can you support custom training and inference code in SageMaker when using a pre-built model?

A

Use SageMaker Script Mode

55
Q

Do SageMaker Training Jobs track model versions?

A

No

56
Q

Can SageMaker pipeline steps be used in other pipelines?

A

Yes

57
Q

What does hyperband hyperparameter tuning do?

A

It allocates more resources to promising configurations while stopping underperforming configurations early

58
Q

Can Glue ETL jobs change JSON to Parquet?

A

Yes.

59
Q

What is SageMaker Endpoint Data Capture?

A

It stores the incoming data and prediction which can be used for monitoring and comparison.

60
Q

What does cross validation help solve?

A

It helps in addressing both bias and variance

61
Q

What technique should be applied to ensure all features are treated equally by the model?

A

Min-Max scaling ensures that all numerical features are on the same scale, typically between 0 and 1, preventing features with larger ranges from dominating the model’s learning process.

62
Q

High variance results in the model ________________

A

Overfitting and the solution is early stopping

63
Q

What does Grid Search Hyperparameter tuning do?

A

It enumerates every possible combination from a pre-defined “grid” of hyperparameter values.

64
Q

What are Trackers in SageMaker Experiments ?

A

they automatically log metadata for each trial, including hyperparameters, datasets, and changes to the code.

65
Q

What does Directed Acyclic Graph (DAG) in SageMaker Pipelines do?

A

The Directed Acyclic Graph (DAG) defines the sequence and dependencies between the steps of the pipeline, ensuring that steps are executed in a specific order.

66
Q

Can AWS Glue save files as record IO protobuff?

A

No.

67
Q
A