AWS ML Associate Flashcards

1
Q

Performance metric: Measure the imbalance of positive outcomes between different facet values.

A

Difference in proportions of labels (DPL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Performance metric: Identify the difference in the predicted outcome as an input feature changes.

A

Partial dependence plots (PDPs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Performance metric: Quantify the contribution of each feature in a prediction.

A

Shapley values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What should you use for data processing if it involves Tensorflow or Pytorch?

A

SageMaker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the simplest way to prevent internet and data access to inference containers?

A

Sagemaker network isolation mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Create a baseline to monitor a Sagemaker model’s bias drift. For instance, you want it to weigh personal income over credit history for loan approval. How do you do this?

A

Create a SHAP baseline using the ‘ModelExplainabilityMonitor’ class. Generate a feature attribution baseline which will trigger when the observed feature attribution occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

tool used to check for bias and explainability in datasets and models

A

SageMaker Clarify

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

used to visualize and analyze intermediate tensors. Identify specific poor classifications in a CNN and make adjustments to improve model performance.

A

SageMaker with TensorBoard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you strip PII from text-based user interactions

A

Amazon Comprehend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

RNN training: Exploding gradients causing a convergence issue. What feature can help address this issue?

A

Sagemaker Training Compiler. Optomises DL models to accelerate training by more efficiently using ML GPU instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What instance types are supported by AWS Neuron SDKs for real-time inference on streaming video?

A

Inferentia instances (Inf2 family)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are used to centralize and standardize model documentation.

A

SageMaker Model Cards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SageMaker Serverless Inference: What is the biggest consideration when deciding whether to use provisioned concurrency?

A

low-latency (avoiding cold-starts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

(CloudWatch) What feature in the Logs Insights page is helpful in finding infrastructure monitoring through-lines in your query results?

A

The Patterns tab

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the primary purpose of Capacity Blocks for machine learning (ML)?

A

Reserve GPU instances for short-duration machine learning workloads on a future date.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When using an embedded question to query a vector database for RAG, what should be returned?

A

The full text - not embeddings - of the nearest neighbor documents to enhance the query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can you use SageMaker Model Monitor to re-train your model?

A

Enable Data Capture, and use that data to retrain the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Exploratory data visualization that can be used to identify hidden patterns, (ralationship analysis) such as an increase in specific item purchases or periods of frequent transactions

A

Heat Map

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Exploratory data visualization that helps with distribution analysis by binning ranges of data

A

Histogram

20
Q

Which storage service should be used when you need concurrent access from multiple Amazon EC2 instances to a Windows File Server for distributed training of the model.

A

FSx

21
Q

Best way to improve Kinesis ingestion performance

A

Batching

22
Q

What type of variables should be on the x-axis of a bar chart?

A

Categorical variables

23
Q

Data format for ingesting streamed, unstructured data

A

JSON lines

24
Q

When to choose Lustre over EFS

A

Only when super high performace, high volumes and extremely low latency is the requirement. EFS is go-to for distributed training and S3 is go-to for general storage

25
Q

Which text feature engineering technique would categorize customer feedback as positive or negative?

A

N-gram

26
Q

How can you reduce the dimensionality of the data while retaining most of the variation?

A

Principle Component Analysis

27
Q

How can you save time by storing curated features that can be accessed to train new models.

A

SageMaker Feature Store

28
Q

When to use EMR over Glue?

A

real-time shit. Glue is only for batch ETL

29
Q

What ETL solution allows you to do anomaly detection on real-time data?

A

Apache Spark on EMR

30
Q

How do you address class imbalances in text-based datasets?

A

text-based data augmentation, like synonym replacement, or text paraphrasing

31
Q

How can you use AWS Glue Data Quality to assess data before training an ML model?

A

data validation rules

32
Q

What does a Class Imbalance metric of .9 mean?

A

It means that the advantaged group is overrepresented in the data, and they are HIGHLY advantaged.

33
Q

Which data formats do most Amazon SageMaker algorithms support for training?

A

CSV and RecordIO-protobuf

34
Q

What is used to compare the distribution of labels in your data to the expected proportions?

A

Difference in Proportion of Labels (DPL)

35
Q

Which built-in algorithm is used for text classification and Word2Vec?

A

BlazingText

36
Q

Which built-in algorithm is a great choice for a supervised text translation model?

A

Sequence-to-Sequence algorithm

37
Q

Which deep learning frameworks are supported in Amazon SageMaker reinforcement learning (RL)? (2)

A

TensorFlow and Apache MXNet

38
Q

Difference between Lex v2 and Kendra?

A

Lex is a chat bot, Kendra is natural language search.

39
Q

What approach is referred as script mode when using Amazon SageMaker?

A

Using pre-set framework and dependencies, but providing your own custom training algorithms.

40
Q

Training: What is the most appropriate data ingestion mode for a large data set of historical data?

A

Pipe mode.

File mode involves ingesting the whole set at once and will not be as performative as streaming with pipe mode. Fast File mode is used for sequential data

41
Q

The model has billions of parameters, and training it on a single GPU would be infeasible due to memory constraints. How fix?

A

Model parallelism

42
Q

Main Use-Cases for Trainium instances?

A

large language models and natural language processing (NLP) training

43
Q

What is it called when you use multiple models of different types and aggragate their predictions in a heterogeneous model group?

A

Stacking

44
Q

Which hyperparameter tuning method is best for finding optimum hyperparameter values with limited compute resources?

A

Hyperband

45
Q

CNN is failing to generalize well, although it performs well on training data. What method can be used to help it adapt to unforeseen patterns?

A

dropout