Machine Learning Engineering Associate 2 Flashcards

Data Transformation, Integrity and Feature Engineering

1
Q

Data Wrangler

A

Visual data preparation tool in Amazon SageMaker for exploring; transforming; and analyzing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Glue

A

Fully managed extract; transform; and load (ETL) service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Glue DataBrew

A

Visual data preparation tool that makes it easy to clean and normalize data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Kinesis

A

Platform for streaming data on AWS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Lambda

A

Serverless compute service for running code without provisioning servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SageMaker Ground Truth

A

Fully managed data labeling service for building accurate training datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Class imbalance

A

Situation where classes in a dataset are not represented equally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Server-side encryption

A

Data encryption performed by the storage service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Client-side encryption

A

Data encryption performed by the client before sending to storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data anonymization

A

Removing or encrypting personally identifiable information from datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Supervised learning

A

ML approach where the model is trained on labeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Unsupervised learning

A

ML approach where the model is trained on unlabeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Reinforcement learning

A

ML approach where an agent learns to make decisions by interacting with an environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Feature importance

A

Measure of how much each feature contributes to the model’s predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SHAP values

A

Shapley Additive exPlanations; a game theoretic approach to explain machine learning model outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

XGBoost

A

Gradient boosting algorithm known for speed and performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Epoch

A

One complete pass through the entire training dataset

18
Q

Early stopping

A

Technique to stop training when performance on a validation set stops improving

19
Q

Distributed training

A

Spreading the training process across multiple compute resources

20
Q

Hyperparameter tuning

A

Process of finding the best combination of hyperparameters for a model

21
Q

Transfer learning

A

Using knowledge gained from solving one problem to solve a related problem

22
Q

Dropout

A

Technique where randomly selected neurons are ignored during training

23
Q

Weight decay

A

Adding a penalty term to the loss function to prevent overfitting

24
Q

Random search

A

Randomly sampling hyperparameters from a defined search space

25
Bayesian optimization
Using probabilistic model to guide the search for optimal hyperparameters
26
Confusion matrix
Table showing correct and incorrect predictions for each class
27
F1 score
Harmonic mean of precision and recall
28
ROC
Graph showing the performance of a classification model at all classification thresholds
29
AUC
Measure of the ability of a classifier to distinguish between classes
30
Overfitting
Model performs well on training data but poorly on unseen data
31
Underfitting
Model performs poorly on both training and unseen data
32
Concept drift
Changes in the underlying relationships between input and output variables
33
Data drift
Changes in the statistical properties of the input data
34
A/B testing
Experiment where two variants of a model are compared to determine which performs better
35
CloudTrail
Service that records API calls and other account activity in AWS
36
Cost Explorer
Tool for visualizing; understanding; and managing AWS costs and usage over time
37
IAM roles
Set of permissions that define what actions are allowed or denied in AWS
38
Security groups
Virtual firewalls for controlling inbound and outbound traffic to AWS resources
39
Network ACLs
Optional layer of security that acts as a firewall for controlling traffic in and out of subnets
40
Least privilege access
Principle of giving users the minimum levels of access necessary to complete their tasks