Machine Learning Engineering Associate 2 Flashcards

Data Transformation, Integrity and Feature Engineering

1
Q

Data Wrangler

A

Visual data preparation tool in Amazon SageMaker for exploring; transforming; and analyzing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Glue

A

Fully managed extract; transform; and load (ETL) service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Glue DataBrew

A

Visual data preparation tool that makes it easy to clean and normalize data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Kinesis

A

Platform for streaming data on AWS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Lambda

A

Serverless compute service for running code without provisioning servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SageMaker Ground Truth

A

Fully managed data labeling service for building accurate training datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Class imbalance

A

Situation where classes in a dataset are not represented equally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Server-side encryption

A

Data encryption performed by the storage service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Client-side encryption

A

Data encryption performed by the client before sending to storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data anonymization

A

Removing or encrypting personally identifiable information from datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Supervised learning

A

ML approach where the model is trained on labeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Unsupervised learning

A

ML approach where the model is trained on unlabeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Reinforcement learning

A

ML approach where an agent learns to make decisions by interacting with an environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Feature importance

A

Measure of how much each feature contributes to the model’s predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SHAP values

A

Shapley Additive exPlanations; a game theoretic approach to explain machine learning model outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

XGBoost

A

Gradient boosting algorithm known for speed and performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Epoch

A

One complete pass through the entire training dataset

18
Q

Early stopping

A

Technique to stop training when performance on a validation set stops improving

19
Q

Distributed training

A

Spreading the training process across multiple compute resources

20
Q

Hyperparameter tuning

A

Process of finding the best combination of hyperparameters for a model

21
Q

Transfer learning

A

Using knowledge gained from solving one problem to solve a related problem

22
Q

Dropout

A

Technique where randomly selected neurons are ignored during training

23
Q

Weight decay

A

Adding a penalty term to the loss function to prevent overfitting

24
Q

Random search

A

Randomly sampling hyperparameters from a defined search space

25
Q

Bayesian optimization

A

Using probabilistic model to guide the search for optimal hyperparameters

26
Q

Confusion matrix

A

Table showing correct and incorrect predictions for each class

27
Q

F1 score

A

Harmonic mean of precision and recall

28
Q

ROC

A

Graph showing the performance of a classification model at all classification thresholds

29
Q

AUC

A

Measure of the ability of a classifier to distinguish between classes

30
Q

Overfitting

A

Model performs well on training data but poorly on unseen data

31
Q

Underfitting

A

Model performs poorly on both training and unseen data

32
Q

Concept drift

A

Changes in the underlying relationships between input and output variables

33
Q

Data drift

A

Changes in the statistical properties of the input data

34
Q

A/B testing

A

Experiment where two variants of a model are compared to determine which performs better

35
Q

CloudTrail

A

Service that records API calls and other account activity in AWS

36
Q

Cost Explorer

A

Tool for visualizing; understanding; and managing AWS costs and usage over time

37
Q

IAM roles

A

Set of permissions that define what actions are allowed or denied in AWS

38
Q

Security groups

A

Virtual firewalls for controlling inbound and outbound traffic to AWS resources

39
Q

Network ACLs

A

Optional layer of security that acts as a firewall for controlling traffic in and out of subnets

40
Q

Least privilege access

A

Principle of giving users the minimum levels of access necessary to complete their tasks