Domain 1 Flashcards

1
Q

Explain the AI relationship ven diagram

A

Artificial Intelligence, Machine Learning, Deep Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Predictions that AI makes based on historical data

A

Inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When AI recognizes a change in what has happened in the past

A

Anomaly detection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some AWS services that could provide structured input data for training ML models?

A

RDS, Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some AWS services that could provide semi-structured input data for training ML models?

A

DynamoDB, MongoDB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For semi-structured, structured data, unstructured data, and time-series, where should you export data for training models?

A

S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In machine learning, what describes the relationship between inputs and outputs?

A

An algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the machine learning training process

A

Known data -> features -> algorithm -> output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the machine learning inference process, which comes after training

A

new data -> features -> model -> output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two artifacts produced that create a model?

A

Inference code + model artifacts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What type of inferencing provides low-latency, high throughput, and a persistent endpoint (also usually more expensive)?

A

Real-time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What type of inferencing is performed offline, uses large datasets, and either happens on an infrequent schedule?

A

Batch transform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Training your model with data that is pre-labeled (pictures with fish/not fish)

A

Supervised Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the challenge with supervised learning?

A

You need a lot of data, people to label…takes time and money

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Amazon Ground Truth?

A

A service that helps you provided labeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What process uses data that has features but is not labeled and is good for pattern recognition, anomaly detection, and grouping data into categories?

A

Unsupervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What process uses both supervised and unsupervised learning, provides rewards to an agent when criteria are ment, uses trial and error, and allows the agent to make mistakes to learn, and has and end goal?

A

Reinforcement learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What sub service of Ground Truth uses crowdsourcing to label
via affordable labor

A

AWS Mechanical Turk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A model telling you a fish is not a fish because it is out of water, a result of training being to specific and not having enough varied examples, is called what?

A

Overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is called when a model cannot determine a meaningful relationship between the input and output data, happens when you haven’t trained the model long enough or with a large enough set?

A

Underfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is bias?

A

When a model discriminates against a specific group because of a lack of fair representation in the data used to train the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Also, if a model is showing bias, what can be done with features?

A

the weight of features that are introducing noise can be directly adjusted by the data scientists. For example, it could completely remove gender consideration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Items such as age and sex discrimination, should be identified at the beginning before creating a model.

A

Fairness constraints

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A type of machine learning that uses algorithmic structures called neural networks.

A

Deep learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

The three layers of deep neural networks

A

input layer, several hidden layers, and an output layer of nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Deep learning can excel at tasks like

A

image classification and natural language processing where there is a need to identify the complex relationship between data objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

A big advantage of deep learning models for computer vision is that

A

they don’t need the relevant features given to them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Traditional machine learning algorithms will generally perform well and be efficient when

A

It comes to identifying patterns from structured data and labeled data. Examples include classification and recommendation systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

On the other hand, deep learning solutions are more suitable for

A

unstructured data like images, videos, and text. Tasks for deep learning include image classification and natural language processing, where the is a need to identify the complex relationships between pixels and words.
but only deep learning uses neural networks to simulate human intelligence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Gen AI use transformer neural networks, which change an input sequence, in Gen AI known as prompt, into an output sequence, which is the response to your prompt. Neural networks process the elements of a sequence sequentially one word at a time. Transformers process the sequence in parallel, which speeds up the training and allows much bigger datasets to be used. They outperform other ML approaches to natural language processing. They excel at understanding human language so they can read long articles and summarize them. They are also great at generating text that’s similar to the way a human would. As a result, they are good at language translation and even writing original stories, letters, articles, and poetry. They even know computer programming languages and can write code for software developers.

A

Gen AI Notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Consider these use cases for AI/ML

A

Increasing business efficiency
Solving complex problems
Making better decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Consider AI/ML alternatives when

A

Costs outweigh benefits
Models cannot meet interpretability requirements
(can’t know how a neural network made a decision, so instead use a rules based system)
Systems must be deterministic (produces same output with the same input) rather than probabilistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

If your dataset consists of features or attributes as inputs with labeled target values as outputs, then you have a supervised learning problem. In this type of problem, you train your model with data containing known inputs and outputs.

A

supervised learning problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

If your target values are categorical, for example, one or more discrete values, then you have a

A

classification problem. (supervision)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

If these target values you’re trying to predict are mathematically continuous, then you have a

A

regression problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Binary classification classification

A

assigns an input to one of several classes based on the input attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Multiclass classification

A

assigns an input to one of several classes based on the input attributes. An example is the prediction of the topic most relevant to a tax documen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

When your target values are mathematically continuous, then you have a

A

egression problem. Regression estimates the value of dependent target variable based on one or more other variables,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

multiple independent variables,

A

If we have such as weight and age, then we have a multiple linear regression problem. A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

s. Cluster analysis is

A

a class of techniques that are used to classify data objects into groups, called clusters. It attempts to find discrete groupings within data. Members of a group are similar as possible to one another, and as different as possible from members of other gro

41
Q

you define the features or attributes that you want the algorithm to use to determine similarity. Then you select a distance function to measure similarity and specify the number of clusters, or groups, you want for the analysis.

A

Clustered analysis

42
Q

Is the identification of rare items, events, or observations in the data, which raise suspicions, because they differ significantly from the rest of the data

A

Anomaly detection

43
Q

This service provides facial recognition, object detection, text detection, and content moderation

A

Amazon Rekognition

44
Q

Extracts text, handwriting, etc from scanned documents

A

Amazon Textract

45
Q

Extracts key phrases, entities, and sentiment

A

Amazon Comprehend

46
Q

This service is pretrained to find PII

A

Amazon Comprehend

47
Q

Converts Text to Speech

A

Polly

48
Q

Converts Speech (Live and recorded) to Text

A

Transcribe

49
Q

This AWS services has Intelligent document search , responds to questions with appropriate context

A

Amazon Kendra

50
Q

Personalized product recommendations

A

Amazon Personalize

51
Q

Translates between 75 languages, built on a neural network

A

Amazon Translate

52
Q

Provided with historical time series data, this AWS service predicts future points in time series

A

Amazon Forecast

53
Q

Detects fraud through checking online transactions, product reviews, checkout and payments, new accounts, and account takeover

A

Amazon Fraud Detector

54
Q

What is the first step in the AI/ML process?

A

Identify the business goal

55
Q

When identifying the business goal, what two things should youdo

A

define success criteria
align stakeholders

56
Q

Second step in the ai/ml process

A

Frame the ML problem

57
Q

When framing the ML problem, what four things should you do

A

Define the ML task, including inputs, outputs and metrics
Determine feasibility
Start with the simplest model options
Do a cost benefit analysis

58
Q

When approach model selection, what should yo udo?

A

Start with the simplest, things AI/ML hosted services and pre-trained models. Fully customize only if needed.

59
Q

To collect training data, you need to know these three things

A

Data sources
Data ingestion, including ETL
Labels

60
Q

ETL includes

A

Gathering transforming and storing data in a new central location

61
Q

What is likely one of the most time intensive parts of processing data?

A

Labeling, as you likely don’t already have the data labeled and need to do that

62
Q

When pre-processing data, what types of things are you doing?

A

Looking for missing data, masking PII data, cleaning it, and splitting it.

63
Q

What are the recommended splits for data?

A

80% for training the model
10% for model eval
10% for final testing before prod deploy

64
Q

Feature engineering

A

which characteristics of the dataset should be used as features to train the model. This is the subset that is relevant and contributes to minimizing the error rate of a trained model. You should reduce the features in your training data to only those that are needed for inference. Features can be combined to further reduce the number of features. Reducing the number of features reduces the amount of memory and computing power required for training

65
Q

What service is a cloud optimized ETL service, contains it’s own data catalog, and has built in transformations (dropping duplicate records, splitting data, etc)?

A

AWS Glue

66
Q

Describe the AWS Glue Data Catalog

A

Crawls source systems, discovers metadata and schemas, understands the source data. Only metadata is stored in the data catalog

67
Q

For AWS Glue ETL jobs, what is a common destination location for transformed data?

A

S3

68
Q

What service has data quality rules, visualization and data preparation.

A

AWS Glue DataBrew

69
Q

What service helps you prepare a well labeled dataset for use in supervised learning? It uses machine learning to label those things it can, then Turk for those it cant.

A

Amazon SageMaker Ground Truth

70
Q

What service can you use to simplify the feature engineering process, to import/prepare/transform/visualize and analyze features?

A

Amazon SageMaker Canvas

71
Q

Amazon Feature Store

A

Amazon SageMaker Feature Store is a centralized store for features and associated metadata, so features can be easily discovered and reused. Feature Store makes it easy to create, share, and manage features for ML development. Feature Store accelerates this process by reducing repetitive data processing and curation work required to convert raw data into features for training an ML algorithm. You can create workflow pipelines that convert raw data into features and add them to feature groups.

72
Q

A machine learning algorithm updates a set of numbers in such a way that the inference matches an expected output. These numbers are

A

Parameters

73
Q

True or false: The training process requires you to run one training run.

A

False.This can’t be done in one iteration, because the algorithm has not learned yet. It has no knowledge of how changing weights will shift the output closer toward the expected value. Therefore, it watches the weights and outputs from previous iterations, and shifts the weights to a direction that lowers the error in generated output. This iterative process stops either when a defined number of iterations have been run, or when the change in error is below a target value.

74
Q

What is known as running experiments?

A

There are usually multiple algorithms to consider for a model. The best practice is to run many training jobs in parallel, by using different algorithms and settings. This is known as running experiments, which helps you land on the best-performing solution

75
Q

Each algorithm has a set of external parameters that affect its performance. These are set by the data scientists before training the model. These include adjusting things like how many neural layers and nodes there will be in a deep learning model. The optimal values can only be determined by running multiple experiments with different settings.

A

known as hyperparameters

76
Q

To run a training job, what do you give Sagemaker?

A

the URL of the S3 bucket containing your training data. You also specify the compute resources you want to use for training, and the output bucket for the model artifacts. You specify the algorithm by giving SageMaker the path to a Docker container image that contains the training algorithm. In the Amazon Elastic Container Registry, Amazon ECR, you can specify the location of SageMaker provided algorithms and deep learning containers, or the location of your custom container, containing a custom algorithm. You also need to set the hyperparameters required by the algorithm.

77
Q

A capability of Amazon SageMaker that lets you create, manage, analyze, and compare your machine learning experiments. An experiment is a group of training runs, each with different inputs, parameters, and configurations. It features a visual interface to browse your active and past experiments, compare runs on key performance metrics, and identify the best-performing models.

A

Amazon SageMaker experiments

78
Q

Amazon Sagemaker automatic model tuning (AMT)

A

also known as hyperparameter tuning, finds the best version of a model, by running many training jobs on your dataset. To do this, AMT uses the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that create a model that performs it best, as measured by a metric that you choose.

79
Q

What is the most cost efficient way to run your model?

A

Batch inference

80
Q

What are the ways you can deploy your model?

A

Batch inference
Real-time inference
Self-managed
Hosted (sagemaker inference)

81
Q

What options are available for Amazon sagemaker inference?

A

Batch transform (offline line inference, large datasets)
Asynchronous (long processing times, large payloads)
Serverless
(intermittent traffic, periods of no traffic)
Real-time
(live predictions, sustained traffic, low latency, consistent performance)

82
Q

What service can you use to monitor your model and be notified of suspected drift in your deployed model?

A

Amazon SageMaker Model Monitor

83
Q

What is MLOPs

A

IAC
Rapid Experimentation
Version Control
Active perf mon
Automatic model retraining and validation when there is data and code changes

84
Q

What are the benefits of MLOps

A

Productivity
Repeatability
Reliability
Auditability
Data and model quality

85
Q

What service allows you to manage and build model pipelines, defining them with the python SDK or JSON, automated data processing, training jobs, creating models, and registering models?

A

Amazon SageMaker Model Building Pipelines

86
Q

Name four repository options

A

CodeCommit
SageMaker Model Registry
SageMaker Feature Store
Third party

87
Q

Name four options for orchestration

A

SageMaker Pipelines
Amazon Managed Worklows for Apache Airflow
AWS Step Functions
Third party

88
Q

What is a confusion matrix?

A

A confusion matrix is a table with actual data typically across the top and the predicted values on the left.used to summarize the performance of a classification model when it’s evaluated against task data

89
Q

What is accuracy?

A

which is simply the percentage of correct predictions

90
Q

What is precision?

A

Precision measures how well an algorithm predicts true positives out of all the positives that it identifies. The formula is the number of true positives divided by the number of true positives, plus the number of false positives.

91
Q

What is Recall (TPR)?

A

If we want to minimize the false negatives, then we can use a metric known as recall. For example, we want to make sure that we don’t miss if someone has a disease and we say they don’t. The formula is the number of true positives divided by the number of true positives plus the number of false negatives.

92
Q

Can you optimize a model for both precision and recall?

A

No, but you can use F1

93
Q

What is F1?

A

Combines recall and precision into one figure, allowing you to optimize on both of these

94
Q

What is False Positive Rate

A

which is the false positives divided by the sum of the false positives and true negatives. In our example, this metric shows us how the model is handling the images that are not fish. It is a measure of how many of the predictions were of fish out of the images that were not fish

95
Q

What is the True Negative Rate

A

Closely related to the false positive rate is the true negative rate, which is the ratio of the true negatives to the sum of the false positives and true negatives. It is a measure of how many of the predictions were of not fish out of the images that were not fish.

96
Q

What is Receiver operating characteristics

A
97
Q
A
98
Q
A