Filler Flashcards

Question 1

Q

Sage maker canvas

Answer

A

No code solution to bring together data preparation model selection and deployment. Uses #DATA wrangler for #DATA preparation. auto pilot for data cleansing and ML model selection.

Question 2

Q

Jumpstart

Answer

A

Evaluate compare, select foundational models and algorithms. Customizable reference architectures.

Question 3

Q

If you see a question about scanned PDFs with analyzed embedded images, this service is used

Answer

A

Rekognition

Question 4

Q

Knowledge cutoff

Answer

A

This is a specific concern of Gen AI

Question 5

Q

Data collection is imperative for Demand prediction use cases

Question 6

Q

Instruction dataset fine tuning

Answer

A

prompt response pairs, specific responses and instructions

Question 7

Q

Domain adaption fine tuning

Answer

A

what you would htink

Question 8

Q

Underfitting is matched with

Answer

A

High bias

Question 9

Q

MAPE and MAP are good metrics for these use cases

Answer

A

Monthly revenue
Forecasting

Question 10

Q

Accuracy and F1 are good metrics for

Answer

A

Classification

Question 11

Q

Which factors can directly influence the latency of a machine learning model’s inference? (Select TWO.)

Answer

A

Length of the generated output sequence
Length of the input data sequence

Question 12

Q

Chain of Thought prompting

Answer

A

The primary advantage of Chain-of-thought prompting lies in its ability to produce detailed, sequential explanations, making it an effective tool for scenarios requiring deep reasoning and clear communication

Question 13

Q

Tree-of-thought is a technique that involves organizing information in a hierarchical structure, just like a decision tree.

Answer

A

Tree of Thought helps visualize relationships and pathways rather than breaking down complex problems into sequential, explainable steps.

Question 14

Q

Directional-stimulus

Answer

A

involves guiding the model’s responses based on specific cues or directions. This technique can influence the direction or focus of the responses but does not specifically enhance the model’s ability to deliver structured, step-by-step explanations

Question 15

Q

Binary classification

Answer

A

is a supervised machine learning model specifically designed to distinguish between two distinct categories or classes. This model is widely used in various applications, such as sentiment analysis, fraud detection, and medical diagnosis, where the objective is to classify data points into one of two predefined categories.

Question 16

Q

Multiclass classification model

Answer

A

This option is only applies when there are more than two categories to predict

Question 17

Q

Ensemble learning

Answer

A

combines multiple models to improve overall performance and robustness.

Question 18

Q

Root mean squared error (RMSE)

Answer

A

This metric is typically used for regression models, not classification models.

Question 19

Q

Recall

Answer

A

this metric measures the proportion of actual positive instances (true positives) correctly identified by the model.

Question 20

Q

Precision

Answer

A

is incorrect because it is a metric that measures the proportion of correct predicted positive instances. Precision is particularly valuable in scenarios where the cost of false positives is high, such as in spam detection or targeted advertising.

Question 21

Q

Tokenization vs embeddings

Answer

A

Tokeneization involves breaking down a sequence of text into smaller units called tokens, such as words, subwords, or characters. Embedings is vectors.

Question 22

Q

Amazon Textract is a fully managed AWS service that uses machine learning to extract written text, handwriting, tables, and other information from scanned documents and photos. It is used to process documents in formats that include PDFs, JPEGs, and PNGs, making it an effective solution for enterprises that manage large amounts of documents. Textract can recognize and extract critical features, including names, dates, amounts, and other structured data from various documents, including contracts, forms, and invoices, making the data machine-readable and suitable for further processing.

Question 23

Q

Amazon Kendra

Answer

A

This is an intelligent search service designed to help users find information across various data sources. While it can retrieve unstructured data, but it does not focus on transforming or structuring data for analysis.

Question 24

Q

AWS Glue is a fully managed extract, transform, and load (ETL) service that can categorize, clean, and transform unstructured data, like medical records, into a structured format. It simplifies the process of preparing data for analysis, including healthcare research and predictive analytics, by automating schema discovery and code generation.

Question 25

Q

With SageMaker JumpStart, you can quickly evaluate, compare, and select pre-trained machine learning models based on predefined quality and reliability metrics. These models can be customized for your specific use case with your own data, and can be easily deployed into production using the user interface or SDK. In addition, you can share models and notebooks within your organization to streamline model building and deployment.

Question 26

Q

Amazon SageMaker Autopilot is incorrect because it primarily automates the process of building, training, and tuning machine learning models. It is designed to make it easier to create ML models without needing deep expertise in ML. However, it does not specifically provide pre-built solutions and models like SageMaker JumpStart does.

Question 27

Q

Generative Adversarial Networks (GANs) are a type of machine learning model designed to generate new data by learning from an existing dataset. GANs consist of two neural networks, the generator, and the discriminator, that work together in a competitive process. The generator creates synthetic data samples resembling the original training data, while the discriminator tries to distinguish between real and fake samples. As the two networks compete, the generator improves its ability to create realistic data, and the discriminator becomes better at identifying fake data. This adversarial training allows GANs to generate highly realistic data, such as images, audio, or text.

Question 28

Q

Recurrent neural network (RNN) is incorrect because it is primarily used for tasks that involve sequential or time-series data, such as speech recognition, language modeling, and time series forecasting

Answer

A

RNNs are effective for learning temporal patterns in data but are not designed to generate new data based on an adversarial process.

Question 29

Q

Convolutional neural networks (CNN) is incorrect because it is only specialized for processing structured grid-like data, such as images. CNNs are primarily used in tasks like image classification, object detection, and facial recognition, where the model needs to learn spatial hierarchies of features. CNNs do not generate new data but extract important features from existing data.

Question 30

Q

An epoch is

Answer

A

a single pass through the entire training dataset.

Question 31

Q

Bidirectional Encoder Representations from Transformers (BERT), a bidirectional model, examines the context of an entire sequence before making predictions. It was trained on a plain text corpus and Wikipedia, utilizing 3.3 billion tokens (words) and 340 million parameters. BERT is capable of answering questions, predicting sentences, and translating texts.

Question 32

Q

learningRateWarmupSteps

Answer

A

this typically defines the number of steps where the learning rate gradually increases before stabilizing,does not directly address increasing accuracy

Question 33

Q

learningRate

Answer

A

determines how much to adjust the model’s weights in response to errors but does not define how often the dataset is processed.

Question 34

Q

batchSize

Answer

A

primarily defines how many training examples are processed in one iteration

Question 35

Q

Collaborative filtering models is incorrect because these are used in recommendation systems to predict user preferences based on past behavior.

Question 36

Q

Prescriptive ML models is incorrect because these models are designed to recommend actions based on predictions and are typically used in decision-making processes

Question 37

Q

Transfer Learning is incorrect because this type of learning uses a pre-trained model from one task or domain and applies it to a different but related task. Transfer learning is typically used when there is a shortage of labeled data in the target domain but an abundance of labeled data in a related domain.

Question 38

Q

Exploratory Data Analysis (EDA) is the process of analyzing and understanding the characteristics of the data before building an ML model. It involves tasks such as visualizing data distributions, calculating summary statistics, identifying missing values, and detecting outliers. EDA aims to gain insights into the data and identify potential issues or patterns that may impact the model’s performance

Question 39

Q

Inpainting is incorrect because it is a technique that is typically used to fill in missing sections of an image

Question 40

Q

Prompt engineering is a highly effective approach for tailoring the chatbot’s responses to adhere to the desired tone and style guidelines. By carefully crafting the prompts with examples and instructions that reflect the company’s guidelines, the model can generate responses that align with those requirements.

Question 41

Q

In cases where you want the model to be both precise and sensitive (high recall),

Answer

A

computing the F1-score is the way to go

Question 42

Q

Federal Risk and Authorization Management Program (FedRAMP) is

Answer

A

focuses on cloud services for federal agencies.

Question 43

Q

Transparency vs explainability

Answer

A

Transparency is details, explainability is concepts

Question 44

Q

Amazon SageMaker Model Parallelism is a feature designed to help

Answer

A

train large deep-learning models that cannot fit into the memory of a single GPU