Domain 1: AI/ML Fundamentals 20% Flashcards
The field of computer science dedicated to solving cognitive problems commonly associated with human intelligence, such as learning, creation, and image recognition.
Artificial intelligence
_____ is to create self-learning system that derives meaning from data.
The goal of AI
Uses of AI
- Question response
- Create original content (text/images)
- Quickly process vast amounts of data
- Solve complex problems (fraud detection)
- Perform repetitive/monotonous tasks
- Finding patterns in data
- Forecasting trends
_____ is a branch of AI and computer science that focuses on use of data and algorithms to imitate the way humans learn. It gradually improves its accuracy to build computer systems that learn from data.
Machine learning
How are ML models trained?
By using large datasets to identify patterns and make predictions
_____ is a type of machine learning model that is inspired by human brains using layers of neural networks to process information.
Deep learning
_____ are some of the things that deep learning models can do.
Recognizing human speech and objects and images
AI uses
- Predict pandemics
- Monitor assembly lines
- Monitor sensor data to determine when equipment might fail
- Product recommendation and support info (search to solution)
- Personalized content recommendations
- Forecast demand
- Detect fraud
- HR
- Translate language text
Using a technique called _____, an AI model can process historical data, also known as time series data and predict future values.
regression analysis
Predictions that AI makes are called _____, which is an educated guess, so the model gives a probabilistic result.
inferences
A deviation from the expected pattern.
anomaly
_____ use AI to process images and video for object identification and facial recognition, as well as classification, recommendation, monitoring, and detection.
Computer vision applications
_____ is what allows machines to understand, interpret, and generate human language in a natural-sounding way.
Natural language processing
_____ can have seemingly intelligent conversations and generate original content like stories, images, videos, and even music.
Generative AI
_____ is the science of developing algorithms and statistical models that computer systems use to perform complex tasks without explicit instructions.
Machine learning
Computer systems use ML algorithms to _____ and _____.
process large quantities of historical data, and identify data patterns
Machine learning starts with a _____ that takes data as inputs, and generates an output.
mathematical algorithm
To train the ML algorithm to produce the output we expect, we give it known data, which consists of _____.
features
What is the task of the ML algorithm?
to find the correlation between the input data features and the known expected output
Adjustments are made to the ML model by changing _____ until the model reliably produces the expected output.
internal parameter values
When a trained model is able to make accurate predictions and produce output from new data that it hasn’t seen during training.
inference
This type of data is stored as rows in a table with columns, which can serve as the features for an ML model.
structured data
_____ can be text files like CSV, or stored in relational databases like Amazon Relational Database Service, Amazon RDS, or Amazon Redshift.
structured data
_____ can be queried using structured query language, or SQL.
structured data
_____ is the primary source for training data because it can store any type of data, is lower cost, and has virtually unlimited storage capacity.
Amazon S3
Unlike data in a table, _____ elements can have different attributes or missing attributes. An example is a text file that contains JSON, which stands for JavaScript Object Notation.
semi-structured data
_____ and _____ with MongoDB compatibility, are two examples of transactional databases built specifically for semi-structured data.
Amazon DynamoDB and Amazon DocumentDB
_____ is data that doesn’t conform to any specific data model and can’t be stored in table format. Some examples include images, video, and text files, or social media posts. It is typically stored as objects in an object storage system like Amazon S3.
Unstructured data
Breaks down text into individual units of words or phrases
tokenization
_____ is important for training models that need to predict future trends. Each data record is labeled with a timestamp, and stored sequentially.
Time series data
Depending on the sampling rate, time series data captured for long periods can get quite large and be stored in _____ for model training.
Amazon S3
To create a machine learning model, we need to start with an algorithm which defines the _____.
mathematical relationship between outputs and inputs
The simple linear equation _____, defines the linear relationship between our independent variable, x, and the dependent variable, y.
y=mx+b
The slope, m, and intercept, b, are the model parameters that are adjusted iteratively during the training process to _____.
find the best-fitting model
To determine the best fitting model, we look for the parameter values that _____.
minimize the errors
This training process produces model artifacts, which typically consists of trained parameters, a model definition that describes how to compute inferences, and other metadata.
model training
The _____, which are normally stored in Amazon S3, are packaged together with inference code to make a deployable model.
model artifacts
_____ is the software that implements the model, by reading the artifacts.
Inference code
The first is where an endpoint is always available to accept inference requests in real time. And the second is where a batch job is performing inference.
Two options for hosting a model
_____ is ideal for online inferences that have low latency and high throughput requirements. For this, your model is deployed on a persistent endpoint to handle a sustained flow of requests.
Real-time inference
_____ is suitable for offline processing when large amounts of data are available upfront, and you don’t need a persistent endpoint.
Batch
When you need a large number of inferences, and it’s okay to wait for the results, _____can be more cost-effective.
batch processing
T/F: The main difference between real-time and batch is that with batch, the computing resources only run when processing the batch, and then they shut down.
True
T/F: With real-time inferencing, some compute resources are always running and available to process requests.
True
With _____, you train your model with data that is pre-labeled.
supervised learning
T/F: Training data specifies both, the input and the desired output of the algorithm.
True
What is the challenge with supervised learning?
labeling
What solution helps with the challenge of labeling?
Amazon SageMaker Ground Truth
SageMaker Ground Truth can leverage crowdsourcing service called _____that provides access to a large pool of affordable labor spread across the globe.
Amazon Mechanical Turk
_____ algorithms train on data that has features but is not labeled. They can spot patterns, group the data into clusters, and split the data into a certain number of groups.
Unsupervised learning
_____ is useful for use cases such as pattern recognition, anomaly detection, and automatically grouping data into categories.
Unsupervised learning
T/F: Unsupervised learning algorithms can also be used to clean and process data for further modeling automatically.
True
T/F: Unsupervised learning is often used for anomaly detection?
True
_____ is a machine learning method that is focused on autonomous decision making by an agent. The agent takes actions within an environment to achieve specific goals. The model learns through trial and error, and training does not require labeled input. Actions that an agent takes that move it closer to achieving the goal are rewarded.
Reinforcement learning
T/F: To encourage learning during training, the learning agent must be allowed to sometimes pursue actions that might not result in rewards with reinforcement learning.
True
To teach developers about developing a reinforcement learning model, Amazon offers a model race car called _____ that you can teach to drive on a racetrack. With this, the car is the agent, and the track is the environment.
AWS DeepRacer
T/F: Both unsupervised and reinforcement learning work without labeled data.
True
T/F: Unsupervised learning algorithms receive inputs with no specified outputs during the training process.
True
T/F: Reinforcement learning has a predetermined end goal. While it takes an exploratory approach, the explorations are continuously validated and improved to increase the probability of reaching the end goal.
True
When a model performs better on training data than it does on new data, it is called _____, and it is said that the model does not recognize well.
overfitting
The best way to correct a model that is overfitting _____
is to train it with data that is more diverse
If you train your model for too long, it will start to overemphasize unimportant features called _____, which is another way of overfitting.
noise
_____ is a type of error that occurs when the model cannot determine a meaningful relationship between the input and output data.
Underfitting
_____ models give inaccurate results for both the training dataset and new data.
Underfit
_____ is when there are disparities in the performance of a model across different groups. The results are skewed in favor of or against an outcome for a particular class.
Bias
The quality of a model depends on _____ and _____.
the underlying data quality and quantity
T/F: If a model is showing bias, the weight of features that are introducing noise can be directly adjusted by the data scientists.
True
_____, such as age and sex discrimination, should be identified at the beginning before creating a model.
Fairness constraints
Training data should be inspected and evaluated for potential bias, and models need to be continually evaluated by checking their results for _____.
fairness
Deep learning is a type of machine learning that uses algorithmic structures called _____.
neural networks
In deep learning models, we use software modules called _____to simulate the behavior of neurons.
nodes
_____ comprise layers of nodes, including an input layer, several hidden layers, and an output layer of nodes.
Deep neural networks
Every node in the neural network autonomously assigns _____to each feature.
weights
With neural networks, information flows through the network in a _____direction from input to output.
forward
- Every node autonomously assigns weights to each feature.
- Info flows forward thru network from input to output.
- During training, diff b/w predicted output and actual output is calculated.
- Weights of neurons repeatedly adjusted to minimize error.
How neural networks work
_____ can excel at tasks like image classification and natural language processing where there is a need to identify the complex relationship between data objects.
Deep learning
What made deep learning a viable option?
low-cost cloud computing
Because anyone can now readily use powerful computing resources in the cloud, _____ have become the standard algorithmic approach to computer vision.
neural networks
A big advantage of deep learning models for computer vision is that _____.
they don’t need the relevant features given to them. They can identify patterns in images and extract the important features on their own.
The decision to use traditional machine learning or deep learning depends on _____.
the type of data you need to process
Traditional machine learning algorithms will generally perform well and be efficient when it comes to _____.
identifying patterns from structured data and labeled data
Deep learning solutions are more suitable for _____data like images, videos, and text.
unstructured
Tasks for deep learning include_____.
image classification and natural language processing
Both types of machine learning use statistical algorithms, but only deep learning uses_____ to simulate human intelligence.
neural networks
Do deep learning models require a lot of work on selecting/extracting features?
No, b/c they’re self-learning.
_____ is accomplished by using deep learning models that are pre-trained on extremely large datasets containing strings of text or, in AI terms, _____.
Generative AI /sequences
Gen AI deep learning models use transformer neural networks, which change an input sequence, in Gen AI known as _____, into an output sequence, which is the response to your _____.
prompt
Neural networks process the elements of a sequence sequentially _____.
one word at a time
Transformers process the sequence in _____, which speeds up the training and allows much bigger datasets to be used.
parallel
They outperform other ML approaches to natural language processing. They excel at understanding human language so they can read long articles and summarize them. They are also great at generating text that’s similar to the way a human would. As a result, they are good at language translation and even writing original stories, letters, articles, and poetry. They even know computer programming languages and can write code for software developers.
Large language models
T/F: Complex models generally present a tradeoff of compatibility compared with interpretability.
True
T/F: Less complex models mean lower performance.
True
If a software application always produces the same output for the same input, it is said to be _____.
deterministic
A rule-based application is deterministic unless _____.
someone changes the rules