Domain 1 Flashcards
Explain the AI relationship ven diagram
Artificial Intelligence, Machine Learning, Deep Learning
Predictions that AI makes based on historical data
Inference
When AI recognizes a change in what has happened in the past
Anomaly detection
What are some AWS services that could provide structured input data for training ML models?
RDS, Redshift
What are some AWS services that could provide semi-structured input data for training ML models?
DynamoDB, MongoDB
For semi-structured, structured data, unstructured data, and time-series, where should you export data for training models?
S3
In machine learning, what describes the relationship between inputs and outputs?
An algorithm
Describe the machine learning training process
Known data -> features -> algorithm -> output
Describe the machine learning inference process, which comes after training
new data -> features -> model -> output
What are the two artifacts produced that create a model?
Inference code + model artifacts
What type of inferencing provides low-latency, high throughput, and a persistent endpoint (also usually more expensive)?
Real-time
What type of inferencing is performed offline, uses large datasets, and either happens on an infrequent schedule?
Batch transform
Training your model with data that is pre-labeled (pictures with fish/not fish)
Supervised Learning
What is the challenge with supervised learning?
You need a lot of data, people to label…takes time and money
What is Amazon Ground Truth?
A service that helps you provided labeling
What process uses data that has features but is not labeled and is good for pattern recognition, anomaly detection, and grouping data into categories?
Unsupervised learning
What process uses both supervised and unsupervised learning, provides rewards to an agent when criteria are ment, uses trial and error, and allows the agent to make mistakes to learn, and has and end goal?
Reinforcement learning
What sub service of Ground Truth uses crowdsourcing to label
via affordable labor
AWS Mechanical Turk
A model telling you a fish is not a fish because it is out of water, a result of training being to specific and not having enough varied examples, is called what?
Overfitting
What is called when a model cannot determine a meaningful relationship between the input and output data, happens when you haven’t trained the model long enough or with a large enough set?
Underfitting
What is bias?
When a model discriminates against a specific group because of a lack of fair representation in the data used to train the model
Also, if a model is showing bias, what can be done with features?
the weight of features that are introducing noise can be directly adjusted by the data scientists. For example, it could completely remove gender consideration
Items such as age and sex discrimination, should be identified at the beginning before creating a model.
Fairness constraints
A type of machine learning that uses algorithmic structures called neural networks.
Deep learning
The three layers of deep neural networks
input layer, several hidden layers, and an output layer of nodes
Deep learning can excel at tasks like
image classification and natural language processing where there is a need to identify the complex relationship between data objects
A big advantage of deep learning models for computer vision is that
they don’t need the relevant features given to them.
Traditional machine learning algorithms will generally perform well and be efficient when
It comes to identifying patterns from structured data and labeled data. Examples include classification and recommendation systems.
On the other hand, deep learning solutions are more suitable for
unstructured data like images, videos, and text. Tasks for deep learning include image classification and natural language processing, where the is a need to identify the complex relationships between pixels and words.
but only deep learning uses neural networks to simulate human intelligence.
Gen AI use transformer neural networks, which change an input sequence, in Gen AI known as prompt, into an output sequence, which is the response to your prompt. Neural networks process the elements of a sequence sequentially one word at a time. Transformers process the sequence in parallel, which speeds up the training and allows much bigger datasets to be used. They outperform other ML approaches to natural language processing. They excel at understanding human language so they can read long articles and summarize them. They are also great at generating text that’s similar to the way a human would. As a result, they are good at language translation and even writing original stories, letters, articles, and poetry. They even know computer programming languages and can write code for software developers.
Gen AI Notes
Consider these use cases for AI/ML
Increasing business efficiency
Solving complex problems
Making better decisions
Consider AI/ML alternatives when
Costs outweigh benefits
Models cannot meet interpretability requirements
(can’t know how a neural network made a decision, so instead use a rules based system)
Systems must be deterministic (produces same output with the same input) rather than probabilistic
If your dataset consists of features or attributes as inputs with labeled target values as outputs, then you have a supervised learning problem. In this type of problem, you train your model with data containing known inputs and outputs.
supervised learning problem
If your target values are categorical, for example, one or more discrete values, then you have a
classification problem. (supervision)
If these target values you’re trying to predict are mathematically continuous, then you have a
regression problem.
Binary classification classification
assigns an input to one of several classes based on the input attributes.
Multiclass classification
assigns an input to one of several classes based on the input attributes. An example is the prediction of the topic most relevant to a tax documen
When your target values are mathematically continuous, then you have a
egression problem. Regression estimates the value of dependent target variable based on one or more other variables,
multiple independent variables,
If we have such as weight and age, then we have a multiple linear regression problem. A