AI - 900 Flashcards
This is often the foundation for an AI system, and is the way we “teach” a computer model to make predictions and draw conclusions from data.
Machine learning
Capabilities within AI to interpret the world visually through cameras, video, and images.
Computer vision
Capabilities within AI for a computer to interpret written or spoken language, and respond in kind.
Natural language processing
Capabilities within AI that deal with managing, processing, and using high volumes of data found in forms and documents.
Document intelligence
Capabilities within AI to extract information from large volumes of often unstructured data to create a searchable knowledge store.
Knowledge mining
Capabilities within AI that create original content in a variety of formats including natural language, image, code, and more.
Generative AI
the foundation for most AI solutions
Machine Learning
How does machine learning work.
Machines learn from data
Machine learning models try to capture the relationship between …
Data
Microsoft Azure provides the
Machine learning service
Azure is
a cloud-based platform for creating, managing, and publishing machine learning models.
Azure Machine Learning Studiooffers multiple authoring experiences such as
Automated machine learning: this feature enables non-experts to quickly create an effective machine learning model from data.
Azure Machine Learning designer: a graphical interface enabling no-code development of machine learning solutions.
Data metric visualization: analyze and optimize your experiments with visualization.
Notebooks: write and run your own code in managed Jupyter Notebook servers that are directly integrated in the studio.
Automated machine learning:
this feature enables non-experts to quickly create an effective machine learning model from data.
Azure Machine Learning designer:
a graphical interface enabling no-code development of machine learning solutions.
Data metric visualization:
analyze and optimize your experiments with visualization
Notebooks:
write and run your own code in managed Jupyter Notebook servers that are directly integrated in the studio.
an area of AI that deals with visual processing. Let’s explore some of the possibilities that computer vision brings
Computer Vision
… app is a great example of the power of computer vision. Designed for the blind and low vision community, the Seeing AI app harnesses the power of AI to open up the visual world and describe nearby people, text and objects.
Seeing AI
common computer vision tasks.
Image classification, Object detection, Semantic segmentation, Image analysis
… an advanced machine learning technique in which individual pixels in the image are classified according to the object to which they belong. For example, a traffic monitoring solution might overlay traffic images with “mask” layers to highlight different vehicles using specific colors.
Semantic segmentation
… involves training a machine learning model to classify images based on their contents. For example, in a traffic monitoring solution you might use an image classification model to classify images based on the type of vehicle they contain, such as taxis, buses, cyclists, and so on.
Image classification
machine learning models are trained to classify individual objects within an image, and identify their location with a bounding box. For example, a traffic monitoring solution might use object detection to identify the location of different classes of vehicle.
Object detection
You can create solutions that combine machine learning models with advanced … techniques to extract information from images, including “tags” that could help catalog the image or even descriptive captions that summarize the scene shown in the image.
image analysis
… is a specialized form of object detection that locates human faces in an image. This can be combined with classification and facial geometry analysis techniques to recognize individuals based on their facial features.
Face detection
… is a technique used to detect and read text in images. You can use OCR to read text in photographs (for example, road signs or store fronts) or to extract information from scanned documents such as letters, invoices, or forms.
Optical character recognition
Use … to develop computer vision solutions. The service features are available for use and testing in the…and other programming language
Azure AI Vision, Azure Vision Studio
features of Azure AI Vision include:
Image Analysis: capabilities for analyzing images and video, and extracting descriptions, tags, objects, and text.
Face: capabilities that enable you to build face detection and facial recognition solutions.
Optical Character Recognition (OCR): capabilities for extracting printed or handwritten text from images, enabling access to a digital version of the scanned text.
… is the area of AI that deals with creating software that understands written and spoken language.
Natural language processing (NLP)
NLP software can:
Analyze and interpret text in documents, email messages, and other sources.
Interpret spoken language, and synthesize speech responses.
Automatically translate spoken or written phrases between languages.
Interpret commands and determine appropriate actions.
You can use …and … to build natural language processing solutions.
Microsoft’sAzure AI Language, Azure AI Speech
features of Azure AI Language include …
understanding and analyzing text, training conversational language models that can understand spoken or text-based commands, and building intelligent applications.
Azure AI Speech features include …
speech recognition and synthesis, real-time translations, conversation transcriptions, and more.
You can explore Azure AI Language features in the…and Azure AI Speech features in the….
Azure Language Studio, Azure Speech Studio
… is the area of AI that deals with managing, processing, and using high volumes of a variety of data found in forms and documents. Document intelligence enables you to create software that can automate processing for contracts, health documents, financial forms and more
Document Intelligence
You can use Microsoft’s… to build solutions that manage and accelerate data collection from scanned documents.
Azure AI Document Intelligence
Features of Azure AI Document Intelligence help…
automate document processing in applications and workflows, enhance data-driven strategies, and enrich document search capabilities.
You can use … to add intelligent document processing for invoices, receipts, health insurance cards, tax forms, and more. You can also use … to create custom models with your own labeled datasets. The service features are available for use and testing in the…and other programming languages.
prebuilt models, Azure AI Document Intelligence, Document Intelligence Studio
… is the term used to describe solutions that involve extracting information from large volumes of often unstructured data to create a searchable knowledge store.
Knowledge mining
One Microsoft knowledge mining solution is…, a private, enterprise, search solution that has tools for building indexes. The indexes can then be used for internal only use, or to enable searchable content on public facing internet assets.
Azure Cognitive Search
… can utilize the built-in AI capabilities of Azure AI services such as image processing, document intelligence, and natural language processing to extract data. The product’s AI capabilities makes it possible to index previously unsearchable documents and to extract and surface insights from large amounts of data quickly.
Azure Cognitive Search
… describes a category of capabilities within AI that create original content. People typically interact with generative AI that has been built into chat applications. Generative AI applications take in natural language input, and return appropriate responses in a variety of formats including natural language, image, code, and audio.
Generative AI
In Microsoft Azure, you can use the… to build generative AI solutions.
Azure OpenAI service
… is Microsoft’s cloud solution for deploying, customizing, and hosting generative AI models. It brings together the best of OpenAI’s cutting edge models and APIs with the security and scalability of the Azure cloud platform.
Azure OpenAI Service
Azure OpenAI supports many foundation model choices that can serve different needs. The service features are available for use and testing in theAzure … and other programming languages. You can use the Azure OpenAI Studio user interface to manage, develop, and customize generative AI models.
OpenAI Studio
The Challenges or Risks of AI include:
Bias can affect results
Errors may cause harm
Data could be exposed
Solutions may not work for everyone
Users must trust a complex system
Who’s liable for AI-driven decisions?
six principles, designed to ensure that AI applications provide amazing solutions to difficult problems without any unintended negative consequences.
Fairness, Reliability and safety, Privacy and security, Inclusiveness, Transparency, Accountability
AI systems should treat all people fairly. For example, suppose you create a machine learning model to support a loan approval application for a bank. The model should predict whether the loan should be approved or denied without bias. This bias could be based on gender, ethnicity, or other factors that result in an unfair advantage or disadvantage to specific groups of applicants.
Azure Machine Learning includes the capability to interpret models and quantify the extent to which each feature of the data influences the model’s prediction. This capability helps data scientists and developers identify and mitigate bias in the model.
Another example is Microsoft’s implementation ofResponsible AI with the Face service, which retires facial recognition capabilities that can be used to try to infer emotional states and identity attributes. These capabilities, if misused, can subject people to stereotyping, discrimination or unfair denial of services.
Fairness
AI systems should perform …. For example, consider an AI-based software system for an autonomous vehicle; or a machine learning model that diagnoses patient symptoms and recommends prescriptions. Unreliability in these kinds of systems can result in substantial risk to human life.
AI-based software application development must be subjected to rigorous testing and deployment management processes to ensure that they work as expected before release.
Reliably and safely
AI systems should be … and respect …. The machine learning models on which AI systems are based rely on large volumes of data, which may contain personal details that must be kept private. Even after the models are trained and the system is in production, privacy and security need to be considered. As the system uses new data to make predictions or take action, both the data and decisions made from the data may be subject to privacy or security concerns.
Secure, privacy
Thru …, AI systems should empower everyone and engage people. AI should bring benefits to all parts of society, regardless of physical ability, gender, sexual orientation, ethnicity, or other factors.
Inclusiveness
To achieve …, AI systems should be understandable. Users should be made fully aware of the purpose of the system, how it works, and what limitations may be expected.
Transparency
People should be … for AI systems. Designers and developers of AI-based solutions should work within a framework of governance and organizational principles that ensure the solution meets ethical and legal standards that are clearly defined.
Accountable
Machine learning is in many ways the intersection of two disciplines … and …
data science and software engineering
The goal of machine learning is to use data to create a … model that can be incorporated into a software application or service. To achieve this goal requires collaboration between data scientists who explore and prepare the data before using it totraina machine learning model, and software developers who integrate the models into applications where they’re used to predict new data values (a process known as… ).
predictive, inferencing
Fundamentally, a machine learning model is a software application that encapsulates a …to calculate an output value based on one or more input values. The process of defining that … is known as …. After the … has been defined, you can use it to predict new values in a process called….
function, function, training, function, inferencing
The training data consists of past observations. In most cases, the observations include the observed … or …of the thing being observed, and the known … of the thing you want to train a model to predict (known as the…).
attributes, features, value, label
You’ll often see the features referred to using the shorthand variable name…, and the label referred to as…. Usually, an observation consists of multiple feature values, soxis actually a…(an array with multiple values), like this:[x1,x2,x3,…].
x, y, vector
An…is applied to the data to try to determine a relationship between the … and the …, and generalize that relationship as a calculation that can be performed on…to calculate…
algorithm, features, label, x, Y
The specific algorithm used depends on the kind of … problem you’re trying to solve (more about this later), but the basic principle is to try tofitthe data to a function in which the values of the features can be used to calculate the…
predictive, label.
The result of the algorithm is a…that encapsulates the calculation derived by the algorithm as afunction- let’s call itf. In mathematical notation:
y = f(x)
model
The model is essentially a software program that encapsulates the … produced by the training process. You can input a set of …, and receive as an output a prediction of the corresponding …. Because the output from the model is a prediction that was calculated by the function, and not an observed value, you’ll often see the output from the function shown as…
function, feature values, label, ŷ
… is a general term for machine learning algorithms in which the training data includes bothfeaturevalues and knownlabelvalues.
Supervisedmachine learning
Supervised machine learning is used to train … by determining a relationship between the … and … in past observations, so that unknown … can be predicted for features in future cases.
models, features and labels, labels
…is a form of supervised machine learning in which the label predicted by the model is a numeric value.
Regression
…is a form of supervised machine learning in which the label represents a categorization, orclass. There are two common … scenarios.
Classification, classification
In… classification, the label determines whether the observed itemis(orisn’t) an instance of a specific class. Or put another way, … classification models predict one of two mutually exclusive outcomes.
binary, binary
In the … model predicts a…/…or…/…prediction for a single possible class.
Binary, true/false, positive/negative
… classificationextends binary classification to predict a label that represents one of multiple possible classes.
Multiclass,
In most scenarios that involve a known set of multiple classes, multiclass classification is used to predict … labels
mutually exclusive
…machine learning involves training models using data that consists only offeaturevalues without any known labels.
Unsupervised
… machine learning algorithms determine relationships between the features of the observations in the training data.
Unsupervised
There are some … algorithms that you can use to trainmultilabelclassification models, in which there may be more than one valid label for a single observation.
Multiclass,
…machine learning involves training models using data that consists only offeaturevalues without any known labels.
Unsupervised
Unsupervised machine learning algorithms determine … between the features of the observations in the training data.
relationships
The most common form of unsupervised machine learning is….
clustering
A … algorithm identifies similarities between observations based on their …, and groups them into discrete clusters.
clustering, features,
… is similar to multiclass classification; in that it categorizes observations into discrete groups. The difference is that when using classification, you already know the classes to which the observations in the training data belong.
clustering,
In clustering, there’s no previously known … … and the algorithm groups the data observations based purely on similarity of features.
cluster label,
In some cases, … is used to determine the set of classes that exist before training a classification model.
clustering
… models are trained to predict numeric label values based on training data that includes both features and known labels.
Regression
The process for training a regression model (or indeed, any … machine learning model) involves multiple iterations in which you use an appropriate algorithm (usually with some parameterized settings) to train a model, evaluate the model’s … …, and refine the model by repeating the training process with different … and … until you achieve an acceptable level of predictive accuracy.
supervised, predictive performance, algorithms and parameters
Four key elements of the training process for supervised machine learning models
Split the training data (randomly) to create a dataset with which to train the model while holding back a subset of the data that you’ll use to validate the trained model.
Use an algorithm to fit the training data to a model. In the case of a regression model, use a regression algorithm such aslinear regression.
Use the validation data you held back to test the model by predicting labels for the features.
Compare the knownactuallabels in the validation dataset to the labels that the model predicted. Then aggregate the differences between thepredictedandactuallabel values to calculate a metric that indicates how accurately the model predicted for the validation data.
After each train, validate, and evaluate iteration, you can repeat the process with different … and … until an acceptable evaluation metric is achieved.
algorithms and parameters
A… … algorithm, works by deriving a function that produces a straight line through the intersections of thexandyvalues while minimizing the average distance between the line and the plotted points
linear regression,
The … is the differences between the predicted (…)values and actual (…) values, from the validation dataset.
variance, ŷ, y,
Based on the differences between the predicted and actual values, you can calculate some common metrics that are used to evaluate a regression model. They include:
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Coefficient of determination (R2)
Theabsolute errorfor each prediction, the distance either above or below the predicted outcome, can be summarized for the whole validation set as the…
mean absolute error(MAE)
it may be more desirable to have a model that is consistently wrong by a small amount than one that makes fewer, but larger errors. One way to produce a metric that “amplifies” larger errors bysquaringthe individual errors and calculating the mean of the squared values. This metric is known as the… … …
mean squared error(MSE)
The mean squared error helps take the magnitude of errors into account, but because itsquaresthe error values, the resulting metric no longer represents the quantity measured by the label. If we want to measure the error in terms of the number of ice creams, we need to calculate the…of the MSE; which produces a metric called, unsurprisingly,the … … … …
square root, Root Mean Squared Error
The… … …(more commonly referred to asR2orR-Squared) is a metric that measures the proportion of variance in the validation results that can be explained by the model, as opposed to some anomalous aspect of the validation data.
coefficient of determination
The calculation for …is more complex than for the previous metrics. It compares the sum of squared differences between predicted and actual labels with the sum of squared differences between the actual label values and the mean of actual label values.
…= 1- ∑(y-ŷ)2÷ ∑(y-ȳ)2
R2, R2
The important point is that the result of R2 is a value between … and … that describes the proportion of variance explained by the model. In simple terms, the closer to … this value is, the better the model is fitting the validation data
0 and 1, 1
In most real-world scenarios, a data scientist will use an iterative process to repeatedly train and evaluate a model, varying:
Feature selection and preparation (choosing which features to include in the model, and calculations applied to them to help ensure a better fit).
Algorithm selection (We explored linear regression in the previous example, but there are many other regression algorithms)
Algorithm parameters (numeric settings to control algorithm behavior, more accurately calledhyperparametersto differentiate them from thexandyparameters).
Instead of calculating numeric values like a regression model, the algorithms used to train classification models calculate… …for class assignment and the evaluation metrics used to assess model performance compare the … classes to the … classes.
probabilityvalues, predicted, actual
In most real scenarios, the data observations used to train and validate the binary model consist of … feature (x) values and ayvalue that is either … or ….
multiple, 1 or 0
To train a binary classification model use an algorithm to fit the training data to a … that calculates theprobabilityof the class label beingtrue
function, true
… is measured as a value between … and …, such that thetotalprobability forallpossible classes is ….
Probability, 0.0 and 1.0, 1.0
There are many algorithms that can be used for binary classification, such aslogistic regression, which derives a…(S-shaped) function with values between … and ….
sigmoid, 0.0 and 1.0
Despite its name, in machine learning,…is used for classification, not regression. The important point is thelogisticnature of the function it produces, which describes an S-shaped curve between a lower and upper value (0.0 and 1.0 when used for … …).
logistic regression, binary classification
The … … … used to train binary classification models describes the probability ofybeing true (y=1) for a given value ofx. Mathematically, you can express the function like this:
f(x) = P(y=1 | x)
logistic regression function
For logistical regression models, with three of six observations in the training data, we know thatyis definitelytrue, so the probability for those observations thaty=1 is…and for the other three, we know thatyis definitelyfalse, so the probability thaty=1 is…. The S-shaped curve describes the probability distribution so that plotting a value ofxon the line identifies the corresponding probability thatyis1.
1.0, 0.0
The diagram for a logistical regression model also includes a horizontal line to indicate thethresholdat which a model based on this function will predicttrue(1) orfalse(0). The threshold lies at the … fory(P(y) = 0.5). For any values at this point or above, the model will predicttrue(1); while for any values below this point it will predictfalse(0). For example, for a patient with a blood glucose level of 90, the function would result in a probability value of 0.9. Since 0.9 is higher than the threshold of 0.5, the model would predicttrue(1) - in other words, the patient is predicted to have diabetes.
mid-point
The first step in calculating evaluation metrics for a binary classification model is usually to create a matrix of the number of … and … predictions for each possible class label:
correct and incorrect
Aconfusion matrix shows the prediction totals where:
ŷ=0 and y=0:True negatives(TN)
ŷ=1 and y=0:False positives(FP)
ŷ=0 and y=1:False negatives(FN)
ŷ=1 and y=1:True positives(TP)
ŷ
0. 1
————————
| | |
0. | |. |
Y. ————————
|. |. |
1. |. |. |
————————-
The arrangement of the confusion matrix is such that correct (true) predictions are shown in a … line from … to … Often, color-intensity is used to indicate the number of predictions in each cell, so a quick glance at a model that predicts well should reveal a deeply shaded diagonal trend.
diagonal, top-left, bottom-right.
The simplest metric you can calculate from the confusion matrix is… - the proportion of predictions that the model got ….
accuracy, right
(TN+TP) ÷ (TN+FN+FP+TP)
Accuracy is calculated as:
(TN+TP) ÷ (TN+FN+FP+TP)
(2+3) ÷ (2+1+0+3)
= 5 ÷ 6
=0.83
So for our validation data, the diabetes classification model produced correct predictions 83% of the time.
What is the problem with an accuracy model?
Suppose 11% of the population has diabetes. You could create a model that always predicts0, and it would achieve an accuracy of 89%, even though it makes no real attempt to differentiate between patients by evaluating their features. What we really need is a deeper understanding of how the model performs at predicting1for positive cases and0for negative cases.
…is a metric that measures the proportion of positive cases that the model identified correctly. In other words, compared to the number of patients whohavediabetes, how many did the modelpredictto have diabetes?
The formula is:
TP ÷ (TP+FN)
Recall
…is a similar metric to recall, but measures the proportion of predicted positive cases where the true label is actually positive. In other words, what proportion of the patientspredictedby the model to have diabetes actuallyhavediabetes?
The formula is:
TP ÷ (TP+FP)
Precision
…is an overall metric that combined recall and precision. The formula is:
(2 x Precision x Recall) ÷ (Precision + Recall)
F1-score
Another name for recall is the… … …, and there’s an equivalent metric called the… … …that is calculated asFP÷(FP+TN).
true positive rate(TPR), false positive rate(FPR)
TPR and FPR metrics are often used to evaluate a model by plotting a… … … curve that compares the TPR and FPR for every possible threshold value between 0.0 and 1.0.
received operator characteristic(ROC)
The … curve for a perfect model would go straight up the TPR axis on the left and then across the FPR axis at the top. Since the plot area for the curve measures 1x1, the area under this perfect curve would be 1.0 (meaning that the model is correct 100% of the time). In contrast, a diagonal line from the bottom-left to the top-right represents the results that would be achieved by randomly guessing a binary label; producing an area under the curve of 0.5. In other words, given two possible class labels, you could reasonably expect to guess correctly 50% of the time.
ROC
As a supervised machine learning technique, … … follows the same iterativetrain, validate, and evaluateprocess as regression and binary classification in which a subset of the training data is held back to validate the trained model.
Multiclass classification
Multiclass classification algorithms are used to calculate probability values for multiple class labels, enabling a model to predict the… …class for a given observation.
most probable
To train a multiclass classification model, we need to use an algorithm to fit the training data to a function that calculates a probability value for each possible class. There are two kinds of algorithm you can use to do this:
One-vs-Rest (OvR) algorithms
Multinomial algorithms
… algorithms train a binary classification function for each class, each calculating the probability that the observation is an example of the target class. Each function calculates the probability of the observation being a specific class compared toanyother class.
One-vs-Rest
f0(x) = P(y=0 | x) f1(x) = P(y=1 | x) f2(x) = P(y=2 | x) Each algorithm produces a sigmoid function that calculates a probability value between 0.0 and 1.0. A model trained using this kind of algorithm predicts the class for the function that produces the highest probability output.
A … algorithm, creates a single function that returns a multi-valued output. The output is avector(an array of values) that contains theprobability distributionfor all possible classes - with a probability score for each class which when totaled adds up to 1.0:
f(x) =[P(y=0|x), P(y=1|x), P(y=2|x)]
multinomial
An example of a Multinomial function is a…function, which could produce an output like the following example:
[0.2, 0.3, 0.5]
The elements in the vector represent the probabilities for classes 0, 1, and 2 respectively; so in this case, the class with the highest probability is2.
softmax
You can evaluate a multiclass classifier by calculating … classification metrics for each individual class. Alternatively, you can calculate … metrics that take all classes into account.
binary, aggregate
The confusion matrix for a multiclass classifier is similar to that of a binary classifier, except that it shows the number of predictions … … … ofpredicted(ŷ) andactualclass labels (y)
for each combination
To calculate the overall accuracy, recall, and precision metrics, you use the total of the… … …and … metrics:
Overall accuracy= (TN+TP) ÷ (TN+FN+FP+TP) Overall recall= TP ÷ (TP+FN) Overall precision= TP ÷ (TP+FP)
TP: True Positive, FP: False Positive, TN: True Negative, FN: False Negative
TP,TN,FP, andFN
The overall f1 score is based on … … and …. …
Overall precision and overall recall
Overall f1 = (2 x Precision x Recall) ÷ (Precision + Recall)
Clusteringis a form of … machine learning in which observations are grouped into clusters based on similarities in their data values, or features. This kind of machine learning is considered … because it doesn’t make use of previously known label values to train a model. In a clustering model, the … is the cluster to which the observation is assigned, based only on its features.
unsupervised, unsupervised, label
For example, suppose a botanist observes a sample of flowers and records the number of leaves and petals on each flower:
There are no knownlabelsin the dataset, just twofeatures. The goal is not to identify the different types (species) of flower; just to group similar flowers together based on the number of leaves and petals.
There are multiple algorithms you can use for clustering. One of the most commonly used algorithms isK-Meansclustering, which consists of the following steps:
The feature (x) values are vectorized to definen-dimensional coordinates (wherenis the number of features). In the flower example, we have two features: number of leaves (x1) and number of petals (x2). So, the feature vector has two coordinates that we can use to conceptually plot the data points in two-dimensional space ([x1,x2])
You decide how many clusters you want to use to group the flowers - call this valuek. For example, to create three clusters, you would use akvalue of 3. Thenkpoints are plotted at random coordinates. These points become the center points for each cluster, so they’re calledcentroids.
Each data point (in this case a flower) is assigned to its nearest centroid.
Each centroid is moved to the center of the data points assigned to it based on the mean distance between the points.
After the centroid is moved, the data points may now be closer to a different centroid, so the data points are reassigned to clusters based on the new closest centroid.
The centroid movement and cluster reallocation steps are repeated until the clusters become stable or a predetermined maximum number of iterations is reached.
Since there’s no known label with which to compare the predicted cluster assignments, evaluation of a clustering model is based on how well the resulting clusters are … … … ….
separated from one another
There are multiple metrics that you can use to evaluate cluster separation, including:
Average distance to cluster center: How close, on average, each point in the cluster is to the centroid of the cluster.
Average distance to other center: How close, on average, each point in the cluster is to the centroid of all other clusters.
Maximum distance to cluster center: The furthest distance between a point in the cluster and its centroid.
Silhouette: A value between -1 and 1 that summarizes the ratio of distance between points in the same cluster and points in different clusters (The closer to 1, the better the cluster separation).
… …is an advanced form of machine learning that tries to emulate the way the human brain learns.
Deep learning
The key to deep learning is the creation of an …… … that simulates electrochemical activity in biological neurons by using mathematical functions.
artificialneural network
Artificial neural networks are made up of multiplelayersof neurons - essentially defining a … … …. This architecture is the reason the technique is referred to asdeep learningand the models produced by it are often referred to asdeep neural networks(DNNs). You can use deep neural networks for many kinds of machine learning problem, including regression and classification, as well as more specialized models for natural language processing and computer vision.
deeply nested function
Just like other machine learning techniques discussed in this module, deep learning involves fitting training data to a function that can predict a label (y) based on the value of one or more features (x). The function (f(x)) is the … … of a nested function in which each layer of the neural network encapsulates functions that operate onxand the weight (w) values associated with them. The algorithm used to train the model involves iteratively feeding the … … (x) in the training data forward through the layers to calculate output values forŷ, validating the model to evaluate how far off the calculatedŷvalues are from the knownyvalues (which quantifies the level of error, orloss, in the model), and then modifying the weights (w) to reduce the loss. The trained model includes the final weight values that result in the most accurate predictions.
outer layer, feature values
This is an example of a classification problem, in which the machine learning model must predict the most probable class to which an observation belongs. A classification model accomplishes this by predicting a label that consists of the probability for … ….
each class
In other words, y is a vector of three probability values; one for each of the possible classes:[P(y=0|x), P(y=1|x), P(y=2|x)].
The process for inferencing a predicted penguin class using a deep learning network is:
The feature vector for a penguin observation is fed into the input layer of the neural network, which consists of a neuron for eachxvalue. In this example, the followingxvector is used as the input:[37.3, 16.8, 19.2, 30.0]
The functions for the first layer of neurons each calculate a weighted sum by combining thexvalue andwweight, and pass it to an activation function that determines if it meets the threshold to be passed on to the next layer.
Each neuron in a layer is connected to all of the neurons in the next layer (an architecture sometimes called afully connected network) so the results of each layer are fed forward through the network until they reach the output layer.
The output layer produces a vector of values; in this case, using asoftmaxor similar function to calculate the probability distribution for the three possible classes of penguin. In this example, the output vector is:[0.2, 0.7, 0.1]
The elements of the vector represent the probabilities for classes 0, 1, and 2. The second value is the highest, so the model predicts that the species of the penguin is1(Gentoo).
Azure Machine Learning provides the following features and capabilities to support machine learning workloads:
Centralized storage and management of datasets for model training and evaluation.
On-demand compute resources on which you can run machine learning jobs, such as training a model.
Automated machine learning (AutoML), which makes it easy to run multiple training jobs with different algorithms and parameters to find the best model for your data.
Visual tools to define orchestratedpipelinesfor processes such as model training or inferencing.
Integration with common machine learning frameworks such as MLflow, which make it easier to manage model training, evaluation, and deployment at scale.
Built-in support for visualizing and evaluating metrics for responsible AI, including model explainability, fairness assessment, and others.
The primary resource required for Azure Machine Learning is anAzure Machine Learning…
workspace
Azure Machine Learning …; a browser-based portal for managing your machine learning resources and jobs.
studio
In Azure Machine Learning studio, you can (among other things):
Import and explore data.
Create and use compute resources.
Run code in notebooks.
Use visual tools to create jobs and pipelines.
Use automated machine learning to train models.
View details of trained models, including evaluation metrics, responsible AI information, and training parameters.
Deploy trained models for on-request and batch inferencing.
Import and manage models from a comprehensive model catalog.
… …imitates human behavior by relying on machines to learn and execute tasks without explicit directions on what to output.
Artificial Intelligence
… …algorithms take in data like weather conditions and fit models to the data, to make predictions like how much money a store might make in a given day.
Machine learning
… …models use layers of algorithms in the form of artificial neural networks to return results for more complex use cases. Many Azure AI services are built on deep learning models. You can check out this article to learn more about thedifference between machine learning and deep learning.
Deep learning
… … models can produce new content based on what is described in the input. The OpenAI models are a collection of generative AI models that can produce language, code, and images.
Generative AI
Generative AI includes:
Generating natural language
Generating code
Generating images
OpenAI consists of four components:
Pre-trained generative AI models
Customization capabilities; the ability to fine-tune AI models with your own data
Built-in tools to detect and mitigate harmful use cases so users can implement AI responsibly
Enterprise-grade security with role-based access control (RBAC) and private networks
OpenAI supports many common AI workloads and solves for some new ones.
Common AI workloads include machine learning, computer vision, natural language processing, conversational AI, anomaly detection, and knowledge mining.
Other AI workloads Azure OpenAI supports can be categorized by tasks they support, such as:
Generating Natural Language
Text completion: generate and edit text
Embeddings: search, classify, and compare text
Generating Code: generate, edit, and explain code
Generating Images: generate and edit images
Azure AI services encompass all of what were previously known as … … and Azure Applied AI Services.
Cognitive Services
Azure AI services are tools for solving AI …
workloads
There are several overlapping capabilities between Azure AI Language service and Azure OpenAI Service, such as translation, … …, and keyword extraction
, sentiment analysis
… is the process of optimizing a model’s performance) tuning.
Tuning
Azure OpenAI Service may be more beneficial for use-cases that require highly customized… …, or for exploratory research
generative models
When making business decisions about what type of model to use, it’s important to understand how time and compute needs factor into machine learning training. In order to produce an effective machine learning model, the model needs to be trained with a substantial amount of cleaned data. The ‘learning’ portion of training requires a computer to identify an algorithm that best fits the data. The complexity of the task the model needs to solve for and the desired level of model performance all factor into the … required to run through possible solutions for a best fit algorithm.
time
… models that represent the latest generative models for natural language and code.
GPT-4
… models that can generate natural language and code responses based on prompts.
GPT-3.5
…models that convert text to numeric vectors for analysis - for example comparing sources of text for similarity.
Embeddings
… models that generate images based on natural language descriptions
DALL-E
… modelsalwayshave aprobabilityof reflecting true values. Higher performing models, such as models that have been fine-tuned for specific tasks, do a better job of returning responses that reflect true values. It is important to review the output of generative AI models.
Generative AI
In the Azure OpenAI Studio, you can experiment with OpenAI models in …. In the… …, you can type in prompts, configure parameters, and see responses without having to code.
playgrounds, Completionsplayground
In the …playground, you can use the assistant setup to instruct the model about how it should behave. The assistant will try to mimic the responses you include in tone, rules, and format you’ve defined in your system message.
Chat
… …learning models are trained on words or chunks of characters known astokens. For example, the word “hamburger” gets broken up into the tokensham,bur, andger, while a short and common word like “pear” is a single token.
These tokens are mapped into vectors for a machine learning model to use for training. When a trained … … model takes in a user’s input, it also breaks down the input into tokens.
Natural language, natural language
… … … models are excellent at both understanding and creating natural language.
Generative pre-trained transformer (GPT)
GPT tries to infer, or guess, the context of the user’s question based on the…
prompt
Natural language tasks include:
Task
Summarizing text
Classifying text
Generating names or phrases
Translation
Answering questions
Suggesting content
… models have been trained on both natural language and billions of lines of code from public repositories.
GPT
What’s unique about the … model family is that it’s more capable across more languages than GPT models.
Codex
… can also summarize functions that are already written, explain SQL queries or tables, and convert a function from one programming language into another.
GPT
OpenAI Codex is:
OpenAI Codex is an artificial intelligence model developed by OpenAI. It parses natural language and generates code in response. It powers GitHub Copilot, a programming autocompletion tool for select IDEs, like Visual Studio Code and Neovim.
The main difference between CodeX and ChatGPT is that CodeX focuses on code generation, while ChatGPT is designed for conversational text generation. When analyzing their computational performance, we can see that CodeX is significantly faster than ChatGPT when performing code generation. Both are owned by OpenAI.
GitHub … integrates the power of OpenAI Codex into a plugin for developer environments like Visual Studio Code.
Copilot
In addition to natural language capabilities, generative AI models can edit and create images. The model that works with images is called ….
DALL-E
Image capabilities generally fall into the three categories of:
image creation, editing an image, and creating variations of an image.
DALL-E can edit the image as requested by changing its style, adding or removing items, or generating new content to add. Edits are made by uploading the original image and specifying a transparent … that indicates what area of the image to edit
Mask
… … … are AI capabilities that can be built into web or mobile applications, in a way that’s straightforward to implement.
Azure AI services
The Azure AI … … service can be used to detect harmful content within text or images, including violent or hateful content, and report on its severity.
Content Safety
The Azure AI … service can be used to summarize text, classify information, or extract key phrases.
Language
Azure AI … service provides powerful speech to text and text to speech capabilities, allowing speech to be accurately transcribed into text, or text to natural sounding voice audio.
Speech
Azure AI services are based on three principles that dramatically improve speed-to-market:
Prebuilt and ready to use
Accessed through APIs
Available on Azure
Developers can access AI services through … …, client libraries, or integrate them with tools such as Logic Apps and Power Automate.
REST APIs
AI Services are managed in the same way as other Azure services, such as platform as a service (PaaS), infrastructure as a service (IaaS), or a … … service
managed database
The Azure platform and … … provide a consistent framework for all your Azure services, from creating or deleting resources, to availability and billing.
Resource Manager
There are two types of AI service resources … or …
multi-service or single-service.
… resource: a resource created in the Azure portal that provides access to multiple Azure AI services with a single key and endpoint. Use the resourceAzure AI serviceswhen you need several AI services or are exploring AI capabilities. When you use an Azure AI services resource, all your AI services are billed together.
Multi-service
… resources: a resource created in the Azure portal that provides access to a single Azure AI service, such as Speech, Vision, Language, etc. Each Azure AI service has a unique key and endpoint. These resources might be used when you only require one AI service or want to see cost information separately.
Single-service
To create an Azure AI services resource, sign in to theAzure portalwith … access and selectCreate a resource.
Contributor
Once you create an Azure AI service resource, you can build applications using the … …, software development kits (SDKs), or visual studio interfaces.
REST API
There are different studios for different Azure AI services, such as … …, Language Studio, Speech Studio, and the Content Safety Studio.
Vision Studio
Before you can use an AI service resource, you must associate it with the … you want to use on the Settings page. Select the resource, and then selectUse Resource.
studio
Most Azure AI services are accessed through a … …, although there are other ways. The API defines what information is passed between two software components: the Azure AI service and whatever is using it.
RESTful API
Part of what an … does is to handle authentication. Whenever a request is made to use an AI services resource, that request must be authenticated. For example, your subscription and AI service resource is verified to ensure you have sufficient permissions to access it. This authentication process uses an endpoint and a resource key.
API
The …. … protects the privacy of your resource.
resource key
When you write code to access the AI service, the keys and endpoint must be included in the… ….
authentication header
… … is a technique that uses mathematics and statistics to create a model that can predict unknown values.
Machine learning
Mathematically, you can think of machine learning as a way of defining a … (let’s call itf) that operates on one or more,,,of something (which we’ll callx) to calculate a predicted…(y) - like this:
f(x) = y
function, features, label
The specific operation that theffunction performs onxto calculateydepends on a number of factors, including the type of … you’re trying to create and the specific algorithm used to train the model.
model
The… ,,, ,,,approach requires you to start with a datasetwithknown label values. Two types of supervised machine learning tasks include regression and classification.
supervised machine learning
… is used to predict a continuous value; like a price, a sales total, or some other measure.
Regression
… is used to determine a class label; an example of a binary class label is whether a patient has diabetes or not; an example of multi-class labels is classifying text as positive, negative, or neutral.
Classification
The… machine learningapproach starts with a datasetwithoutknown label values. One type of unsupervised machine learning task is clustering.
unsupervised
… is used to determine labels by grouping similar information into label groups; like grouping measurements from birds into species.
Clustering
To use Azure Machine Learning, you first create a…resource in your Azure subscription.
workspace
You can then use this workspace to manage data, code, … , and other artifacts related to your machine learning workloads.
models
After you have created an Azure Machine Learning workspace, you can develop solutions with the Azure Machine Learning service either with developer tools or the Azure Machine Learning studio … …
web portal.
Azure Machine Learning … is a web portal for machine learning solutions in Azure.
studio
Azure Machine Learning includes anautomated machine learningcapability that automatically tries multiple pre-processing techniques and model-training algorithms in ….
These automated capabilities use the power of cloud … to find the best performing supervised machine learning model for your data.
parallel, compute
Automated machine learning allows you to train models without extensive data science or programming knowledge. For people with a data science and programming background, it provides a way to save time and resources by automating algorithm selection and … tuning.
hyperparameter
In Azure Machine Learning, operations that you run are called …. You can configure multiple settings for your job before starting an automated machine learning ….
jobs, run
The run configuration provides the information needed to specify your training … and Azure Machine Learning environment in your run … and run a training job.
script, configuration
You can think of the steps in a machine learning process as:
Prepare data:
Train model:
Evaluate performance:
Deploy a predictive service:
… …: Identify the features and label in a dataset. Pre-process, or clean and transform, the data as needed.
Prepare data:
… …: Split the data into two groups, a training and a validation set. Train a machine learning model using the training data set. Test the machine learning model for performance using the validation data set.
Train model:
… …: Compare how close the model’s predictions are to the known labels.
Evaluate performance:
Deploy a … …: After you train a machine learning model, you can deploy the model as an application on a server or device so that others can use it.
predictive service:
In Azure Machine Learning, data for model training and other operations is usually encapsulated in an object called a… ….
data asset.
The automated machine learning capability in Azure Machine Learning supportssupervisedmachine learning models - in other words, models for which the training data includes known label values. You can use automated machine learning to train models for:
Classification(predicting categories orclasses)
Regression(predicting numeric values)
Time series forecasting(predicting numeric values at a future point in time)
In Automated Machine Learning, you can select configurations for the primary metric, type of model used for training, exit criteria, and … ….
concurrency limits
Additional Machine Learning configuration settings include:
Primary metric
Explain best model
Use all supported models
Blocked models
Training Job Time
Metric score threshold
Max concurrent iterations
… will split data into a training set and a validation set.
AutoML
The best model is identified based on the … metric you specified,
evaluation
If you used … … to stop the job. Thus the “best” model the job generated might not be the best possible model, just the best one found within the time allowed for this exercise.
exit criteria
A technique called…is used to calculate the evaluation metric. After the model is trained using a portion of the data, the remaining portion is used to iteratively test, or …, the trained model. The metric is calculated by comparing the predicted value from the test with the actual known value, or label.
cross-validation, cross-validate
The difference between the predicted and actual value, known as the… , indicates the amount oferrorin the model.
residuals
The performance metric… … … …(RMSE), is calculated by squaring the errors across all of the test cases, finding the mean of these squares, and then taking the square root.
root mean squared error
With root mean squared error, the … this value is, the more accurate the model’s predictions.
smaller
The… root mean squared error(NRMSE) standardizes the RMSE metric so it can be used for comparison between models which have variables on different scales.
normalized
The… …shows the frequency of residual value ranges. Residuals represent variance between predicted and true values that can’t be explained by the model, in other words, errors. You should hope to see the most frequently occurring residual values clustered around zero. You want small errors with fewer errors at the extreme ends of the scale.
Residual Histogram
The… vs. …chart should show a diagonal trend in which the predicted value correlates closely to the true value. The dotted line shows how a perfect model should perform. The closer the line of your model’s average predicted value is to the dotted line, the better its performance. A histogram below the line chart shows the distribution of true values.
Predicted, True
In Azure Machine Learning, you can deploy a service as an … … … (ACI) or to an … … … (AKS) cluster.
Azure Container Instances, Azure Kubernetes Service
For production scenarios, an … deployment is recommended, for which you must create an…. …. … ….
AKS, inference clustercompute target