02-Machine Learning Flashcards
How do machines learn
Machines learn from data
What is Regression
Regression is a form of Machine Learning that predicts a numeric LABEL based on an item’s FEATURES
What type of Machine Learning technique is Regression
Regression is Supervised Machine Learning
What is Supervised Learning
Technique in which you train a model using data that includes both FEATURES and known values for the LABEL, so that the model learns to FIT the FEATURE combinations to the LABEL
What is Classification
Classification is a form of Machine Learning that predicts which category, or class an item belongs to
What type of Machine Learning technique is Classification
Classification is Supervised Machine Learning
What is Clustering
Clustering is a form of Machine Learning that is used to GROUP SIMILAR ITEMS based on their features
What type of Machine Learning technique is Clustering
Clustering is Unsupervised Machine Learning
What is Unsupervised Machine Learning
Unsupervised Machine Learning is where you train a model to separate items into clusters based purely on their characteristics, or FEATURES. There is no previously known cluster value (or LABEL) from which to train the model.
What are Azure Machine Learning services
- AUTOMATED machine learning
- Azure Machine Learning DESIGNER
- Data and compute MANAGEMENT
- PIPELINES
What is Automated Machine Learning
Automate Machine Learning ALLOWS NON-EXPERTS to create an effective machine learning model from data very quick
What is Azure Machine Learning designer
Azure Machine Learning designer is a GUI that allows no-code development of machine learning solutions.
The Designer tool in Azure Machine Learning studio allows you to create and run pipelines by using DRAG & DROP INTERFACE to connect modules that define the steps and data flow for the pipeline.
What is Data and Compute management
Data and Compute management is CLOUD-BASED DATA STORAGE AND COMPUTE resources that professional data scientists can use to run data experiment code at scale.
Scale meaning they can run multiple training experiments in parallel while incurring costs only when actually used.
What are Pipelines
Pipelines are MULTI-STEP WORKFLOWS to
PREPARE data,
TRAIN models,
and perform model MANAGEMENT tasks.
Pipelines allow data scientists, software engineers, and IT operations professionals to do the above.
What is Forecasting
Forecasting is Regression with a TIME-SERIES element
What is Azure Machine Learning
Azure Machine Learning is a CLOUD SERVICE that you can use to TRAIN and MANAGE machine learning models.
You need COMPUTE on which to run the training process.
What is a Workspace
Workspace is CREATED IN AZURE SUBSCRIPTION to use Azure Machine Learning.
It allows you to MANAGE data, compute resources, code, models, and other ARTIFACTS related to your machine learning workloads
What are Compute Targets
Compute Targets are cloud-based resources on which you can run MODEL TRAINING and DATA EXPLORATION processes.
What are four types of Compute Resources you can create
- Compute INSTANCES
- Compute CLUSTERS
- INFERENCE Clusters
- ATTACHED Compute
What are Compute Instances
Compute Instances are DEVELOPMENT WORKSTATIONS that data scientists can use to work with data and models
What are Compute Clusters
Compute Clusters are scalable CLUSTERS OF VIRTUAL MACHINES for on-demand processing of experiment code
What are Inference Clusters
Inference Clusters are DEPLOYMENT TARGETS FOR PREDICTIVE SERVICES that use your trained models
What is Attached Compute
Attached Compute LINKS TO EXISTING AZURE COMPUTE RESOURCES, such as Virtual Machines or Azure Databricks clusters
What is a dataset
Dataset is an object that ENCAPSULATES DATA for model training and other operations
What are experiments
Experiments are OPERATIONS you run in Azure Machine Learning
What is cross-validation
Cross-validation ITERATIVELY tests the trained model with data it wasn’t trained with and compare the predicted value with the actual known value.
What are residuals
Residuals are the difference between the PREDICTED and ACTUAL value.
Amount of ERROR in the model.
This particular performance metric is calculated by SQUARING the errors across all the test cases, finding the MEAN of these squares, and then taking the SQUARE ROOT.
The smaller the value, the more accurate the model is at predicting.
Which are two ways you can deploy a Machine Learning model as a service
- Azure Container Instance (ACI)
2. Azure Kubernetes Service (AKS)
What is Reinforcement Learning
In Reinforcement learning, the algorithm CHOSE AN ACTION IN RESPONSE TO EACH DATA POINT.
Common in robotics where set of sensor readings at one point in time is a data point, and the algorithm must CHOOSE THE ROBOT’S NEXT ACTION. Also for Internet of Things.
Algorithm also RECEIVES A REWARD SIGNAL a short time later, indicating how good the decision was.
Based on this signal, the algorithm modifies its strategy in order to achieve the highest reward.
What type of ML algorithm predicts values
Regression algorithms predict values. Makes forecasts by estimating the relationship between values. Answers questions like: How much or how many?
What ML algorithm finds unusual occurrences
Anomaly Detection algorithm finds unusual occurrences. Identifies and predicts rare or unusual data points
What ML algorithm discovers structure
Clustering algorithm. It separates similar data points into INTUITIVE GROUPS. Answers questions like: How is this organized?
What ML algorithm generates recommendations
Recommenders algorithm predicts what someone will be interested in. Answers the question: What will they be interested in?
What is K-Means
K-Means is a Clustering algorithm, unsupervised learning
What ML algorithm classifies images
Image Classification is ML algorithm that classifies images.
Classifies images with popular networks.
Answers questions like: What does this image represent?
What is a Confusion Matrix
TABULATION of the predicted and actual value counts for each possible class
What are true positives
Both predicted and actual values are TRUE
What are true negatives
Predicted and actual values are FALSE
What are false negatives
Predicted value is FALSE but actual value is TRUE
What are false positives
Predicted value is TRUE but actual value is FALSE
What is accuracy
Ratio of CORRECT predictions (true positive + true negative) to the TOTAL number of predictions
What is precision
True Positives / (True Positives + False Positives). Of all the ones that are PREDICTED positives, which ones were correctly predicted.
What is recall
True Positives / (True Positives + False Negatives). Of all the ones that are ACTUAL positive, which ones were correctly predicted.
What is F1 Score
An overall metric that combines PRECISION and RECALL
What is ROC curve
Received Operator Characteristic.
Plot TRUE POSITIVE rate against FALSE POSITIVE rate. The larger area under the curve, the better the model is performing.
What is AUC
Area under the curve. Metric for binary classification.
What are centriods
Centroids are randomly initialized K COORDINATES
What is average distance to other center
This indicates HOW CLOSE, on average, EACH POINT in the cluster is TO THE CENTRIOD of all other clusters
What is average distance to cluster center
This indicates how close, on average, each point in the cluster is to the centroid of the cluster
What is number of points
The number of points assigned to the cluster
What is maximal distance to cluster center
The maximum of the distances between each POINT and the CENTROID of the point’s cluster.
If this number is high, the cluster may be widely dispersed.
This statistic in combination with the Average Distance to the Cluster Center helps you determine the cluster’s spread.
What question does Anomaly Detection answer
Is this weird?
What question does Image Classification answer
What dose this image represent?
What question does Two-Class Classification answer
Is this A or B?
What question does Multiclass Classification answer
Is this A or B or C or D?
What question does Recommenders answer
What will they be interested in?
What question does Text Analytics answer
What info is in this text?
What question does Regression answer
How much or how many?