Identify common machine learning techniques Flashcards
Machine Learning
The fundamental idea of machine learning is to use data from past observations to predict unknown outcomes or values.
A machine learning model is a software application that encapsulates a function to calculate an output value based on one or more input values.
Types of Machine Learning
Supervised machine learning is a general term for machine learning algorithms in which the training data includes both feature values and known label values.
-Regression is a form in which the label predicted by the model is a numeric value.
-Classification is a form in which the label represents a categorization, or class.
–Binary classification models predict one of two mutually exclusive outcomes. (true/false)
–Multiclass classification predicts a label that represents one of multiple possible classes. (genre of a movie)
Unsupervised machine learning involves training models using data that consists only of feature values without any known labels.
-A clustering algorithm identifies similarities between observations based on their features, and groups them into discrete clusters.
(Identify groups of similar customers based on demographic attributes)
Regression Evaluation Metrics
Mean Absolute Error (MAE) = The variance in this example indicates by how many ice creams each prediction was wrong.
Mean Squared Error (MSE) = Measures the amount of error in statistical models.
Root Mean Squared Error (RMSE) = Measures the error of a model in predicting quantitative data.
Coefficient of determination (R2) = Is a number between 0 and 1 that measures how well a statistical model predicts an outcome
Binary classification Evaluation Metrics
Accuracy = Proportion of predictions that the model got right. (overall correctness, 80%)
Recall = Proportion of positive cases that the model identified correctly. (successful detection of a specific category, 75%)
Precision = Proportion of predicted positive cases where the true label is actually positive. (what proportion of the patients predicted by the model to have diabetes actually have diabetes?)
Multiclass classification - Algorithms
One-vs-Rest algorithms = Train a binary classification function for each class, each calculating the probability that the observation is an example of the target class.
Multinomial algorithms = creates a single function that returns a multi-valued output.
Regardless of which type of algorithm is used, the model uses the resulting function to determine the most probable class for a given set of features (x) and predicts the corresponding class label (y).
Clustering - Metrics
-Average distance to cluster center: How close, on average, each point in the cluster is to the centroid of the cluster.
-Average distance to other center: How close, on average, each point in the cluster is to the centroid of all other clusters.
-Maximum distance to cluster center: The furthest distance between a point in the cluster and its centroid.
-Silhouette: A value between -1 and 1 that summarizes the ratio of distance between points in the same cluster and points in different clusters (The closer to 1, the better the cluster separation).
Deep Learning
Deep learning is an advanced form of machine learning that tries to emulate the way the human brain learns.
Artificial neural networks are made up of multiple layers of neurons - essentially defining a deeply nested function. It learns continuously by using corrective feedback loops to improve their predictive analytics.
The purpose of a loss function is to evaluate the aggregate difference between predicted and actual label values
How training and validation datasets are used in machine learning
The process of training and validation involves iteratively adjusting the model’s parameters using optimization algorithms (such as gradient descent) based on the training dataset.
The validation dataset provides an unbiased evaluation during training, and the test dataset offers a final assessment of the model’s performance. The goal is to build a model that not only fits the training data well but also generalizes effectively to new, unseen data.
Identify features and labels in a dataset for machine learning
In this scenario, the features (x) provide information about each email, and the label is the classification that the model is trying to predict.
During the training process, the model learns the relationships between the features and labels from the training dataset. Once trained, the model can use this knowledge to make predictions on new data by evaluating the input features and providing the corresponding predicted labels.
Azure Machine Learning
Azure Machine Learning is a cloud service for training, deploying, and managing machine learning models.
-Centralized storage and management of datasets for model training and evaluation.
-On-demand compute resources
-Automated machine learning (AutoML) (Automatically run multiple training jobs using different algorithms and parameters to find the best model)
-Visual tools to define orchestrated pipelines
-Integration with common machine learning frameworks (MLflow)
-Built-in support
-The primary resource required for Azure Machine Learning is an Azure Machine Learning workspace
-Azure Machine Learning studio is a browser-based portal for managing your machine learning resources and jobs.