Developing Machine Learning Solutions Flashcards
In this machine learning course, you will learn about the machine learning lifecycle, and how to use AWS services at every stage.
What is the ML development lifecycle?
Machine learning (ML) lifecycle refers to the end-to-end process of developing, deploying, and maintaining machine learning models.
The end-to-end machine learning lifecycle process includes the following phases:
- Business goal identification
ML problem framing - Data processing (data collection, data preprocessing, and feature engineering)
- Model development (training, tuning, and evaluation)
- Model deployment (inference and prediction)
- Model monitoring
- Model retraining
What is Amazon SageMaker?
Amazon SageMaker is a fully managed ML service. In a single unified visual interface, you can perform the following tasks:
- Collect and prepare data.
- Build and train machine learning models.
- Deploy the models and monitor the performance of their predictions.
What are some ways to use Amazon SageMaker?
The following are ways to use SageMaker to build your ML model:
- Pre-trained models require the least effort and are models ready to deploy or to fine-tune and deploy using SageMaker JumpStart.
- Built-in models available in SageMaker require more effort and scale if the dataset is large and significant resources are needed to train and deploy the model.
- If there is no built-in solution that works, try to develop one that uses pre-made images for machine learning and deep learning frameworks for supported frameworks such as scikit-learn, TensorFlow, PyTorch, MXNet, or Chainer.
- You can build your own custom Docker image that is configured to install the necessary packages or software
What is bias in ML?
A bullseye is a nice analogy because, generally speaking, the center of the bullseye is where you aim your darts. The center of the bullseye in this situation is the label or target—it predicts the value of your model—and each dot is a result that your model produced during training.
Think about bias as the gap between your predicted value and the actual value, whereas variance describes how dispersed your predicted values are.
In ML, the ideal algorithm has low bias and can accurately model the true relationship. The ideal algorithm also has low variability, by producing consistent predictions across different datasets.
What is variance in ML?
Think about bias as the gap between your predicted value and the actual value, whereas variance describes how dispersed your predicted values are. In ML, the ideal algorithm has low bias and can accurately model the true relationship, and it has low variability by producing consistent predictions across different datasets
What are the metrics used to evaluate classification?
- Accuracy
- Precision
- Recall
- F1
- AUC-ROC
What are the metrics used to evaluate regression?
- Mean squared error
- R squared
What are the elements of a confusion matrix?
True positive (TP)
If the actual label or class is “cat,” which is identified as “P” for positive in the confusion matrix, and the predicted label or class is also “cat,” then you have a true positive result. This is a good outcome for your model.
True Negative (TN)
Similarly, if you have an actual label of “not cat,” which is identified as “N” for negative in the confusion matrix, and the predicted label or class is also “not cat,” then you have a true negative. This is also a good outcome for your model. In both cases, your model predicted the correct outcome when using the testing data.
False positive (FP)
This is less than ideal and is when the actual class is negative, so “not cat,” but the predicted class is positive, so “cat.” This is called a false positive because the prediction is positive but incorrect.
False negative (FN)
This is also less than ideal. A false negative occurs when the actual class is positive, so “cat,” but the predicted class is negative, so “not cat.”
What is accuracy in ML?
Accuracy
Formula for accuracy : tp + tn divided by tp + tn + fp + fn
Calculation for a model’s accuracy
To calculate the model’s accuracy, also known as its score, add up the correct predictions and then divide that number by the total number of predictions.
What is precision in ML?
Precision
Precision removes the negative predictions from the picture. Precision is the proportion of positive predictions that are actually correct. You can calculate it by taking the true positive count and dividing it by the total number of positives.
Formula for precision: tp divided by tp + fp
Calculation for precision
When the cost of false positives are high in your particular business situation, precision can be a good metric. Think about a classification model that identifies emails as spam or not. In this case, you do not want your model labeling a legitimate email as spam and preventing your users from seeing that email.
What is recall in ML?
Recall
In addition to precision, there is also recall (or sensitivity). In recall, you are looking at the proportion of correct sets that are identified as positive. Recall is calculated by dividing the true positive count by the sum of the true positives and false negatives. By looking at that ratio, you get an idea of how good the algorithm is at detecting, for example, cats.
Formula for precision: tp divided by tp + fn
Calculation for recall
Think about a model that needs to predict whether a patient has a terminal illness or not. In this case, using precision as your evaluation metric does not account for the false negatives in your model. It is extremely important and vital to the success of the model that it not give false negative results. A false negative would be not identifying a patient as having a terminal illness when the patient actually does have a terminal illness. In this situation, recall is a better metric to use.
What is AUC-ROC in ML?
AUC-ROC
Area under the curve-receiver operator curve (AUC-ROC) is another evaluation metric. ROC is a probability curve, and AUC represents the degree or measure of separability.
AUC-ROC uses sensitivity (true positive rate) and specificity (false positive rate)
In general, AUC-ROC can show what the curve for true positive compared to false positive looks like at various thresholds. That means that when you calculate the AUC-ROC curve, you plot multiple confusion matrices at different thresholds and compare them to one another to find out the threshold you need for your business use case.
What is mean squared error?
Mean squared error
The general purpose of mean squared error (MSE) is the same as the classification metrics. You determine the prediction from the model and compare the difference between the prediction and the actual outcome.
Calculation for mean squared error
More specifically, you take the difference between the prediction and actual value, square that difference, and then sum up all the squared differences for all the observations.
The smaller the MSE, the better the model’s predictive accuracy.
What is R squared?
R squared
R squared is another commonly used metric with linear regression problems. R squared explains the fraction of variance accounted for by the model. It’s like a percentage, reporting a number from 0 to 1. When R squared is close to 1, it usually indicates that a lot of the variance in the data can be explained by the model itself.
MSE focuses on the average squared error of the model’s predictions to provide a measure of model performance. R squared provides a measure of the model’s goodness of fit to the data. Both are important but provide different perspectives.
What is model deployment?
Model deployment is the integration of the model and its resources into a production environment so that it can be used to create predictions.