Evaluation Flashcards
What is Evaluation?
Model Evaluation is the last stage of AI project development cycle. Evaluation is the process of understanding the reliability of an AI model, based on outputs by feeding test dataset into the model and comparing with the actual answers. Model Evaluation is an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model will work in the future. There can be different evaluation techniques, depending on the type and purpose of the model.
What is underfitting?
When the model’s output does not match the true function at all, the model is said to be underfitting and its accuracy is lower.
What is a perfect fit?
The model’s performance matches well with the true function which states that the model has optimum accuracy and the model is called a perfect fit.
What is overfitting?
The model performance tries to cover all the data samples even if they do not align with the true function. The model performance has a lower accuracy. It’s not recommended to use the data we used to build the model to evaluate it as the model will simply remember the whole training set and predict the correct label for any point in the training set. This is known as overfitting.
What is Prediction and Reality?
To understand the efficiency of an AI model, we need to understand if the predictions which it makes are correct or not. Thus, there exists two conditions which we need to ponder upon: Prediction and Reality. The Prediction is the output given by the machine and the Reality is the real scenario on which the prediction has been made.
What is a confusion matrix?
The result of comparison between the prediction and reality can be recorded in what we call the confusion matrix. The confusion matrix allows us to understand the prediction results. It is not an evaluation metric but a record which can help in evaluation. Prediction and Reality can be mapped together with the help of this confusion matrix.
What are the formulas for the different evaluation methods?
Accuracy: (TP +TN)/(TP +TN + FP + FN) * 100
Precision: TP/TP + FP * 100
Recall: TP/TP + FN * 100
F1 Score: 2 * Precision*Recall/ Precision+ Recall
In which cases is False Negative risky?
In a case like forest fire, a false negative can cost us a lot and is risky too. Imagine no alert being given even when there is a forest fire. The entire forest would burn down. Another case where a false negative can be risky is a Viral Outbreak. Imagine a deadly virus has started spreading and the model which is supposed to predict a viral outbreak does not detect it. The virus might spread widely and infect a lot of people.
In what cases is False Positive risky?
False Positive condition can cost us a lot in cases such as Mining. Imagine a model telling you that there exists a treasure at a point and you keep on digging but it turns out that it was a false alarm. Here, false positive case can be very costly.
What is Prediction and Reality?
To understand the efficiency of an AI model we need to check if the predictions which it makes are correct or not. Thus, there exists to conditions to ponder upon: Prediction and Reality. The Prediction is the output given by the machine and the Reality is the real scenario on which the prediction has been made.
What is a confusion matrix?
The result of comparison between prediction and reality can be recorded in what we call the confusion matrix. The confusion matrix allows us to understand the prediction results. It is not an evaluation metric but a record which can help in evaluation. Prediction and reality can be mapped together with the help of confusion matrix.
List the formulas for each evaluation metric.
Accuracy: (TP+TN)/(TP+TN+FP+FN) * 100
Precision: TP/TP+FP
Recall: TP/TP+FN
F1 Score: 2* Precision*Recall/Precision + Recall
When is false negative case risky?
[Choosing between precision and recall depends upon the condition in which the model has been deployed]. In case like forest fire, a false negative can cost us a lot and is risky too. Imagine no alert being given even when there is a forest fire. The whole forest might burn down. Another case where a false negative can be dangerous is a viral outbreak. Imagine a deadly virus has started spreading and the model which is supposed to predict a viral outbreak does not detect it. The virus might spread widely and infect a lot of people.
When is false positive case risky?
There can be cases where false positive condition is more risky than false negatives. One such case is Mining. Imagine a model telling you that there exists a treasure at a point and you keep on digging but it turns out that it was a false alarm. Here, false positive case (predicting that there is a treasure but no treasure) can be very costly. Another case is a model which predicts if a mail is spam or not. If the model always predicts that the mail is spam, then people would not look at it and eventually might lose important information. Here, false positive condition (predicting the mail as spam when the mail is not spam) would have a high cost.
What is F1 Score? What is an ideal situation in F1 Score?
If we want to know is our model’s performance is good, we need both Precision and Recall. For some cases, you might have high precision but low recall or low precision but high recall. But since both the measures are important, there is a need of a parameter which takes both precision and recall into account. F1 Score can be defined as the measure of balance between precision and recall. An ideal situation would be when we have a value of 1 (that is 100%) for both Precision and Recall. In that case, the F1 Score would also be an ideal 1 (100%). It is known as the perfect value for F1 Score. As the values of both precision and recall ranges from 0 to 1, the F1 Score also ranges from 0 to 1. Thus, we can say that a model has a good performance if the F1 Score for that model is high.
What is the importance of evaluation?
Evaluation is a process that critically examines a program. It involves collecting and analyzing information about a program’s activities, characteristics and outcomes.
What are the advantages of evaluating a model?
- Evaluation ensures that a model is working correctly and optimally.
- It is an initiative to understand how well it achieves its goals.
- It helps to determine what works well and what could be improved in a program.
What are the reasons for inefficiency of an AI model?
- Lack of training data
- Unauthenticated data/ Wrong data
- Inefficient coding/ Wrong algorithms
- Less accuracy
- Not tested
- Not easy
What is accuracy?
Accuracy is defined as the percentage of correct predictions out of all the observations. A prediction is said to be correct if it matches with the reality. Here, we have two conditions where the prediction matches with the reality: True Positive and True Negative.
Is accuracy measure sufficient for evaluation?
Assume that the model always predicts that there is no fire. But in reality, there is a 2% chance of a fire breaking out. In this case, for 98 cases, the model will be right but for those two cases where there was a forest fire, then too the model predicted no fire. Here:
True Positives = 0
True Negatives = 98
Total observations = 100
Thus, the Accuracy = (98+0)/100 *100 = 98%
This is a fairly high accuracy for an AI model. But this parameter is useless for us as the actual cases where the fire broke out is not taken into account. Hence, there is a need to look at another parameter which takes account of such cases as well.
What is precision?
Precision is defined as the number of true positive cases versus all the cases where the prediction was positive. That is, it takes into account True Positives and False Positives. In the case of forest fire, assume that the model always predicts that there is a forest fire irrespective of the reality. In this case, all the positive conditions would be taken into account that is, True Positive and False Positive. In this case, the firefighters will have to always check if the model’s prediction was a false alarm or not.
Is good precision equivalent to good model performance?
In case of forest fire, if the precision is low (meaning there is more false alarms than actual cases) then the firefighters would get complacent and might not go and check every time considering it might be a false alarm. This makes precision an important criteria. If the precision is high, this means the true positive cases are more, giving lesser false alarms.
What is Recall?
Another parameter for evaluating the model’s performance is Recall. It can be defined as the fraction of positive cases that are correctly identified. It majorly takes into account the true reality cases where in Reality there was a fire but the machine either detected it correctly or it didn’t. That is, it considers True Positives (There was a forest fire in reality and the model predicted a forest fire) and False Negatives (There was a forest fire and the model didn’t predict it).