There are several metrics we can use for evaluating a machine learning (ML) model. One of them is accuracy.
We compute the accuracy of a model, by running a test set against the model and looking at the proportion of the total samples that the model correctly classified.
I will try to show the shortcomings of the accuracy metric in a medical setting, where it might have a serious impact on the patients' fate.
I will also cover some other metrics including sensitivity, specificity, predictive values, and the ROC curve on other post as I want to keep this one compact.
Let's work with an example in the figure below so that we can illustrate the computation of accuracy.
let’s assume a test set of 10 tumor tissues labeled as benign (green) or malignant (red).
As the ground truth says, there are 3 malignant and 7 benign samples in the test set.
Let's assume we built a model (Model 1) and tested it against our test set.
Model 1 predicts all samples benign. That is, three malignant samples are classified as benign erroneously. On the other hand, seven samples are correctly classified. That is, Model 1's accuracy is 7 out of 10, which equals 0.7, although it is not really classifying.
It is definitely not a useful model, but it gets all the benign samples right.
Let's assume you built another model, i.e. Model 2.
Model 2 correctly classifies five samples as benign and two samples as malignant. In total, seven samples are classified right, which makes the accuracy 0.7 for Model 2 as well.
As a result, we have two ML models with an accuracy of 0.7.
I think one can safely say that Model 2 may be of more use than Model 1.
In summary, Model 1 would let go of cancer patients undetected while Model 2 would catch at least two of them although both models have the same accuracy scores.
Thank you for reading this post. If you have anything to say/object/correct, please drop a comment down below.
Comments