Why Accuracy may not be the Best Metric to Evaluate Models (in Medical Settings)?

There are several metrics we can use for evaluating a machine learning (ML) model. One of them is accuracy.

We compute the accuracy of a model, by running a test set against the model and looking at the proportion of the total samples that the model correctly classified.

I will try to show the shortcomings of the accuracy metric in a medical setting, where it might have a serious impact on the patients' fate.

I will also cover some other metrics including sensitivity, specificity, predictive values, and the ROC curve on other post as I want to keep this one compact.

Let's work with an example in the figure below so that we can illustrate the computation of accuracy.

let’s assume a test set of 10 tumor tissues labeled as benign (green) or malignant (red).

As the ground truth says, there are 3 malignant and 7 benign samples in the test set.

Let's assume we built a model (Model 1) and tested it against our test set.

Model 1 predicts all samples benign. That is, three malignant samples are classified as benign erroneously. On the other hand, seven samples are correctly classified. That is, Model 1's accuracy is 7 out of 10, which equals 0.7, although it is not really classifying.

It is definitely not a useful model, but it gets all the benign samples right.

Let's assume you built another model, i.e. Model 2.

Model 2 correctly classifies five samples as benign and two samples as malignant. In total, seven samples are classified right, which makes the accuracy 0.7 for Model 2 as well.

As a result, we have two ML models with an accuracy of 0.7.

I think one can safely say that Model 2 may be of more use than Model 1.

In summary, Model 1 would let go of cancer patients undetected while Model 2 would catch at least two of them although both models have the same accuracy scores.

Thank you for reading this post. If you have anything to say/object/correct, please drop a comment down below.

The Good Class

by Fuat Akal

Why Accuracy may not be the Best Metric to Evaluate Models (in Medical Settings)?

Recent Posts

Comments

Never Miss a Post. Subscribe Now!