{"id":1141,"date":"2025-01-13T07:02:33","date_gmt":"2025-01-13T07:02:33","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/01\/13\/your-classifier-is-broken-but-it-is-still-useful-bcc4e0636664\/"},"modified":"2025-01-13T07:02:33","modified_gmt":"2025-01-13T07:02:33","slug":"your-classifier-is-broken-but-it-is-still-useful-bcc4e0636664","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/01\/13\/your-classifier-is-broken-but-it-is-still-useful-bcc4e0636664\/","title":{"rendered":"Your Classifier Is Broken, But It Is Still Useful"},"content":{"rendered":"<p>    Your Classifier Is Broken, But It Is Still Useful<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p>When you run a binary classifier over a population you get an estimate of the proportion of true positives in that population. This is known as the <em>prevalence<\/em>.<\/p>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1024\/0*FkUEf10n-mUZGnwo\"><figcaption>Photo by <a href=\"https:\/\/unsplash.com\/@rodlong?utm_source=medium&amp;utm_medium=referral\">Rod Long<\/a> on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\">Unsplash<\/a><\/figcaption><\/figure>\n<p>But that estimate is <em>biased<\/em>, because no classifier is perfect. For example, if your classifier tells you that you have 20% of positive cases, but its precision is known to be only 50%, you would expect the true prevalence to be 0.2 \u00d7 0.5 = 0.1, i.e. 10%. But that\u2019s assuming perfect recall (all true positives are flagged by the classifier). If the recall is less than 1, then you know the classifier missed some true positives, so you <em>also<\/em> need to normalize the prevalence estimate by the\u00a0recall.<\/p>\n<p>This leads to the common formula for getting the true prevalence Pr(y=1) from the positive prediction rate\u00a0Pr(\u0177=1):<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/556\/1%2AmXewU1XTK0tXzUE-RZAlbw.png?ssl=1\"><\/figure>\n<p>But suppose that you want to run the classifier more than once. For example, you might want to do this at regular intervals to detect trends in the prevalence. You can\u2019t use this formula anymore, because <em>precision depends on the prevalence<\/em>. To use the formula above you would have to re-estimate the precision regularly (say, with human eval), but <a href=\"https:\/\/stats.stackexchange.com\/questions\/273237\/estimating-prevalence-from-a-classifiers-precision-and-recall\">then you could just as well also re-estimate the prevalence itself<\/a>.<\/p>\n<p>How do we get out of this circular reasoning? It turns out that binary classifiers have other performance metrics (besides precision) that do not depend on the prevalence. These include not only the recall <em>R<\/em> but also the specificity S, and these metrics can be used to adjust Pr(\u0177=1) to get an unbiased estimate of the true prevalence using this formula (sometimes called <em>prevalence adjustment<\/em>):<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/512\/1%2AZH6ijmmuU52dtKglj68lhA.png?ssl=1\"><\/figure>\n<p>where:<\/p>\n<ul>\n<li>Pr(y=1) is the true prevalence<\/li>\n<li>\n<em>S<\/em> is the specificity<\/li>\n<li>\n<em>R<\/em> is the sensitivity or\u00a0recall<\/li>\n<li>Pr(\u0177=1) is the proportion of positives<\/li>\n<\/ul>\n<p>The proof is straightforward:<\/p>\n<figure><img data-recalc-dims=\"1\" decoding=\"async\" alt=\"\" src=\"https:\/\/i0.wp.com\/cdn-images-1.medium.com\/max\/1024\/1%2AEabcO0JgZnoDt5h4jFUfGg.png?ssl=1\"><\/figure>\n<p>Solving for Pr(y = 1) yields the formula\u00a0above.<\/p>\n<p>Notice that this formula breaks down when the denominator <em>R<\/em>\u200a\u2014\u200a(1\u200a\u2014\u200a<em>S<\/em>) becomes 0, or when recall becomes equal to the false positive rate 1-<em>S<\/em>. But remember what a typical ROC curve looks\u00a0like:<\/p>\n<figure><img decoding=\"async\" alt=\"\" src=\"https:\/\/cdn-images-1.medium.com\/max\/500\/0*G5OPX2M1dcS7s3K1\"><figcaption>From <a href=\"https:\/\/en.wikipedia.org\/wiki\/Receiver_operating_characteristic#\/media\/File:Roccurves.png\">https:\/\/en.wikipedia.org\/wiki\/Receiver_operating_characteristic#\/media\/File:Roccurves.png<\/a><\/figcaption><\/figure>\n<p>An ROC curve like this one plots recall <em>R<\/em> (aka true positive rate) against the false positive rate 1-<em>S<\/em>, so a classifier for which <em>R<\/em> = (1-<em>S<\/em>) is a classifier falling on the diagonal of the ROC diagram. This is a classifier that is, essentially, guessing randomly. True cases and false cases are equally likely to be classified positively by this classifier, so the classifier is completely non-informative, and you can\u2019t learn anything from it\u2014and certainly not the true prevalence.<\/p>\n<p>Enough theory, let\u2019s see if this works in practice:<\/p>\n<pre># randomly draw some covariate<br>x &lt;- runif(10000, -1, 1)<br> <br># take the logit and draw the outcome<br>logit &lt;- plogis(x)<br>y &lt;- runif(10000) &lt; logit<br> <br># fit a logistic regression model  <br>m &lt;- glm(y ~ x, family = binomial)<br> <br># make some predictions, using an absurdly low threshold<br>y_hat &lt;- predict(m, type = \"response\") &lt; 0.3<br> <br># get the recall (aka sensitivity) and specificity<br>c &lt;- caret::confusionMatrix(factor(y_hat), factor(y), positive = \"TRUE\")<br>recall &lt;- unname(c$byClass['Sensitivity'])<br>specificity &lt;- unname(c$byClass['Specificity'])<br> <br># get the adjusted prevalence<br>(mean(y_hat) - (1 - specificity)) \/ (recall - (1 - specificity))<br> <br># compare with actual prevalence<br>mean(y)<\/pre>\n<p>In this simulation I get recall = 0.049 and specificity = 0.875. The predicted prevalence is a ridiculously biased 0.087, but the adjusted prevalence is essentially equal to the true prevalence (0.498).<\/p>\n<p>To sum up: this shows how, using a classifier\u2019s recall and specificity, you can adjusted the predicted prevalence to track it over time, assuming that recall and specificity are stable over time. <em>You cannot do this using precision and recall<\/em> because precision depends on the prevalence, whereas recall and specificity don\u2019t.<\/p>\n<p><em>Originally published at <\/em><a href=\"https:\/\/davidlindelof.com\/estimating-the-true-prevalence-from-a-biased-classifier\/\"><em>https:\/\/davidlindelof.com<\/em><\/a><em> on January 8,\u00a02025.<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/medium.com\/_\/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=bcc4e0636664\" width=\"1\" height=\"1\" alt=\"\"><\/p>\n<hr>\n<p><a href=\"https:\/\/towardsdatascience.com\/your-classifier-is-broken-but-it-is-still-useful-bcc4e0636664\">Your Classifier Is Broken, But It Is Still Useful<\/a> was originally published in <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    David Lindel\u00f6f<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/medium.com\/m\/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fyour-classifier-is-broken-but-it-is-still-useful-bcc4e0636664\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Your Classifier Is Broken, But It Is Still Useful When you run a binary classifier over a population you get an estimate of the proportion of true positives in that population. This is known as the prevalence. Photo by Rod Long on\u00a0Unsplash But that estimate is biased, because no classifier is perfect. For example, if [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,640,837,83,1260,1261],"tags":[267,899,1262],"class_list":["post-1141","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-bias","category-classification","category-data-science","category-prevalence","category-roc","tag-but","tag-classifier","tag-prevalence"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1141"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1141"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1141\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1141"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1141"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1141"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}