{"id":1925,"date":"2025-02-19T07:03:18","date_gmt":"2025-02-19T07:03:18","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/02\/19\/honestly-uncertain\/"},"modified":"2025-02-19T07:03:18","modified_gmt":"2025-02-19T07:03:18","slug":"honestly-uncertain","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/02\/19\/honestly-uncertain\/","title":{"rendered":"Honestly Uncertain"},"content":{"rendered":"<p>    Honestly Uncertain<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\" id=\"9049\"><strong>Ethical issues aside, should you be honest when asked how certain you are about some belief? Of course,\u00a0<em>it depends<\/em>. In this blog post, you\u2019ll learn on what.<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Different ways of evaluating probabilistic predictions come with dramatically different degrees of \u201coptimal honesty\u201d.<\/li>\n<li class=\"wp-block-list-item\">Perhaps surprisingly, the linear function that assigns +1 to true and fully confident statements, 0 to admitted ignorance and -1 to wrong but fully confident statements incentivizes exaggerated, dishonest boldness. If you rate forecasts that way, you\u2019ll be surrounded by self-important fools and suffer from badly calibrated machine forecasts.<\/li>\n<li class=\"wp-block-list-item\">If you want people (or machines) to give their truly unbiased and honest assessment, your scoring function should penalize confident but wrong convictions more strongly than it rewards confident correct ones.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\" id=\"6d06\"><strong>A probabilistic quiz game<\/strong><\/h2>\n<p class=\"wp-block-paragraph\" id=\"6dcb\">David Spiegelhalter\u2019s new (as of 2025) fantastic book, \u201c<em>The Art of Uncertainty<\/em>\u201d \u2013 a must-read for everyone who deals with probabilities and their communication \u2013 features a short section on scoring rules. Spiegelhalter walks the reader through the quadratic scoring rule, and briefly mentions that a linear scoring rule will lead to dishonest behavior. I elaborate on that interesting point in this blog post.<\/p>\n<p class=\"wp-block-paragraph\" id=\"8151\">Let\u2019s set the stage: Just like in so many other scenarios and paradoxes, you find yourself in a TV show (yes, what an old-fashioned way to start). You have the opportunity to answer questions on common knowledge and win some cash. You are asked yes\/no-questions that are expressed in a binary fashion, such as:\u00a0<em>Is the area of France larger than the area of Spain? Was Marie Curie born earlier than Albert Einstein? Is Montreal\u2019s population larger than Kyoto\u2019s?<\/em><\/p>\n<p class=\"wp-block-paragraph\" id=\"c3b8\">Depending on your background, these questions might be obvious for you, or they might be difficult. In any case, you will have a subjective \u201c<em>best guess<\/em>\u201d in mind, and some degree of certainty. For example, I feel comfortable answering the first, slightly less for the second, and I already forgot the answer to the third, even though I looked it up to build the example. You might experience a similar level of confidence, or a very different one. Degrees of certainty are, of course, subjective.<\/p>\n<p class=\"wp-block-paragraph\" id=\"d2de\">The twist of the quiz: You are not supposed to give a binary yes\/no-answer as in a multiple-choice test, but to honestly communicate your degree of conviction, that is, to produce the probability that you personally assign to the true answer being \u201cyes\u201d. The number 0 then means \u201cdefinitely not\u201d, 1 expresses \u201cdefinitely yes\u201d, and 0.5 reflects the degree of uncertainty corresponding to the toss of a fair coin \u2014 you then have absolutely no idea. Let\u2019s call\u00a0<strong><em>P(A)<\/em><\/strong>\u00a0your true subjective conviction that statement\u00a0<strong><em>A<\/em><\/strong>\u00a0is true. That probability can take any value between 0 and 1, whereas\u00a0<strong><em>A<\/em><\/strong>\u00a0is bound to be\u00a0<strong><em>either<\/em><\/strong>\u00a00 or 1. You can then communicate that number, but you don\u2019t have to, so we\u2019ll call\u00a0<strong>Q(A)<\/strong>\u00a0the probability that you eventually express in that quiz.<\/p>\n<p class=\"wp-block-paragraph\" id=\"71f0\">In general, not every probabilistic expression\u00a0<strong><em>Q<\/em><\/strong>\u00a0is met with the same excitement, because humans generally dislike uncertainty. We are much happier with the expert that gives us \u201c99.99%\u201d or \u201c0.01%\u201d probabilities for something to be or not to be the case, and we favor them considerably over the experts producing \u201c25%\u201d and \u201c75%\u201d\u00a0<em>maybe-ish<\/em>\u00a0assessments. From a rational perspective, more informative probabilities (\u201csharp predictions\u201d, close to 0 or close to 1) are favorable over uninformative ones (\u201cunsharp predictions\u201d, close to 0.5). However, a modest but truthful prediction is still worth more than a bold but unreliable one that would make you go all-in. We should therefore ensure that people do not lie about their degree of conviction, so that really 99% of the \u201c99%-sure\u201d predictions are actually true, 12% or the \u201c12%-sure\u201d, and so on. How can the quiz master ensure that?<\/p>\n<h2 class=\"wp-block-heading\" id=\"c445\">The Linear Scoring Rule<\/h2>\n<p class=\"wp-block-paragraph\" id=\"71b8\">The most straightforward way that one might come up with to judge probabilistic statements is to use a linear scoring rule: In the best case, you are very confident and right, which means\u00a0<strong><em>Q(A)=P(A)<\/em>=1<\/strong>\u00a0and\u00a0<strong>A<\/strong>\u00a0is true, or\u00a0<strong><em>Q(A)=P(A)<\/em>=0<\/strong>\u00a0and\u00a0<strong>A<\/strong>\u00a0is false. We then add the score\u00a0<strong>+1=r(Q=1, A=1)=r(Q=0, A=0)<\/strong>\u00a0to the balance. In the worst case, you were very sure of yourself, but wrong; that is,\u00a0<strong><em>Q(A)=P(A)<\/em>=1<\/strong>\u00a0while\u00a0<strong><em>A<\/em><\/strong>\u00a0is false, or\u00a0<strong><em>Q(A)=P(A)<\/em>=0<\/strong>\u00a0while\u00a0<strong><em>A<\/em><\/strong>\u00a0is true. In that unfortunate case, we subtract \u2013<strong>1=r(Q=1, A=0)=r(Q=0, A=1)<\/strong>\u00a0from the score. Between these extreme cases, we draw a straight line. When you express maximal uncertainty via\u00a0<strong><em>Q(A)<\/em>=0.5<\/strong>, we have\u00a0<strong>0=r(Q=0.5, A=1)=r(Q=0.5, A=0)<\/strong>, and neither add nor subtract anything.<\/p>\n<p class=\"wp-block-paragraph\" id=\"5bfa\">The functional form of this linear reward function is not particularly spectacular, but its visualization will come handy in the following:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"fbfbfb\" data-has-transparency=\"false\" style=\"--dominant-color: #fbfbfb;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"661\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_KGSnNcCuRQh3Hyj7tzZU2w-1024x661.png?resize=1024%2C661&#038;ssl=1\" alt=\"\" class=\"wp-image-598071 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_KGSnNcCuRQh3Hyj7tzZU2w-1024x661.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_KGSnNcCuRQh3Hyj7tzZU2w-300x194.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_KGSnNcCuRQh3Hyj7tzZU2w-768x496.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_KGSnNcCuRQh3Hyj7tzZU2w.png 1248w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Linear scoring function: You are rewarded +1 for being very sure about your true belief, subtracted -1 when being equally sure about a wrong belief, you don\u2019t get any reward nor punishment when you are openly ignorant with Q=0.5. Image by the author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"995c\">No surprise here: If\u00a0<strong><em>A<\/em><\/strong>\u00a0is true, the best thing you could have done is to communicate \u201c<strong>Q=1<\/strong>\u201d, if\u00a0<strong><em>A<\/em><\/strong>\u00a0is false, the best strategy would have been to produce \u201c<strong>Q=0<\/strong>\u201d. That\u2019s what is visualized by the black dots: They point to the largest value that the reward function can attain for the particular value of the truth. That\u2019s a good start.<\/p>\n<p class=\"wp-block-paragraph\" id=\"fad3\">But you typically do\u00a0<strong><em>not<\/em><\/strong>\u00a0know with absolute certainty whether the answer is \u201c<em>yes, A is true<\/em>\u201d or \u201c<em>no, A is false<\/em>\u201d, you only have a subjective gut feeling. So what should you do? Should you just be honest and communicate your true belief, e.g.\u00a0<strong><em>P<\/em>=0.7<\/strong>\u00a0or\u00a0<strong><em>P<\/em>=0.1<\/strong>?<\/p>\n<p class=\"wp-block-paragraph\" id=\"d51f\">Let\u2019s set ethics aside, and consider the reward that we want to maximize. It then turns out that you should not be honest.\u00a0<strong>When evaluated via the linear scoring rule, you should lie, and communicate Q<em>(A)<\/em>=0 when P<em>(A)<\/em>&lt;0.5 and Q<em>(A)<\/em>=1 when P<em>(A)<\/em>&gt;0.5.<\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"84e1\">To see this surprising result, let\u2019s compute the\u00a0<strong><em>expectation value<\/em><\/strong>\u00a0of the reward function, assuming that your belief is, on average, correct (cognitive psychology teaches us that this is an unrealistically optimistic assumption in the first place, we\u2019ll come back to that below). That is, we assume that in about 70% of the cases when you say\u00a0<strong><em>P<\/em>=0.7<\/strong>, the true answer is \u201c<em>yes, A is true<\/em>\u201d, in about 75% of the cases when you say\u00a0<strong><em>P<\/em>=0.25<\/strong>, the true answer is \u201c<em>no, A is false<\/em>\u201d. The expected reward\u00a0<strong><em>R(P, Q)<\/em><\/strong>\u00a0is then a function of both the\u00a0<strong><em>honest subjective\u00a0<\/em><\/strong>probability\u00a0<strong><em>P<\/em><\/strong>\u00a0and of the\u00a0<strong><em>communicated<\/em><\/strong>\u00a0probability\u00a0<strong><em>Q,\u00a0<\/em><\/strong>namely the weighted sum of the reward\u00a0<strong><em>r(Q, A=1)\u00a0<\/em><\/strong>and\u00a0<strong><em>r(Q, A=0)<\/em><\/strong>:<\/p>\n<p class=\"wp-block-paragraph\" id=\"3d3c\"><strong><em>R(P, Q) = P * r(Q, A=1) + (1-P) * r(Q, A=0)<\/em><\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"5d94\">Here come the resulting\u00a0<strong><em>R(P,Q)<\/em><\/strong>\u00a0for four different values of the honest subjective probability\u00a0<strong><em>P<\/em><\/strong>:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f9f9f9\" data-has-transparency=\"false\" style=\"--dominant-color: #f9f9f9;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"607\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lGuw-ifVzawwDNnD2hQqVA-1024x607.png?resize=1024%2C607&#038;ssl=1\" alt=\"\" class=\"wp-image-598072 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lGuw-ifVzawwDNnD2hQqVA-1024x607.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lGuw-ifVzawwDNnD2hQqVA-300x178.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lGuw-ifVzawwDNnD2hQqVA-768x455.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lGuw-ifVzawwDNnD2hQqVA.png 1366w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Expected reward as a function of honest and communicated probabilities P and Q. Image by the author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"a7a2\">The maximally attainable reward on the long term is not always 1 anymore, but it\u2019s bounded by\u00a0<strong><em>2|P-<\/em>0.5<em>| \u2014\u00a0<\/em><\/strong>ignorance comes at a cost. Clearly, the best strategy is to confidently communicate\u00a0<strong><em>Q=<\/em>1<\/strong>\u00a0as long as\u00a0<strong>P&gt;0.5<\/strong>, and to communicate an equally confident\u00a0<strong><em>Q<\/em>=0<\/strong>\u00a0when\u00a0<strong><em>P<\/em>&lt;0.5 \u2014\u00a0<\/strong>see where the black dots lie in the figure.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\" id=\"f017\">Under a linear scoring rule, when it is more likely than not that the event occurs \u2014 pretend you are absolutely certain that it will occur. When it\u2019s marginally more likely that it does not occur \u2014 be bold and proclaim \u201cthat can never happen\u201d. You will be wrong sometimes, but, on average, it\u2019s more profitable to be bold than to be honest.<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\" id=\"e505\">Even worse: What happens when you have absolutely no clue, no idea about the outcome, and your subjective belief is\u00a0<strong><em>P<\/em><\/strong>=0.5? Then you can play safe and communicate that, or you can take the chance and communicate\u00a0<strong>Q=1<\/strong>\u00a0or\u00a0<strong>Q=0<\/strong>\u00a0\u2014 the expectation value is the same.<\/p>\n<p class=\"wp-block-paragraph\" id=\"f5c3\">If find this a disturbing result: A linear reward function makes people go all-in! There is no way as forecast consumer to distinguish a slight tendency of 51% from a \u201cquite likely\u201d conviction of 95% or from an almost-certain 99.9999999%. In that quiz, the smart players will always go all-in.<\/p>\n<p class=\"wp-block-paragraph\" id=\"e3f0\">Worse, many situations in life reward unsupported confidence more than thoughtful and careful assessments. Cautiously said, not many people are being heavily sanctioned for making clearly exaggerated claims\u2026<\/p>\n<p class=\"wp-block-paragraph\" id=\"7293\">A quiz show is one thing, but, obviously, it\u2019s quite a problem when people (or machines\u2026) are pushed to not communicate their true degree of conviction when it comes to estimating the risk of serious and dramatic events such as earthquakes, war and catastrophes.<\/p>\n<p class=\"wp-block-paragraph\" id=\"e8a5\">How can we make them to be honest (in the case of people) or\u00a0<a href=\"https:\/\/medium.com\/@maltetichy\/calibration-and-sharpness-fd8270b71f07\">calibrated<\/a>\u00a0(in the case of machines)?<\/p>\n<h2 class=\"wp-block-heading\" id=\"e482\"><strong>Punishing confident wrongness: The Quadratic Scoring Rule<\/strong><\/h2>\n<p class=\"wp-block-paragraph\" id=\"1981\">If the probability for something to happen is estimated to be\u00a0<strong>P<\/strong>=55% by some expert, I want that expert to communicate\u00a0<strong>Q<\/strong>=55%, and not\u00a0<strong>Q<\/strong>=100%. For probabilities to have any value for our decisions, they should reflect the true level of conviction, and not an opportunistically optimized value.<\/p>\n<p class=\"wp-block-paragraph\" id=\"ca35\">This reasonable ask has been formalized by statisticians by\u00a0<strong><em>proper<\/em><\/strong>\u00a0scoring rules: A proper scoring rule is one that incentivizes the forecaster to communicate their true degree of conviction, it is maximized when the communicated probabilities are calibrated, i.e. when predicted events are realized with the predicted frequency. At first, the question might arise whether such a scoring rule can exist at all. Thankfully, it can!<\/p>\n<p class=\"wp-block-paragraph\" id=\"0d46\">One proper scoring rule is the\u00a0<strong><em>quadratic scoring rule<\/em><\/strong>, also known as the\u00a0<strong><em>Brier score<\/em><\/strong>. For extreme communicated probabilities (<strong>Q<\/strong>=1,\u00a0<strong>Q<\/strong>=0), the values are the very same as for the linear scoring rule, but we don\u2019t draw straight line between these, but a parabola. By doing that, we reward honest ignorance: +0.5 is awarded for a communicated probability of\u00a0<strong>Q<\/strong>=0.5.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"fafbfb\" data-has-transparency=\"false\" style=\"--dominant-color: #fafbfb;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"638\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_v4pWRg7c_nHFZ1gAmtACHw-1024x638.png?resize=1024%2C638&#038;ssl=1\" alt=\"\" class=\"wp-image-598073 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_v4pWRg7c_nHFZ1gAmtACHw-1024x638.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_v4pWRg7c_nHFZ1gAmtACHw-300x187.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_v4pWRg7c_nHFZ1gAmtACHw-768x479.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_v4pWRg7c_nHFZ1gAmtACHw.png 1264w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Quadratic reward as a function of outcome A and communicated probability Q. Image by the author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"1ec0\">This reward function is asymmetric: When you increase your confidence from\u00a0<strong><em>Q<\/em><\/strong>=0.95 to\u00a0<strong><em>Q<\/em><\/strong>=0.98 (and\u00a0<strong><em>A<\/em><\/strong>\u00a0is true), the reward function only increases marginally. On the other hand, when\u00a0<strong>A<\/strong>\u00a0is false, that same increase of confidence leaning towards the wrong outcome is pushing down the reward considerably. Clearly, the quadratic reward thereby nudges one to be more cautious than the linear reward. But will it suffice to make people honest?<\/p>\n<p class=\"wp-block-paragraph\" id=\"29db\">To see that, let\u2019s compute the expectation value of the quadratic reward as a function of both the true honest probability\u00a0<strong><em>P<\/em><\/strong>\u00a0and the communicated one\u00a0<strong><em>Q,\u00a0<\/em><\/strong>just like we did in the linear case:<\/p>\n<p class=\"wp-block-paragraph\" id=\"b857\"><strong><em>R(P, Q) = P * r(Q, A=1) + (1-P) * r(Q, A=0)<\/em><\/strong><\/p>\n<p class=\"wp-block-paragraph\" id=\"cc74\">The resulting expected reward, for different values of the honest probability P, is shown in the next figure:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f8f8f8\" data-has-transparency=\"false\" style=\"--dominant-color: #f8f8f8;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"639\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_yo7Fh6c2i2_BYNdc9IY0mw-1024x639.png?resize=1024%2C639&#038;ssl=1\" alt=\"\" class=\"wp-image-598074 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_yo7Fh6c2i2_BYNdc9IY0mw-1024x639.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_yo7Fh6c2i2_BYNdc9IY0mw-300x187.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_yo7Fh6c2i2_BYNdc9IY0mw-768x479.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_yo7Fh6c2i2_BYNdc9IY0mw.png 1254w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image by the author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"f901\">Now, the maxima of the curves lie exactly at the point for which\u00a0<strong><em>Q=P<\/em><\/strong>, which makes the correct strategy communicating honestly one\u2019s own probability\u00a0<strong><em>P<\/em><\/strong>. Both exaggerated confidence and excessive caution are penalized. Of course, by knowing more in the first place, you\u2019ll be able to make sharper and more confident statements (more predictions\u00a0<strong>Q=P<\/strong>\u00a0that are either close to 1 or close to 0). But honest ignorance is now rewarded with +0.5. Better be safe than sorry.<\/p>\n<p class=\"wp-block-paragraph\" id=\"4b58\">What do we learn from that? The reward that is maximized by honestly communicated probabilities sanctions \u201csurprises\u201d (<strong>Q<\/strong>&lt;0.5 and the event is actually true, or\u00a0<strong>Q<\/strong>&gt;0.5 and the event is actually false) quite strongly. You lose more when you are wrong with your tendency (<strong>Q<\/strong>&gt;0.5 or\u00a0<strong>Q<\/strong>&lt;0.5) than you would win when you are correct. At the same time, not knowing and being honest about it is rewarded a non-negligible value.<\/p>\n<h2 class=\"wp-block-heading\" id=\"dc3e\">Logarithmic reward<\/h2>\n<p class=\"wp-block-paragraph\" id=\"407b\">The quadratic reward function is not the only one that rewards honesty (there are infinitely many proper scoring rules): The logarithmic reward penalizes being confidently wrong (<strong>P<\/strong>=0, but truth is \u201c<em>yes<\/em>,\u00a0<strong><em>A<\/em><\/strong><em>\u00a0is true<\/em>\u201d;\u00a0<strong>P<\/strong>=1, yet truth is \u201c<em>no,\u00a0<\/em><strong><em>A<\/em><\/strong><em>\u00a0is false<\/em>\u201d) with an unassailable\u00a0<strong><em>-infinity<\/em><\/strong>: The score is simply the logarithm of the probability that had been predicted for the event that eventually occurred \u2014 the plot is cut off on the y-axis for that reason:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"fafbfb\" data-has-transparency=\"false\" style=\"--dominant-color: #fafbfb;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"635\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lYjULjid_mp2PM_0bWaljQ-1024x635.png?resize=1024%2C635&#038;ssl=1\" alt=\"\" class=\"wp-image-598075 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lYjULjid_mp2PM_0bWaljQ-1024x635.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lYjULjid_mp2PM_0bWaljQ-300x186.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lYjULjid_mp2PM_0bWaljQ-768x476.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_lYjULjid_mp2PM_0bWaljQ.png 1264w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Logarithmic reward as a function of the communicated probability. Image by the author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"15e2\">The logarithmic reward breaks the symmetry between \u201c<em>having communicated a slightly too-high<\/em>\u201d and \u201c<em>having expressed a slightly too-low<\/em>\u201d probability: Towards uninformative\u00a0<strong>Q<\/strong>=0.5, the penalty is weaker than towards informative\u00a0<strong>Q<\/strong>=0 or\u00a0<strong>Q<\/strong>=1, which we see in the expectation values:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f8f8f8\" data-has-transparency=\"false\" style=\"--dominant-color: #f8f8f8;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"653\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_mcwmCqDJSPljmLQtgs9jww-1024x653.png?resize=1024%2C653&#038;ssl=1\" alt=\"\" class=\"wp-image-598076 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_mcwmCqDJSPljmLQtgs9jww-1024x653.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_mcwmCqDJSPljmLQtgs9jww-300x191.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_mcwmCqDJSPljmLQtgs9jww-768x490.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_mcwmCqDJSPljmLQtgs9jww.png 1264w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image by the author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"30a9\">The logarithmic scoring rule heavily penalizes the assignment of a probability of 0 to something that then very surprisingly happened: Somebody who has to admit \u201c<em>I really though it was absolutely impossible<\/em>\u201d after the fact that they assigned\u00a0<strong><em>Q<\/em><\/strong>=0 won\u2019t be invited to provide predictions ever again\u2026<\/p>\n<h2 class=\"wp-block-heading\" id=\"7fe8\">Incentivizing sandbagging: The Cubic Scoring Rule<\/h2>\n<p class=\"wp-block-paragraph\" id=\"4d51\">Scoring rules can push forecasters to be over-confident (see the linear scoring rule), they can be proper (see the quadratic and logarithmic scoring rules), but they can also punish \u201c<em>being boldly wrong<\/em>\u201d so thoroughly that forecasters would rather pretend they don\u2019t know really even if they do. A\u00a0<strong><em>cubic scoring rule<\/em><\/strong>\u00a0would lead to such excessive caution:<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"fbfbfb\" data-has-transparency=\"false\" style=\"--dominant-color: #fbfbfb;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"645\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_0AQnXHKFzBBlYX5qBZzulw-1024x645.png?resize=1024%2C645&#038;ssl=1\" alt=\"\" class=\"wp-image-598077 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_0AQnXHKFzBBlYX5qBZzulw-1024x645.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_0AQnXHKFzBBlYX5qBZzulw-300x189.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_0AQnXHKFzBBlYX5qBZzulw-768x484.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_0AQnXHKFzBBlYX5qBZzulw.png 1254w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image by the author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"dcdd\">The expectation values of the reward now make people rather communicate values that are less informative (closer to 0.5) than their true convictions: Instead of an honest\u00a0<strong>Q=P<\/strong>=0.2, the optimum is at<strong>\u00a0Q<\/strong>=0.333, instead of honest\u00a0<strong>Q=P<\/strong>=0.4, the optimum is\u00a0<strong>Q<\/strong>=0.4495.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"f8f8f8\" data-has-transparency=\"false\" style=\"--dominant-color: #f8f8f8;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"658\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_yGbRreaKLdbTzLoR7RYrUw-1024x658.png?resize=1024%2C658&#038;ssl=1\" alt=\"\" class=\"wp-image-598078 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_yGbRreaKLdbTzLoR7RYrUw-1024x658.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_yGbRreaKLdbTzLoR7RYrUw-300x193.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_yGbRreaKLdbTzLoR7RYrUw-768x493.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_yGbRreaKLdbTzLoR7RYrUw.png 1258w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Image by the author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"e380\">In other words, to be provided honest judgements, don\u2019t exaggerate the punishment of strong but eventually wrong convictions either \u2014 otherwise you\u2019ll be surrounded by indecisive and hesitant cowards\u2026<\/p>\n<h2 class=\"wp-block-heading\" id=\"e270\">Honest and communicated probabilities<\/h2>\n<p class=\"wp-block-paragraph\" id=\"592d\">The following plot recapitulates the argument by showing the optimal communicated probability\u00a0<strong>Q<\/strong>\u00a0as a function of the true belief\u00a0<strong>P<\/strong>. For a linear reward (Exponent 1), you will either communicate\u00a0<strong>Q<\/strong>=0 or\u00a0<strong>Q<\/strong>=1, and not disclose any information about your true degree of conviction. The quadratic reward (Exponent 2) makes you be honest (<strong>Q<\/strong>=<strong>P<\/strong>), while the cubic reward (Exponent 3) lets you set overly cautious\u00a0<strong>Q<\/strong>\u00a0values.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"fafafa\" data-has-transparency=\"false\" style=\"--dominant-color: #fafafa;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"641\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_F2jEj8ARLZWXFScPRDOXFg-1024x641.png?resize=1024%2C641&#038;ssl=1\" alt=\"\" class=\"wp-image-598079 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_F2jEj8ARLZWXFScPRDOXFg-1024x641.png 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_F2jEj8ARLZWXFScPRDOXFg-300x188.png 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_F2jEj8ARLZWXFScPRDOXFg-768x481.png 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_F2jEj8ARLZWXFScPRDOXFg.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Optimally communicated probability\u00a0<strong>Q<\/strong>\u00a0as a function of the true conviction\u00a0<strong>P<\/strong>, for different reward functions. A proper scoring rule ensures\u00a0<strong>Q=P<\/strong>. Image by the author.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"01dd\">In reality, our choices are often binary, and, depending on the \u201cfalse positive\u201d and \u201cfalse negative\u201d cost and the \u201ctrue positive\u201d and \u201ctrue negative\u201d reward, we will set the threshold on our subjective probability to take or not take a certain action to different values. It is not at all irrational to plan thoroughly for a probability\u00a0<strong>P<\/strong>=0.01=1% catastrophe.<\/p>\n<h2 class=\"wp-block-heading\" id=\"f9a1\">If probabilities are subjective, how can they be \u201cwrong\u201d?<\/h2>\n<p class=\"wp-block-paragraph\" id=\"c995\">Scoring rules have two main applications: On a technical level, when training a probabilistic statistical or machine learning model on data, optimizing a proper scoring rule will yield calibrated and as-sharp-as-possible probabilistic forecasts. In a more informal setting, when several experts estimate the probability for something (typically dramatic) to happen, one wants to make sure that the experts are honest and don\u2019t try to overplay or downplay their subjective uncertainty (beware of group dynamics!). Super-forecasters indeed use quadratic scoring rules to help reflect on their degree of confidence and to train themselves to become more calibrated.<\/p>\n<p class=\"wp-block-paragraph\" id=\"ffbe\">Back to our initial quiz game. Before answering, you should definitely ask how you are evaluated. The evaluation procedure does matter, even if you are told it does not. Similarly, when you are given a multiple-choice-test, be sure to understand whether it might be worthwhile to check a box even if you are only very marginally certain about its correctness.<\/p>\n<p class=\"wp-block-paragraph\" id=\"9aa3\">But how can a quiz involving subjective probabilities be evaluated at all in an objective fashion? According to Bruno De Finetti, \u201c<em>probability does not exist\u201d<\/em>, so how can we then judge the probabilities that people express? We don\u2019t judge people\u2019s taste either! David Spiegelhalter emphasizes in \u201cThe Art of Uncertainty\u201d that uncertainty is not \u201c<em>a property of the world, but of our relationship with the world<\/em>\u201d.<\/p>\n<p class=\"wp-block-paragraph\" id=\"45cf\">However,\u00a0<em>subjective<\/em>\u00a0does not mean\u00a0<em>unfalsifiable<\/em>.<\/p>\n<p class=\"wp-block-paragraph\" id=\"ce0f\">I might be 99% sure that France is larger than Spain, 75% sure that Marie Curie was born before Albert Einstein, and 55% sure that Montreal is larger than Kyoto. The numbers that\u00a0<strong><em>you<\/em><\/strong>\u00a0assign to these statements will\u00a0<em>probably<\/em>\u00a0(pun intended) be different. Your relationship to the world is a different one than mine. That\u2019s OK.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\" id=\"cd1c\">We can be both right in the sense that we express calibrated probabilities, even if we assign\u00a0<strong>different<\/strong>\u00a0probabilities to the\u00a0<strong>same<\/strong>\u00a0events.<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\" id=\"3617\">A more commonplace setting: When I enter a supermarket, I can assign quite informative (quite high or quite low) probabilities to me buying certain products \u2014 I typically know well what I intend to shop. The data scientist working at the supermarket does not know my personal shopping list, even after having collected considerable personal data. The probability that they assign to me buying a bottle of orange juice will be quite different from the one that I assign to me doing that \u2014 both probabilities can be \u201ccorrect\u201d in the sense that they are calibrated on the long term.<\/p>\n<p class=\"wp-block-paragraph\" id=\"39df\">Subjectivity does not mean arbitrariness: We can aggregate predictions and outcomes, and evaluate to which extent the predictions are calibrated. Scoring rules help us precisely with that task, because they simultaneously grade honesty and information: Each forecaster can be evaluated separately upon their predicted probabilities. The one that is most informed (producing close-to-1 and close-to-0 probabilities) while being honest at the same time will win the quiz. Different scoring rules can then rank strong-but-slightly-uncalibrated against weaker-but-calibrated predictions differently.<\/p>\n<p class=\"wp-block-paragraph\" id=\"18ec\">As mentioned above, honesty and\u00a0<a href=\"https:\/\/medium.com\/@maltetichy\/calibration-and-sharpness-fd8270b71f07\">calibration<\/a>\u00a0are not equivalent in practice. We might truly believe 100 times that certain events should occur in 20% of each case \u2014 but the true number of occurrences might significantly differ from 20. We might be honest about our belief and express\u00a0<strong>P=Q<\/strong>, but that belief itself is typically uncalibrated! Kahneman and Tversky have studied the cognitive biases that typically make more confident than we should be. In a way, we often behave as if a linear scoring rule judged our predictions, making us lean towards the bold side.<a href=\"https:\/\/medium.com\/tag\/data-science?source=post_page-----033ea6d993df---------------------------------------\"><\/a><\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/honestly-uncertain\/\">Honestly Uncertain<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Malte Tichy<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/honestly-uncertain\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Honestly Uncertain Ethical issues aside, should you be honest when asked how certain you are about some belief? Of course,\u00a0it depends. In this blog post, you\u2019ll learn on what. Different ways of evaluating probabilistic predictions come with dramatically different degrees of \u201coptimal honesty\u201d. Perhaps surprisingly, the linear function that assigns +1 to true and fully [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,83,229,520,1782,238,92],"tags":[1783,1784,1785],"class_list":["post-1925","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-data-science","category-math","category-predictive-modeling","category-probabilistic-forecast","category-statistics","category-thoughts-and-theory","tag-confident","tag-scoring","tag-than"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1925"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1925"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1925\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1925"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1925"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1925"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}