Tag: calibration
-
Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function
Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function arXiv:2602.18573v1 Announce Type: new Abstract: Machine-generated probability predictions are essential in modern classification tasks such as image classification. A model is well calibrated when its predicted probabilities correspond to observed event frequencies. Despite the need for multicategory recalibration methods, existing…
-
Nonparametric Distribution Regression Re-calibration
Nonparametric Distribution Regression Re-calibration arXiv:2602.13362v1 Announce Type: new Abstract: A key challenge in probabilistic regression is ensuring that predictive distributions accurately reflect true empirical uncertainty. Minimizing overall prediction error often encourages models to prioritize informativeness over calibration, producing narrow but overconfident predictions. However, in safety-critical settings, trustworthy uncertainty estimates are often more valuable than narrow…
-
Design-marginal calibration of Gaussian process predictive distributions: Bayesian and conformal approaches
Design-marginal calibration of Gaussian process predictive distributions: Bayesian and conformal approaches arXiv:2512.05611v1 Announce Type: new Abstract: We study the calibration of Gaussian process (GP) predictive distributions in the interpolation setting from a design-marginal perspective. Conditioning on the data and averaging over a design measure mu, we formalize mu-coverage for central intervals and mu-probabilistic calibration through…
-
Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification
Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification arXiv:2511.20960v1 Announce Type: new Abstract: Modern artificial intelligence systems make critical decisions yet often fail silently when uncertain. We develop a geometric framework for post-hoc calibration of neural network probability outputs, treating probability vectors as points on the $(c-1)$-dimensional probability simplex equipped with the Fisher–Rao metric.…
-
Enforcing Calibration in Multi-Output Probabilistic Regression with Pre-rank Regularization
Enforcing Calibration in Multi-Output Probabilistic Regression with Pre-rank Regularization arXiv:2510.21273v1 Announce Type: new Abstract: Probabilistic models must be well calibrated to support reliable decision-making. While calibration in single-output regression is well studied, defining and achieving multivariate calibration in multi-output regression remains considerably more challenging. The existing literature on multivariate calibration primarily focuses on diagnostic tools…
-
Calibrating Generative Models
Calibrating Generative Models arXiv:2510.10020v1 Announce Type: new Abstract: Generative models frequently suffer miscalibration, wherein class probabilities and other statistics of the sampling distribution deviate from desired values. We frame calibration as a constrained optimization problem and seek the closest model in Kullback-Leibler divergence satisfying calibration constraints. To address the intractability of imposing these constraints exactly,…
-
CP4SBI: Local Conformal Calibration of Credible Sets in Simulation-Based Inference
CP4SBI: Local Conformal Calibration of Credible Sets in Simulation-Based Inference arXiv:2508.17077v1 Announce Type: new Abstract: Current experimental scientists have been increasingly relying on simulation-based inference (SBI) to invert complex non-linear models with intractable likelihoods. However, posterior approximations obtained with SBI are often miscalibrated, causing credible regions to undercover true parameters. We develop $texttt{CP4SBI}$, a model-agnostic…
-
Accuracy Is Dead: Calibration, Discrimination, and Other Metrics You Actually Need
Accuracy Is Dead: Calibration, Discrimination, and Other Metrics You Actually Need A deep dive into advanced evaluation for data scientists The post Accuracy Is Dead: Calibration, Discrimination, and Other Metrics You Actually Need appeared first on Towards Data Science. Pol Marin Go to original source
-
Know What You Don’t Know: Uncertainty Calibration of Process Reward Models
Know What You Don’t Know: Uncertainty Calibration of Process Reward Models arXiv:2506.09338v1 Announce Type: new Abstract: Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poorly calibrated and often overestimate success probabilities. To address this, we present…
-
Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning
Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning arXiv:2505.23783v1 Announce Type: new Abstract: In-Context Learning (ICL) allows Large Language Models (LLMs) to adapt to new tasks with just a few examples, but their predictions often suffer from systematic biases, leading to unstable performances in classification. While calibration techniques are proposed to…
-
Evaluating Uncertainty in Deep Gaussian Processes
Evaluating Uncertainty in Deep Gaussian Processes arXiv:2504.17719v1 Announce Type: new Abstract: Reliable uncertainty estimates are crucial in modern machine learning. Deep Gaussian Processes (DGPs) and Deep Sigma Point Processes (DSPPs) extend GPs hierarchically, offering promising methods for uncertainty quantification grounded in Bayesian principles. However, their empirical calibration and robustness under distribution shift relative to baselines…
-
Advancing calibration for stochastic agent-based models in epidemiology with Stein variational inference and Gaussian process surrogates
Advancing calibration for stochastic agent-based models in epidemiology with Stein variational inference and Gaussian process surrogates arXiv:2502.19550v1 Announce Type: new Abstract: Accurate calibration of stochastic agent-based models (ABMs) in epidemiology is crucial to make them useful in public health policy decisions and interventions. Traditional calibration methods, e.g., Markov Chain Monte Carlo (MCMC), that yield a…
-
Understanding Model Calibration: A Gentle Introduction & Visual Exploration
Understanding Model Calibration: A Gentle Introduction & Visual Exploration How Reliable Are Your Predictions? About To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blog post we’ll take a look at the most commonly used definition for calibration and then dive…
-
Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction
Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction arXiv:2502.05676v1 Announce Type: new Abstract: Ensuring model calibration is critical for reliable predictions, yet popular distribution-free methods, such as histogram binning and isotonic regression, provide only asymptotic guarantees. We introduce a unified framework for Venn and Venn-Abers calibration, generalizing Vovk’s binary classification approach to arbitrary…
-
Model Calibration, Explained: A Visual Guide with Code Examples for Beginners
Model Calibration, Explained: A Visual Guide with Code Examples for Beginners MODEL EVALUATION & OPTIMIZATION When all models have similar accuracy, now what? You’ve trained several classification models, and they all seem to be performing well with high accuracy scores. Congratulations! But hold on — is one model truly better than the others? Accuracy alone doesn’t tell the…