Tag: label

Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models

Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models arXiv:2601.22336v1 Announce Type: new Abstract: Large-scale AI evaluation increasingly relies on aggregating binary judgments from $K$ annotators, including LLMs used as judges. Most classical methods, e.g., Dawid-Skene or (weighted) majority voting, assume annotators are conditionally independent given the true label $Yin{0,1}$, an assumption often violated by LLM…

February 2, 2026
An Adaptive Sampling Framework for Detecting Localized Concept Drift under Label Scarcity

An Adaptive Sampling Framework for Detecting Localized Concept Drift under Label Scarcity arXiv:2511.02452v1 Announce Type: new Abstract: Concept drift and label scarcity are two critical challenges limiting the robustness of predictive models in dynamic industrial environments. Existing drift detection methods often assume global shifts and rely on dense supervision, making them ill-suited for regression tasks…

November 5, 2025
Limitations of refinement methods for weak to strong generalization

Limitations of refinement methods for weak to strong generalization arXiv:2508.17018v1 Announce Type: new Abstract: Standard techniques for aligning large language models (LLMs) utilize human-produced data, which could limit the capability of any aligned LLM to human level. Label refinement and weak training have emerged as promising strategies to address this superalignment problem. In this work,…

August 26, 2025
Approaching the Harm of Gradient Attacks While Only Flipping Labels

Approaching the Harm of Gradient Attacks While Only Flipping Labels arXiv:2503.00140v1 Announce Type: new Abstract: Availability attacks are one of the strongest forms of training-phase attacks in machine learning, making the model unusable. While prior work in distributed ML has demonstrated such effect via gradient attacks and, more recently, data poisoning, we ask: can similar…

March 4, 2025
The Majority Vote Paradigm Shift: When Popular Meets Optimal

The Majority Vote Paradigm Shift: When Popular Meets Optimal arXiv:2502.12581v1 Announce Type: new Abstract: Reliably labelling data typically requires annotations from multiple human workers. However, humans are far from being perfect. Hence, it is a common practice to aggregate labels gathered from multiple annotators to make a more confident estimate of the true label. Among…

February 19, 2025
Ranking Basics: Pointwise, Pairwise, Listwise

Ranking Basics: Pointwise, Pairwise, Listwise Because thy neighbour matters Image taken from unsplash.com First, let’s talk about where ranking comes into play. Ranking is a big deal in e-commerce and search applications — essentially, any scenario where you need to organize documents based on a query. It’s a little different from classic classification or regression problems. For…

December 21, 2024