Tag: refinement

Limitations of refinement methods for weak to strong generalization

Limitations of refinement methods for weak to strong generalization arXiv:2508.17018v1 Announce Type: new Abstract: Standard techniques for aligning large language models (LLMs) utilize human-produced data, which could limit the capability of any aligned LLM to human level. Label refinement and weak training have emerged as promising strategies to address this superalignment problem. In this work,…

August 26, 2025

Limitations of refinement methods for weak to strong generalization