R-Zero : Self-Evolving Reasoning LLM from Zero Data
R-Zero by Tencent introduces a concept to train LLMs without any labelled data and aims towards self-improving AI without human intervention. It works on the similar principle of GANs i.e. involving a Challenger and Solver where one generates questions and other Solves them.
Paper : https://arxiv.org/abs/2508.05004?ref=mackenziemorehead.com
Video explanation : https://youtu.be/kNL6z0wxZ_o?si=iG8U7Go7YeiLsADe
submitted by /u/Technical-Love-8479
[link] [comments]
/u/Technical-Love-8479
Go to original source