Tag: ij

Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds

Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds arXiv:2512.22473v1 Announce Type: new Abstract: Transformers empirically perform precise probabilistic reasoning in carefully constructed “Bayesian wind tunnels” and in large-scale language models, yet the mechanisms by which gradient-based learning creates the required internal geometry remain opaque. We provide a complete first-order analysis of how cross-entropy training…

December 30, 2025

Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds