Tag: gating
-
Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs
Mixture-of-Experts under Finite-Rate Gating: Communication–Generalization Trade-offs arXiv:2602.15091v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) architectures decompose prediction tasks into specialized expert sub-networks selected by a gating mechanism. This letter adopts a communication-theoretic view of MoE gating, modeling the gate as a stochastic channel operating under a finite information rate. Within an information-theoretic learning framework, we specialize…
-
Convergence Rates for Softmax Gating Mixture of Experts
Convergence Rates for Softmax Gating Mixture of Experts arXiv:2503.03213v1 Announce Type: new Abstract: Mixture of experts (MoE) has recently emerged as an effective framework to advance the efficiency and scalability of machine learning models by softly dividing complex tasks among multiple specialized sub-models termed experts. Central to the success of MoE is an adaptive softmax…