{"id":10559,"date":"2026-02-18T07:02:35","date_gmt":"2026-02-18T07:02:35","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2026\/02\/18\/2602-15091\/"},"modified":"2026-02-18T07:02:35","modified_gmt":"2026-02-18T07:02:35","slug":"2602-15091","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2026\/02\/18\/2602-15091\/","title":{"rendered":"Mixture-of-Experts under Finite-Rate Gating: Communication&#8211;Generalization Trade-offs"},"content":{"rendered":"<p>    Mixture-of-Experts under Finite-Rate Gating: Communication&#8211;Generalization Trade-offs<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>arXiv:2602.15091v1 Announce Type: new<br \/>\nAbstract: Mixture-of-Experts (MoE) architectures decompose prediction tasks into specialized expert sub-networks selected by a gating mechanism. This letter adopts a communication-theoretic view of MoE gating, modeling the gate as a stochastic channel operating under a finite information rate. Within an information-theoretic learning framework, we specialize a mutual-information generalization bound and develop a rate-distortion characterization $D(R_g)$ of finite-rate gating, where $R_g:=I(X; T)$, yielding (under a standard empirical rate-distortion optimality condition) $mathbb{E}[R(W)] le D(R_g)+delta_m+sqrt{(2\/m), I(S; W)}$. The analysis yields capacity-aware limits for communication-constrained MoE systems, and numerical simulations on synthetic multi-expert models empirically confirm the predicted trade-offs between gating rate, expressivity, and generalization.<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Ali Khalesi, Mohammad Reza Deylam Salehi<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2602.15091\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Mixture-of-Experts under Finite-Rate Gating: Communication&#8211;Generalization Trade-offs arXiv:2602.15091v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) architectures decompose prediction tasks into specialized expert sub-networks selected by a gating mechanism. This letter adopts a communication-theoretic view of MoE gating, modeling the gate as a stochastic channel operating under a finite information rate. Within an information-theoretic learning framework, we specialize [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,414,113,415,112],"tags":[1947,1933,1384],"class_list":["post-10559","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-cs-it","category-cs-lg","category-math-it","category-stat-ml","tag-gating","tag-rate","tag-under"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/10559"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=10559"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/10559\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=10559"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=10559"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=10559"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}