{"id":10055,"date":"2026-01-28T07:02:39","date_gmt":"2026-01-28T07:02:39","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2026\/01\/28\/2601-19156\/"},"modified":"2026-01-28T07:02:39","modified_gmt":"2026-01-28T07:02:39","slug":"2601-19156","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2026\/01\/28\/2601-19156\/","title":{"rendered":"Convergence of Muon with Newton-Schulz"},"content":{"rendered":"<p>    Convergence of Muon with Newton-Schulz<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>arXiv:2601.19156v1 Announce Type: new<br \/>\nAbstract: We analyze Muon as originally proposed and used in practice &#8212; using the momentum orthogonalization with a few Newton-Schulz steps. The prior theoretical results replace this key step in Muon with an exact SVD-based polar factor. We prove that Muon with Newton-Schulz converges to a stationary point at the same rate as the SVD-polar idealization, up to a constant factor for a given number $q$ of Newton-Schulz steps. We further analyze this constant factor and prove that it converges to 1 doubly exponentially in $q$ and improves with the degree of the polynomial used in Newton-Schulz for approximating the orthogonalization direction. We also prove that Muon removes the typical square-root-of-rank loss compared to its vector-based counterpart, SGD with momentum. Our results explain why Muon with a few low-degree Newton-Schulz steps matches exact-polar (SVD) behavior at a much faster wall-clock time and explain how much momentum matrix orthogonalization via Newton-Schulz benefits over the vector-based optimizer. Overall, our theory justifies the practical Newton-Schulz design of Muon, narrowing its practice-theory gap.<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Gyu Yeol Kim, Min-hwan Oh<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2601.19156\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Convergence of Muon with Newton-Schulz arXiv:2601.19156v1 Announce Type: new Abstract: We analyze Muon as originally proposed and used in practice &#8212; using the momentum orthogonalization with a few Newton-Schulz steps. The prior theoretical results replace this key step in Muon with an exact SVD-based polar factor. We prove that Muon with Newton-Schulz converges to a [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,113,376,112],"tags":[4683,1720,4682],"class_list":["post-10055","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-cs-lg","category-math-oc","category-stat-ml","tag-muon","tag-newton","tag-schulz"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/10055"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=10055"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/10055\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=10055"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=10055"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=10055"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}