{"id":8687,"date":"2025-11-27T07:02:30","date_gmt":"2025-11-27T07:02:30","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/11\/27\/2511-20888\/"},"modified":"2025-11-27T07:02:30","modified_gmt":"2025-11-27T07:02:30","slug":"2511-20888","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/11\/27\/2511-20888\/","title":{"rendered":"Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets"},"content":{"rendered":"<p>    Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>arXiv:2511.20888v1 Announce Type: new<br \/>\nAbstract: This paper argues that DNNs implement a computational Occam&#8217;s razor &#8212; finding the `simplest&#8217; algorithm that fits the data &#8212; and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start with the discovery that the set of real-valued function $f$ that can be $epsilon$-approximated with a binary circuit of size at most $cepsilon^{-gamma}$ becomes convex in the `Harder than Monte Carlo&#8217; (HTMC) regime, when $gamma&gt;2$, allowing for the definition of a HTMC norm on functions. In parallel one can define a complexity measure on the parameters of a ResNets (a weighted $ell_1$ norm of the parameters), which induce a `ResNet norm&#8217; on functions. The HTMC and ResNet norms can then be related by an almost matching sandwich bound. Thus minimizing this ResNet norm is equivalent to finding a circuit that fits the data with an almost minimal number of nodes (within a power of 2 of being optimal). ResNets thus appear as an alternative model for computation of real functions, better adapted to the HTMC regime and its convexity.<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Arthur Jacot<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2511.20888\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets arXiv:2511.20888v1 Announce Type: new Abstract: This paper argues that DNNs implement a computational Occam&#8217;s razor &#8212; finding the `simplest&#8217; algorithm that fits the data &#8212; and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,1190,113,112],"tags":[1191,4312,4311],"class_list":["post-8687","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-cs-cc","category-cs-lg","category-stat-ml","tag-circuit","tag-htmc","tag-resnets"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/8687"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=8687"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/8687\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=8687"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=8687"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=8687"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}