{"id":2939,"date":"2025-04-08T07:02:41","date_gmt":"2025-04-08T07:02:41","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/04\/08\/2504-04105\/"},"modified":"2025-04-08T07:02:41","modified_gmt":"2025-04-08T07:02:41","slug":"2504-04105","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/04\/08\/2504-04105\/","title":{"rendered":"Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes"},"content":{"rendered":"<p>    Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>arXiv:2504.04105v1 Announce Type: new<br \/>\nAbstract: We study $textit{gradient descent}$ (GD) for logistic regression on linearly separable data with stepsizes that adapt to the current risk, scaled by a constant hyperparameter $eta$. We show that after at most $1\/gamma^2$ burn-in steps, GD achieves a risk upper bounded by $exp(-Theta(eta))$, where $gamma$ is the margin of the dataset. As $eta$ can be arbitrarily large, GD attains an arbitrarily small risk $textit{immediately after the burn-in steps}$, though the risk evolution may be $textit{non-monotonic}$.<br \/>\n  We further construct hard datasets with margin $gamma$, where any batch or online first-order method requires $Omega(1\/gamma^2)$ steps to find a linear separator. Thus, GD with large, adaptive stepsizes is $textit{minimax optimal}$ among first-order batch methods. Notably, the classical $textit{Perceptron}$ (Novikoff, 1962), a first-order online method, also achieves a step complexity of $1\/gamma^2$, matching GD even in constants.<br \/>\n  Finally, our GD analysis extends to a broad class of loss functions and certain two-layer networks.<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Ruiqi Zhang, Jingfeng Wu, Licong Lin, Peter L. Bartlett<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2504.04105\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes arXiv:2504.04105v1 Announce Type: new Abstract: We study $textit{gradient descent}$ (GD) for logistic regression on linearly separable data with stepsizes that adapt to the current risk, scaled by a constant hyperparameter $eta$. We show that after at most $1\/gamma^2$ burn-in steps, GD [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,113,112],"tags":[2308,2307,813],"class_list":["post-2939","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-cs-lg","category-stat-ml","tag-gamma","tag-gd","tag-textit"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2939"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=2939"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2939\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=2939"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=2939"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=2939"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}