Tag: gd
-
Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization
Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization arXiv:2509.17251v1 Announce Type: new Abstract: Existing theory suggests that for linear regression problems categorized by capacity and source conditions, gradient descent (GD) is always minimax optimal, while both ridge regression and online stochastic gradient descent (SGD) are polynomially suboptimal for certain categories of such problems.…
-
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes arXiv:2504.04105v1 Announce Type: new Abstract: We study $textit{gradient descent}$ (GD) for logistic regression on linearly separable data with stepsizes that adapt to the current risk, scaled by a constant hyperparameter $eta$. We show that after at most $1/gamma^2$ burn-in steps, GD…