Tag: sgd
-
Interactive Learning of Single-Index Models via Stochastic Gradient Descent
Interactive Learning of Single-Index Models via Stochastic Gradient Descent arXiv:2602.17876v1 Announce Type: new Abstract: Stochastic gradient descent (SGD) is a cornerstone algorithm for high-dimensional optimization, renowned for its empirical successes. Recent theoretical advances have provided a deep understanding of how SGD enables feature learning in high-dimensional nonlinear models, most notably the textit{single-index model} with i.i.d.…
-
On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization
On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization arXiv:2601.12238v1 Announce Type: new Abstract: While momentum-based acceleration has been studied extensively in deterministic optimization problems, its behavior in nonstationary environments — where the data distribution and optimal parameters drift over time — remains underexplored. We analyze the tracking performance of Stochastic Gradient Descent…
-
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes arXiv:2511.03952v1 Announce Type: new Abstract: We develop a high-dimensional scaling limit for Stochastic Gradient Descent with Polyak Momentum (SGD-M) and adaptive step-sizes. This provides a framework to rigourously compare online SGD with some of its popular variants. We show that the scaling limits of SGD-M coincide…
-
Effective continuous equations for adaptive SGD: a stochastic analysis view
Effective continuous equations for adaptive SGD: a stochastic analysis view arXiv:2509.21614v1 Announce Type: new Abstract: We present a theoretical analysis of some popular adaptive Stochastic Gradient Descent (SGD) methods in the small learning rate regime. Using the stochastic modified equations framework introduced by Li et al., we derive effective continuous stochastic dynamics for these methods.…
-
Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization
Optimal High-probability Convergence of Nonlinear SGD under Heavy-tailed Noise via Symmetrization arXiv:2507.09093v1 Announce Type: new Abstract: We study convergence in high-probability of SGD-type methods in non-convex optimization and the presence of heavy-tailed noise. To combat the heavy-tailed noise, a general black-box nonlinear framework is considered, subsuming nonlinearities like sign, clipping, normalization and their smooth counterparts.…
-
Dimension-adapted Momentum Outscales SGD
Dimension-adapted Momentum Outscales SGD arXiv:2505.16098v1 Announce Type: new Abstract: We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by data complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target…