On Adaptive Stochastic Optimization for Streaming Data: A Newton's Method with O(dN) Operations
Antoine Godichon-Baggioni, Nicklas Werge; 26(59):1−49, 2025.
Abstract
Stochastic optimization methods face new challenges in the realm of streaming data, characterized by a continuous flow of large, high-dimensional data. While first-order methods, like stochastic gradient descent, are the natural choice for such data, they often struggle with ill-conditioned problems. In contrast, second-order methods, such as Newton's method, offer a potential solution but are computationally impractical for large-scale streaming applications. This paper introduces adaptive stochastic optimization methods that effectively address ill-conditioned problems while functioning in a streaming context. Specifically, we present adaptive inversion-free stochastic quasi-Newton methods with computational complexity matching that of first-order methods, $\mathcal{O}(dN)$, where $d$ represents the number of dimensions/features and $N$ the number of data points. Theoretical analysis establishes their asymptotic efficiency, and empirical studies demonstrate their effectiveness in scenarios with complex covariance structures and poor initializations. In particular, we demonstrate that our adaptive quasi-Newton methods can outperform or match existing first- and second-order methods.
[abs]
[pdf][bib]© JMLR 2025. (edit, beta) |