Learning and Computation of $Phi$-Equilibria at the Frontier of Tractability
arXiv:2502.18582v1 Announce Type: new
Abstract: $Phi$-equilibria — and the associated notion of $Phi$-regret — are a powerful and flexible framework at the heart of online learning and game theory, whereby enriching the set of deviations $Phi$ begets stronger notions of rationality. Recently, Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC ’24) — abbreviated as DFFPS — settled the existence of efficient algorithms when $Phi$ contains only linear maps under a general, $d$-dimensional convex constraint set $mathcal{X}$. In this paper, we significantly extend their work by resolving the case where $Phi$ is $k$-dimensional; degree-$ell$ polynomials constitute a canonical such example with $k = d^{O(ell)}$. In particular, positing only oracle access to $mathcal{X}$, we obtain two main positive results: i) a $text{poly}(n, d, k, text{log}(1/epsilon))$-time algorithm for computing $epsilon$-approximate $Phi$-equilibria in $n$-player multilinear games, and ii) an efficient online algorithm that incurs average $Phi$-regret at most $epsilon$ using $text{poly}(d, k)/epsilon^2$ rounds.
We also show nearly matching lower bounds in the online learning setting, thereby obtaining for the first time a family of deviations that captures the learnability of $Phi$-regret.
From a technical standpoint, we extend the framework of DFFPS from linear maps to the more challenging case of maps with polynomial dimension. At the heart of our approach is a polynomial-time algorithm for computing an expected fixed point of any $phi : mathcal{X} to mathcal{X}$ based on the ellipsoid against hope (EAH) algorithm of Papadimitriou and Roughgarden (JACM ’08). In particular, our algorithm for computing $Phi$-equilibria is based on executing EAH in a nested fashion — each step of EAH itself being implemented by invoking a separate call to EAH.
Abstract: $Phi$-equilibria — and the associated notion of $Phi$-regret — are a powerful and flexible framework at the heart of online learning and game theory, whereby enriching the set of deviations $Phi$ begets stronger notions of rationality. Recently, Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC ’24) — abbreviated as DFFPS — settled the existence of efficient algorithms when $Phi$ contains only linear maps under a general, $d$-dimensional convex constraint set $mathcal{X}$. In this paper, we significantly extend their work by resolving the case where $Phi$ is $k$-dimensional; degree-$ell$ polynomials constitute a canonical such example with $k = d^{O(ell)}$. In particular, positing only oracle access to $mathcal{X}$, we obtain two main positive results: i) a $text{poly}(n, d, k, text{log}(1/epsilon))$-time algorithm for computing $epsilon$-approximate $Phi$-equilibria in $n$-player multilinear games, and ii) an efficient online algorithm that incurs average $Phi$-regret at most $epsilon$ using $text{poly}(d, k)/epsilon^2$ rounds.
We also show nearly matching lower bounds in the online learning setting, thereby obtaining for the first time a family of deviations that captures the learnability of $Phi$-regret.
From a technical standpoint, we extend the framework of DFFPS from linear maps to the more challenging case of maps with polynomial dimension. At the heart of our approach is a polynomial-time algorithm for computing an expected fixed point of any $phi : mathcal{X} to mathcal{X}$ based on the ellipsoid against hope (EAH) algorithm of Papadimitriou and Roughgarden (JACM ’08). In particular, our algorithm for computing $Phi$-equilibria is based on executing EAH in a nested fashion — each step of EAH itself being implemented by invoking a separate call to EAH.
Brian Hu Zhang, Ioannis Anagnostides, Emanuel Tewolde, Ratip Emin Berker, Gabriele Farina, Vincent Conitzer, Tuomas Sandholm
Go to original source