{"id":10731,"date":"2026-02-25T07:02:32","date_gmt":"2026-02-25T07:02:32","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2026\/02\/25\/2602-20297\/"},"modified":"2026-02-25T07:02:32","modified_gmt":"2026-02-25T07:02:32","slug":"2602-20297","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2026\/02\/25\/2602-20297\/","title":{"rendered":"Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation"},"content":{"rendered":"<p>    Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>arXiv:2602.20297v1 Announce Type: new<br \/>\nAbstract: We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing analyses do not apply to algorithms that achieve the nearly minimax-optimal worst-case regret bound $tilde{O}(dsqrt{H^3K})$, where $d$ is the feature dimension, $H$ is the horizon length, and $K$ is the number of episodes. We bridge this gap by providing the first gap-dependent regret bound for the nearly minimax-optimal algorithm LSVI-UCB++ (He et al., 2023). Our analysis yields improved dependencies on both $d$ and $H$ compared to previous gap-dependent results. Moreover, leveraging the low policy-switching property of LSVI-UCB++, we introduce a concurrent variant that enables efficient parallel exploration across multiple agents and establish the first gap-dependent sample complexity upper bound for online multi-agent RL with linear function approximation, achieving linear speedup with respect to the number of agents.<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Haochen Zhang, Zhong Zheng, Lingzhou Xue<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2602.20297\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation arXiv:2602.20297v1 Announce Type: new Abstract: We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing analyses do not apply to algorithms that achieve the nearly [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,113,112],"tags":[1658,1657,4830],"class_list":["post-10731","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-cs-lg","category-stat-ml","tag-dependent","tag-gap","tag-nearly"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/10731"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=10731"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/10731\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=10731"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=10731"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=10731"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}