{"id":1667,"date":"2025-02-05T07:03:07","date_gmt":"2025-02-05T07:03:07","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/02\/05\/2502-01672\/"},"modified":"2025-02-05T07:03:07","modified_gmt":"2025-02-05T07:03:07","slug":"2502-01672","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/02\/05\/2502-01672\/","title":{"rendered":"Doubly Robust Monte Carlo Tree Search"},"content":{"rendered":"<p>    Doubly Robust Monte Carlo Tree Search<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>arXiv:2502.01672v1 Announce Type: new<br \/>\nAbstract: We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS) to enhance sample efficiency and decision quality in complex environments. Our approach introduces a hybrid estimator that combines MCTS rollouts with DR estimation, offering theoretical guarantees of unbiasedness and variance reduction under specified conditions. Empirical evaluations in Tic-Tac-Toe and the partially observable VirtualHome environment demonstrate DR-MCTS&#8217;s superior performance over standard MCTS. In Tic-Tac-Toe, DR-MCTS achieves an 88% win rate compared to a 10% win rate for standard MCTS. In compound VirtualHome tasks, DR-MCTS attains a 20.7% success rate versus 10.3% for standard MCTS. Our scaling analysis reveals that DR-MCTS exhibits better sample efficiency, notably outperforming standard MCTS with larger language models while using a smaller model. These results underscore DR-MCTS&#8217;s potential for efficient decision-making in complex, real-world scenarios where sample efficiency is paramount.<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Manqing Liu, Andrew L. Beam<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2502.01672\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Doubly Robust Monte Carlo Tree Search arXiv:2502.01672v1 Announce Type: new Abstract: We present Doubly Robust Monte Carlo Tree Search (DR-MCTS), a novel algorithm that integrates Doubly Robust (DR) off-policy estimation into Monte Carlo Tree Search (MCTS) to enhance sample efficiency and decision quality in complex environments. Our approach introduces a hybrid estimator that combines MCTS [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,187,113,112],"tags":[1646,1645,1644],"class_list":["post-1667","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-cs-ai","category-cs-lg","category-stat-ml","tag-doubly","tag-dr","tag-mcts"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1667"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1667"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1667\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1667"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1667"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1667"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}