{"id":1382,"date":"2025-01-23T07:04:00","date_gmt":"2025-01-23T07:04:00","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/01\/23\/2501-12785\/"},"modified":"2025-01-23T07:04:00","modified_gmt":"2025-01-23T07:04:00","slug":"2501-12785","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/01\/23\/2501-12785\/","title":{"rendered":"On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration"},"content":{"rendered":"<p>    On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>arXiv:2501.12785v1 Announce Type: new<br \/>\nAbstract: This paper tackles the efficiency and stability issues in learning from observations (LfO). We commence by investigating how reward functions and policies generalize in LfO. Subsequently, the built-in reinforcement learning (RL) approach in generative adversarial imitation from observation (GAIfO) is replaced with distributional soft actor-critic (DSAC). This change results in a novel algorithm called Mimicking Observations through Distributional Update Learning with adequate Exploration (MODULE), which combines soft actor-critic&#8217;s superior efficiency with distributional RL&#8217;s robust stability.<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Yirui Zhou, Xiaowei Liu, Xiaofeng Zhang, Yangchun Zhang<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2501.12785\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration arXiv:2501.12785v1 Announce Type: new Abstract: This paper tackles the efficiency and stability issues in learning from observations (LfO). We commence by investigating how reward functions and policies generalize in LfO. Subsequently, the built-in reinforcement learning (RL) approach in generative adversarial imitation from observation (GAIfO) [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,113,112],"tags":[1093,1433,1434],"class_list":["post-1382","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-cs-lg","category-stat-ml","tag-distributional","tag-observations","tag-update"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1382"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1382"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1382\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1382"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1382"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1382"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}