{"id":7236,"date":"2025-09-30T07:02:28","date_gmt":"2025-09-30T07:02:28","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/09\/30\/2509-22755\/"},"modified":"2025-09-30T07:02:28","modified_gmt":"2025-09-30T07:02:28","slug":"2509-22755","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/09\/30\/2509-22755\/","title":{"rendered":"Concept activation vectors: a unifying view and adversarial attacks"},"content":{"rendered":"<p>    Concept activation vectors: a unifying view and adversarial attacks<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>arXiv:2509.22755v1 Announce Type: new<br \/>\nAbstract: Concept Activation Vectors (CAVs) are a tool from explainable AI, offering a promising approach for understanding how human-understandable concepts are encoded in a model&#8217;s latent spaces. They are computed from hidden-layer activations of inputs belonging either to a concept class or to non-concept examples. Adopting a probabilistic perspective, the distribution of the (non-)concept inputs induces a distribution over the CAV, making it a random vector in the latent space. This enables us to derive mean and covariance for different types of CAVs, leading to a unified theoretical view. This probabilistic perspective also reveals a potential vulnerability: CAVs can strongly depend on the rather arbitrary non-concept distribution, a factor largely overlooked in prior work. We illustrate this with a simple yet effective adversarial attack, underscoring the need for a more systematic study.<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Ekkehard Schnoor, Malik Tiomoko, Jawher Said, Alex Jung, Wojciech Samek<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2509.22755\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Concept activation vectors: a unifying view and adversarial attacks arXiv:2509.22755v1 Announce Type: new Abstract: Concept Activation Vectors (CAVs) are a tool from explainable AI, offering a promising approach for understanding how human-understandable concepts are encoded in a model&#8217;s latent spaces. They are computed from hidden-layer activations of inputs belonging either to a concept class or [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,113,420,112],"tags":[3922,1335,2355],"class_list":["post-7236","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-cs-lg","category-math-pr","category-stat-ml","tag-activation","tag-concept","tag-vectors"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/7236"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=7236"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/7236\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=7236"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=7236"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=7236"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}