Tag: activation

Concept activation vectors: a unifying view and adversarial attacks

Concept activation vectors: a unifying view and adversarial attacks arXiv:2509.22755v1 Announce Type: new Abstract: Concept Activation Vectors (CAVs) are a tool from explainable AI, offering a promising approach for understanding how human-understandable concepts are encoded in a model’s latent spaces. They are computed from hidden-layer activations of inputs belonging either to a concept class or…

September 30, 2025

Concept activation vectors: a unifying view and adversarial attacks