Category: superposition

Formulation of Feature Circuits with Sparse Autoencoders in LLM

Formulation of Feature Circuits with Sparse Autoencoders in LLM Large Language models (LLMs) have witnessed impressive progress and these large models can do a variety of tasks, from generating human-like text to answering questions. However, understanding how these models work still remains challenging, especially due a phenomenon called superposition where features are mixed into one…

February 20, 2025
Superposition: What Makes it Difficult to Explain Neural Network

Superposition: What Makes it Difficult to Explain Neural Network When there are more features than model dimensions Introduction It would be ideal if the world of neural network represented a one-to-one relationship: each neuron activates on one and only one feature. In such a world, interpreting the model would be straightforward: this neuron fires for…

December 30, 2024

Formulation of Feature Circuits with Sparse Autoencoders in LLM