Category: superposition

  • Formulation of Feature Circuits with Sparse Autoencoders in LLM

    Formulation of Feature Circuits with Sparse Autoencoders in LLM Large Language models (LLMs) have witnessed impressive progress and these large models can do a variety of tasks, from generating human-like text to answering questions. However, understanding how these models work still remains challenging, especially due a phenomenon called superposition where features are mixed into one…

  • Superposition: What Makes it Difficult to Explain Neural Network

    Superposition: What Makes it Difficult to Explain Neural Network When there are more features than model dimensions Introduction It would be ideal if the world of neural network represented a one-to-one relationship: each neuron activates on one and only one feature. In such a world, interpreting the model would be straightforward: this neuron fires for…