Category: autoencoder

Sparse AutoEncoder: from Superposition to interpretable features

Sparse AutoEncoder: from Superposition to interpretable features Disentangle features in complex Neural Network with superpositions Complex neural networks, such as Large Language Models (LLMs), suffer quite often from interpretability challenges. One of the most important reasons for such difficulty is superposition — a phenomenon of the neural network having fewer dimensions than the number of features it…

February 2, 2025

Sparse AutoEncoder: from Superposition to interpretable features