2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy

2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy










Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU






Benjamin Marie





Go to original source