2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy
Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU
Benjamin Marie
Go to original source
2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy
Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU
Benjamin Marie
Go to original source
Posted
in
by