Tag: bit

Boost 2-Bit LLM Accuracy with EoRA

Boost 2-Bit LLM Accuracy with EoRA Quantization is one of the key techniques for reducing the memory footprint of large language models (LLMs). It works by converting the data type of model parameters from higher-precision formats such as 32-bit floating point (FP32) or 16-bit floating point (FP16/BF16) to lower-precision integer formats, typically INT8 or INT4.…

May 15, 2025
2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy

2-Bit VPTQ: 6.5x Smaller LLMs While Preserving 95% Accuracy Very accurate 2-bit quantization for running 70B LLMs on a 24 GB GPU Continue reading on Towards Data Science » Benjamin Marie Go to original source

February 1, 2025

Boost 2-Bit LLM Accuracy with EoRA