Quantization of large language models
The LLM.int8() paper (Dettmers et al. 2022) explains some interesting issues and solutions for quantization of transformer-based large language models. Notably some emergent properties arise in these language models. More details in the author’s blog post.
Bibliography
- Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer. . "Llm.int8(): 8-bit Matrix Multiplication for Transformers at Scale". arXiv. DOI.