Quantization

tags
Computer science, Neural networks

Quantization of large language models

The LLM.int8() paper (Dettmers et al. 2022) explains some interesting issues and solutions for quantization of transformer-based large language models. Notably some emergent properties arise in these language models. More details in the author’s blog post.

Bibliography

  1. . . "Llm.int8(): 8-bit Matrix Multiplication for Transformers at Scale". arXiv. DOI.
Last changed | authored by

Comments


← Back to Notes