PaLM

tags: Transformers, NLP
paper: (Chowdhery et al. 2022)

Architecture

This is a standard decoder-only architecture with some specific extensions:

SwiGLU activation functions
Parallel layers
Multi-query attention
RoPE embeddings
Shared input-output embeddings
No biaises
A 256k SentencePiece vocabulary generated from the training data

Parameter count

540B

Bibliography

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, et al.. April 19, 2022. "Palm: Scaling Language Modeling with Pathways". arXiv. http://arxiv.org/abs/2204.02311.

Links to this note

Minerva

Last changed 27/07/2022 | authored by Hugo Cisneros

Comments

← Back to Notes