Hugo Cisneros
Open main menu
Blog
Notes
Projects
Resume
Contact
← Browse all tags
Transformers
Notes
Gopher
Sparrow
ChatGPT
BlenderBot 3
GPT
Gato
XLNet
XLM-RoBERTa
Wu Dao 2.0
Turing-NLG
Vision transformer
Transformer-XL
Trajectory transformer
T5
Switch transformer
Swin Transformer
SeeKer
RoBERTa
Pegasus
PaLM
OPT: Open Pre-trained Transformer
Minerva
Megatron
mBART
LAMDA
Jurassic-1
Imagen
GPTInstruct
GPT-Neo
Global context ViT
GPT-3
GPT-2
GLaM
Flamingo
ERNIE
ELECTRA
DQ-BART
DistillBERT
DialoGPT
Decision transformer
ALBERT
BERT
BLOOM
CTRL
Big bird
DALL-E-2
DALL-E
CLIP
Chinchilla
Positional encoding
BART
Notes on: Memorizing Transformers by Wu, Y., Rabe, M. N., Hutchins, D., & Szegedy, C. (2022)
Notes on: Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention by Xiong, Y., Zeng, Z., Chakraborty, R., Tan, M., Fung, G., Li, Y., & Singh, V. (2021)
Notes on: Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention by Katharopoulos, A., Vyas, A., Pappas, N., & Fleuret, F. (2020)
Notes on: Pretrained Transformers as Universal Computation Engines by Lu, K., Grover, A., Abbeel, P., & Mordatch, I. (2021)
Notes on: Information-Theoretic Probing with Minimum Description Length by Voita, E., & Titov, I. (2020)