- tags
- Transformers, NLP
- website
- Wikipedia page for Wu Dao
Architecture
It is similar to GPT, being a decoder architecture but it applies a different pre-training task.
Parameter count
1.75T
It is similar to GPT, being a decoder architecture but it applies a different pre-training task.
1.75T