Language model architecture that use diffusion instead of autoregression. They generate text by iteratively denoising masked or noised tokens, which enables parallel decoding.
Language model architecture that use diffusion instead of autoregression. They generate text by iteratively denoising masked or noised tokens, which enables parallel decoding.
Loading comments...