# Diffusion models

tags
Generative modelling
papers
(Sohl-Dickstein et al. 2015), (Ho et al. 2020)

## Principle of diffusion

### Forward diffusion

An image of size $$N$$ by $$N$$ $$x_0$$, which is a vector in $$\mathbb{R}^{N \times N \times c}$$ is diffused at each timestep $$t$$ to become $$x_t$$. The forward diffusion step is defined as follows: $q(\boldsymbol{x}_t | \boldsymbol{x}_{t-1}) = \mathcal{N}(\boldsymbol{x}_t; \sqrt{1 - \beta_t} \boldsymbol{x}_{ t - 1 }, \beta_t I)$ The probability of a sequence of images $$x_1, \ldots, x_T$$ is then $q(\boldsymbol{x}_1, \ldots, \boldsymbol{x}_T | \boldsymbol{x}_0) = \prod_{t=1}^T q(\boldsymbol{x}_t|\boldsymbol{x}_{t -1})$

For each timestep a new diffused image is sampled from a gaussian distribution centered on $$\sqrt{1 - \beta_t} \boldsymbol{x}_{ t - 1 }$$ with covariance matrix $$\beta_t I$$. It gradually perturbs the data independently for each pixel.

This choice of scaled mean allows to compute the distribution of $$\boldsymbol{x}_t$$ directly for any number of timesteps : $q(\mathbf{x}_t | \mathbf{x}_0) = \mathcal{N}(\mathbf{x}_t; \sqrt{\bar{\alpha}_t} \mathbf{x}_0, (1 - \bar{\alpha}_t)\mathbf{I})$ where $$\bar{\alpha}_t = \prod_{i = 1}^t \alpha_i$$ and $$\alpha_t = 1 - \beta_t$$

The forward process progressively alters data, mapping the distribution of images to the normal distribution in the limit of infinitely many timesteps.

## Bibliography

1. . . "Deep Unsupervised Learning Using Nonequilibrium Thermodynamics". arXiv. http://arxiv.org/abs/1503.03585.
2. . . "Denoising Diffusion Probabilistic Models". arXiv. http://arxiv.org/abs/2006.11239.