# Network Deconvolution by Ye, C., Evanusa, M., He, H., Mitrokhin, A., Goldstein, T., Yorke, J. A., Fermuller, Cornelia, … (2020)

tags
Convolutional neural networks, Neural network training
source
(Ye et al. 2020)

## Summary

This paper introduces so-called Network Deconvolution, advertised as a way to remove pixel-wise and channel-wise correlation in deep neural networks.

The authors base their new operator on the optimal configuration for $$L_2$$ linear regression, where gradient descent converges in one single step if and only if:

$\frac{1}{N}X^t X = I$ where $$X$$ is the feature matrix and $$N$$ the number of samples.

This means that input features should be normalized and uncorrelated for gradient descent to converge the fastest. This can be achieved, either by correcting the gradient with the Hessian matrix, or manipulating input features so as to normalize them and remove correlations.

An algorithm to construct this deconvolution operator is introduced. $$D \approx (Cov + \epsilon \cdot I) ^{-\frac{1}{2}}$$, where $$Cov = \frac{1}{N}(X-\mu)^t (X-\mu)$$.

Actual deconvolution operator approximation is done with some sampling to accelerate computations. After training, a running average of $$D$$ is frozen and can be used for evaluation.

Deconvolution is presented as a unification of commonly used normalization techniques such as channel-wise decorrelation or BatchNorm.

The authors report what looks like pretty consistent improvement over BatchNorm on image classification tasks. This improvement is not very large however (less than top-5 1% accuracy on ImageNet).