Pytorch is an autodiff library used to do machine learning in Python.
I don’t know who originally made this list. I also don’t know how many of those have been addressed in recent versions. If some of these tricks are not valid anymore let me know:
DataLoaderhas bad default settings, tune
num_workers > 0and default to
pin_memory = True.
torch.backends.cudnn.benchmark = Trueto autotune cudnn kernel choice
- Max out the batch size for each GPU to ammortize compute.
- Do not
forget_bias=Falsein weight layers before BatchNorms, it’s a noop that bloats model.
for p in model.parameters (): p.grad = Noneinstead of
- Careful to disable debug APIs in prod (
DataParallel, even if not running distributed.
- Careful to load balance compute on all GPs if variably-sized inputs or GPUs will idle.
- Use an apex fused optimizer (default PyTorch optim for loop iterates individual parameters, yikes).
- Use checkpointing to recompute memory-intensive compute-efficient ops in bwd pass (eg activations, upsampling,…).
@torch.jit.script, e.g. esp to fuse long sequences of pointwise ops like in GELU.