Abstract: We make a few interesting observations regarding deep neural networks (DNNS) training:
1) DNNs are typically initialized with (a) random weights in order to (b) break symmetry and (c) promote feature diversity.
We demonstrate a,b, and c are not necessary at initialization, in order to obtain high accuracy at the end of the training. (ICML2020)
2) Large batch training is commonly used to accelerate training.
We improve final accuracy by increasing the batch size with more data augmentations, instead of more samples. (CVPR2020)
3) Quantization of full precision trained DNNs while retaining high accuracy typically requires fine-tuning the model on a large dataset.
We generate synthetic data from the DNN Batch-Norm statistics; we then use it to fine-tune the DNN, without any real data (CVPR2020).
4) Asynchronous training causes a degradation in generalization, even after training has converged to a steady-state.
We close this gap by adjusting hyper-parameters according to a theoretical framework aiming to maintain minima stability. (ICLR2020)