arXiv:2305.14561v5 Announce Type: replace-cross Abstract: Compute-in-memory accelerators built upon non-volatile memory devices excel in energy efficiency and latency when performing deep neural network (DNN) inference, thanks to their in-situ data processing capability. However, the stochastic nature and intrinsic variations of non-volatile memory devices often result in performance degradation during DNN inference. Introducing these non-ideal device behaviors in DNN training enhances robustness, but drawbacks include limited accuracy improvement, reduced prediction confidence, and convergence issues. This arises from a mismatch between the deterministic training and non-deterministic device variations, as such training, though considering variations, relies solely on the model's final output. In this work, inspired by control theory, we propose Negative Feedback Training (NeFT), a novel concept supported by theoretical analysis, to more effectively capture the multi-scale noisy information throughout the network. We instantiate this concept with two specific instances, oriented variational forward (OVF) and intermediate representation snapshot (IRS). Based on device variation models extracted from measured data, extensive experiments show that our NeFT outperforms existing state-of-the-art methods with up to a 45.08% improvement in inference accuracy while reducing epistemic uncertainty, boosting output confidence, and improving convergence probability. These results underline the generality and practicality of our NeFT framework for increasing the robustness of DNNs against device variations. The source code for these two instances is available at https://github.com/YifanQin-ND/NeFT_CIM