CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity
International Conference on Learning Representations (ICLR), 2024
Abstract: Sample efficiency is a crucial problem in deep reinforcement learning. Recent algorithms, such as REDQ and DroQ, found a way to improve the sample efficiency by increasing the update-to-data (UTD) ratio to 20 gradient update steps on the critic per environment sample. However, this comes at the expense of a greatly increased computational cost. To reduce this computational burden, we introduce CrossQ: a lightweight algorithm that makes careful use of Batch Normalization and removes target networks to surpass the state-of-the-art in sample efficiency while maintaining a low UTD ratio of 1. Notably, CrossQ does not rely on advanced bias-reduction schemes used in current methods. CrossQs contributions are thus threefold: (1) state-of-the-art sample efficiency, (2) substantial reduction in computational cost compared to REDQ and DroQ, and (3) ease of implementation, requiring just a few lines of code on top of SAC.
Paper
Images and movies
See also
BibTex reference
@InProceedings{AAB24, author = "A. Bhatt and D. Palenicek and B. Belousov and M. Argus and A. Amiranashvili and T. Brox and J. Peters", title = "CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity", booktitle = "International Conference on Learning Representations (ICLR)", month = " ", year = "2024", url = "http://lmbweb.informatik.uni-freiburg.de/Publications/2024/AAB24" }