
CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

A. Bhatt, D. Palenicek, B. Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, J. Peters
International Conference on Learning Representations (ICLR), 2024
Abstract: Sample efficiency is a crucial problem in deep reinforcement learning. Recent algorithms, such as REDQ and DroQ, found a way to improve the sample efficiency by increasing the update-to-data (UTD) ratio to 20 gradient update steps on the critic per environment sample. However, this comes at the expense of a greatly increased computational cost. To reduce this computational burden, we introduce CrossQ: a lightweight algorithm that makes careful use of Batch Normalization and removes target networks to surpass the state-of-the-art in sample efficiency while maintaining a low UTD ratio of 1. Notably, CrossQ does not rely on advanced bias-reduction schemes used in current methods. CrossQs contributions are thus threefold: (1) state-of-the-art sample efficiency, (2) substantial reduction in computational cost compared to REDQ and DroQ, and (3) ease of implementation, requiring just a few lines of code on top of SAC.

Images and movies


See also

BibTex reference

  author       = "A. Bhatt and D. Palenicek and B. Belousov and M. Argus and A. Amiranashvili and T. Brox and J. Peters",
  title        = "CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity",
  booktitle    = "International Conference on Learning Representations (ICLR)",
  month        = " ",
  year         = "2024",
  url          = "http://lmbweb.informatik.uni-freiburg.de/Publications/2024/AAB24"

Other publications in the database