Defending against universal perturbations with shared adversarial training
IEEE International Conference on Computer Vision (ICCV), 2019
Abstract: Classifiers such as deep neural networks have been shown to be vulnerable against adversarial perturbations on problems with high-dimensional input space. While adversarial training improves the robustness of image classifiers against such adversarial perturbations, it leaves them sensitive to perturbations on a non-negligible fraction of the inputs. In this work, we show that adversarial training is more effective in preventing universal perturbations, where the same perturbation needs to fool a classifier on many inputs. Moreover, we investigate the trade-off between robustness against universal perturbations and performance on unperturbed data and propose an extension of adversarial training that handles this trade-off more gracefully. We present results for image classification and semantic segmentation to showcase that universal perturbations that fool a model hardened with adversarial training become clearly perceptible and show patterns of the target scene.
Images and movies
BibTex reference
@InProceedings{Bro19b, author = "C. K. Mummadi and T. Brox and J. H. Metzen", title = "Defending against universal perturbations with shared adversarial training", booktitle = "IEEE International Conference on Computer Vision (ICCV)", month = " ", year = "2019", url = "http://lmbweb.informatik.uni-freiburg.de/Publications/2019/Bro19b" }