Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning

Hao He^1*, Kaiwen Zha^1*, Dina Katabi¹

¹MIT CSAIL, ^*Co-primary Authors

ICLR 2023 Spotlight (notable top 25%)

Contrastive Poisoning shortcuts the contrastive learning by biasing the model towards recognizing poisoning patterns rather than real features to achieve the contrastive objective.

Abstract

Indiscriminate data poisoning attacks are quite effective against supervised learning. However, not much is known about their impact on unsupervised contrastive learning (CL). This paper is the first to consider indiscriminate poisoning attacks of contrastive learning. We propose Contrastive Poisoning (CP), the first effective such attack on CL. We empirically show that Contrastive Poisoning, not only drastically reduces the performance of CL algorithms, but also attacks supervised learning models, making it the most generalizable indiscriminate poisoning attack. We also show that CL algorithms with a momentum encoder are more robust to indiscriminate poisoning.

Video

Highlights

New Problem

Previous studies have shown that indiscriminate data poisoning methods can be highly effective in attacking supervised learning models, with the ability to reduce CIFAR-10 accuracy from 95% to 6%. However, these methods are inherently fragile and can be defended against by unsupervised learning techniques like contrastive learning, which can restore CIFAR-10 accuracy to 80% from 6%. As such, our paper aims to address the problem of poisoning contrastive learning models, exploring new methods for compromising the integrity of these models in the face of advanced defense mechanisms.

New Techniques

Our idea is to learn poisoning perturbation that can shortcut the contrastive learning. Specifically, we co-learn the poisoning perturbation together with a neural network model to minimize the contrastive learning loss.

We emphasize two key technical points that are important to the effectiveness of the learned contrastive poisoning:

First, we need to back-propagate the gradient through the data augmentation.
Second, we need to back-propagate through the momentum encoder.

Intriguing Results

Our results show our method, Contrastive Poisoning (CP), is highly effective and robust. It is able to

Poison different contrastive learning algorithms, different datasets, and different model architectures.
Poison both supervised learning and contrastive learning.
Remain effective, even when the attacker is unknown about the victim's learning algorithms, model architectures, and downstream tasks.
Remain effective, when the dataset is only partially poisoned.

BibTeX

@inproceedings{he2023indiscriminate,
    title={Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning},
    author={Hao He and Kaiwen Zha and Dina Katabi},
    booktitle={The Eleventh International Conference on Learning Representations},
    year={2023},
    url={https://openreview.net/forum?id=f0a_dWEYg-Td}
}