0% found this document useful (0 votes)
66 views28 pages

Image Editing Costs

This document proposes an approach to mitigate the risks of malicious image editing using large diffusion models. The key idea is to "immunize" images by adding imperceptible adversarial perturbations that disrupt diffusion models and force them to generate unrealistic edited images. Two methods for crafting such perturbations are presented and shown to be effective. For the approach to be fully practical, organizations developing diffusion models would need to implement and support the immunization process, rather than relying on individual users.

Uploaded by

Valeria Rocha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views28 pages

Image Editing Costs

This document proposes an approach to mitigate the risks of malicious image editing using large diffusion models. The key idea is to "immunize" images by adding imperceptible adversarial perturbations that disrupt diffusion models and force them to generate unrealistic edited images. Two methods for crafting such perturbations are presented and shown to be effective. For the approach to be fully practical, organizations developing diffusion models would need to implement and support the immunization process, rather than relying on individual users.

Uploaded by

Valeria Rocha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Raising the Cost of Malicious AI-Powered Image Editing

Hadi Salman* Alaa Khaddaj* Guillaume Leclerc* Andrew Ilyas


hady@mit.edu alaakh@mit.edu gleclerc@mit.edu ailyas@mit.edu
MIT MIT MIT MIT
Aleksander Madry
˛
arXiv:2302.06588v1 [cs.LG] 13 Feb 2023

madry@mit.edu
MIT

Abstract
We present an approach to mitigating the risks of malicious image editing posed by large diffusion
models. The key idea is to immunize images so as to make them resistant to manipulation by these mod-
els. This immunization relies on injection of imperceptible adversarial perturbations designed to disrupt
the operation of the targeted diffusion models, forcing them to generate unrealistic images. We provide
two methods for crafting such perturbations, and then demonstrate their efficacy. Finally, we discuss a
policy component necessary to make our approach fully effective and practical—one that involves the or-
ganizations developing diffusion models, rather than individual users, to implement (and support) the
immunization process.1

1 Introduction
Large diffusion models such as DALL·E 2 [RDN+22] and Stable Diffusion [RBL+22] are known for their
ability to produce high-quality photorealistic images, and can be used for a variety of image synthesis and
editing tasks. However, the ease of use of these models has raised concerns about their potential abuse,
e.g., by creating inappropriate or harmful digital content. For example, a malevolent actor might download
photos of people posted online and edit them maliciously using an off-the-shelf diffusion model (as in
Figure 1 top).
How can we address these concerns? First, it is important to recognize that it is, in some sense, im-
possible to completely eliminate such malicious image editing. Indeed, even without diffusion models in
the picture, malevolent actors can still use tools such as Photoshop to manipulate existing images, or even
synthesize fake ones entirely from scratch. The key new problem that large generative models introduce is
that these actors can now create realistic edited images with ease, i.e., without the need for specialized skills
or expensive equipment. This realization motivates us to ask:
How can we raise the cost of malicious (AI-powered) image manipulation?
In this paper, we put forth an approach that aims to alter the economics of AI-powered image editing. At
the core of our approach is the idea of image immunization—that is, making a specific image resistant to AI-
powered manipulation by adding a carefully crafted (imperceptible) perturbation to it. This perturbation
would disrupt the operation of a diffusion model, forcing the edits it performs to be unrealistic (see Figure
1). In this paradigm, people can thus continue to share their (immunized) images as usual, while getting a
layer of protection against undesirable manipulation.
We demonstrate how one can craft such imperceptible perturbations for large-scale diffusion models
and show that they can indeed prevent realistic image editing. We then discuss in Section 5 complementary
technical and policy components needed to make our approach fully effective and practical.
* Equal contribution.
1 Code is available at https://github.com/MadryLab/photoguard.

1
Original Image Edited Image

Prompt: Two men


ballroom dancing
Adversary

Original Image Immunized Image Edited Image

Immunization Adversary

Figure 1: Overview of our framework. An adversary seeks to modify an image found online. The adversary
describes via a textual prompt the desired changes and then uses a diffusion model to generate a realistic
image that matches the prompt (top). By immunizing the original image before the adversary can access it,
we disrupt their ability to successfully perform such edits (bottom).

2 Preliminaries
We start by providing an overview of diffusion models as well as of the key concept we will leverage:
adversarial attacks.

2.1 Diffusion Models


Diffusion models have emerged recently as powerful tools for generating realistic images [SWM+15; HJA20].
These models excel especially at generating and editing images using textual prompts, and currently sur-
pass other image generative models such as GANs [GPM+14] in terms of the quality of produced images.

Diffusion process. At their core, diffusion models employ a stochastic differential process called the diffu-
sion process [SWM+15]. This process allows us to view the task of (approximate) sampling from a distribu-
tion of real images q(·) as a series of denoising problems. More precisely, given a sample x0 ∼ q(·), the diffu-
sion process incrementally adds noise to generate samples x1 , . . . , xT for T steps, where xt+1 = at xt + bt εt ,
and εt is sampled from a Gaussian distribution2 . Note that, as a result, the sample xT starts to follow a
standard normal distribution N (0, I) when T → ∞. Now, if we reverse this process and are able to sample
xt given xt+1 , i.e., denoise xt+1 , we can ultimately generate new samples from q(·). This is done by simply
starting from xT ∼ N (0, I) (which corresponds to T being sufficiently large), and iteratively denoising these
samples for T steps, to produce a new image x̃ ∼ q(·).
The element we need to implement this process is thus to learn a neural network εθ that “predicts” given
xt+1 the noise εt added to xt at each time step t. Consequently, this denoising model εθ is trained to minimize
the following loss function:
h i
L(θ ) = Et,x0 ,ε∼N (0,1) kε − εθ (xt+1 , t)k22 , (1)

where t is sampled uniformly over the T time steps. We defer discussion of details to Appendix B and refer
the reader to [Wen21] for a more in-depth treatment of diffusion models.
2 Here, at and bt are the parameters of the distribution q(xt+1 |xt ). Details are provided in Appendix B.

2
Generating images using prompts Generating image variations using prompts

Prompt: A photo of Mount Everest Prompt: A photo of a black cow swimming on


surrounded by Cherry Blossom trees the beach

Editing images using prompts

Prompt: A photo of two men in a wedding


Editable Region

Figure 2: Diffusion models offer various capabilities, such as (1) generating images using textual prompts
(top left), (2) generating variations of an input image using textual prompts (top right), and (3) editing
images using textual prompts (bottom).

Latent diffusion models (LDMs). Our focus will be on a specific class of diffusion models called the latent
diffusion models (LDMs) [RBL+22]3 . These models apply the diffusion process described above in the latent
space instead of the input (image) space. As it turned out, this change enables more efficient training and
faster inference, while maintaining high quality generated samples.
Training an LDM is similar to training a standard diffusion model and differs mainly in one aspect.
Specifically, to train an LDM, the input image x0 is first mapped to its latent representation z0 = E (x0 ),
where E is a given encoder. The diffusion process then continues as before (just in the latent space) by
incrementally adding noise to generate samples z1 , . . . , zT for T steps, where zt+1 = at zt + bt εt , and εt is
sampled from a Gaussian distribution. Finally, the denoising network εθ is then learned analogously to as
before but, again, now in the latent space, by minimizing the following loss function:
h i
L(θ ) = Et,z0 ,ε∼N (0,1) kε − εθ (zt+1 , t)k22 (2)

Once the denoising network εθ is trained, the same generative process can be applied as before, starting
from a random vector in the latent space, to obtain a latent representation z̃ of the (new) generated image.
This representation is then decoded into an image x̃ = D(z̃) ∼ q(·), using the corresponding decoder D .

Prompt-guided sampling using an LDM. An LDM by default generates a random sample from the dis-
tribution of images q(·) it was trained on. However, it turns out one can also guide the sampling using
natural language. This can be accomplished by combining the latent representation zT produced during
the diffusion process with the embedding of the user-defined textual prompt t.4 The denoising network
3 Our methodology can be adjusted to other diffusion models. Our focus on LDMs is motivated by the fact that all popular open-

sourced diffusion models are of this type.


4 Conditioning on the text embedding happens at every stage of the generation process. See [RBL+22] for more details.

3
εθ is applied to the combined representation for T steps, yielding z̃ which is then mapped to a new image
using the decoder D as before.

LDMs capabilities. LDMs turn out to be powerful text-guided image generation and editing tools. In
particular, LDMs can be used not only for generating images using textual prompts, as described above,
but also for generating textual prompt–guided variations of an image or edits of a specific part of an image
(see Figure 2). The latter two capabilities (i.e., generation of image variations and image editing) requires a
slight modification of the generative process described above. Specifically, to modify or edit a given image
x, we condition the generative process on this image. That is, instead of applying, as before, our generative
process of T denoising steps to a random vector in the latent space, we apply it to the latent representation
obtained from running the latent diffusion process on our image x. To edit only part of the image we
additionally condition the process on freezing the parts of the image that were to remain unedited.

2.2 Adversarial Attacks


For a given computer vision model and an image, an adversarial example is an imperceptible perturbation of
that image that manipulates the model’s behavior [SZS+14; BCM+13]. In image classification, for example,
an adversary can construct an adversarial example for a given image x that makes it classified as a specific
target label ytarg (different from the true label). This construction is achieved by minimizing the loss of a
classifier f θ with respect to that image:

δadv = arg min L( f θ (x + δ), ytarg ). (3)


δ∈∆

Here, ∆ is a set of perturbations that are small enough that they are imperceptible—a common choice is to
constrain the adversarial example to be close (in ` p distance) to the original image, i.e., ∆ = {δ : kδk p ≤ e}.
The canonical approach to constructing an adversarial example is to solve the optimization problem (3) via
projected gradient descent (PGD) [Nes03; MMS+18].

3 Adversarially Attacking Latent Diffusion Models


We now describe our approach to immunizing images, i.e., making them harder to manipulate using latent
diffusion models (LDMs). At the core of our approach is to leverage techniques from the adversarial attacks
literature [SZS+14; MMS+18; AMK+21] and add adversarial perturbations (see Section 2.1) to immunize
images. Specifically, we present two different methods to execute this strategy (see Figure 3): an encoder
attack, and a diffusion attack.

Encoder attack. Recall that an LDM, when applied to an image, first encodes the image using an encoder
E into a latent vector representation, which is then used to generate a new image (see Section 2). The key
idea behind our encoder attack is now to disrupt this process by forcing the encoder to map the input image
to some “bad” representation. To achieve this, we solve the following optimization problem using projected
gradient descent (PGD):

δencoder = arg min kE (x + δ) − ztarg k22 , (4)


kδk∞ ≤e

where x is the image to be immunized, and ztarg is some target latent representation (e.g., ztarg can be
the representation, produced using encoder E , of a gray image). Solutions to this optimization problem
yield small, imperceptible perturbations δencoder which, when added to the original image, result in an
(immunized) image that is similar to the (gray) target image from the LDM’s encoder perspective. This, in
turn, causes the LDM to generate an irrelevant or unrealistic image. An overview of this attack is shown in
Figure 3 (left)5 .
5 See Algorithm 1 in Appendix for the details of the encoder attack.

4
Encoder Attack Diffusion Attack
Target Image 𝒙𝑡𝑎𝑟𝑔 𝒛𝑡𝑎𝑟𝑔 Diffusion Process “Two men in a wedding”
Input Image 𝒙
2
𝛿𝑒𝑛𝑐𝑜𝑑𝑒𝑟 = arg min ℰ 𝒙 + 𝛿 − 𝒛𝑡𝑎𝑟𝑔 𝛿𝑑𝑖𝑓𝑓𝑢𝑠𝑖𝑜𝑛
ℰ 𝛿 ∞ ≤𝜖
𝒛 Text
ℰ Encoder

Diffusion Process “Two men in a wedding” Denoising Network


Input Image 𝒙 Edited Image 𝒙

𝛿𝑒𝑛𝑐𝑜𝑑𝑒𝑟

𝒛 Text
ℰ Encoder 𝒟 𝒛෤

Denoising Network Text Embedding


Edited Image 𝒙

Target Image 𝒙𝑡𝑎𝑟𝑔

𝒟 𝒛෤
2
𝛿𝑑𝑖𝑓𝑓𝑢𝑠𝑖𝑜𝑛 = arg min 𝑓 𝒙 + 𝛿 − 𝒙𝑡𝑎𝑟𝑔
𝛿 ∞ ≤𝜖
Text Embedding

Figure 3: Overview of our proposed attacks. When applying the encoder attack (left), our goal is to map the rep-
resentation of the original image to the representation of a target image (gray image). Our (more complex)
diffusion attack (right), on the other hand, aims to break the diffusion process by manipulating the whole
process to generate image that resembles a given target image (gray image).

Diffusion attack. Although the encoder attack is effective at forcing the LDM to generate images that
are unrelated to the immunized ones, we still expect the LDM to use the textual prompt. For example,
as shown in the encoder attack diagram in Figure 3, editing an immunized image of two men using the
prompt “Two men in a wedding” still results in a generated image with two men wearing wedding suits,
even if the image will contain some visual artifacts indicating that it has been manipulated. Can we disturb
the diffusion process even further so that the diffusion model “ignores” the textual prompt entirely and
generates a more obviously manipulated image?
It turns out that we are able to do so by using a more complex attack, one where we target the diffusion
process itself instead of just the encoder. In this attack, we perturb the input image so that the final image
generated by the LDM is a specific target image (e.g., random noise or gray image). Specifically, we generate
an adversarial perturbation δdi f f usion by solving the following optimization problem (again via PGD):

δdi f f usion = arg min k f (x + δ) − xtarg k22 . (5)


kδk∞ ≤e

Above, f is the LDM, x is the image to be immunized, and xtarg is the target image to be generated. An
overview of this attack is depicted in Figure 3 (right)6 . As we already mentioned, this attack targets the full
diffusion process (which includes the text prompt conditioning), and tries to nullify not only the effect of
the immunized image, but also that of the text prompt itself. Indeed, in our example (see Figure 3 (right))
no wedding suits appear in the edited image whatsoever.
It is worth noting that this approach, although more powerful than the encoder attack, is harder to
execute. Indeed, to solve the above problem (5) using PGD, one needs to backpropagate through the full
diffusion process (which, as we recall from Section 2.1, includes repeated application of the denoising step).
This causes memory issues even on the largest GPU we used7 . To address this challenge, we backpropagate
through only a few steps of the diffusion process, instead of the full process, while achieving adversarial
perturbations that are still effective. We defer details of our attacks to Appendix A.

4 Results
In this section, we examine the effectiveness of our proposed immunization method.
6 See Algorithm 2 in Appendix for the details of the diffusion attack.
7 We used an A100 with 40 GB memory.

5
Generated image Generated image
Source Image
(without immunization) (with immunization)

Black cow on the


beach

Brown cat playing


poker

Figure 4: Given a source image (e.g., image of a white cow on the beach) and a textual prompt (e.g., "black
cow on the beach"), the SDM can generate a realistic image matching the prompt while still similar to the
original image (middle column). However, when the source image is immunized, the SDM fails to do so
(right-most column). More examples are in Appendix C.

Setup. We focus on the Stable Diffusion Model (SDM) v1.5 [RBL+22], though our methods can be applied
to other diffusion models too. In each of the following experiments, we aim to disrupt the performance of
SDM by adding imperceptible noise (using either of our proposed attacks)—i.e., applying our immuniza-
tion procedure—to a variety of images. The goal is to force the model to generate images that are unrealistic
and unrelated to the original (immunized) image. We evaluate the performance of our method both quali-
tatively (by visually inspecting the generated images) and quantitatively (by examining the image quality
using standard metrics). We defer further experimental details to Appendix A.

4.1 Qualitative Results


Immunizing against generating image variations. We first assess whether we can disrupt the SDM’s
ability to generate realistic variations of an image based on a given textual prompt. For example, given an
image of a white cow on the beach and a prompt of “black cow on the beach”, the SDM should generate a
realistic image of a black cow on the beach that looks similar to the original one (cf. Figure 4). Indeed, the
SDM is able to generate such images. However, when we immunize the original images (using the encoder
attack), the SDM fails to generate a realistic variation—see Figure 4.

Immunizing against image editing. Now we consider the more challenging task of disrupting the ability
of SDMs to edit images using textual prompts. The process of editing an image using an SDM involves
inputting the image, a mask indicating which parts of the image should be edited, and a text prompt
guiding how the rest of the image should be manipulated. The SDM then generates an edited version
based on that prompt. An example can be seen in Figure 2, where an image of two men watching a tennis
game is transformed to resemble a wedding photo. This corresponded to inputting the original image,
a binary mask excluding from editing only the men’s heads, and the prompt “A photo of two men in a
wedding.” However, when the image is immunized (using either encoder or diffusion attacks), the SDM
is unable to produce realistic image edits (cf. Figure 5). Furthermore, the diffusion attack results in more
unrealistic images than the encoder attack.

6
Generated image Generated image Generated image
Source Image
(without immunization) (encoder attack) (diffusion attack)

Two men in a
wedding

A man sitting in a
metro

Figure 5: Given a source image (e.g., image of two men watching a tennis game) and a textual prompt (e.g.,
"two men in a wedding"), the SDM can edit the source image to match the prompt (second column). How-
ever, when the source image is immunized using the encoder attack, the SDM fails to do so (third column).
Immunizing using the diffusion attack further reduces the quality of the edited image (forth column). More
examples are in Appendix C.

Method FID ↓ PR ↑ SSIM ↑ PSNR ↑ VIFp ↑ FSIM ↑


Immunization baseline (Random noise) 82.57 1.00 0.75 ± 0.13 19.21 ± 4.00 0.43 ± 0.13 0.83 ± 0.08
Immunization (Encoder attack) 130.6 0.95 0.58 ± 0.11 14.91 ± 2.78 0.30 ± 0.10 0.73 ± 0.08
Immunization (Diffusion attack) 167.6 0.87 0.50 ± 0.09 13.58 ± 2.23 0.24 ± 0.09 0.69 ± 0.06

Table 6: We report various image quality metrics measuring the similarity between edits originating from
immunized vs. non-immunized images. We observe that edits of immunized images are substantially
different from those generated from the original (notn-immunized) images. Note that the arrows next to
the metrics denote increasing image similarity. Since our goal is to make the edits as different as possible
from the original edits in the presence of no immunization, then lower image similarity is better. Confidence
intervals denote one standard deviation over 60 images. Additional metrics are in Appendix C.1.

4.2 Quantitative Results


Image quality metrics. Figures 4 and 5 indicate that, as desired, edits of immunized images are notice-
ably different from those of non-immunized images. To quantify this difference, we generate 60 different
edits of a variety of images using different prompts, and then compute several metrics capturing the simi-
larity between resulting edits of immunized versus non-immunized images8 : FID [HRU+17], PR [SBL+18],
SSIM [WBS+04], PSNR, VIFp [SB06], and FSIM [ZZM+11]9 . The better our immunization method is, the
less similar the edits of immunized images are to those of non-immunized images.
The similarity scores, shown in Table 6, indicate that applying either of our immunization methods
(encoder or diffusion attacks) indeed yields edits that are different from those of non-immunized images
(since, for example, FID is far from zero for both of these methods). As a baseline, we consider a naive
immunization method that adds uniform random noise (of the same intensity as the perturbations used in
our proposed immunization method). This method, as we verified, is not effective at disrupting the SDM,
and yields edits almost identical to those of non immunized images. Indeed, in Table 6, the similarity scores
of this baseline indicate closer edits to non-immunized images compared to both of our attacks.
8 We use the implementations provided in: https://github.com/photosynthesis-team/piq.
9 We report additional metrics in Appendix C.1.

7
Image-prompt similarity. To further evaluate the řŖ

 ȱ˜œ’—Žȱ’–’•Š›’¢
quality of the generated/edited images after im-
munization (using diffusion attack), we measure
Řş
the similarity between the edited images and the ŘŞ
textual prompt used to guide this edit, with and Řŝ
without immunization. The fact that the SDM uses
ŘŜ
the textual prompt to guide the generation of an im-
age indicates that the similarity between the gener- Řś
›’’—Š• ›’’—Š•ȱƸȱ˜’œŽ ––ž—’£Ž
ated image and the prompt should be high in the
case of no immunization. However, after immu- Figure 7: Image-prompt similarity. We plot the cosine
nization (using the diffusion attack), the similarity similarity between the CLIP embeddings of the gen-
should be low, since the immunization process dis- erated images and the text prompts, with and without
rupts the full diffusion process, and forces the dif- immunization, as well as with a baseline immuniza-
fusion model to ignore the prompt during genera- tion of adding small random noise to the original im-
tion. We use the same 60 edits as in our previous ex- age. Error bars denote the interquartile range (IQR)
periment, and we extract—using a pretrained CLIP over 60 runs.
model [RKH+21]—the visual embeddings of these
images and the textual prompts used to generate
them. We then compute the cosine similarity between these two embeddings. As show in Figure 7, the
immunization process decreases the similarity between the generated images and the textual prompts to
generated them, as expected.

5 A Techno-Policy Approach to Mitigation of AI-Powered Editing


In the previous sections we have developed an immunization procedure that, when applied to an image,
protects the immunized version of that image from realistic manipulation by a given diffusion model. Our
immunization procedure has, however, certain important limitations. We now discuss these limitations as
well as a combination of technical and policy remedies needed to obtain a fully effective approach to raising
the cost of malicious AI-powered image manipulation.

(Lack of) robustness to transformations. One of the limitations of our immunization method is that the
adversarial perturbation that it relies on may be ineffective after the immunized image is subjected to image
transformations and noise purification techniques. For instance, malicious actors could attempt to remove
the disruptive effect of that perturbation by cropping the image, adding filters to it, applying a rotation, or
other means. This problem can be addressed, however, by leveraging a long line of research on creating
robust adversarial perturbations, i.e., adversarial perturbations that can withstand a broad range of image
modifications and noise manipulations [EEF+18; KGB16; AEI+18; BMR+18].

Forward-compatibility of the immunization. While the immunizing adversarial perturbations we pro-


duce might be effective at disrupting the current generation of diffusion-based generative models, they are
not guaranteed to be effective against the future versions of these models. Indeed, one could hope to rely
here on the so-called transferability of adversarial perturbations [PMG16; LCL+17], but no perturbation will
be perfectly transferable.
To truly address this limitation, we thus need to go beyond purely technical methods and encourage—
or compel—via policy means a collaboration between organizations that develop large diffusion models,
end-users, as well as data hosting and dissemination platforms. Specifically, this collaboration would in-
volve the developers providing APIs that allow the users and platforms to immunize their images against
manipulation by the diffusion models the developers create. Importantly, these APIs should guarantee
“forward compatibility”, i.e., effectiveness of the offered immunization against models developed in the
future. This can be accomplished by planting, when training such future models, the current immunizing
adversarial perturbations as backdoors. (Observe that our immunization approach can provide post-hoc
“backward compatibility” too. That is, one can create immunizing adversarial perturbations that are effec-
tive for models that were already released.)

8
It is important to point out that we are leveraging here an incentive alignment that is fundamentally
different to the one present in more typical applications of adversarial perturbations and backdoor attacks.
In particular, the “attackers” here—that is, the parties that create the adversarial perturbations/execute the
backdoor attack—are the same parties that develop the models being attacked. This crucial difference is, in
particular, exactly what helps remedy the forward compatibility challenges that turns out to be crippling,
e.g., in the context of “unlearnable” images creation (i.e., creation of images that are immune to being
leveraged by, e.g., facial recognition models) [RHC+21].

6 Related Work
Data misuse after model training. Recent advances in ML-powered image generation and editing have
raised concerns about the potential misuse of personal data for generating fake images. This issue arose
first in the context of the development of generative adversarial networks (GANs) for image generation and
editing [GPM+14; MO14; SGZ+16; IZZ+17; ZPI+17; ZXL+17; KAL+18; BDS19; KLA19], and led to research
on methods for defending against such manipulation, such as attacking the GAN itself [RAS20; RBS20;
SZZ+21]. This problem became exacerbated recently with the advent of (publicly available) diffusion mod-
els [RBL+22; RDN+22]. Indeed, one can now easily describe in text how one wants to manipulate an image,
and immediately get the result of an impressive quality (see Figure 2) that significantly outperforms previ-
ous methods, such as GANs.

Deepfake detection. A line of work related to ours aims to detect fake images rather than prevent their
generation. Deepfake detection methods include analyzing the consistency of facial expressions and iden-
tifying patterns or artifacts in the image that may indicate manipulation, and training machine learning
models to recognize fake images [KM18; ANY+18; NNN+19; ML21; RCV+19; DKP+19; LBZ+20; LYS+20;
BCM+21]. While some deepfake detection methods are more effective than others, no single method is fool-
proof. A potential way to mitigate this shortcoming could involve development of so-called watermarking
methods [CKL+97; NHZ+22]. These methods aim to ensure that it is easy to detect that a given output has
been produced using a generative model—such watermarking approaches have been recently developed
for a related context of large language models [KGW+23]. Still, neither deepfake detection nor watermark-
ing methods could protect images from being manipulated in the first place. A manipulated image can
hence cause harm before being flagged as fake. Also, given that our work is complementary to deepfake
detection and watermarking methods, it could, in principle, be combined with them.

Data misuse during model training. The abundance of readily available data on the Internet has played
a significant role in recent breakthroughs in deep learning, but has also raised concerns about the potential
misuse of such data when training models. Therefore, there has been an increasing interest in protection
against unauthorized data exploitation, e.g., by designing unlearnable examples [HME+21; FHL+21]. These
methods propose adding imperceptible backdoor signals to user data before uploading it online, so as to
prevent models from fruitfully utilizing this data. However, as pointed out by Radiya-Dixit et al. [RHC+21],
these methods can be circumvented, often simply by waiting until subsequently developed models can
avoid being fooled by the planted backdoor signal.

7 Conclusion
In this paper, we presented a method for raising the difficulty of using diffusion models for malicious image
manipulation. Our method involves “immunizing” images through addition imperceptible adversarial
perturbations. These added perturbations disrupt the inner workings of the targeted diffusion model and
thus prevent it from producing realistic modifications of the immunized images.
We also discussed the complementary policy component that will be needed to make our approach fully
effective. This component involves ensuring cooperation of the organizations developing diffusion-based
generative models in provisioning APIs that allow users to immunize their images to manipulation by such
models (and the future versions of thereof).

9
8 Acknowledgements
Work supported in part by the NSF grants CNS-1815221 and DMS-2134108, and Open Philanthropy. This
material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) un-
der Contract No. HR001120C0015. Work partially done on the MIT Supercloud compute cluster [RKB+18].

10
References
[AEI+18] Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. “Synthesizing Robust Ad-
versarial Examples”. In: International Conference on Machine Learning (ICML). 2018.
[AMK+21] Naveed Akhtar, Ajmal Mian, Navid Kardan, and Mubarak Shah. “Threat of Adversarial At-
tacks on Deep Learning in Computer Vision: Survey”. In: (2021).
[ANY+18] Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. “Mesonet: a compact
facial video forgery detection network”. In: 2018 IEEE international workshop on information
forensics and security (WIFS). IEEE. 2018, pp. 1–7.
[BCM+13] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov,
Giorgio Giacinto, and Fabio Roli. “Evasion attacks against machine learning at test time”. In:
Joint European conference on machine learning and knowledge discovery in databases (ECML-KDD).
2013.
[BCM+21] Nicolo Bonettini, Edoardo Daniele Cannas, Sara Mandelli, Luca Bondi, Paolo Bestagini, and
Stefano Tubaro. “Video face manipulation detection through ensemble of cnns”. In: 2020 25th
international conference on pattern recognition (ICPR). IEEE. 2021, pp. 5012–5019.
[BDS19] Andrew Brock, Jeff Donahue, and Karen Simonyan. “Large Scale GAN Training for High Fi-
delity Natural Image Synthesis”. In: International Conference on Learning Representations (ICLR).
2019.
[BMR+18] Tom B. Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. Adversarial
Patch. 2018. arXiv: 1712.09665 [cs.CV].
[BSM+15] Amnon Balanov, Arik Schwartz, Yair Moshe, and Nimrod Peleg. “Image quality assessment
based on DCT subband similarity”. In: IEEE International Conference on Image Processing (ICIP).
2015.
[CKL+97] Ingemar J Cox, Joe Kilian, F Thomson Leighton, and Talal Shamoon. “Secure spread spectrum
watermarking for multimedia”. In: IEEE transactions on image processing 6.12 (1997), pp. 1673–
1687.
[DKP+19] Ricard Durall, Margret Keuper, Franz-Josef Pfreundt, and Janis Keuper. “Unmasking deep-
fakes with simple features”. In: arXiv preprint arXiv:1911.00686 (2019).
[EEF+18] Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Florian Tramer, Atul
Prakash, Tadayoshi Kohno, and Dawn Song. “Physical Adversarial Examples for Object De-
tectors”. In: CoRR (2018).
[FHL+21] Shaopeng Fu, Fengxiang He, Yang Liu, Li Shen, and Dacheng Tao. “Robust unlearnable ex-
amples: Protecting data privacy against adversarial learning”. In: International Conference on
Learning Representations. 2021.
[GPM+14] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair,
Aaron Courville, and Yoshua Bengio. “Generative adversarial nets”. In: neural information pro-
cessing systems (NeurIPS). 2014.
[HJA20] Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising Diffusion Probabilistic Models”. In:
Neural Information Processing Systems (NeurIPS). 2020.
[HME+21] Hanxun Huang, Xingjun Ma, Sarah Monazam Erfani, James Bailey, and Yisen Wang. “Un-
learnable Examples: Making Personal Data Unexploitable”. In: ICLR. 2021.
[HRU+17] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochre-
iter. “Gans trained by a two time-scale update rule converge to a local nash equilibrium”. In:
Neural Information Processing Systems (NeurIPS). 2017.
[IZZ+17] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. “Image-to-image translation
with conditional adversarial networks”. In: conference on computer vision and pattern recognition
(CVPR). 2017.

11
[KAL+18] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. “Progressive Growing of GANs
for Improved Quality, Stability, and Variation”. In: International Conference on Learning Repre-
sentations. 2018.
[KGB16] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. “Adversarial examples in the physical
world”. In: arXiv preprint arXiv:1607.02533 (2016).
[KGW+23] John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein.
“A Watermark for Large Language Models”. In: 2023.
[KLA19] Tero Karras, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative
adversarial networks”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition. 2019, pp. 4401–4410.
[KM18] Pavel Korshunov and Sébastien Marcel. “Deepfakes: a new threat to face recognition? assess-
ment and detection”. In: arXiv preprint arXiv:1812.08685 (2018).
[LBZ+20] Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo.
“Face x-ray for more general face forgery detection”. In: Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition. 2020, pp. 5001–5010.
[LCL+17] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. “Delving into Transferable Adversar-
ial Examples and Black-box Attacks”. In: International Conference on Learning Representations
(ICLR). 2017.
[LYS+20] Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. “Celeb-df: A large-scale challeng-
ing dataset for deepfake forensics”. In: Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition. 2020, pp. 3207–3216.
[ML21] Yisroel Mirsky and Wenke Lee. “The creation and detection of deepfakes: A survey”. In: ACM
Computing Surveys (CSUR) 54.1 (2021), pp. 1–41.
[MMS+18] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.
“Towards deep learning models resistant to adversarial attacks”. In: International Conference on
Learning Representations (ICLR). 2018.
[MO14] Mehdi Mirza and Simon Osindero. “Conditional generative adversarial nets”. In: arXiv preprint
arXiv:1411.1784 (2014).
[Nes03] Yurii Nesterov. Introductory Lectures on Convex Optimization. 2003.
[NHZ+22] Paarth Neekhara, Shehzeen Hussain, Xinqiao Zhang, Ke Huang, Julian McAuley, and Farinaz
Koushanfar. “FaceSigns: semi-fragile neural watermarks for media authentication and coun-
tering deepfakes”. In: arXiv preprint arXiv:2204.01960 (2022).
[NNN+19] Thanh Thi Nguyen, Cuong M. Nguyen, Dung Tien Nguyen, Duc Thanh Nguyen, and Saeid
Nahavandi. “Deep Learning for Deepfakes Creation and Detection”. In: (2019).
[PMG16] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. “Transferability in Machine Learn-
ing: from Phenomena to Black-box Attacks using Adversarial Samples”. In: ArXiv preprint
arXiv:1605.07277. 2016.
[RAS20] Nataniel Ruiz, Sarah Adel Bargal, and Stan Sclaroff. “Disrupting Deepfakes: Adversarial At-
tacks Against Conditional Image Translation Networks and Facial Manipulation Systems”. In:
(2020).
[RBK+18] Rafael Reisenhofer, Sebastian Bosse, Gitta Kutyniok, and Thomas Wiegand. “A Haar wavelet-
based perceptual similarity index for image quality assessment”. In: 2018.
[RBL+22] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. “High-
resolution image synthesis with latent diffusion models”. In: Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition. 2022, pp. 10684–10695.
[RBS20] Nataniel Ruiz, Sarah Bargal, and Stan Sclaroff. “Protecting Against Image Translation Deep-
fakes by Leaking Universal Perturbations from Black-Box Neural Networks”. In: (2020).

12
[RCV+19] Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias
Nießner. “Faceforensics++: Learning to detect manipulated facial images”. In: Proceedings of the
IEEE/CVF international conference on computer vision. 2019, pp. 1–11.
[RDN+22] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. “Hierarchical
text-conditional image generation with clip latents”. In: arXiv preprint arXiv:2204.06125 (2022).
[RHC+21] Evani Radiya-Dixit, Sanghyun Hong, Nicholas Carlini, and Florian Tramèr. “Data Poisoning
Won’t Save You From Facial Recognition”. In: arXiv, 2021.
[RKB+18] Albert Reuther, Jeremy Kepner, Chansup Byun, Siddharth Samsi, William Arcand, David Be-
stor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna
Klein, Lauren Milechin, Julia Mullen, Andrew Prout, Antonio Rosa, Charles Yee, and Peter
Michaleas. “Interactive supercomputing on 40,000 cores for machine learning and data anal-
ysis”. In: 2018 IEEE High Performance extreme Computing Conference (HPEC). IEEE. 2018, pp. 1–
6.
[RKH+21] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar-
wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. “Learning transferable
visual models from natural language supervision”. In: arXiv preprint arXiv:2103.00020. 2021.
[SB06] H.R. Sheikh and A.C. Bovik. “Image information and visual quality”. In: 2006.
[SBL+18] Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lučić, Olivier Bousquet, and Sylvain Gelly. “As-
sessing Generative Models via Precision and Recall”. In: Advances in Neural Information Pro-
cessing Systems (NeurIPS). 2018.
[SGZ+16] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen.
“Improved techniques for training gans”. In: neural information processing systems (NeurIPS).
2016.
[SWM+15] Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. “Deep Un-
supervised Learning Using Nonequilibrium Thermodynamics”. In: International Conference on
Machine Learning. 2015.
[SZS+14] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Good-
fellow, and Rob Fergus. “Intriguing properties of neural networks”. In: International Conference
on Learning Representations (ICLR). 2014.
[SZZ+21] Hui Sun, Tianqing Zhu, Zhiqiu Zhang, Dawei Jin Xiong, Wanlei Zhou, et al. “Adversarial
attacks against deep generative models on data: a survey”. In: arXiv preprint arXiv:2112.00247
(2021).
[WBS+04] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. “Image quality assessment: from
error visibility to structural similarity”. In: 2004.
[Wen21] Lilian Weng. “What are diffusion models?” In: lilianweng.github.io (July 2021). URL: https :
//lilianweng.github.io/posts/2021-07-11-diffusion-models/.
[XZM+14] Wufeng Xue, Lei Zhang, Xuanqin Mou, and Alan C. Bovik. “Gradient Magnitude Similarity
Deviation: A Highly Efficient Perceptual Image Quality Index”. In: 2014.
[ZL12] Lin Zhang and Hongyu Li. “SR-SIM: A fast and high performance IQA index based on spectral
residual”. In: 2012 19th IEEE International Conference on Image Processing. 2012.
[ZPI+17] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. “Unpaired image-to-image
translation using cycle-consistent adversarial networks”. In: international conference on computer
vision(ICCV). 2017.
[ZSL14] Lin Zhang, Ying Shen, and Hongyu Li. “VSI: A Visual Saliency-Induced Index for Perceptual
Image Quality Assessment”. In: 2014.
[ZXL+17] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and
Dimitris N Metaxas. “Stackgan: Text to photo-realistic image synthesis with stacked generative
adversarial networks”. In: Proceedings of the IEEE international conference on computer vision.
2017, pp. 5907–5915.

13
[ZZM+11] Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. “FSIM: A Feature Similarity Index
for Image Quality Assessment”. In: 2011.

14
A Experimental Setup
A.1 Details of the diffusion model we used
In this paper, we used the open-source stable diffusion model hosted on the Hugging Face10 . We use
the hyperparameters presented in Table 8 to generated images from this model. For a given image on
which we want to test our immunization method, we first search for a good random seed that leads to a
realistic modification of the image given some textual prompt. Then we use the same seed when editing
the immunized version of the same image using the diffusion model. This ensures that the immunized
image is modified in the same way as the original image, and that the resulting non-realistic edits are due
to immunization and not to random seed.

Table 8: Hyperparameters used for the Stable Diffusion model.

height width guidance_scale num_inference_steps eta


512 512 7.5 100 1

A.2 Our attacks details


Throughout the paper, we use two different attacks: an encoder attack and a diffusion attack. These attacks
are described in the main paper, and are summarized here in Algorithm 1 and Algorithm 2, respectively.
For both of the attacks, we use the same set of hyperparameters shown in Table 9. The choice of e was such
that it is the large enough to disturb the image, but small enough to not be noticeable by the human eye.

Table 9: Hyperparameters used for the adversarial attacks.

Norm e step size number of steps


`∞ 16/255 2/255 200

Algorithm 1 Encoder Attack on a Stable Diffusion Model


1: Input: Input image x, target image xtarg , Stable Diffusion model encoder E , perturbation budget e, step
size k, number of steps N.
2: Compute the embedding of the target image: ztarg ← E (xtarg )
3: Initialize adversarial perturbation δencoder ← 0, and immunized image xim ← x
4: for n = 1 . . . N do
5: Compute the embedding of the immunized image: z ← E (xim )
6: Compute mean squared error: l ← kztarg − zk22
7: Update adversarial perturbation: δencoder ← δencoder + k · sign(∇xim l )
8: δencoder ← clip(δencoder , −e, e)
9: Update the immunized image: xim ← xim − δencoder
10: end for
11: Return: xim

B Extended Background for Diffusion Models


Overview of the diffusion process. At their heart, diffusion models leverage a statistical concept: the
diffusion process [SWM+15; HJA20]. Given a sample x0 from a distribution of real images q(·), the diffusion
10 This model is available on: https://huggingface.co/runwayml/stable-diffusion-v1-5.

15
Algorithm 2 Diffusion Attack on a Stable Diffusion Model
1: Input: Input image x, target image xtarg , Stable Diffusion model f , perturbation budget e, step size k,
number of steps N.
2: Initialize adversarial perturbation δdi f f usion ← 0, and immunized image xim ← x
3: for n = 1 . . . N do
4: Generate an image using diffusion model: xout ← f (xim )
5: Compute mean squared error: l ← kxtarg − xout k22
6: Update adversarial perturbation: δdi f f usion ← δdi f f usion + k · sign(∇xim l )
7: δdi f f usion ← clip(δdi f f usion , −e, e)
8: Update the immunized image: xim ← xim − δdi f f usion
9: end for
10: Return: xim

process works in two steps: a forward step and a backward step. During the forward step, Gaussian noise is
added to the sample x0 over T time steps, to generate increasingly noisier versions x1 , . . . , xT of the original
sample x0 , until the sample is equivalent to an isotropic Gaussian distribution. During the backward step,
the goal is to reconstruct the original sample x0 by iteratively denoising the noised samples xT , . . . , x1 . The
power of the diffusion models stems from the ability to learn the backward process using neural networks.
This allows to generate new samples from the distribution q(·) by first generating a random Gaussian
sample, and then passing it through the “neural” backward step.

Forward process. During the forward step, Gaussian noise is iteratively added to the original sample x0 .
The forward process q(x1:T |x0 ) is assumed to follow a Markov chain, i.e. the sample at time step t depends
only on the sample at the previous time step. Furthermore, the variance added at a time step t is controlled
by a schedule of variances { β t }tT=1 11 .
T
∏ q (xt | xt −1 );
p
q(x1:T |x0 ) = q (xt | xt −1 ) = N (xt ; 1 − β t xt −1 , β t I ) (6)
t =1

Backward process. At the end of the forward step, the sample xT looks as if it is sampled from an isotropic
Gaussian p(xT ) = N (xT ; 0, I). Starting from this sample, the goal is to recover x0 by iteratively removing
the noise using neural networks. The joint distribution pθ (x0:T ) is referred to as the reverse process, and is
also assumed to be a Markov chain.
T
pθ (x0:T ) = p(xT ) ∏ pθ (xt−1 |xt ); pθ (xt−1 |xt ) = N (xt−1 ; µθ (xt , t), Σθ (xt , t)) (7)
t =1

Training a diffusion model. At its heart, diffusion models are trained in a way similar to Variational
Autoencoders, i.e. by optimizing a variational lower bound. Additional tricks are employed to make the
process faster. For an extensive derivation, refer to [Wen21].

q(x1:T |x0 )
 
Eq(x0 ) [− log pθ (x0 )] ≤ Eq(x0:T ) log = LVLB (8)
pθ (x0:T )

Latent Diffusion Models (LDMs). In this paper, we focus on a specific class of diffusion models, namely
LDMs, which was proposed in [RBL+22] as a model that applies the diffusion process described above in a
latent space instead of the image space. This enables efficient training and inference of diffusion models.
To train an LDM, the input image x0 is first mapped to a latent representation z0 = E (x0 ), where E is an
image encoder. This input representation z0 is then passed to the diffusion process to obtain a denoised z̃.
The generated image x̃ is then obtained by decoding z̃0 using a decoder D , i.e. x̃ = D(z̃).
11 The
p
values of at and bt from the main paper correspond to at = 1 − β t and bt = β t

16
C Additional Results
C.1 Additional quantitative results
We presented in Section 4 several metrics to assess the similarity between the images generated with and
without immunization. Here, we report in Table 10 additional metrics to evaluate this: SR-SIM [ZL12],
GMSD [XZM+14], VSI [ZSL14], DSS [BSM+15], and HaarPSI [RBK+18]. Similarly, we indicate for each
metric whether a higher value corresponds to higher similarity (using ↑), or contrariwise (using ↓). We
again observe that applying the encoder attack already decreases the similarity between the generated
images with and without immunization, and applying the diffusion attack further decreases the similarity.

Table 10: Additional similarity metrics for Table 6. Errors denote standard deviation over 60 images.

Method SR-SIM ↑ GMSD ↓ VSI ↑ DSS ↑ HaarPSI ↑


Immunization baseline (Random noise) 0.91 ± 0.04 0.20 ± 0.06 0.94 ± 0.03 0.35 ± 0.18 0.52 ± 0.15
Immunization (Encoder attack) 0.86 ± 0.05 0.26 ± 0.05 0.90 ± 0.03 0.19 ± 0.09 0.35 ± 0.11
Immunization (Diffusion attack) 0.84 ± 0.05 0.27 ± 0.04 0.89 ± 0.03 0.17 ± 0.08 0.31 ± 0.08

17
C.2 Generating Image Variations using Textual Prompts

Generated image Generated image


Source Image
(without immunization) (encoder attack)

An airplane flying
under the moon

A black cow on the


beach

A black cow on the


beach

A brown cat playing


poker

A bunny eating an
apple

A civilian airplane

Figure 11: Immunization against generating prompt-guided image variations.

18
C.3 Image Editing via Inpainting

Generated image Generated image Generated image


Source Image
(without immunization) (encoder attack) (diffusion attack)

A man in a wedding

A man in a wedding

A man in the gym

A man in New York


City

A man in a restaurant

A man playing poker

Figure 12: Immunization against image editing via prompt-guided inpainting.

19
Generated image Generated image Generated image
Source Image
(without immunization) (encoder attack) (diffusion attack)

A man in a farm

A man in a restaurant

A man in a restaurant

A man in a store

A man preparing dinner

A man holding a
microphone

Figure 13: Immunization against image editing via prompt-guided inpainting.

20
Generated image Generated image Generated image
Source Image
(without immunization) (encoder attack) (diffusion attack)

A man holding a
phone

A man preparing
dinner

A man drinking hot


coffee

A man playing poker

A man playing poker

A man playing poker

Figure 14: Immunization against image editing via prompt-guided inpainting.

21
Generated image Generated image Generated image
Source Image
(without immunization) (encoder attack) (diffusion attack)

A man sitting in a
metro

A man sitting in first


class airplane

A man sitting in the


airport

A man dancing on
stage

A man in a meeting

A man riding a
motorcycle

Figure 15: Immunization against image editing via prompt-guided inpainting.

22
Generated image Generated image Generated image
Source Image
(without immunization) (encoder attack) (diffusion attack)

Two men ballroom


dancing

Two men cooking in


the kitchen

Two men cooking in


the kitchen

Two men cooking in


the kitchen

Two men grilling

Two men grilling

Figure 16: Immunization against image editing via prompt-guided inpainting.

23
Generated image Generated image Generated image
Source Image
(without immunization) (encoder attack) (diffusion attack)

Two men in a hot tub

Two men in a
wedding

Two men in a
wedding on a
seafront

Two men in jail

Two men in the forest

Two men on the grass

Figure 17: Immunization against image editing via prompt-guided inpainting.

24
Generated image Generated image Generated image
Source Image
(without immunization) (encoder attack) (diffusion attack)

Two men in the zoo

Two men playing


guitar

Two men sneaking


into a building

Two men street


fighting

Figure 18: Immunization against image editing via prompt-guided inpainting.

25
Generated image Generated image Generated image
Source Image
(without immunization) (encoder attack) (diffusion attack)

A man receiving an
award

Two men attending a


wedding

Two men attending


a wedding

Two men in an
airplane

Two men in a
restaurant

Two men in a
restaurant

Figure 19: Immunization against image editing via prompt-guided inpainting.

26
Generated image Generated image Generated image
Source Image
(without immunization) (encoder attack) (diffusion attack)

Two men in a
restaurant

Two men in Europe

Two men in an
airplane

Two men in front of


the Eiffel Tower

Two men riding a


motorcycle

Two men wearing


gray shirts in the fog

Figure 20: Immunization against image editing via prompt-guided inpainting.

27
Generated image Generated image Generated image
Source Image
(without immunization) (encoder attack) (diffusion attack)

Two men wearing


green T-shirts

Two men wearing


red shirts in the fog

Figure 21: Immunization against image editing via prompt-guided inpainting.

28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy