0% found this document useful (0 votes)
4 views53 pages

Deep Learning Model Inversion Attacks and Defenses

This survey provides a comprehensive review of model inversion (MI) attacks and defense strategies in deep learning, highlighting their implications for privacy and security in sensitive applications. It introduces a structured taxonomy of MI attacks, analyzes existing defense mechanisms, and identifies future research directions while addressing gaps in current literature. Additionally, a resource repository is created to support ongoing research in this area, offering access to relevant datasets, evaluation metrics, and state-of-the-art research papers.

Uploaded by

temporary3853
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views53 pages

Deep Learning Model Inversion Attacks and Defenses

This survey provides a comprehensive review of model inversion (MI) attacks and defense strategies in deep learning, highlighting their implications for privacy and security in sensitive applications. It introduces a structured taxonomy of MI attacks, analyzes existing defense mechanisms, and identifies future research directions while addressing gaps in current literature. Additionally, a resource repository is created to support ongoing research in this area, offering access to relevant datasets, evaluation metrics, and state-of-the-art research papers.

Uploaded by

temporary3853
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Artificial Intelligence Review (2025) 58:242

https://doi.org/10.1007/s10462-025-11248-0

Deep learning model inversion attacks and defenses: a


comprehensive survey

Wencheng Yang1 · Song Wang2 · Di Wu1 · Taotao Cai1 · Yanming Zhu3 · Shicheng Wei1 ·
Yiying Zhang4 · Xu Yang5 · Zhaohui Tang1 · Yan Li1

Accepted: 25 April 2025


© The Author(s) 2025

Abstract
The rapid adoption of deep learning in sensitive domains has brought tremendous ben-
efits. However, this widespread adoption has also given rise to serious vulnerabilities,
particularly model inversion (MI) attacks, posing a significant threat to the privacy and
integrity of personal data. The increasing prevalence of these attacks in applications such
as biometrics, healthcare, and finance has created an urgent need to understand their mech-
anisms, impacts, and defense methods. This survey aims to fill the gap in the literature
by providing a structured and in-depth review of MI attacks and defense strategies. Our
contributions include a systematic taxonomy of MI attacks, extensive research on attack
techniques and defense mechanisms, and a discussion about the challenges and future
research directions in this evolving field. By exploring the technical and ethical implica-
tions of MI attacks, this survey aims to offer insights into the impact of AI-powered sys-
tems on privacy, security, and trust. In conjunction with this survey, we have developed a
comprehensive repository to support research on MI attacks and defenses. The repository
includes state-of-the-art research papers, datasets, evaluation metrics, and other resources
to meet the needs of both novice and experienced researchers interested in MI attacks and
defenses, as well as the broader field of AI security and privacy. The repository will be
continuously maintained to ensure its relevance and utility. It is accessible at ​h​t​t​p​s​:​​/​/​g​i​t​​h​u​
b​.​c​o​​m​/​o​v​​e​r​g​t​e​​r​/​D​e​e​​p​-​L​e​a​r​​n​i​n​g​​-​M​o​d​e​​l​-​I​n​v​​e​r​s​i​o​n​​-​A​t​t​​a​c​k​s​-​a​n​d​-​D​e​f​e​n​s​e​s.

Keywords Deep learning · Model inversion (MI) attacks · Privacy · Security

1 Introduction

As deep learning models are increasingly integrated into sensitive applications, robust
privacy and security measures are critical. While the deep learning mechanism provides
unprecedented predictive accuracy, it also introduces vulnerabilities related to how mod-
els encode and retain information about training data (Rigaki and Garcia 2023; Sen et al.
2024). Complex architectures, such as convolutional neural networks (CNNs) and trans-
formers, are often effective at generalization, but may inadvertently remember details of

Extended author information available on the last page of the article

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 2 of 52 W. Yang et al.

training data. In privacy-sensitive applications, such memorization poses a significant risk,


as indirect inferences about training data can lead to serious consequences.
Among major attacks that threaten deep learning systems (e.g., model extraction, model
inversion (MI), poisoning, and adversarial attacks) (He et al. 2020), the MI attack stands out
due to its ability to extract sensitive information from the training dataset and compromise
user privacy (Gong et al. 2023). MI attacks were proposed by Fredrikson et al. (2014),
which exploits the relationship between input data and model learning parameters to recover
sensitive information. The impact of MI attacks goes beyond violations of personal privacy,
undermining trust in machine/deep learning systems in critical applications such as biomet-
rics, healthcare analytics, and financial modeling. For example, using MI attacks on face
recognition systems, an adversary can approximate face features in the training dataset and
reconstruct the face image, thereby undermining the security of biometric systems (Yang
et al. 2023; Tran et al. 2024; Qiu et al. 2024). In healthcare systems, leaked sensitive patient
data may violate ethical standards and lead to legal consequences (Dao and Nguyen 2024;
Tang et al. 2024; Nguyen et al. 2024). Similarly, in financial systems, private transaction
data or credit scores could be compromised by MI attacks (Milner 2024). In addition to
privacy concerns, these attacks can erode users’ trust in machine/deep learning systems,
as users may lose confidence in the security of their personal or sensitive information. A
visualization example of the MI attack is shown in Fig. 1 (adapted from Nguyen et al. 2023).
From a technical perspective, MI attacks exploit the inherent vulnerability of deep
learning models to overfit or memorize patterns in training data (Titcombe et al. 2021).
Researchers have focused on understanding the factors that exacerbate this vulnerability
and developing robust defenses. There are several factors that affect the success of MI
attacks (Fredrikson et al. 2015; Shokri et al. 2017). First, access to the model: the attacker
needs to access the model’s outputs, architecture, or parameters. In white-box scenarios, the
attacker has full knowledge of the model, including its architecture and parameters, while in
black-box scenarios, only the output can be accessed. Second, model architecture: different
models retain different levels of information about training data. Complex models, such as
deep neural networks (DNNs), are particularly vulnerable to MI attacks because they are
more likely to memorize and store detailed information about their inputs. Third, data corre-
lation: MI attacks typically exploit the correlation between input data and model output. By
understanding how small changes in inputs affect model outputs, an adversary can fine-tune
the reconstruction of original data.

Fig. 1 A visualization example of the model inversion (MI) attack

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 3 of 52 242

Defending against MI attacks presents a unique challenge. First, designing a defense


strategy requires balancing the utility and predictive power of the model with risk-reducing
privacy protection mechanisms. Second, MI attacks usually require only limited informa-
tion, such as access to model outputs or intermediate features, rather than full access to
model parameters or training data (Han et al. 2023; Fang et al. 2023b). Therefore, prevent-
ing them is non-trivial. In addition, privacy-preserving techniques tend to degrade model
performance, thus requiring a careful trade-off between accuracy and privacy.
MI attacks highlight a critical vulnerability in deep learning models, emphasizing the
need for strong safeguards to protect sensitive information. This survey explores the tax-
onomy of MI attacks and examines the defense strategies proposed to mitigate these threats.

1.1 Existing surveys on MI attacks and defenses

A number of existing surveys on MI attacks and defenses provide valuable insights into the
mechanisms, challenges, and countermeasures associated with this privacy threat. These
surveys are summarized below.
Zhang et al. (2022) found that training samples can be reconstructed from gradients by
techniques known as gradient inversion (GradInv) attacks. These attacks show how attackers
can exploit gradient data to endanger data privacy. The authors categorized GradInv attacks
into two main modes (i.e., paradigm-iteration-based and recursion-based approaches), high-
lighting key techniques for gradient matching and data initialization to optimise recovery.
Dibbo (2023) presented a systematic review of MI attacks, providing a taxonomy to classify
MI attacks based on a variety of methods and features. The taxonomy highlights the unique
nature of MI attacks compared to other privacy threats (e.g., model extraction and mem-
bership inference), positioning MI attacks as attacks with unique complexity and impact.
The author also outlined key defense strategies and identifies a number of open questions.
Yang et al. (2023) provided an in-depth study of gradient leakage attacks in federated learn-
ing (FL) and classified these attacks into optimisation-based and analysis-based attacks.
The former considers data reconstruction as an optimisation problem, while the latter solves
the problem through linear equation analysis. To address the limitations of these traditional
approaches, the authors proposed a novel generation-based paradigm that greatly improves
the accuracy and efficiency of reconstruction.
Fang et al. (2024) provided a comprehensive survey of MI attacks and defenses, shed-
ding light on different approaches and features used in DNN applications. Their study sys-
tematically categorizes MI attacks based on data types and target tasks, outlines early MI
techniques in traditional machine/deep learning, and then shifts the focus to sophisticated
modern approaches in DNNs. The authors also provided a defense taxonomy that explores
current efforts to reduce MI risks. Liu et al. (2024) presented a systematic evaluation of
data reconstruction attacks, criticizing previous studies for relying on empirical observa-
tions without sufficient theoretical foundations. The authors introduced an approach that
enables a theoretical assessment of data leakage and establishes bounds on reconstruction
error, especially for two-layer neural networks. This study identifies upper bounds on the
reconstruction error and information theoretic lower bounds, advancing the understanding
of attack and defense dynamics.
Shi et al. (2024) conducted a survey on gradient inversion attacks (GIAs) in FL that
addresses the critical need to expand from the traditional “honest but curious” server model

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 4 of 52 W. Yang et al.

to a threat model involving malicious servers and clients. The authors classified GIAs
according to the roles of these adversaries, demonstrating that traditional defenses are
often ineffective against more aggressive malicious actors. The authors also detailed vari-
ous attack strategies in their classification, specifically how malicious servers and clients
can bypass existing defenses, and highlighted gaps in FL’s ability to deal with such com-
plex threats. The work stresses the role of reconstruction methods, model architectures, and
evaluation metrics in shaping the effectiveness of defenses. This research underscores the
importance of developing stronger defenses against GIAs.
Differences to existing surveys. The main differences between our work and existing
surveys can be summarized as follows. First, although the existing surveys have advanced
the categorization of MI attacks and their impact, these reviews lack a unifying framework
that integrates the latest advances across various paradigms and applications. Second, they
fail to provide a comprehensive summary of emerging challenges and corresponding future
research directions. Third, many of these studies narrowly focus on specific techniques (e.g.,
only about gradient inversion attacks and defenses), leaving critical gaps in understanding
broader MI attacks and defenses.
Our survey aims to address aforementioned limitations through a comprehensive and
integrated analysis of MI attacks, defenses and other aspects (e.g., applications, data types,
and evaluation metrics). By combining theoretical insights with practical considerations
in different applications, our work seeks to advance research on MI defense and guide the
development of robust privacy protection strategies. While almost all MI-related surveys
cover some common topics such as attacks, defenses, and future research, our work provides
a more in-depth review as it explores additional aspects that are not covered by existing sur-
veys, such as different recommendations regarding future research directions. Moreover, we
create a unified resource repository for studying MI attacks and defenses. This will benefit
both novice and experienced researchers in the field of deep learning-related security and
privacy. A comparison between existing surveys and our work is presented in Table 1.

1.2 Contributions of this work

The main contributions of this research are summarized below.


The first major contribution of this work is the development of a new, structured taxonomy
stemming from diverse techniques proposed in recent years. The new taxonomy provides
a clear, unified framework for understanding and categorizing MI attacks based on key
factors, such as methodology (e.g., gradient-based, generative model-based, and optimisa-
tion-based), data type (e.g., image, audio, text/tabular data), and application domain (e.g.,
biometric recognition, healthcare, finance). This structured organization not only enhances
the understanding of the differences and vulnerabilities of MI attacks in various scenarios,
but also highlights research gaps and lays the groundwork for exploring defense strategies.
Complementing the taxonomy, the comprehensive review systematically analyzes the state
of the art, including gradient inversion, generative model-based, and optimisation-based
attacks. Detailed case studies and comparisons of different strategies provide insight into the
impact, limitations, and practical applications of these strategies. The taxonomy and review
can serve as a valuable resource for researchers and practitioners to guide future research
and defense against MI attacks.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Table 1 A comparison of existing surveys and this work
Surveys Gradient Generative Different Different Defenses Evaluation Datasets Challenges and Resource
inversion model-based applications data types metrics future research reposi-
attacks attacks tory
Zhang et al. (2022) Yes No No No Yes No No Yes No
Dibbo (2023) Yes Yes Yes No Yes No No Yes No
Yang et al. (2023) Yes No No No No No No Yes No
Fang et al. (2024) Yes Yes No Yes Yes No No Yes Yes
Liu et al. (2024) Yes Yes No No Yes No No No No
Shi et al. (2024) Yes No No No Yes No No Yes No
Deep learning model inversion attacks and defenses: a comprehensive…

This survey Yes Yes Yes Yes Yes Yes Yes Yes Yes
In this table, “Yes” indicates that the topic is covered, while “No” means that it is not covered
Page 5 of 52
242

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 6 of 52 W. Yang et al.

The second contribution is a more in-depth and extensive review of defense mechanisms
designed to mitigate MI attacks than existing surveys. While this survey investigates and
analyzes current development and existing approaches, specific ideas are also studied. The
main defense mechanisms include feature/gradient perturbation, i.e., concealing or obscur-
ing sensitive information (e.g., feature representations and gradients) to distort the data and
thwart data reconstruction. Differential privacy, where noise is injected into the training
process to protect sensitive data; and cryptographic encryption, which allows training the
model using encrypted data without exposing the original input. By analyzing these mecha-
nisms, this survey provides a roadmap for strengthening models against MI attacks while
maintaining effectiveness in real-world applications.
The third contribution is to identify open questions and suggest future research direc-
tions. Despite the progress in understanding and mitigating MI attacks, critical challenges
remain. These challenges include developing defenses that can balance privacy protection
with model utility, improving the robustness of deep learning models against increasingly
sophisticated MI attacks, and addressing vulnerabilities in emerging paradigms such as FL
and edge AI systems. This survey highlights these challenges and outlines viable research
directions to stimulate innovative research in defending MI attacks.
The final contribution is the creation of a unified resource repository for studying MI
attacks and defenses. We compile and standardize the name abbreviations of existing MI
attack and defense methods. In cases where the authors do not provide abbreviations, we
generate reasonable abbreviations based on our understanding to ensure clarity and usabil-
ity. These standardized abbreviations help researchers utilize these works in their own
research. In addition, we compile a list of datasets and evaluation metrics commonly used
in MI attack and defense research. To consolidate these resources, we develop a comprehen-
sive repository of state-of-the-art research articles, datasets, evaluation metrics, and other
important resources. This repository is intended to support both novice and experienced
researchers in the study of MI attacks, defenses, and the broader field of AI security and pri-
vacy. It will be continually maintained to ensure its relevance and accessibility. The reposi-
tory is available at ​h​t​t​p​s​:​​/​/​g​i​t​​h​u​b​.​c​o​​m​/​o​v​​e​r​g​t​e​​r​/​D​e​e​​p​-​L​e​a​r​​n​i​n​g​​-​M​o​d​e​​l​-​I​n​v​​e​r​s​i​o​n​​-​A​t​t​​a​c​k​s​-​a​
n​d​-​D​e​f​e​n​s​e​s.

1.3 Literature selection methodology

This study employs a systematic method to identify, analyze, and synthesize relevant
research articles on deep learning model inversion attacks and defenses. A thorough search
is conducted across multiple reputable digital libraries, including IEEE Xplore, ACM Digi-
tal Library, SpringerLink, ScienceDirect, arXiv, and top-tier conferences such as CVPR,
NeurIPS, ICML, ECCV, USENIX, and NDSS. The literature search covers foundational
studies as well as latest advancements in the areas of interest. Both journal and conference
papers, as well as preprints and technical reports, are included so as to achieve a complete
coverage and a comprehensive overview.
To retrieve relevant studies, Boolean search queries are formulated with primary key-
words, including “Model Inversion Attack”, “Gradient Inversion Attack”, “Privacy Breach
in Deep Learning”, “Federated Learning Security”, “Privacy-Preserving AI”, “Generative
Model Inversion”, “GAN-based Model Inversion”, and “Data Reconstruction Attack”.
The Population, Concept, and Context framework (Ullah et al. 2023) is applied to refine

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 7 of 52 242

the search scope. The population consists of deep learning architectures, including CNNs,
RNNs, transformers, and FL models. The concept focuses on techniques and defenses
related to MI attacks, while the context covers AI security, biometric recognition, healthcare
privacy, and cloud-based AI services.
A structured filtering process is implemented using Zotero 7.0 for reference manage-
ment and duplicate removal. The inclusion criteria ensure the selection of peer-reviewed
journal and conference papers on MI attacks and defenses, studies introducing novel attack
techniques such as GAN-based, black-box, and gradient inversion attacks, experimental
research evaluating attack effectiveness and defense robustness, and review papers sum-
marizing up-to-date developments on topics pertaining to deep learning MI attacks and
defenses. The exclusion criteria eliminate studies lacking empirical validation or theoreti-
cal discussions without experimental results, redundant works, and short papers, editorials,
and opinion articles without technical depth. As a result, about 500 research articles are
retrieved from various sources. Following a careful assessment to titles, abstracts, and full-
text review, it results in a final selection of approximately 180 high-quality papers for this
survey. The selected articles are collated into four themes. The first theme is about MI attack
methodologies. The second theme explores defensive countermeasures and examination
methods. The third theme focuses on evaluation metrics and datasets for MI attack research.
The fourth theme addresses emerging challenges and future research directions.

2 Fundamentals of MI attacks

MI attacks pose a significant threat to the privacy and security of deep learning models by
attempting to reconstruct sensitive input data from model parameters, outputs, or intermedi-
ate representations. Understanding the underlying mechanisms of MI attacks is crucial for
designing robust defense strategies. This section provides a comprehensive overview of MI
attacks, beginning with their definition and characteristics, followed by an analysis of how
these attacks exploit different components of deep learning models under various adver-
sarial knowledge scenarios.

2.1 Definition and characteristics

This subsection covers the mathematical formulation of deep learning models, their optimi-
zation process, and MI attacks that extract sensitive data from model parameters, outputs,
or representations.

2.1.1 Deep learning models

As a subset of machine learning, deep learning employs artificial neural networks with
multiple layers to model complex patterns in data. A deep learning model is composed of
interconnected layers of neurons, where each layer applies mathematical transformations to
input data to learn representations that are progressively more abstract (Goodfellow 2016).
The operation of a single layer can be mathematically described as
( )
h(l) = f W (l) · h(l−1) + b(l) (1)

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 8 of 52 W. Yang et al.

where h(l−1) is the input from layer l − 1 (i.e., previous layer); W (l) is the weight matrix
of the lth layer; b(l) is the bias vector of the lth layer; f (·) is the activation function (e.g.,
ReLU, sigmoid); and h(l) is the output of layer l (i.e., current layer). For a network with
multiple layers (e.g., n layers), the final prediction is

ŷ = σ (fn ◦ fn−1 ◦ . . . ◦ f1 (x))(2)

where symbol ◦ denotes function composition, which means that the output of one function
becomes the input to the next function; and σ(·) is a decision function (e.g., softmax) for
classification.
Model parameters Given a deep learning model, weight (W ) represents the connec-
tion strength between neurons in adjacent layers and is updated iteratively to improve the
model’s predictions. The loss function ℓ quantifies the discrepancy between the model’s
predictions and the actual target values, guiding the optimization process. Gradient ∇W
can be expressed by

∂ℓ
∇W = (3)
∂W
where ∇W is the gradient of the weight matrix W , indicating the direction and magni-
tude of change needed to minimize the loss function ℓ (Zhu et al. 2019). The term ∂W ∂ℓ
in
Equation (3) denotes the partial derivative of the loss function ℓ with respect to the weight
matrix W , capturing how small changes in W impact the loss.

2.1.2 MI attacks on deep learning models

MI attacks exploit a deep learning model’s parameters (e.g., gradients (Zhu et al. 2019; Zhao
et al. 2020)), output (e.g., confidence scores (Han et al. 2023)) or intermediate representa-
tions (Fang et al. 2023b) to reconstruct sensitive information about its inputs (e.g., training
data). Depending on the specific component being targeted, the optimisation process for MI
attacks varies:
MI attacks using parameters (e.g., gradients) The process involves solving an optimisa-
tion problem, typically as follows (quoted from Zhu et al. 2019):
 2
2
 ∂ℓ(F (x′ , W ), y′ ) 
(x , y ) = arg min
′∗ ′∗
∥∇W − ′
∇W ∥ = arg min 
 − ∇W  (4)

x′ ,y′ x′ ,y′ ∂W

where x′∗ and y′∗ are the recovered (dummy) input and label, which are iteratively opti-
mised to minimize the difference between the dummy gradients and the real gradients; ∇W
represents the real gradients of the loss function with respect to the model weights W , com-
puted using the true training data (x, y); ∇W ′ represents the dummy gradients, computed
using the dummy data (x′ , y′ ); and ∥ · ∥2 is the objective function that aims to minimize the
squared difference (L2 norm) between the dummy gradients ∇W ′ and the real gradients
∇W . By minimizing this difference, the dummy data becomes increasingly similar to the
true training data.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 9 of 52 242

MI attacks using outputs or intermediate representations These attacks rely on extracting


meaningful information directly from confidence scores or representations of intermediate
layers, often without requiring explicit optimisation (Han et al. 2023; Fang et al. 2023b). For
instance, intermediate representations may be analyzed to reconstruct input features without
the iterative process used for the MI attacks using parameters.

2.2 MI attacks under different model knowledge scenarios

Under different model knowledge scenarios, MI attacks may be carried out in different
ways; that is, access to the target model may be white-box, black-box or gray-box for adver-
saries (Pengcheng et al. 2018; Dibbo 2023).
White-box access. The adversary has full access to the model’s architecture and train-
ing algorithm. The adversary also has access to parameters (e.g., weights) or middle layer
activations.
Black-box access. The adversary interacts with the model only through the application
programming interface (API), providing input queries and observing corresponding out-
puts, such as prediction labels, class probabilities (softmax outputs), and logits (unnormal-
ized predictions). The adversary does not have direct access to the model’s parameters.
Gray-box access. It is also considered as a semi-white scenario (Khosravy et al. 2021),
in which the adversary has partial access, such as access to the architecture (e.g., model
structure), but not access to the parameters or training details. In some cases, the adversary
may know specific layers or output formats.

3 Taxonomy of MI attacks

MI attacks have become a major privacy threat that utilizes various methods to reconstruct
sensitive data from machine/deep learning models (Pang et al. 2024). In this section, MI
attacks are classified into three strategies: gradient inversion attacks, generative model-
based attacks, and optimisation-based attacks, each of which has its own operational mecha-
nisms and applications. The classification of MI attacks is illustrated in Fig. 2. By exploring
the unique features, strengths, and limitations of these strategies, we provide this structured
taxonomy to facilitate a comprehensive understanding of the evolving MI attacks.

3.1 Gradient inversion attacks

FL (federated learning) is a decentralized approach where model training occurs across mul-
tiple devices, exchanging only model parameters instead of raw data to enhance privacy. In
contrast, centralized learning (CL) collects all data on a central server for training purposes,
offering efficiency but increasing privacy and security risks. The differences between FL
and CL are demonstrated in Fig. 3 (adapted from Dibbo 2023). FL is still vulnerable to gra-
dient inversion attacks, which aim to reconstruct private data from shared gradients during
training. In FL environments, attackers can exploit the gradients exchanged between clients
and the central server to recover sensitive input data, undermining the privacy benefits of
decentralization.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 10 of 52 W. Yang et al.

Fig. 2 The structured taxonomy of the MI attacks in this survey

Fig. 3 Illustration of the differences between centralized learning (left) and federated learning (right)

Geiping et al. (2020) conducted an in-depth investigation into the privacy risks in FL
by analyzing the security of sharing parameter gradients. Contrary to the belief that gradi-
ent sharing is a privacy-preserving measure, the authors demonstrated that high-resolution
images could be reconstructed from parameter gradients, endangering privacy. The authors
gave a detailed analysis of how architectural choices and parameters could influence the
reconstruction of input images. Although differential privacy and gradient compression
were previously considered strong defense against gradient inversion attacks, Wu et al.
(2023) demonstrated that a simple adaptive method, leveraging a model trained on auxiliary

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 11 of 52 242

data, could successfully invert gradients and compromise the privacy of both vision and lan-
guage tasks, even in the presence of these defenses, showing that existing countermeasures
may have underestimated the privacy risks in FL.
DLG and iDLG. There are two basic gradient inversion attacks, called DLG (Zhu et al.
2019) and iDLG (Zhao et al. 2020), which aim to reconstruct (steal) an FL client’s local
data instances using the communicated ∇W gradients. The attacker generates a pair of
dummy data x′ and dummy labels y′ , which are used to generate dummy gradients (or
weight update) ∇W ′ . Then by optimising the dummy gradients ∇W ′ to be close to the
real gradients ∇W , the dummy data x′ will get close to the real data x. A key difference
between DLG and its improved version iDLG lies in the way they extract the ground truth
labels (Palihawadana et al. 2023).
The approximate gradient inversion attack (AGIC). Xu et al. (2022) proposed AGIC,
targeting FL models to reconstruct sensitive data through gradient or model updates. Unlike
traditional attacks that focus solely on single mini-batch updates, AGIC introduces a more
realistic scenario by considering multiple updates across several epochs. This approach esti-
mates gradient updates to avoid expensive simulations and enhances reconstruction qual-
ity by assigning increasing weights to layers in the neural network structure. The authors
demonstrated AGIC’s effectiveness through evaluations on CIFAR-10, CIFAR-100, and
ImageNet datasets. The method achieved as much as a 50% improvement in peak signal-to-
noise ratio (PSNR), while operating five times faster than earlier simulation-based attacks.
Refine gradient comparison and input regularization (RGCIR). Developed by Luo et al.
(2022), RGCIR is a more practical and effective method that addresses the limitations of
existing gradient inversion techniques by overcoming prior constraints, such as dependency
on batch normalization or limited applicability to small datasets. The authors utilized cosine
similarity to refine gradient comparison and input regularization to improve image recon-
struction fidelity. Additionally, the authors employed total variation denoising to enhance
the smoothness of reconstructed images. Experimental evaluations demonstrate that RGCIR
successfully recovers high-fidelity training data, even from large datasets like ImageNet,
marking a significant advancement in the field.
Logit maximization + model augmentation (LOMMA). Nguyen et al. (2023) re-exam-
ined current MI techniques, uncovering two main issues that limit their performance: sub-
optimal optimisation objectives and the problem of MI overfitting. The authors proposed
LOMMA with a revised optimisation objective that significantly improves the performance
of MI attacks, delivering notable enhancements across state-of-the-art MI algorithms. Addi-
tionally, they introduced a segmentation-adaptive model augmentation strategy to address
MI overfitting, a common issue where reconstructed images fail to accurately capture the
semantics of the training data.
An external gradient inversion attack (EGIA). Liang et al. (2023) introduced EGIA that
operate in a gray-box setting and target public shared gradients transmitted through inter-
mediate nodes in FL systems. Their work shows that even if both the client and the server
are honest and fully trustworthy, an external adversary can still reconstruct private inputs
by intercepting the transmitted gradients. The authors conducted extensive experiments on
multiple real-world datasets and verify the effectiveness of EGIA. This work highlights
the evolving threats in FL, where privacy vulnerabilities persist even in seemingly secure
settings.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 12 of 52 W. Yang et al.

Stepwise gradient inversion (SGI). Ye et al. (2024) proposed SGI, a two-step approach
that significantly improves the quality of image reconstruction. The authors first modeled
the coefficient of variation (CV) of features, and then applied an evolutionary algorithm to
accurately recover labels, followed by a stepwise gradient inversion attack to optimise the
convergence of attack results. This method successfully recovers high-resolution images
from complex models and large batch sizes, revealing the inherent vulnerabilities of dis-
tributed learning.
Generative gradient inversion (GGI). Zhu et al. (2024a) introduced GGI to address the
limitations of traditional gradient inversion methods, which rely on gradient matching to
reconstruct virtual data but are often bogged down by high-dimensional search spaces.
Instead of directly optimising dummy images, GGI leverages low-dimensional latent vec-
tors generated through a pre-trained generator, significantly reducing the complexity of the
search process. Compared with existing methods, this approach not only improves the effi-
ciency of image reconstruction, but also enhances the quality of restored images.
Gradient inversion via neural architecture search (GI-NAS). Attributed to Yu et al.
(2024), GI-NAS overcomes the shortfall of existing gradient reversal approaches, which
rely on explicit prior knowledge, such as pre-trained generative models. Unlike traditional
methods that use fixed architectures, GI-NAS adaptively searches for neural architectures
to exploit implicit prior knowledge, enhancing the adaptability and generalization of attacks
in various settings. Extensive experiments showed that GI-NAS outperforms many exist-
ing methods, especially under challenging conditions such as high-resolution images, large
batch sizes, and advanced defense mechanisms.

3.2 Generative model-based attacks

With the emergence of Generative Adversarial Networks (GANs) as highly effective gen-
erative frameworks, recent MI attacks exploit the generative capabilities of GANs to recon-
struct sensitive input data from the outputs or intermediate features of a trained machine
learning model, even in black-box scenarios. In these attacks, adversaries leverage the
learned representations of a model to recover private information, such as facial images, by
generating high-quality approximations of the original data. By employing an adversarial
process-where a generator network endeavors to recreate input samples that align with the
model’s internal representations and a discriminator network assesses the realism of these
reconstructions-GAN-based MI attacks can effectively destroy privacy protection, thereby
posing significant risks to data confidentiality (Pang et al. 2024).
Generative model inversion (GMI) and its improved version variational model inversion
(VMI). Zhang et al. (2020) presented GMI that effectively utilizes DNNs to reconstruct sen-
sitive training data. By leveraging GANs to learn informative priors from public datasets,
the GMI attack is an improvement over previous model inversion methods, which struggle
with complex architectures and high-dimensional data spaces. GMI substantially increases
reconstruction quality, especially in tasks like face recognition, where it achieves high rec-
ognition accuracy. The key innovation is the use of a generative model to guide the inver-
sion process, which offers a more realistic and semantically rich reconstruction of privacy
data than earlier approaches such as basic MI attacks that rely on optimisation techniques.
Wang et al. (2021) introduced VMI, a variational framework that formulates the MI attack
as a variational inference (VI) problem. In this framework, a generative model (usually a

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 13 of 52 242

GAN) is trained on an auxiliary dataset so that the attack can efficiently learn a prior dis-
tribution similar to the target data. This approach is superior to previous MI techniques,
especially in terms of restoring more realistic and diverse samples while retaining a high
level of target accuracy. By optimising a variational target containing a Kullback–Leibler
divergence term, the attack allows for control over the trade-off between realism and accu-
racy, resulting in a more flexible and efficient reconstruction of sensitive data, such as faces
or medical images. The VMI approach is built on GAN-based MI techniques (e.g., Zhang
et al.’s GMI attack (Zhang et al. 2020)), but expands the functionality of these techniques
by integrating probabilistic models and exploiting more meaningful normalised streams in
the latent space.
KEDMI. Proposed by Chen et al. (2021), KEDMI is a novel GAN optimised for specific
inversion tasks. The core of KEDMI is training a discriminator that not only differentiates
between real and generated samples, but also discriminates the soft labels provided by the
target model. This approach enables the generator to retain image statistics that are more
relevant to the class inference of the target model, which are also most likely to be present
in the private training data that is unknown. In addition, the authors introduced a distribution
recovery method that does not recover individual data points, but rather to restore the distri-
bution of training samples corresponding to a given label. Such a many-to-one assumption
is more realistic, since a classifier naturally correlates to multiple training samples. The
experiments showed that this knowledge-rich distribution inversion attack method signifi-
cantly improves the success of the attack and demonstrates its validity across a wide variety
of datasets and network architectures. This work strengthens the performance of MI attacks
and offers new ideas about improving GANs for specific attack tasks.
Pseudo label-guided MI (PLG-MI). Yuan et al. (2023) designed the PLG-MI attack,
which uses a conditional GAN (cGAN) to enhance the effectiveness of the MI attack by
increasing visual quality and the success of reconstructed data. A notable novelty of PLG-
MI is the use of a top-n selection scheme to generate pseudo-labels from public data, decou-
pling the way the cGAN is trained from different classes in the search space, and reducing
the scope of the optimisation problem. In addition, the authors introduced the max-margin
loss, which further improves the search process by focusing on the subspace of specific tar-
get categories. Extensive experiments showed that PLG-MI performs better than previous
GAN-based attacks with significant changes in distribution, enabling higher-fidelity recon-
struction and improved attack success across a variety of datasets and models.
Intermediate features enhanced generative model inversion (IF-GMI). Developed by Qiu
et al. (2024), IF-GMI seeks to enhance the performance of MI attacks by utilising intermedi-
ate features of the GAN architecture, which improves over existing MI attack approaches
that depend only on latent space representations in GANs. By breaking down the GAN
architecture and not only optimising the latent code but also the intermediate features, IF-
GMI significantly increases the representational power of the model, resulting in a more
realistic and semantically accurate reconstruction of private data. The authors implemented
L1 ball constraints during the optimisation to prevent the generation of unrealistic images,
which allowed the method to outperform existing MI attacks on multiple benchmarks, espe-
cially in out-of-distribution (OOD) scenarios.
MIRROR. An et al. (2022) proposed MIRROR, a leading-edge methodology in the area
of GAN-based MI attacks, demonstrating remarkable progress in the fidelity of reversed
samples. This work is based on the distinctive architecture of StyleGAN, which helps to

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 14 of 52 W. Yang et al.

break down input styles at different granularities. This enables the model to learn and regen-
erate these styles individually during training, a process that is especially beneficial for MI
attacks. Given the target labels, MIRROR takes a StyleGAN trained on public data and
applies either gradient descent or genetic search algorithms together with distribution-based
clipping to locate the optimal parameterisation of the styles. The samples thus generated not
only conform to the target labelling criteria set by the subject model, but are also recogniz-
able by humans. It is shown that MIRROR’s reversed samples exhibit higher fidelity than
existing techniques, advancing MI attacks in the privacy threat domain.
Plug and play attacks (PPA). Early studies (e.g., GMI Zhang et al. 2020, VMI Wang
et al. 2021) demonstrated the utility of GANs as image priors tailored to specific models,
but these methods are found to be resource-intensive, inflexible, and vulnerable to dataset
distributional shifts. Struppek et al. (2022) devised PPA, an innovative approach that decou-
ples the dependency between the target model and the image prior, allowing for the use of
a single pre-trained GAN to attack multiple targets with minimal adjustments. This study
revealed that these attacks remain robust under strong distributional shifts, while producing
high-quality images that expose sensitive class features.
Boundary-repelling model inversion (BREP-MI). Kahla et al. (2022) proposed the
BREP-MI algorithm, demonstrating the viability of MI using only the labelling information
of model predictions. In this context, the attacker only has access to the output labels of the
model and not any confidence scores or soft labels, thus making it much more challenging.
The key novelty of BREP-MI is its boundary exclusion mechanism, whereby the predictive
labels of the model are assessed in a spherical region, guiding the attacker to the direction of
the centroid of the estimated target category. The BREP-MI approach proved to be capable
of high-fidelity reconstruction of private training data (e.g., face images) even in the absence
of detailed model knowledge. The authors compared BREP-MI with existing white-box and
black-box MI attacks and found that BREP-MI performs better than black-box methods and
is close to the effectiveness of white-box attacks, but requires much less model information.
Coarse-to-fine model inversion (C2FMI) attacks. As the scope of the threat of MI
attacks expands, black-box scenarios in which attackers have limited access to models have
received much attention. Ye et al. (2023) introduced C2FMI, a two-phase attack aimed at
increasing the effectiveness of MI attacks under black-box situations. The first phase of
C2FMI involves an inversion network that ensures that the reconstructed image (known
as the attack image) is located near the stream shape of the training data. In the second
phase, the attack uses a black-box oriented tactic to refine these images so that they closely
resemble the original training data. Notably, C2FMI even outperforms some white-box
MI attacks, which manifests the validity of its two-phase design. This work also gave a
robustness analysis on assessing the consistency of C2FMI in comparison with existing MI
attacks, offering insights into the robustness of the attack. Moreover, the authors explored
the potential defenses against C2FMI.
RLBMI. Han et al. (2023) presented a Reinforcement Learning (RL)-based method for
black-box MI attacks to address the limitations of GANs. The authors designed the search
for private data in the latent space as a Markov decision process, using RL to guide the
exploration of this space and rewarding the confidence score of the generated images. This
new method enables effective navigation through the potential space and greatly increases
attack performance compared to existing black-box GAN-based attacks. Experiments per-
formed on a variety of datasets and models demonstrate that this approach delivers strong

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 15 of 52 242

performance, and that it successfully reconstructs private training data in black-box scenar-
ios with fewer queries and higher accuracy than traditional black-box-based GAN methods.
LOKT. Nguyen et al. (2024) investigated the scenario of a challenging label-only MI
attack, where the adversary only gets access to the predictive labels of the model with no
additional confidence scores or model information. To tackle the issue, the authors proposed
a promising technique named LOKT, which facilitates the knowledge transfer from the
opaque target model to the agent model. This approach exploits generative modelling tech-
niques, particularly through using a target model-assisted conditional generative adversarial
network, to achieve effective knowledge transfer and advanced white-box MI attacks in a
previously restricted setting. The experimental results demonstrated that LOKT beats exist-
ing label-only MI attacks by over 15% on various benchmarks, while providing optimised
query budgets. This research highlights that even the least important information (e.g., hard
labels) can be exploited to reconstruct sensitive training data.
SecretGen. Yuan et al. (2022) developed SecretGen, a novel approach to recovering pri-
vate information from pre-trained models using the generative power of GANs. The work is
particularly relevant to GAN-based MI attacks as it solves the problem of extracting sensi-
tive data in the absence of ground truth tags. SecretGen is composed of a conditional GAN
that acts as a generative backbone, a pseudo-tag predictor, and a latent vector selector. These
components work together to produce realistic images that may resemble private training
data for the target model. SecretGen works efficiently in both white- and black-box set-
tings, proving its usefulness in real-world scenarios. The extensive experiments showed that
SecretGen can achieve performance comparable to or better than existing methods, even if
the latter can access ground truth labels. In addition, the authors provided a comprehensive
analysis of SecretGen’s performance and its resistance to sanitised defenses, shedding light
on the importance of privacy-preserving practices in the era of transfer learning.
Gradient inversion over feature domains (GIFD). Fang et al. (2023b) presented GIFD to
augment privacy attacks in FL systems. In contrast to traditional gradient inversion attacks
that rely on pre-trained GANs running in the latent space, limiting their expressiveness and
generality, GIFD operates by breaking down the GAN model and performing optimisations
in the feature domain in the middle layer. This layered approach enables more accurate
gradient inversion, refining the optimisation progressively from the latent space to feature
layers that are closer to the output image. Additionally, the authors proposed an l1-sphere
constraint to mitigate the risk of generating unrealistic images, and extended the method to
an OOD setting, which addresses the mismatch between GANs’ training data and the FL
task. The experiments showed that GIFD attains superior pixel-level reconstruction, while
being effective under various defense strategies and batch sizes, rendering it a robust tool
for evaluating privacy vulnerabilities in FL systems.
Dynamic memory model inversion attacks (DMMIA). Built on the concept of leveraging
historical knowledge, Qi et al. (2023) designed DMMIA to enhance attack performance.
DMMIA employs two prototype-based representations: intraclass multicentre representa-
tion (IMR) and interclass discriminative representation (IDR). The IMR captures target-
related concepts through multiple learnable prototypes, and the IDR describes memory
samples as prototypes to extend privacy-relevant information. The use of these prototypes
allows DMMIA to generate more varied and discriminative outcomes than existing MI
attack methods, as evidenced by DMMIA’s excellent performance on various benchmark
datasets.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 16 of 52 W. Yang et al.

Patch-MI. Inspired by puzzle assembly, Jang et al. (2023) proposed Patch-MI to rein-
force MI attacks through patch-based reconstruction techniques. Traditional generative MI
attacks typically depend on auxiliary datasets, and the success depends on the similarity
among these datasets and the target dataset. When the distributions are different, these meth-
ods often yield impractical reconstructions. To resolve the issue, Patch-MI adopts a patch-
based discriminator in a GAN-like framework to reconstruct images even when auxiliary
datasets are dissimilar. Patch-MI also incorporates random transform blocks to increase the
ability to generalise reconstructed images, which eventually improves the accuracy of the
target classifiers. Patch-MI outperforms existing MI methods while preserving comparable
statistical quality, marking a big step forward in the field of MI attacks.
SIA-GAN. Challenging the presumed security of scrambling methods, Madono et al.
(2021) proposed the SIA-GAN model by mapping scrambled images back to their original
form through learning. The findings of the work indicate that certain transformations (e.g.,
block shuffling) hinder reconstruction, but the security of scrambling techniques has not
been adequately investigated.
FedInverse: a GAN-based framework for MI attack assessment. Wu et al. (2024) pre-
sented FedInverse to evaluate the risk of MI attacks in FL systems, in which an adversary
who masquerades as a benign participant can use a shared global model to reconstruct other
participants’ privacy data. In spite of continuous advances in defense mechanisms, FedIn-
verse reveals that current defenses are still ineffective against sophisticated MI attackers.
The authors optimised their approach using the Hilbert-Schmidt independence criterion to
increase the diversity of MI attacks generated. FedInverse is tested with three types of MI
attacks (i.e., GMI, KEDMI, and VMI) and demonstrated its effectiveness in reconstructing
private data from FL participants. It highlights the necessity of assessing the risk of privacy
breaches in FL.

3.3 Optimisation-based attacks

In MI attacks, optimisation-based methods refine sensitive data reconstruction by iteratively


minimizing loss functions so that the models’ outputs or internal representations align with
specific criteria. These methods are highly versatile and suitable for a variety of scenarios,
such as white-box attacks (where model gradients are accessible) and black-box attacks
(where only predictions are used). The reconstruction is guided by well-designed loss func-
tions, including gradient matching loss (minimizing the difference between the observed
gradient and the gradient computed from the reconstructed inputs) and output matching loss
(minimizing the discrepancy between the reconstructed inputs and the model outputs from
the target inputs) (Guo et al. 2024). Utilizing an optimisation algorithm such as gradient
descent or stochastic gradient descent, these methods iteratively adjust the reconstructed
inputs to make high-fidelity data recovery (Zhu et al. 2019; Fang et al. 2024; Guo et al.
2024).
Gradient matching. An attacker can utilize the gradient updates shared during the FL
process to reconstruct individual training data. To be specific, the attacker minimizes the
difference between the observed gradient and the gradient computed from the reconstructed
input. Most of the attacks in the gradient inversion category, such as DLG (Zhu et al. 2019),
iDLG (Zhao et al. 2020), and LOMMA (Nguyen et al. 2023), fall in this catergory.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 17 of 52 242

Output matching. An attacker reconstructs inputs that generate outputs similar to the
target model. The data is reconstructed through matching the confidence scores of the tar-
get model (Guo et al. 2024). For instance, the study in Fredrikson et al. (2014) shows that
adversarial access to linear classifiers exposes sensitive genomic information in personal-
ized medicine. Based on this notion, the authors introduced a novel MI attack that utilizes
confidence values output by machine learning models to make these attacks applicable to a
variety of settings.

3.4 Miscellaneous MI attacks

The following MI attacks do not fit into those well-defined categories in Sects. 3.1–3.3, and
are therefore included in this section.
DeepInversion. Yin et al. (2020) introduced DeepInversion to synthesize high-quality
images from pre-trained neural networks that do not need access to the original training
data. Unlike traditional MI techniques which may require auxiliary information or shal-
low models, DeepInversion uses internal feature statistics stored in the batch normalization
layer to generate realistic images. DeepInversion is well suited for tasks such as data-free
knowledge refinement, pruning, and continuous learning. DeepInversion’s ability to syn-
thesize images with high levels of fidelity and contextual accuracy sets it apart from earlier
techniques such as the MI attacks of Fredrikson et al. (2015), which focus on reconstructing
base class images via gradient descent methods. While GANs are widely used for generative
modeling, DeepInversion provides an efficient alternative, making it an important contribu-
tor to privacy-relevant inversion attacks and knowledge transfer scenarios.
RL-GraphMI. Zhang et al. (2022) studied MI attacks against graph neural net-
works (GNNs), highlighting the privacy risks inherent in graph data because of its relational
structure. The authors emphasized that while MI attacks are effective in lattice-like domains,
their application to non-lattice structures (e.g., graphs) tends to produce sub-optimal results
due to the uniqueness of graphs. To tackle this problem, the authors introduced GraphMI, a
novel approach that incorporates a projected gradient module to manage graph edge discret-
ization and maintain feature sparsity, as well as a graph autoencoder to effectively exploit
graph topology and attributes. Furthermore, the authors introduced RL-GraphMI for hard-
labeled black-box settings, which uses gradient estimation and reinforcement learning to
assess the risk of MI associated with edge effects. Through extensive experiments on public
datasets, the authors validated RL-GraphMI, and also assessed two existing defense meth-
ods, differential private training and graph preprocessing, finding that they are not sufficient
to defend against privacy attacks.
Explainable artificial intelligence (XAI). MI attacks pose substantial privacy risks, par-
ticularly when model interpretability is augmented with interpretable XAI. Zhao et al.
(2021) investigated these risks and identified a wide variety of attack architectures that
leverage model interpretation for the reconstruction of private image data, proving that such
interpretation may unintentionally expose sensitive information. The authors developed a
multimodal transposition CNN architecture that exhibits a significantly higher inversion
performance than methods that rely solely on target model predictions. The results indicate
that the spatial knowledge embedded in image interpretation can be used to improve attack
efficiency. Moreover, the authors emphasized that even with an unexplained target model,
inversion performance can be improved by an attention-shifting approach that reconstructs

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 18 of 52 W. Yang et al.

the target image without the target model by first inverting the interpretation from the proxy
model. This research shows that while XAI helps user understanding, it increases vulner-
ability to privacy attacks, and thus new privacy-preserving techniques are required to bal-
ance the dual requirements of AI interpretability and privacy.
Supervised model inversion (SMI). Tian et al. (2023) explored the usefulness of class
information in MI attacks and proposed SMI. This approach decreases the reliance on a
priori target information by learning pixel-level and data-to-category features from the vic-
tim model’s polled output and labeled auxiliary datasets. Their findings indicate that the
inversion samples generated by SMI are visually more convincing and richer in detail than
existing MI methods. In addition, this study found that the makeup of the auxiliary dataset
plays a vital role in determining the quality of the reconstructed samples, and that ground
truth labeling, while useful, is not essential for a successful attack.
Label smoothing model inversion (LSMI). Struppek et al. (2023) studied the effects of
label smoothing, a common regularization technique, on model vulnerability to MI attacks.
The findings suggest that traditional label smoothing softens class labels to enhance gener-
alization and calibration, but inadvertently increases the model’s vulnerability to MI attacks,
thereby magnifying privacy leakage. To address the issue, the authors introduced LSMI,
a new label smoothing approach through applying a negative factor. LSMI shows greater
resilience against MI attacks and hinders the extraction of category-relevant information,
outperforming existing MI defense methods.
Ensemble model inversion (EMI). Wang and Kurz (2022) presented EMI, which enhances
traditional MI methods by simultaneously utilizing multiple trained models to infer the dis-
tribution of the original training data. Using ensemble models provides the adversary with
a much richer perspective, resulting in a higher quality reconstruction of training samples.
EMI significantly improves single-model inversion, allowing the generated samples to
display distinguishable features of dataset entities. Furthermore, the authors explored the
effectiveness of EMI in the absence of reliance on auxiliary datasets, showing high-quality
data-free MI. This study shows the importance of model diversity in ensembles and includes
additional constraints to boost prediction accuracy and reconstructed sample activation.

3.5 Summary

Gradient inversion attacks, generative model-based attacks, and optimisation-based attacks


are at the core of MI. Each of them has different methods and applications.
Gradient inversion attacks utilize the gradient information shared during federated or
distributed training to reconstruct sensitive data. Methods such as DLG and iDLG allow for
the alignment of fake gradients with real ones, while more advanced methods like AGIC
improve scalability and accuracy. These attacks are efficient in either white- or gray-box
settings, but rely on gradient access, limiting their scope.
Generative model-based attacks take advantage of frameworks such as GAN to recon-
struct data by approximating data distributions guided by model outputs or intermediate
features. Techniques such as GMI and VMI improve fidelity by leveraging latent spatial
priors or probabilistic modeling, and are thus effective in both white- and black-box set-
tings. Nevertheless, they typically need auxiliary datasets or pre-trained models, which adds
complexity and resource requirements.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 19 of 52 242

Optimisation-based attacks, covering aspects of gradient inversion attacks and genera-


tive model-based attacks, iteratively refine the reconstructed data by keeping inputs consis-
tent with outputs, gradients, or intermediate features via tailored loss functions. While these
attacks have the accuracy of gradient inversion and the flexibility of generative methods,
they are computationally resource-intensive and rely on specific attack inputs, highlighting
the challenges of defending against various MI threats.
Figure 4 presents an illustrative example of gradient inversion attacks vs. generative
model-based attacks. Moreover, structured around a host of key factors, Table 2 presents a
detailed comparison of gradient inversion attacks and generative model-based attacks.

3.6 MI in different applications

MI attacks pose a significant privacy threat across various domains where deep learning
models process sensitive data. These attacks exploit learned representations to reconstruct
private inputs, resulting in privacy and data breaches in biometric recognition systems,
healthcare, and financial services.

3.6.1 MI in biometric recognition systems

Biometric systems, especially those used for facial recognition (Zhang et al. 2024), fin-
gerprint authentication, and iris scanning, rely heavily on deep learning models to process
and match individuals’ unique biological traits. These systems are often deployed in high-
stakes environments such as border control, law enforcement, and secure access control. MI
attacks on these systems can result in the reconstruction of highly sensitive data (e.g., an
individual’s facial image or fingerprint). This poses severe privacy risks, as biometric data,
once compromised, cannot be easily changed or replaced like passwords or other traditional
security measures (Yang et al. 2019a, b, 2022). By reconstructing biometric data, adversar-
ies can bypass authentication systems, impersonate individuals, or misuse this data for mali-
cious purposes. Thus, studying MI attacks and defenses (Ahmad et al. 2022; Huang et al.

Fig. 4 A demonstrative example of gradient inversion attacks vs. generative model-based attacks

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 20 of 52 W. Yang et al.

Table 2 Comparison of gradient Key factors Gradient inversion Generative model-


inversion attacks and generative attacks based attacks
model-based attacks
Primary Gradient matching Leverages generative
methodology through optimisation models (e.g., GANs)
Input dependency Requires gradient Uses outputs or inter-
updates mediate representations
Flexibility Limited to scenarios Applicable in both
with gradient access white-box and black-
box settings
Computational Relatively low due Higher, due to gen-
complexity to direct gradient erative modeling and
optimisation training
Auxiliary data Often unnecessary Frequently relies on
requirement auxiliary datasets
Reconstruction High for gradient- High, particularly for
fidelity specific data complex distributions
Application Federated learning General machine learn-
scenarios or gradient-sharing ing settings
settings

2024; Khosravy et al. 2022, 2021; Yoshimura et al. 2021) is critical to ensuring the security
and privacy protection of biometric systems.

3.6.2 MI in healthcare systems

Healthcare is another area where deep learning models are widely employed to aid in diag-
nosing diseases, predicting patient prognosis, and planning treatments (Ra et al. 2021; Pei
et al. 2022; Wei et al. 2023). These models often deal with highly sensitive medical records,
including patient history, imaging data (e.g., X-rays or MRIs) (Dao and Nguyen 2024), and
genetic data. MI attacks on healthcare models present a serious threat because it is likely that
adversaries can recover private health information from the model’s output. Exposure of
healthcare data through MI attacks can lead to many forms of harm, including identity theft,
insurance fraud, and discrimination due to medical conditions (Hatamizadeh et al. 2023).
Furthermore, improper handling of such sensitive data may have severe legal consequences
for healthcare providers, undermine patient trust, and jeopardize the credibility of AI-based
healthcare solutions. Understanding and mitigating MI attacks in this context is therefore
vital to patient privacy protection and ethical deployment of AI in healthcare.

3.6.3 MI in financial systems

In finance, deep learning models are increasingly used for tasks such as fraud detection,
credit scoring and risk evaluation (Milner 2024). These models run on large amounts of
sensitive financial data, including transaction history, personal financial information, and
credit history. MI attacks on financial models allow adversaries to infer private financial
details, reconstruct specific transactions, resulting in financial fraud or identity theft, or
other severe financial consequences for individuals and organizations. Personal financial
data, once exposed, can be used for unauthorized transactions or to compromise other sys-
tems. In addition, successful MI attacks could damage confidence in AI-based financial
services that rely on trust and confidentiality (Galloway et al. 2024). Thus, it is important for

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 21 of 52 242

Fig. 5 MI attacks across various applications where different data types are processed

financial institutions to take the potential impact of MI attacks seriously and develop defen-
sive measures to ensure the integrity of financial models and the security of customer data.

3.7 MI on different data types

MI attacks across different application domains, including biometric systems, healthcare


systems, and financial systems and each of these applications processes different data types,
such as images, audio, and text/tabular data, which are susceptible to MI attacks, as demon-
strated in Fig. 5. The latest advances in MI attacks increase the vulnerability of these data
types to MI attacks. MI attacks leverage the learned parameters, features or outputs of deep
learning models to reconstruct sensitive training data, thereby compromising the privacy
of user information. As deep learning systems increasingly penetrate areas such as visual
recognition, speech processing, natural language understanding, and analysis of structured
data, the possibility of privacy breaches by malicious attacks is increasing.

3.7.1 MI on images

MI attacks expose a critical privacy vulnerability in deep learning systems, particularly in


applications dealing with image data. These attacks aim to reconstruct input images from
the outputs or parameters of trained models, compromising the confidentiality of training
data. In image-based systems, MI attacks can exploit gradient information (Geiping et al.
2020; Zhu et al. 2019; Zhao et al. 2020), or model outputs (e.g., confidence scores (Guo
et al. 2024)) to recover visual details of training samples, including personally identifiable
information. Addressing these risks has led to the development of defensive strategies, such
as feature perturbation (Sun et al. 2021), cryptographic techniques (Prakash et al. 2020) and
differential privacy (Li et al. 2024), against the MI attacks targeting images.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 22 of 52 W. Yang et al.

3.7.2 MI on audios

MI attacks on audio data create distinct and pressing challenges for privacy protection in
machine learning systems. These attacks exploit model parameters or outputs to reconstruct
sensitive audio inputs, such as spoken phrases or unique acoustic features, potentially expos-
ing private or proprietary information. In view of the inherently rich temporal and spectral
characteristics of audio data, MI attacks can disclose subtle features of the training samples,
including speaker identity or confidential speech content. For example, Pizzi et al. (2023)
extended the application of MI attacks to automatic speech recognition systems, uncovering
the potential for voice extraction from trained models. The authors introduced a new tech-
nique called sliding model inversion, which augments traditional MI attacks by iteratively
inverting overlapping audio chunks. This approach makes effective use of the sequential
nature of audio data, accomplishing the reconstruction of audio samples and the extraction
of intermediate speech features, thus offering important lessons for speaker biometrics. The
authors demonstrated that inverted audio can be used to generate spoofed samples to imper-
sonate a loudspeaker and execute voice-protected commands in a secure system.

3.7.3 MI on texts and tabular data

MI attacks on texts and tabular data reveal privacy vulnerabilities in natural language
processing (NLP) models, in which sensitive information such as personally identifiable
information, confidential communications, or proprietary content can be reconstructed from
model outputs or learned representations. These attacks use patterns in embedding vec-
tors (Petrov et al. 2024), labeling probabilities, or gradients to infer training data, creat-
ing risks for applications that range from sentiment analysis to language generation. The
structural and context-dependent nature of textual data amplifies the likelihood of sensitive
reconstruction, especially if the training dataset contains private or domain-specific infor-
mation. MI-related studies on texts and tabular data are reviewed below.
DAGER. Petrov et al. (2024) proposed the DAGER algorithm, designed specifically for
accurate gradient inversion of large language models. DAGER addresses a major drawback
of earlier research that focused on image data. While initial attacks on the text domain
were restricted to approximate reconstruction of small or short input sequences, DAGER
overcomes these inadequacies by utilizing the low-rank structure of the gradient of the self-
attentive layer and the discrete nature of the token embedding vectors. The authors showed
that DAGER can achieve accurate recovery of entire batches of input texts with the use of
efficient GPUs, outperforming existing approaches in terms of speed, scalability, and recon-
struction quality.
Text revealer. Text Revealer (Zhang et al. 2022) is the first systematic study of MI attacks
for text reconstruction in transformer-based classifiers. The study showed how to faithfully
reconstruct private training data by utilizing external datasets and GPT-2 to produce class-
domain text and then perturb it based on feedback from the target model. A large number of
experiments were conducted to demonstrate the effectiveness of their attack on text datasets
of varying lengths and the successful reconstruction of sensitive text information.
TabLeak. Proposed by Vero et al. (2023), TabLeak is the first comprehensive MI attack
tailored to tabular data. TabLeak is able to achieve over 90% reconstruction accuracy on
private data despite large batch sizes, thus destroying assumptions about the security of FL

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 23 of 52 242

protocols such as FedSGD and FedAvg. Earlier research in other domains (e.g., images) has
found similar threats. Nevertheless, the distinctive challenges of dealing with mixed types
of tabular data with discrete and continuous features require new solutions such as soft-max
relaxation and entropy-based uncertainty quantization.

3.8 MI on emerging generative AI and foundation models

With the recent rise of generative AI and foundation models like diffusion models (e.g.,
Stable Diffusion, DALL-E) and Large Language Models (LLMs) such as GPT-4, increased
concerns about MI attacks have arisen. In contrast to traditional deep learning architec-
tures, these models produce highly complex outputs and are therefore particularly vulner-
able to data reconstruction attacks (Feretzakis et al. 2024). Attacks, such as MI attacks, seek
to reconstruct training data from model outputs, raising ethical and security concerns. MI
attacks leverage the model’s learnt representations to restore sensitive training data, posing
substantial privacy risks in applications involving text, images, and audios. It is worth not-
ing that MI attacks present significant privacy and security threats to generative AI models,
including diffusion models, LLMs, and other foundation models. Although privacy-preserv-
ing techniques continue to advance, many challenges remain.

3.8.1 MI on LLMs

As deep learning evolves, LLMs (e.g., OpenAI’s GPT, Anthropic’s Claude, and Meta’s
Llama) have greatly enhanced the ability to efficiently process a variety of downstream NLP
tasks and unify them into a generative pipeline. Nevertheless, unrestricted access to these
models also brings about potentially malicious attacks, such as MI attacks (Li et al. 2024).
Embedding techniques are able to transform textual data into rich, dense numerical rep-
resentations that capture semantic and syntactic properties, and thus have become a corner-
stone of LLM functionality (Liu et al. 2024). Embedding privacy is critical because these
embeds often contain sensitive information about user data. Embedding inversion attacks
are a specific subset of MI attacks that focus on restoring the original input data (Li et al.
2024) from feature embeddings. For example, Pan et al. (2020) transforms a text sequence
into a set of words to perform multi-label classification to predict multiple words for given
embeddings. Morris et al. (2023) considered the issue of language model inversion, show-
ing that the next-token probabilities contain a large amount of information about the preced-
ing text. The approach demonstrates for the first time that language model predictions are
mostly reversible. In a number of cases, the authors are capable of recovering inputs that are
similar to the original text, sometimes even recovering the input text completely.
Chen et al. (2024) studied the security of LLMs from the perspective of multilingual
embedding inversion. Specifically, the authors defined the problem of black-box multilin-
gual and cross-lingual inversion attacks with a special focus on cross-domain scenarios. The
results show that multilingual models are more vulnerable to inversion attacks than mono-
lingual models. Dai et al. (2025) discovered a realistic attack surface of LLMs: privacy
leakage of training data in decentralized training, and proposed the first activation inversion
attack (AIA). The AIA utilizes a public dataset to construct a shadow dataset consisting of
text labels and corresponding activations. Using this dataset, an attack model can be trained
so as to reconstruct the training data from the activations in decentralized training. The

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 24 of 52 W. Yang et al.

authors conducted experiments on various LLMs and publicly available datasets to show the
vulnerability of decentralized training to AIA.
Shu et al. (2025) identified MI attacks in a split learning framework for LLMs, high-
lighting the necessity of security defenses. They introduced mutual information entropy for
the first time to understand the information propagation of transformer-based LLMs and
evaluated the privacy attack performance of LLM blocks. The authors proposed a two-stage
attack system where the first stage projects representations into the embedding space, and
the second stage uses a generative model to recover text from the embeddings. This work
highlights comprehensively the potential privacy risks when deploying personalized LLMs
at the edge side. To reduce embedding inversion attacks, Liu et al. (2024) proposed Embed-
ding Guard (Eguard). Eguard employs transformer-based projection networks and textual
mutual information optimization to protect embeddings while retaining the utility of LLMs.
This approach greatly reduces privacy risks and protects more than 95% of the tokens from
being inverted, while maintaining high performance consistent with the original embed-
dings in downstream tasks.

3.8.2 MI on diffusion models

Diffusion models are emerging as favorable models for generating exceptionally high reso-
lution image data. Diffusion models generate higher quality samples and are much easier to
scale and control than previous models such as GAN. As a result, they have rapidly become
the de facto approach for producing high-resolution images, and large-scale models such as
DALL-E have attracted considerable public interest (Carlini et al. 2023).
However, diffusion models like stable diffusion are also subject to MI attacks that can
extract training data from diffusion models. Chen et al. (2025) provided a comprehensive
review of recent advances in image inversion techniques, highlighting two main paradigms:
GAN inversion and diffusion model inversion. In the context of diffusion model inversion,
the authors explored training-free strategies, fine-tuning methods, and the design of addi-
tional trainable modules, and emphasized their unique advantages and limitations. Carlini
et al. (2023) demonstrated that diffusion models memorize individual images from training
data and emit these images at generation time. By using a generate-and-filter pipeline, the
authors pulled over a thousand training examples from state-of-the-art models, varying from
photos of individual people to company trademark logos. They also trained hundreds of dif-
fusion models in a variety of settings to analyze how different modeling and data decisions
impact privacy. The findings suggest that diffusion models are much less private than previ-
ous generative models such as GANs.
Huang et al. (2024) investigated the privacy leakage risk of gradient inversion attacks.
The authors designed a two-stage fusion optimization scheme that uses the trained genera-
tive model itself as prior knowledge to constrain the inversion search (latent) space, fol-
lowed by pixel-level fine-tuning. The results show that the proposed optimization scheme
can reconstruct an image almost identical to the original image. Wu et al. (2025) presented
a practical gradient inversion method, namely Deep Generative Gradient Inversion, which
utilizes the diffusion model’s prior knowledge to improve the reconstruction performance
for high-resolution datasets and larger batches. In addition, in order to address the spa-
tial variation problem caused by the pre-trained diffusion model, group consistency regular
terms are developed to constrain the distance between the reconstructed and aligned images.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 25 of 52 242

Zhang et al. (2024) proposed a means of directing the inversion process of a diffusion model
toward a synthetic embedding of the core distribution. In addition, the authors introduced a
spatial regularization method to balance the attention to the concepts being composed. This
is devised as a post-training method that can be seamlessly integrated with other inversion
approaches. Experimental results show that the proposed method is effective in mitigating
the overfitting problem and generating more diverse and balanced combinations of concepts
in synthetic images.

4 Defenses against MI attacks

This section outlines prominent and emerging defense strategies against MI attacks, divided
into six categories: feature perturbation/obfuscation, gradient pruning, gradient perturba-
tion/obfuscation, differential privacy, cryptographic encryption, and model/architecture
enhancement. Each category has its unique approaches designed to protect data privacy
while preserving model utility and computing efficiency.

4.1 Feature perturbation/obfuscation

Feature perturbation and obfuscation are crucial in alleviating MI attacks as they are
designed to degrade the fidelity of the extracted data without undermining the utility of
the underlying machine learning model. These techniques utilize various strategies, such
as adding noise, transforming features, or hiding sensitive information, to distort data rep-
resentations and thwart adversarial inference. Several feature perturbation and obfuscation
approaches are summarized below.
Vicinal risk minimization (VRM). Ye et al. (2024) provided a thorough analysis of the
success factors of GIAs and explained privacy risks inherent in distributed learning frame-
works. The authors pointed out that while current defense strategies are abundant, they often
degrade the performance of global models or require too much computational resources.
This study also points to the gap in understanding the root cause of data leakage during
distributed training. GIA is facing challenges in terms of model robustness, especially as
changes in model structure can affect attack results. In response, the authors proposed, a
plug-and-play defense scheme that leverages VRM and data augmentation based on neigh-
borhood distributions, which as demonstrated, effectively enhances privacy without com-
promising model usability.
Defense by concealing sensitive samples (DCS2). Wu et al. (2024) put forward DCS2,
a defense strategy against MI attacks in FL. FL is vulnerable to attacks that take advantage
of low entanglement between the gradients, so the authors proposed a method to synthesize
hidden samples. These samples imitate sensitive data at the gradient level while appearing
different visually, thus obfuscating adversaries trying to reconstruct the data. Experimental
results shown that DCS2 offers excellent protection for sensitive data without compromis-
ing the performance of FL systems, thereby setting a new standard for privacy-preserving
defenses in FL.
Automatic transformation search (ATS). Gao et al. (2021) discussed and addressed a
critical flaw in collaborative learning environments where the sharing of gradients can lead
to the reconstruction of sensitive training data. The authors introduced ATS to identify care-

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 26 of 52 W. Yang et al.

fully selected transformation policies that not only protect data privacy but also maintain
the model’s efficacy, obfuscating sensitive information. This work offers a practical solution
to safeguard collaborative learning systems against gradient-based reconstruction attacks.
Soteria. Sun et al. (2021) proposed Soteria, a defense mechanism specifically designed to
address the leakage of data representations within gradients, which is believed to be the main
channel of privacy leakage in FL. The authors gave a comprehensive analysis of how data
representations are embedded in model updates and described the potential for MI attacks.
Soteria involves perturbing data representations to reduce the quality of reconstructed data
while preserving the performance of the FL system. An important contribution of this work
is the derivation of proven robustness and convergence guarantees for perturbing model
updates, thereby ensuring the effectiveness of the defense without compromising accuracy.
Empirical evaluations against attacks such as DLG using the MNIST and CIFAR10 datasets
demonstrate that Soteria significantly improves privacy protection. This research provides
valuable insights into the characterization of privacy breaches in FL and paves the way for
more advanced defense strategies.
Crafter. Wang et al. (2024) proposed Crafter, a feature crafting mechanism to preserve
identity information against adaptive MI attacks. Different to traditional methods based on
adversarial games, Crafter misguides attackers by crafting features that direct them to non-
privacy priors. These carefully crafted features can effectively function as poison training
samples, restricting the attacker’s capability to reconstruct private identities, even when
adaptive counterattacks are employed. Experimental results show that Crafter performs bet-
ter than existing countermeasures, successfully defending against both basic and adaptive
MI attacks, while remaining functional for cloud-based deep learning tasks.
Sparse-coding architecture. Dibbo et al. (2024) explored a new approach to enhanc-
ing the robustness of neural networks against MI attacks, which utilize the output of the
network to reconstruct private training data. The authors proposed an innovative network
architecture that includes sparse coding layers, leveraging three decades of research on
sparse coding in areas ranging from image denoising to adversarial classification. This work
fills a gap in the understanding of how sparse coding can alleviate privacy vulnerabilities
in neural networks. The experimental results demonstrate that sparse-coding architectures
not only maintain a comparable or higher classification accuracy than existing defenses,
but also reduce the quality of training data reconstruction across multiple metrics (PSNR,
SSIM, FID) and datasets including CelebA, medical images, and CIFAR-10. In addition,
the authors provided the cluster-ready PyTorch codebase to facilitate further research and
standardization of defense evaluation against MI attacks.
Synthetic data generation with privacy-preserving techniques. Slokom et al. (2023)
found that training deep learning models on unprotected raw data may lead to the disclosure
of sensitive information, making individuals vulnerable to MI attribute inference attacks.
To reduce this risk, the authors designed a two-step method that couples synthetic data
generation with privacy-preserving techniques. Unlike previous methods that directly apply
privacy measures to the original dataset, this approach first replaces the original data with
synthetic data and then applies privacy-preserving techniques to maintain model perfor-
mance, aiming to strike a balance between privacy preservation and model accuracy. The
experiments show that this method decreases the success rate of MI attacks on “inclusive”
individuals (present in the training data) and “exclusive” individuals (not included in the
training data).

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 27 of 52 242

Image augmentation. Shin et al. (2023) empirically analyzed the use of image enhance-
ment as a defense strategy against MI attacks. The study shows that while differential
privacy is a common solution, it often involves a trade-off of privacy versus utility. In com-
parison, image enhancement techniques are promising in alleviating such attacks without
significantly impairing model performance. Through experiments on the CIFAR-10 and
CIFAR-100 datasets, the authors identified the optimal combination of enhancements for
different image classes, proving that these strategies work better than differential privacy.
This study suggests that image enhancement may be used as a feasible alternative to mini-
mize the loss of utility when defending against MI attacks.
Additive noise. Titcombe et al. (2021) indicated that even with limited information about
data distribution, it is possible for a malicious computational server to successfully perform
MI attacks that compromise user data privacy. In response, the authors proposed a simple
noise addition method that can minimize the effectiveness of MI attacks while keeping an
acceptable accuracy trade-off on the MNIST dataset.
Privacy-guided training. Goldsteen et al. (2020) presented a privacy-guided training
method specifically designed for tree-based models. This approach lessens the impact of
sensitive features in the model, thus decreasing the risk of feature-based inference without
sacrificing the accuracy of the overall model. The authors remarked that sensitive features
do not have to play a prominent role in model training, as privacy can be preserved by
emphasizing less sensitive attributes. The authors showed the effectiveness of the proposed
method in mitigating the risk of MI in both black-box and white-box scenarios.
Statistical features via knowledge distillation. Gao et al. (2024) studied the vulnerability
of FL to gradient inversion attacks, which allows the server to reconstruct the client’s train-
ing data by reversing the uploaded gradients. To mitigate this risk, the authors proposed
a defense mechanism that exploits statistical information/features derived from the train-
ing data rather than the data itself. Motivated by statistical machine learning, this method
involves training lightweight local models via knowledge refinement so that the server
can only access semantically meaningless statistics. The experiment results show that this
method is superior to existing defense strategies in terms of reconstruction accuracy, model
accuracy, and training efficiency.

4.2 Gradient pruning

Gradient pruning, also known as gradient compression or sparsification, has emerged as a


privacy-preserving machine learning technique against MI attacks. By either selectively
pruning the gradient or reducing the gradient transmission, these methods are designed to
enhance privacy preservation, improve communication efficiency, and preserve model per-
formance. This section discusses gradient pruning methods for MI defense.
PATROL. Ding et al. (2024) proposed PATROL to deal with privacy risks in collab-
orative inference for edge devices. Although collaborative inference realizes advanced use
of DNNs by unloading intermediate results to the cloud, it is susceptible to MI attacks.
PATROL addresses this problem by reducing leakage of sensitive information through the
use of privacy-oriented pruning, deployment of more layers at the edge, and focusing on
task-relevant features. In addition, Lipschitz regularization (Virmaux and Scaman 2018)
and adversarial training increase the model’s robustness to MI attacks, effectively balancing
privacy, efficiency and utility.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 28 of 52 W. Yang et al.

Dual gradient pruning (DGP). Xue et al. (2024) investigated privacy challenges posed
by GIAs in collaborative learning. Given that existing defense methods (e.g., differential
privacy and cryptography) struggle to achieve a balance between privacy, utility and effi-
ciency, the authors proposed DGP, which enhances the traditional gradient pruning method
by improving the efficiency of communication in collaborative learning systems while
providing stronger privacy guarantees. The method aims to reduce the risk of data recov-
ery by selectively pruning the gradient to resist powerful GIAs without affecting model
performance. Through both theoretical analysis and extensive experiments, the authors
demonstrated that DGP not only provides strong GIA defense, but also reduces the commu-
nication cost significantly, making it an efficient solution for privacy-preserving collabora-
tive learning.
Guardian. Fan et al. (2024) developed Guardian to combat gradient leakage by opti-
mising two theoretically derived metrics: the performance maintenance metric and privacy
protection metric. These metrics are used to generate transformed gradients that minimize
privacy leakage while maintaining model accuracy. The authors introduced an innovative
initialization strategy to speed up the production of transformed gradients, thus increasing
the utility of Guardian. The authors provided theoretical convergence guarantees and con-
ducted extensive experiments on a variety of tasks, including tasks using visual transformer
architectures, showing that Guardian can effectively withstand state-of-the-art attacks with
no significant loss of accuracy. Guardian’s defense ability even in strict white-box scenarios
with Bayesian optimal adversaries highlights its potential for real-world applications.
Pruned frequency-based gradient defense (pFGD). Palihawadana et al. (2023) presented
pFGD, which incorporates frequency transformation techniques such as discrete cosine
transform and gradient pruning methods to strengthen privacy protection. The experimental
results on the MNIST dataset show that pFGD substantially reduces the risk from gradient
inversion attacks while exhibiting resilience and robustness.
Deep gradient compression (DGC). Lin et al. (2018) proposed DGC to keep model
accuracy while minimizing communication bandwidth. This approach reduces gradient
exchange by 99.9% via strategies such as momentum correction and local gradient clipping,
and proves its effectiveness on various datasets, including ImageNet and Penn Treebank,
with compression rates of up to 600 times. Although compression strategies like DGC con-
tribute to scalability and efficiency, they also raise questions about their potency to affect
the vulnerability of adversaries, as compression may change the attack surface of MI strate-
gies. Thus, the intersection of gradient compression and MI remains an area of interest in
protecting data privacy.

4.3 Gradient perturbation/obfuscation

Gradient perturbation and obfuscation can protect sensitive data during model training. Per-
turbation introduces small variations or noise into the gradient, masking sensitive data while
retaining utility, while obfuscation obscures the gradient, making it difficult to interpret.
These techniques are vital to defending against MI attacks. Some gradient perturbation and
obfuscation methods are reviewed below.
Model fragmentation, shuffle and aggregation. Masuda et al. (2021) used model frag-
mentation, shuffling, and aggregation to tackle MI attacks without compromising model
accuracy. In this work, each participant fragments, shuffles and aggregates models before

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 29 of 52 242

sharing them, thus making it difficult for adversaries to reconstruct data. The method main-
tains the integrity of the shared model in the defense of MI attacks, which is an important
step forward in enhancing the privacy of FL and reflects the ongoing requirement for robust
and effective privacy-preserving strategies in joint learning environments.
Autoencoder-based compression. Chen et al. (2024) presented an adaptive autoencoder-
based approach to compressing and perturbing model parameters before sharing them with
a server, thus improving privacy without sacrificing model performance. In contrast to tradi-
tional defense methods that introduce noise or sparse gradients (which might prevent model
convergence and increase communication costs), this approach allows clients to gain a com-
pressed representation of their local model parameters in just a few iterations. The empirical
evaluation shows that the proposed method reduces the communication rate by 4.1 times
compared to the joint averaging approach, while retaining almost the same model perfor-
mance as unmodified local updates. Furthermore, the results show that the method is effec-
tive in preventing information leakage, demonstrating the potential of autoencoder-based
compression techniques for achieving beneficial trade-offs between privacy preservation,
model efficiency, and communication costs in joint learning environments.
GradPrivacy. Lu et al. (2023) worked on a critical gap in privacy protection in collab-
orative learning systems, especially defenses against GIAs. While prior research mainly
focused on untrained models, the authors emphasized that trained models carry more infor-
mation and are equally vulnerable to attacks, and thus must be prioritized for protection.
To reduce privacy risks, the authors proposed GradPrivacy to protect privacy at all stages
of collaborative learning while not sacrificing model performance. GradPrivacy consists of
two main components: an amplitude perturbation module and a deviation correction mod-
ule. The former is used to perturb the gradient parameters associated with sensitive features
in order to impede gradient inversion. The latter aims to correct deviations in model updates
so that the model maintains its accuracy. Through extensive evaluation, the authors dem-
onstrated that GradPrivacy achieves an excellent balance between privacy and accuracy,
outperforming existing defenses by providing robust protection for well-trained models in
collaborative learning environments.
Quantization enabled FL. Ovi et al. (2023) showed that traditional defense mechanisms,
such as differential privacy, homomorphic encryption, and gradient pruning, tend to suffer
from drawbacks such as complex key generation, degraded performance, and difficulty in
selecting the optimum pruning ratio. To address these drawbacks, the authors designed a
joint learning scheme with hybrid quantization, in which different layers of a deep learning
model are quantized with different accuracies and modes. Their experimental evaluation
shows that this approach greatly improves robustness against iteration- and recursion-
based gradient inversion attacks, while keeping strong performance on multiple benchmark
datasets.

4.4 Differential privacy

Differential privacy (DP), which uses the idea of noise to limit information leakage, has
emerged as a foundational framework for mitigating privacy risks posed by MI attacks.
By introducing carefully calibrated noise into the training process or model output, DP
limits the possibility of reconstructing sensitive information by ensuring that the presence
or absence of any individual data point in the training set has a minimum impact on model

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 30 of 52 W. Yang et al.

predictions. In the setting of MI attacks, DP-based defense mechanisms focus on tech-


niques such as gradient noise addition, output perturbation, and privacy budget allocation to
obscure the link between training data and model response. Despite theoretical guarantees,
these defenses often confront challenges in balancing privacy with model utility, especially
in areas that require high accuracy or deal with complex data types. This section explores
some DP-related approaches to defending against MI attacks.
Augmented DP. While transparency in deep learning enhances trust, accountability, and
fairness, traditional methods such as DP often fail to fully protect against MI attacks, par-
ticularly with high accuracy. Alufaisan et al. (2020) introduced an improved DP method
to create transparent, accurate models that are resistant to MI attacks, thus bridging this
gap. By improving the balance between privacy and utility, this approach provides strong
privacy guarantees without severely compromising model accuracy, thus addressing a com-
mon trade-off in privacy-preserving deep learning. This research highlights the significance
of designing methods that are both privacy-preserving and transparent.
Class-level and subclass-level DP. Zhang et al. (2020) indicated that standard record-
level DP, while popular for privacy-preserving deep learning, lacks robustness against MI
attacks. The authors enhanced existing MI attack techniques by demonstrating their ability
to reconstruct training images from deep learning models, stressing the need for more fine-
grained privacy protection mechanisms. To this end, the authors proposed a novel class-
level and subclass-level DP scheme designed to provide quantifiable privacy guarantees,
specifically aimed at defeating MI attacks. The experiments show that the proposed scheme
successfully strikes a balance between privacy protection and model accuracy, driving the
field toward more effective defenses against MI attacks in deep learning.
DP in healthcare models. Krall et al. (2020) developed a gradient-based scheme to pre-
serve DP in healthcare modeling, especially in an intensive care unit setting. This scheme
aims to mitigate the adversary’s ability to infer sensitive patient attributes by applying DP
techniques during gradient descent to reduce the risk of MI attacks. The experimental results
show that the scheme reduces the risk of MI while maintaining high model accuracy, high-
lighting its potential in the healthcare domain where data protection is critical.
Local DP. Li et al. (2024) studied the important issue of gradient privacy in joint learning,
where an attacker can infer local data from uploaded gradients. To address this challenge,
the authors proposed a privacy-enhancing approach that integrates local DP, parameter
sparsification, and weighted aggregation, which is particularly suitable for cross-silo set-
tings. Their approach uses DP by adding noise to local parameters before uploading, which
achieves local DP while adjusting the privacy budget dynamically to balance noise and
model accuracy. In addition, the authors introduced the Top-K method to optimise com-
munication costs based on the varying capabilities of clients and used weighted aggregation
to augment the robustness of the privacy framework. The experimental results show that
this approach effectively balances privacy, accuracy, communication cost, and robustness.

4.5 Cryptographic encryption

Recent advances in cryptography have heightened the security of federated and distributed
deep learning systems against MI attacks. By incorporating methods such as secure gradi-
ent aggregation, perceptual hashing, and homomorphic encryption, researchers are tackling
critical privacy challenges while maintaining model accuracy and efficiency. This section

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 31 of 52 242

analyzes these innovative approaches to demonstrate their potential to strengthen data pro-
tection in collaborative learning environments.
Secure aggregation. Yang et al. (2023) devised a secure aggregation approach to thwart-
ing MI attacks in FL environments. The method consists of encrypting the gradient before
sharing it, preventing the adversary from using the gradient information to reconstruct pri-
vate data. To improve efficiency, the authors developed a new method for producing shared
keys, where each client establishes keys with a subset of other clients instead of all clients
in the system. Simulation results show that the proposed method is effective against attacks
initiated by honest but curious parameter servers.
Perceptual hashing. Prakash et al. (2020) proposed a privacy-preserving approach to
mitigate MI attacks using perceptual hashing of training images. Their approach converts
parts of each training image into a hashed form and then uses these perceptually hashed
images to train facial recognition models, thereby effectively mitigating the risk of recon-
structing the original image in an inversion attack. This method retains a high level of clas-
sification accuracy while providing a boost in privacy, as the hashed image, instead of the
original image, is returned in the adversarial scenario.
Certificateless additively homomorphic encryption (CAHE). Based on learning with
errors, Antwi-Boasiako et al. (2023) designed CAHE to resist MI attacks. Their work elimi-
nates the need for trusted third party and reduces reliance on centralized authorities, thus
providing better privacy protection, while addressing the high communication cost associ-
ated with transmitting cryptographic gradients in distributed deep learning systems. Their
results show that the accuracy of the CAHE-based model remains high (97.20%) while
ensuring privacy protection of participant data. Their solution also employs a partial gradi-
ent sharing algorithm, which improves communication efficiency and achieves an accuracy
of 97.17% and 97.12% in the case of upmost gradient selection and random gradient selec-
tion, respectively.

4.6 Model/architecture enhancement

Recent advances in privacy-preserving machine learning have highlighted the importance


of improving model architectures to tackle security vulnerabilities, especially in federated
and collaborative learning environments. This section goes over model/architecture-based
approaches to defending against MI attacks.
PRECODE. Scheliga et al. (2022) introduced PRECODE, a privacy-enhancing module
intended to protect gradient leakage in collaborative neural network training. In contrast
to traditional gradient perturbation techniques that tend to degrade model performance or
increase training complexity, PRECODE involves random sampling using variational mod-
eling to effectively protect client data from gradient inversion attacks. The module is a
flexible extension for a variety of model architectures, providing robust protection without
compromising training efficiency or accuracy. Through extensive testing on multiple model
architectures and datasets, the authors showed that PRECODE reduces the attack success
rate to 0%, outperforming existing defense mechanisms, and demonstrating that PRECODE
is a promising and effective solution to enhancing privacy protection in distributed learn-
ing environments without the trade-offs typically associated with gradient perturbation
techniques.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 32 of 52 W. Yang et al.

Secure convolutional neural networks (SecCNN). Liu et al. (2024) proposed SecCNN to
integrate an upsampling layer into the CNN, offering an intrinsic defense against gradient
inversion attacks. This approach also leverages rank analysis to increase security without
compromising model accuracy or adding computational cost. The work advances current
defense strategies for FL, highlighting the potential of model architecture modifications as
an effective tool to combat MI attacks while maintaining computational efficiency.
Variational encoder-based personalized FL (RVE-PFL). Issa et al. (2024) devised the
RVE-PFL approach to alleviating MI attacks while preserving model utility. The personal-
ized variant encoder architecture assures privacy of heterogeneous data across clients and
efficiently aggregates data at the server level, differentiating between adversarial settings
and legitimate operations. Research shows that privacy-preserving techniques in FL typi-
cally weaken model performance, whereas RVE-PFL offers significant improvements in
both privacy and utility.
ResSFL. Li et al. (2022) designed a two-step framework called ResSFL to resolve the
vulnerability of split FL against MI attacks. The ResSFL framework utilizes attacker-aware
training in combination with a bottleneck layer to exploit an MI-resistant feature extractor,
which is subsequently used by the client for secure and efficient collaborative learning.
This approach guarantees robustness against MI attacks during the critical training phase
and significantly improves resilience with minimal computational overhead. An extensive
evaluation on datasets such as CIFAR-100 validates the efficacy of ResSFL, which achieves
an excellent trade-off between accuracy and resistance to information interference when
compared to baseline and contemporary approaches.

4.7 Miscellaneous MI defenses

Transfer learning-based defense against model inversion (TL-DMI). Many existing defense
methods rely on regularization techniques, which often reduce model performance by con-
flicting with training objectives. Ho et al. (2024) proposed a simple yet effective defense
method, called TL-DMI, to counter MI attacks using transfer learning (TL). This approach
reduces the attacker’s ability to reconstruct private data by limiting the number of layers
that encode sensitive information in the training dataset. Through analyzing Fisher informa-
tion, the authors theoretically justified this approach, demonstrating that TL can effectively
reduce MI attack success. Extensive experiments prove that TL-DMI is invulnerable to MI
attacks and achieves robust privacy protection without impairing the utility of the model.
The simplicity and effectiveness of TL-DMI make it a compelling defense strategy for deep
learning privacy protection.
Bilateral dependency optimisation (BiDO). Peng et al. (2022) designed BiDO to defend
MI attacks recovering the training data from the classifier and leading to privacy breaches.
Traditional defense strategies focus on minimizing the dependency between inputs and out-
puts during training, however, this conflicts with the need to maximize this dependency for
accurate classification, thus necessitating a trade-off between the strength of the defense
and the utility of the model. BiDO addresses this problem by minimizing the dependency
between potential representations and inputs, and simultaneously maximizing the depen-
dency between potential representations and outputs. The authors proposed two implemen-
tations of BiDO: BiDO-COCO (using constrained covariance) and BiDO-HSIC (based
on the Hilbert-Schmidt independence criterion). The experiments show that BiDO tackles

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 33 of 52 242

MI attacks with minimal impact on classification accuracy, offering a new and effective
approach to balancing privacy and model performance.
Mutual information regularization-based defense (MID). Accredited to Wang et al.
(2021), MID controls the captured information about the model input in the prediction and
reduces the adversary’s ability to reconstruct privacy attributes. The authors showed that
MID achieves superior performance on a range of models and datasets, providing a theoreti-
cally sound alternative to traditional methods such as differential privacy.

4.8 Summary

Defense strategies against MI attacks have evolved into diverse approaches to dealing with
privacy vulnerabilities while attempting a good balance between efficiency, practicality,
and robustness. These approaches are organized into six categories (see Sects. 4.1–4.6).
Some insights are: feature perturbation techniques (e.g., Crafter and Soteria) distort sensi-
tive data while maintaining accuracy; gradient perturbation strategies (e.g., GradPrivacy)
show promise in obfuscating data in collaborative learning processes; DP remains crucial,
and innovations such as class-level DP extend privacy guarantees; encryption techniques,
such as certificateless homomorphic encryption, secure aggregation, and perceptual hash-
ing, provide robust data protection by securing shared information; and model-enhancing
techniques such as PRECODE and ResSFL protect privacy through architectural enhance-
ments that defend against attacks without sacrificing performance.
Despite recent progress, it is challenging to achieve a balanced trade-off between pri-
vacy, utility, and computational costs, especially to counter advanced threats (Scheliga et
al. 2023) and address security concerns in IoT and edge computing environments. Most of
the MI defense mechanisms discussed earlier, except for gradient pruning methods such
as PATROL (Ding et al. 2024), DGP (Xue et al. 2024), Guardian (Fan et al. 2024), and
DGC (Lin et al. 2018), fail to consider the trade-off between efficiency and privacy pres-
ervation in IoT and edge computing settings. These environments are often constrained by
limited computational resources, making traditional, resource-intensive security defenses
impractical. Gradient pruning methods selectively prune gradients to enhance privacy and
communication efficiency while maintaining model performance. For instance, DGC (Lin
et al. 2018) reduces gradient exchange by 99.9% and still preserves model accuracy. How-
ever, nearly all existing gradient pruning methods have yet to be tested on real IoT or edge
computing devices. Therefore, there is a pressing need for practical pruning strategies and
lightweight MI defense mechanisms (e.g., efficient encryption techniques) to mitigate vul-
nerabilities while ensuring a feasible trade-off between defense performance and cost effec-
tiveness (e.g., computational efficiency) in IoT and edge computing environments.

5 Evaluation metrics in MI attacks and defenses

In this section, the most common evaluation metrics used in MI attacks and defenses are
discussed, along with detailed explanations for each metric (Issa et al. 2024; Shi et al. 2024;
Zhang et al. 2025). The provided evaluation metrics range from simple pixel-based com-
parisons to complex feature and latent-space-based evaluations.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 34 of 52 W. Yang et al.

Mean squared error (MSE) (Bishop and Nasrabadi 2006; Dodge 2008): MSE is a stan-
dard metric for evaluating the quality of reconstructions in MI attacks. It quantifies the aver-
age squared difference between the reconstructed data and the actual target data. A lower
MSE indicates that the reconstructed data is closer to the original data, signifying a more
successful attack. MSE is particularly useful in image reconstruction or attribute inference
tasks, where fidelity to the original data is critical. MSE can be computed as:

N
1 ∑ 2
MSE = (xi − yi ) (5)
N
i=1

where xi represents the predicted value and yi the ground truth value. Smaller MSE values
indicate better performance.
Learned perceptual image patch similarity (LPIPS) (Zhang et al. 2018): LPIPS measures
perceptual similarity by comparing representations in neural network feature spaces. It cap-
tures semantic and perceptual nuances better than traditional metrics. LPIPS is particularly
effective for evaluating image reconstructions in visually complex tasks. LPIPS is expressed
as:
∑ 1 ∑ h
LPIPS(x, y) = ∥ϕl (x) − ϕhl (y)∥2 (6)
Hl Wl
l h,w

where ϕl (·) represents features extracted from a pre-trained neural network at layer l; and
Hl and Wl are spatial dimensions of features at layer l.
Peak signal-to-noise ratio (PSNR) (Lin et al. 2005; Wang et al. 2004): PSNR is a tradi-
tional measure for quantifying the quality of reconstruction in lossy compression. It com-
pares the maximum possible signal power to the power of noise affecting the fidelity. PSNR
is obtained by:
( )
MAX2
PSNR = 10 · log10 (7)
MSE

where MAX denotes the maximum possible pixel value (e.g., 255 for 8-bit images), and
MSE the Mean Squared Error between the original and reconstructed images
Structural similarity index measure (SSIM) (Wang et al. 2003; Wang et al. 2004): Unlike
traditional error measures like MSE, SSIM is a perceptual metric for assessing image qual-
ity based on structural similarity. It measures the similarity between two images by compar-
ing their luminance, contrast, and structural information. SSIM assumes that human visual
systems are adapted to extract structural information from visual scenes. SSIM is calculated
by:

(2µx µy + C1 )(2σxy + C2 )
SSIM(x, y) =  (8)
(µ2x + µ2y + C1 )(σx2 + σy2 + C2 )

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 35 of 52 242

where µx , µy are mean intensities of images x and y , respectively; σx2 , σy2 are variances of
images x and y , respectively; σxy is the covariance between images x and y ; and C1 , C2 are
stabilizing constants.
fréchet inception distance (FID) (Heusel et al. 2017): FID measures the similarity
between feature distributions of real and generated images using the Wasserstein-2 dis-
tance. FID is a widely used metric to evaluate the quality of generated images in tasks such
as GANs. FID is written as:

FID = ∥µx − µy ∥22 + Tr(Σx + Σy − 2(Σx Σy )1/2 )(9)

where µx , µy are the mean vectors of features extracted from real and generated images,
respectively; Σx , Σy are covariance matrices of features extracted from real and generated
images, respectively; and Tr(Σx + Σy − 2(Σx Σy )1/2 ) computes the trace of the resulting
matrix from the covariance terms, capturing the spread and relationships between the real
and generated data distributions.
Feature similarity index (FSIM) (Zhang et al. 2011; Shi et al. 2024): FSIM is a recent
metric that uses phase congruency (a measure of feature perception) and gradient magnitude
to evaluate image quality. FSIM captures human perception-based similarity and can be
computed as:

p∈Ω PCm (p) · S(p)
FSIM = ∑ (10)
p∈Ω PCm (p)

where Ω is a set of pixels in the image; PCm (p) is the phase congruency value at pixel p
(perceptually motivated feature); and S(p) is the similarity of gradient magnitude and phase
congruency between two images.
Absolute variation distance (AVD) (Papadopoulos et al. 2024): AVD is a metric for eval-
uating data recovery and information leakage, specifically in the context of FL and inversion
attacks. It provides a method to compare the similarity between a reconstructed (or attacked)
image and its original counterpart by analyzing their spatial gradients. AVD is expressed as:

AVD(vsource , vtarget ) = ∥|∇vsource ||∇vtarget |∥ + ∥|∇2 vsource | − |∇2 vtarget |∥(11)


2 2
where ∇v = ∂∂iv + ∂∂jv is the spatial gradient; ∇2 v = ∂∂i2v + ∂∂jv2 is the second-order spatial
gradient; and vsource and vtarget are source and target images, treated as 2D arrays of pixel
values.
Relative data leakage value (RDLV) (Hatamizadeh et al. 2023): RDLV is a metric for
quantifying and comparing the degree of data leakage in gradient inversion attacks within
FL systems. It is particularly useful for evaluating privacy risks across multiple clients
under varying privacy-preserving configurations. RDLV is defined as:

SSIM(Ti , Ii ) − SSIM(Ti , P )
RDLV = (12)
SSIM(Ti , P )

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 36 of 52 W. Yang et al.

where Ti is the original training image; Ii the reconstructed image obtained from the gradi-
ent inversion attack; and P is the prior used in the gradient inversion attack (e.g., an initial-
ization image for reconstruction).
Image identifiability precision (IIP) (Yin et al. 2021): IIP quantifies the extent of “image-
specific” information disclosed through gradient inversion. It assesses how effectively a
specific image can be identified based solely on its reconstructed version among other simi-
lar images in the original dataset. Numerically, IIP is determined as the proportion of exact
matches between an original image and its closest neighbor in the reconstructed set.

6 Datasets for MI attack research

Existing studies on MI attacks and defenses employ diverse datasets to evaluate attack effi-
cacy and defense mechanisms. These datasets contain different data types and complexi-
ties, facilitating a comprehensive assessment of privacy vulnerabilities and protections in
machine learning models. Below is a list of commonly used datasets in MI-related research.
MNIST—modified national institute of standards and technology database (LeCun
1998): MNIST is a large database of handwritten digits (0 through 9), consisting of 60,000
training images and 10,000 test images, each represented as 28x28 grayscale pixels. Works
using this database include (Wang et al. 2021; Zhao et al. 2020; Sun et al. 2021; Chen et al.
2021; Zhao et al. 2021; Titcombe et al. 2021; Peng et al. 2022; Scheliga et al. 2023; Nguyen
et al. 2023; Ren et al. 2023; Xu et al. 2024; Issa et al. 2024; Liu et al. 2024; Li et al. 2024;
Wang et al. 2024; Gao et al. 2024; Fan et al. 2024; Wu et al. 2024; Wang et al. 2022; Niu
et al. 2023; Liu et al. 2023; Wang et al. 2023; Wan et al. 2023; Yang et al. 2024; Xiao et al.
2024; Qi et al. 2024; Zhu et al. 2024b).
F-MNIST—fashion MNIST (Xiao et al. 2017): This dataset comprises 28x28 grayscale
images of 70,000 fashion products from 10 categories, with 7,000 images per category. The
training set has 60,000 images and the test set has 10,000 images. Works using this database
include (Gao et al. 2023; Chen et al. 2024; Issa et al. 2024; Wang et al. 2022; Xiao et al.
2024; Qi et al. 2024).
CIFAR-10—Canadian institute for advanced research—10 classes (Krizhevsky et al.
2010): This database consists of 60,000 color images (32x32) in 10 classes, with 6,000
images per class, divided into 50,000 training and 10,000 test images. Works using this
database include (Sun et al. 2021; Chen et al. 2021; Peng et al. 2022; Scheliga et al. 2022;
Lu et al. 2023; Scheliga et al. 2023; Nguyen et al. 2023; Ren et al. 2023; Xu et al. 2024; Hu
et al. 2024; Chen et al. 2024; Issa et al. 2024; Zhou et al. 2024a; Li et al. 2024; Wang et al.
2024; Gao et al. 2024; Fan et al. 2024; Wu et al. 2024; Xue et al. 2024; Wang et al. 2022;
Jiang et al. 2022; Yang et al. 2022; Wan et al. 2022; Zhang et al. 2023; Niu et al. 2023; Zhu
et al. 2023; Liu et al. 2023; Wan et al. 2023; Chu et al. 2023; Fan et al. 2023a; Noorbakhsh
et al. 2024; Yang et al. 2024; Qi et al. 2024; Zhu et al. 2024b).
CIFAR-100 (Krizhevsky and Hinton 2009): This dataset is similar to CIFAR-10 but
includes 100 distinct classes, each comprising 600 images. For each class, there are 500
images for training and 100 images for testing. The 100 classes are organized into 20
superclasses, with each image labeled at two levels: a “fine” label indicating its specific
class and a “coarse” label representing its broader superclass. Works using this database
include (Zhao et al. 2020; Gao et al. 2021; Scheliga et al. 2022; Lu et al. 2023; Ren et al.

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 37 of 52 242

2023; Hu et al. 2024; Chen et al. 2024; Zhou et al. 2024a; Xue et al. 2024; Wang et al. 2022;
Zhang et al. 2023; Wan et al. 2023; Fan et al. 2023a; Noorbakhsh et al. 2024; Wang et al.
2024; Zhu et al. 2024b).
LFW—Labeled faces in the wild (Huang et al. 2008): This dataset is a collection of face
photographs aimed at advancing research in unconstrained face recognition. Developed
and maintained by researchers at the University of Massachusetts, Amherst, the dataset
comprises 13,233 images of 5,749 individuals, with faces detected and centered using the
Viola-Jones face detector. These images were sourced from the web with 1,680 individuals
having two or more unique photographs included in the dataset. Works using this database
include (Zhao et al. 2020; Zhang et al. 2023; Cui et al. 2023).
CelebA—CelebFaces attributes dataset (Liu et al. 2015): This dataset is a large-scale
collection of over 200,000 celebrity images, each annotated with 40 binary attributes. It
includes 10,177 unique identities and 202,599 face images, with annotations for five facial
landmarks and 40 attributes per image. The dataset is immensely diverse, with variations in
pose and background clutter, making it suitable for a range of computer vision tasks such
as facial attribute recognition, face recognition, face detection, landmark localization, and
face editing or synthesis. Works using this database include (Wang et al. 2021; Chen et al.
2021; Zhao et al. 2021; Peng et al. 2022; Gao et al. 2023; Nguyen et al. 2023; MaungMaung
and Kiya 2023; Nguyen et al. 2024; Dibbo et al. 2024; Liu et al. 2024; Wu et al. 2024; Jiang
et al. 2022; Li et al. 2022; Cui et al. 2023; Yu et al. 2024).
ImageNet (Deng et al. 2009): ImageNet is a vast visual database created to support
research in visual object recognition. It includes over 14 million hand-annotated images,
with objects labeled to specify their contents, and bounding boxes provided for at least one
million images. Spanning more than 20,000 categories, such as “balloon” or “strawberry,”
each category typically contains several hundred images. ImageNet is extensively used for
training and benchmarking deep learning models. Works using this database include (Gao
et al. 2023; Fang et al. 2023b; MaungMaung and Kiya 2023; Li et al. 2022; Zhu et al.
2024b).
FFHQ—Flickr-faces-HQ (Karras 2019): This is a high-resolution image dataset featur-
ing 70,000 PNG images of human faces at a resolution of 1024×1024 pixels. The data-
set exhibits significant diversity in age, ethnicity, and backgrounds, along with extensive
coverage of accessories such as eyeglasses, sunglasses, and hats, making it well-suited for
applications in face-related research. Works using this database include (Nguyen et al. 2023;
Fang et al. 2023b; Nguyen et al. 2024).
ChestX-ray8 (Wang et al. 2017): This is a medical imaging dataset containing 108,948
frontal-view X-ray images from 32,717 patients, collected between 1992 and 2015. It
includes eight common disease labels extracted from radiological reports using NLP tech-
niques, making it a valuable resource for medical image analysis and disease diagnosis
research. Works using this database include (Wang et al. 2021; Chen et al. 2021).
UBMD—UCI bank marketing dataset (Moro et al. 2014): This dataset is designed to
predict the likelihood of clients subscribing to deposits. It consists of 41,188 instances with
17-dimensional bank data, providing a comprehensive basis for analysis and modeling.
Works using this database include (Yang et al. 2022).
LDC—Lesion disease classification (Tschandl et al. 2018): LDC has 8,000 training
images and 2,000 test images of skin lesions, intended for the classification of various skin
diseases. Works using this database include (Yang et al. 2022).

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 38 of 52 W. Yang et al.

AT&T—the database of faces: This dataset contains 400 grayscale images of 40 subjects,
each with 10 images taken under varying conditions such as lighting, facial expressions, and
details (e.g., glasses). Each image is of 92x112 pixels with 256 gray levels, organized in 40
directories by subject, totaling 4.5 MB in size. Originally created between 1992 and 1994
for a face recognition project, this dataset provides a consistent dark background and frontal
face orientation, making it ideal for facial recognition research. Works using this database
include (Melis et al. 2019).
SVHN—street view house numbers (Netzer et al. 2011): SVHN is a benchmark dataset
for digit classification, consisting of 600,000 32×32 RGB images of printed digits (0–9)
cropped from house number plates. The images center around the target digit while retain-
ing nearby digits and other visual elements. This dataset is divided into three subsets: train-
ing, testing, and an additional set with 530,000 less challenging images to assist in the
training process. Works using this database include (Fan et al. 2023a).

7 Challenges and future research directions

While substantial progress has been made in understanding and mitigating MI attacks,
many challenges remain. This section discusses some of these challenges and correspond-
ing future research directions.

7.1 Balancing privacy and model utility

Balancing privacy and model utility is a pressing challenge in developing defenses against
MI attacks. Existing methods (e.g., Ho et al. 2024; Peng et al. 2022) often face a trade-
off: enhancing privacy typically leads to reduced model utility, and vice versa. Effective
defenses against MI attacks must innovate methods that reduce privacy leakage without
compromising model utility. Although continued advancements are needed to refine this
balance, approaches such as random erasing (Tran et al. 2024), adversarial noise (Wen et al.
2021), and mutual information regularization (Wang et al. 2021) look promising.
Future research: To address this challenge, research efforts can be directed at developing
adaptive techniques that dynamically adjust privacy-preserving mechanisms based on appli-
cation-specific requirements. Innovations such as multi-objective optimisation methods
would enable simultaneous minimization of privacy leakage and maximization of model
utility. Furthermore, collaboration across fields, including deep learning, cryptography, and
statistics, will be critical in creating flexible and robust frameworks that ensure privacy
without deteriorating model performance.

7.2 Defining realistic threat models

Defining realistic threat models for MI attacks is a complex task. As adversarial techniques
evolve, traditional threat models often fail to capture the breadth of potential risks, especially
in real-world scenarios characterized by noisy data, diverse datasets, and rapidly advancing
architectures. One of the primary difficulties in establishing comprehensive threat models
is accounting for the adaptability of MI attacks across diverse neural network architectures,
such as transformers and multimodal models. Current research shows that MI attacks can

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 39 of 52 242

exploit architecture-specific vulnerabilities, but the extent of these risks remains underex-
plored (Zhang et al. 2022; Li et al. 2024). Additionally, most existing threat models are
tested under controlled laboratory conditions, which do not always reflect the complexities
of real-world deployments. Practical factors, such as heterogeneous datasets, capabilities of
the adversary, and model structures, significantly affect the robustness and adaptability of
MI attacks (Ye et al. 2024; Wang et al. 2021).
Future research: It is important for researchers to establish dynamic and flexible
frameworks based on realistic threat models. Such frameworks should incorporate: (1)
diverse attack vectors, including those exploiting new architectures like transformers and
GANs (Chen et al. 2020; 2) robust testing against real-world conditions, such as datasets
that are noisier and more heterogeneous than commonly used ones (e.g., CelebA (Peng et al.
2022)) for MI attack evaluations; and (3) development of defensive strategies that balance
model utility and privacy under evolving threat scenarios (Wen et al. 2021; Tran et al. 2024).

7.3 Lack of scalability and generalizability of defenses

Defensive strategies against MI attacks often struggle with scalability and generalizabil-
ity, particularly in complex, large-scale systems. Many existing MI defenses, such as dif-
ferential privacy and homomorphic encryption, are effective in controlled scenarios but
face challenges when applied to large-scale datasets or sophisticated architectures. These
approaches often require substantial computational resources, which limit their applicability
in real-time or resource-constrained environments (Wang et al. 2015).
Another critical issue lies in the specificity of defenses, which are frequently tailored to
address vulnerabilities in particular architectures or datasets. For example, defenses such as
mutual information regularization are effective for certain neural networks but may fail to
generalize to architectures like transformers or graph neural networks. Similarly, the imple-
mentation of differential privacy often leads to degraded model utility when extended to
large-scale datasets, making it less feasible for practical applications (Wang et al. 2021;
Peng et al. 2022).
Future research: To address the challenge of the lack of scalability and generalizability,
researchers need to explore more adaptable mechanisms (Ho et al. 2024), such as TL-based
defenses and adversarial training, which aim to generalize across architectures and domains
while retaining performance (Wen et al. 2021). However, achieving scalable and generaliz-
able defenses that can be applied to large-scale systems without significant computational
trade-offs is a challenging issue, necessitating further investigation into lightweight, archi-
tecture-independent approaches.

7.4 Domain-specific challenges

MI attacks pose great risks across many application domains, demanding tailored defen-
sive strategies to mitigate their impact. For example, in healthcare, deep learning models
are often trained on sensitive data in diverse formats, such as electronic health records in
tabular form and medical X-ray images. While MI attacks need to adapt to these varying
data formats, effective defenses in this domain must also accommodate such diversity. At
the same time, it is necessary to strike a balance between preserving privacy and ensuring

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 40 of 52 W. Yang et al.

accurate diagnostics and treatment recommendations, thereby safeguarding patient informa-


tion (Kaissis et al. 2021).
IoT and smart systems represent another domain where MI attacks can exploit compu-
tationally constrained edge devices. These systems, such as smart home devices or indus-
trial IoT platforms, often lack the computational capacity for traditional, resource-intensive
defenses. Lightweight encryption or pruning methods can help mitigate these vulnerabili-
ties, but they must be adapted to the unique resource and latency requirements of edge com-
puting (Alhalabi 2023). Given these constraints, conventional encryption techniques may
be impractical, increasing the susceptibility of these devices to inference attacks. To address
this, lightweight encryption schemes and optimized security protocols are essential, as they
can enhance data protection while ensuring efficient transmission and real-time responsive-
ness (Al-Hejri et al. 2024).
Future research: To tackle these challenges, domain-specific defense schemes must be
developed. These schemes should involve privacy-preserving techniques supported by an
understanding of varying needs (e.g., different data types or resource-limited devices) in
each field, ensuring both robustness against MI attacks and usability in real-world scenarios.
Focusing on tailored approaches will make future studies better align the goals of privacy,
utility, and practicality in diverse application domains. In the case of IoT and edge comput-
ing, further exploration of lightweight cryptographic methods, secure data transmission pro-
tocols, and scalable privacy-preserving techniques is necessary to minimize vulnerabilities
without overburdening resource-limited devices (Singh et al. 2024), especially when attach-
ing AI accelerators to these devices are impractical. Additionally, interdisciplinary efforts
that integrate security, privacy, and real-world usability will be critical in ensuring that MI
defenses align with the practical needs of different applications.

7.5 Lack of standardized metrics, diverse datasets and open-course repositories for
MI attacks and defenses

The lack of standardized metrics for evaluating MI attacks and defenses continues to impede
progress in the field of MI-related research. Establishing clear and consistent metrics for
assessing data leakage, attack success rates, and defense robustness is essential for fair com-
parisons across research works. Existing evaluation methods vary drastically, as shown in
Sect. 5, making itchallenging to identify universally effective solutions. For instance, the
fidelity-related metrics (e.g., MSE and PSNR) of reconstructed data are often employed to
assess MI attack success, but these metrics lack consistency in their implementation across
studies (Fredrikson et al. 2015).
The development of representative and diverse benchmark datasets is also important for
testing MI defenses under realistic conditions. Publicly available datasets, such as CIFAR-
10 or MNIST, are frequently used in MI research but fail to capture the complexities of
real-world situations, such as those in healthcare or finance, where representative datasets
are lacking, as shown in Sect. 6. Developing domain-specific datasets and evaluation pro-
tocols tailored to sensitive applications would significantly enhance the relevance of MI
research (Huang et al. 2021).
In addition, the adoption of open-source repositories for evaluating MI attacks and
defenses can facilitate reproducibility and collaboration within the research community.
Such repositories would allow researchers to compare methods directly and transparently,

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 41 of 52 242

leading to more robust and innovative solutions. Recent efforts, such as those incorporating
FL and differential privacy techniques, highlight the importance of comprehensive testing
environments for understanding trade-offs between privacy and utility (Zhou et al. 2024b).
Future research: Researchers should prioritize the design of standardized metrics to
quantify data leakage and assess defense robustness consistently across studies (Qiu et al.
2024). Creating representative domain-specific datasets, alongside synthetic datasets for
controlled testing, will enhance the realism and reproducibility of evaluations (Huang et al.
2021). Although a few open-source repositories exist (Zhou et al. 2024b), each has its own
limitation. There is still a need for more open-source repositories and simulation environ-
ments to enable transparent and collaborative testing of MI defenses under diverse condi-
tions. One of the main contributions of this work is the development of a comprehensive
repository of state-of-the-art research articles, datasets, evaluation metrics, and other essen-
tial resources to address this need.

7.6 Legal and ethical considerations

MI attacks pose profound legal and ethical challenges that demand interdisciplinary solu-
tions. Compliance with global privacy regulations such as GDPR, CCPA, and HIPAA is
imperative to address these challenges (Veale et al. 2018). These regulations require that
defensive strategies not only mitigate risks of data reconstruction but also align with the
principles of lawful and ethical data handling, ensuring individuals’ privacy rights are pre-
served (Nguyen 2024). Moreover, ethical AI development necessitates balancing robust
data protection with transparency, accountability, and fairness. This balance is particularly
crucial in sensitive domains like healthcare, where safeguarding patient data must coexist
with advancing diagnostic tools (Thapa 2024).
Another pressing concern is the potential misuse of MI-related research by malicious
actors, emphasizing the need for responsible dissemination of findings. This requires set-
ting clear guidelines on publishing sensitive results to avoid enabling adversarial exploita-
tion while maintaining the spirit of open research (Hung 2023). Additionally, collaboration
between technical, legal, and ethical experts is crucial to develop frameworks that holis-
tically address these challenges. Initiatives that integrate privacy-preserving AI technolo-
gies, compliance checks, and ethical design principles can help produce socially responsible
solutions for MI defense (Makhdoom et al. 2024).
Future research: Research efforts should be devoted to developing frameworks that
enable transparency and accountability without compromising data security, particularly
in high-stakes fields such as healthcare and finance (Thapa 2024). To mitigate the risks of
adversarial misuse, researchers must establish guidelines for responsibly disseminating sen-
sitive findings while encouraging open scientific collaboration (Hung 2023). Additionally,
interdisciplinary collaboration between technologists, ethicists, and legal experts will be
essential to advance socially responsible AI development. Such partnerships can support the
creation of adaptive and globally relevant frameworks that balance ethical considerations
with technical innovation (Makhdoom et al. 2024).

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 42 of 52 W. Yang et al.

8 Conclusion

This survey has provided a comprehensive overview of MI attacks and corresponding


defense mechanisms, offering a structured taxonomy of MI attacks based on diverse tech-
niques and an in-depth review of their applications across key domains such as biometric
recognition, healthcare, and finance. Through detailed analysis and discussion, we empha-
size the increasing importance of developing robust defenses to protect sensitive data in deep
learning systems. While summarizing existing defense strategies, we recognize that many
current approaches face limitations in terms of scalability, generalizability, and maintaining
model utility. Furthermore, most defenses are evaluated under simplified threat models that
may not fully capture real-world adversarial capabilities. These limitations highlight the
need for continued research in designing more practical and adaptable solutions.
We also identify several promising directions for future work. These include defining
realistic threat models, enhancing the explainability of defense techniques, and addressing
domain-specific challenges. By tackling these issues, we can pave the way toward develop-
ing more secure deep learning systems that are resilient to MI attacks.

Acknowledgements This research was supported in part by an Australia Research Council (ARC) Discovery
Project Grant: DP230102828.

Author contributions Wencheng Yang and Song Wang wrote the main manuscript text. Shicheng Wei cre-
ated the resource repository on github for this paper. Di Wu, Taotao Cai, Yanming Zhu, Yiying Zhang, Xu
Yang, Zhaohui Tang, and Yan Li discussed and revised the manuscript. All authors reviewed the manuscript.

Funding Open Access funding enabled and organized by CAUL and its Member Institutions

Data availability No datasets were generated or analysed during the current study.

Declarations

Conflict of interest The authors declare that they have no conflict of interest.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as
you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons
licence, and indicate if changes were made. The images or other third party material in this article are
included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material.
If material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References
Antwi-Boasiako E, Zhou S, Liao Y, Dong Y (2023) Privacy-preserving distributed deep learning via LWE-
based certificateless additively homomorphic encryption (cahe). J Inform Secur Appl 74:103462
Al-Hejri I, Azzedin F, Almuhammadi S, Eltoweissy M (2024) Lightweight secure and scalable scheme for
data transmission in the internet of things. Arab J Sci Eng 49(9):12919–12934
Alufaisan Y, Kantarcioglu M, Zhou Y (2020) Robust transparency against model inversion attacks. IEEE
Trans Dependable Secure Comput 18(5):2061–2073

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 43 of 52 242

Alhalabi B (2023) Ensembles of pruned deep neural networks for accurate and privacy preservation in iot
applications. Phd thesis, Birmingham City University. http://www.open-access.bcu.ac.uk/15070/
Ahmad S, Mahmood K, Fuller B (2022) Inverting biometric models with fewer samples: Incorporating the
output of multiple models. 2022 IEEE International Joint Conference on Biometrics (IJCB). IEEE,
Piscataway, USA, pp 1–11
An S, Tao G, Xu Q, Liu Y, Shen G, Yao Y, Xu J, Zhang X (2022) Mirror: Model inversion for deep learning
network with high fidelity. In: Proceedings of the 29th Network and Distributed System Security Sym-
posium. https://par.nsf.gov/servlets/purl/10376663
Bishop CM, Nasrabadi NM (2006) Pattern Recognition and Machine Learning vol. 4. Springer, Cham, Swit-
zerland. https://link.springer.com/book/9780387310732
Chen Y, Abrahamyan L, Sahli H, Deligiannis N (2024) Learned model compression for efficient and privacy-
preserving federated learning. Authorea Preprints
Carlini N, Hayes J, Nasr M, Jagielski M, Sehwag V, Tramer F, Balle B, Ippolito D, Wallace E (2023) Extract-
ing training data from diffusion models. In: 32nd USENIX Security Symposium (USENIX Security 23),
pp. 5253–5270 ​h​t​t​p​s​:​​/​/​w​w​w​​.​u​s​e​n​i​​x​.​o​r​​g​/​c​o​n​​f​e​r​e​n​​c​e​/​u​s​e​​n​i​x​s​​e​c​u​r​i​​t​y​2​3​/​​p​r​e​s​e​n​​t​a​t​i​​o​n​/​c​a​r​l​i​n​i
Chen S, Jia R, Qi G.-J (2020) Improved techniques for model inversion attacks. ​h​t​t​p​s​:​/​/​o​p​e​n​r​e​v​i​e​w​.​n​e​t​/​f​o​r​u​
m​?​i​d​= ​u​n​R​f​7​c​z​1​o​1​​​​
Chen S, Kahla M, Jia R, Qi G-J (2021) Knowledge-enriched distributional model inversion attacks. In: Pro-
ceedings of the IEEE/CVF International Conference on Computer Vision, pp 16178–16187. ​h​t​t​p​:​/​​/​o​p​e​
n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​/​​I​C​C​V​​2​0​2​1​/​​h​t​m​l​/​​C​h​e​n​_​K​​n​o​w​l​​e​d​g​e​-​​E​n​r​i​c​​h​e​d​_​D​i​​s​t​r​i​​b​u​t​i​o​​n​a​l​_​M​​o​d​e​l​_​I​​n​v​e​r​​
s​i​o​n​_​​A​t​t​a​c​​k​s​_​I​C​C​​V​_​2​0​​2​1​_​p​a​p​e​r​.​h​t​m​l
Chen Y, Lent H, Bjerva J (2024) Text embedding inversion attacks on multilingual language models. Preprint
at arXiv:2401.12192
Cui Y, Meerza SIA, Li Z, Liu L, Zhang J, Liu J (2023) Recup-fl: Reconciling utility and privacy in feder-
ated learning via user-configurable privacy defense. In: Proceedings of the ACM Asia Conference on
Computer and Communications Security, pp 80–94. ACM, Melbourne VIC Australia. ​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​1​
0​.​1​1​4​5​/​3​5​7​9​8​5​6​.​3​5​8​2​8​1​9​​​​
Chu T, Yang M, Laoutaris N, Markopoulou A (2023) Priprune: Quantifying and preserving privacy in pruned
federated learning. Preprint at arXiv:2310.19958
Chen Y, Zhang J, Bi Y, Hu X, Hu T, Xue Z, Yi R, Liu Y, Tai Y (2025) Image inversion: a survey from gans to
diffusion and beyond https://doi.org/10.48550/arXiv.2502.11974. arXiv:2502.11974
Dibbo SV, Breuer A, Moore J, Teti M (2024) Improving robustness to model inversion attacks via sparse
coding architectures. European Conference on Computer Vision (ECCV 2024)
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image data-
base. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Ieee, Pis-
cataway, USA. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​5​2​0​6​8​4​8​/
Dibbo SV (2023) Sok: Model inversion attack landscape: Taxonomy, challenges, and future roadmap. In:
2023 IEEE 36th Computer Security Foundations Symposium (CSF), pp 439–456. IEEE, Piscataway,
USA. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​​1​0​2​2​1​​9​1​4​/​?​c​​a​s​a​_​​t​o​k​e​n​​= ​l​l​4​q​​d​6​7​N​z​e​​E​A​A​A​​A​A​:​4​h​​z​
t​S​a​9​​N​T​1​b​H​O​​T​o​V​G​​r​v​d​u​Z​​u​Y​j​v​D​​8​v​c​u​0​l​​w​y​i​T​​L​O​1​E​E​​5​P​f​t​w​​D​3​q​j​k​A​​t​U​b​J​​Q​1​6​r​8​8​c​2​q​_​l​H​k​4​Z​j​A
Dai C, Lu L, Zhou P (2025) Stealing training data from large language models in decentralized training
through activation inversion attack https://doi.org/10.48550/arXiv.2502.16086. arXiv:2502.16086
Dao T-N, Nguyen TP (2024) Performance analysis of gradient inversion attack in federated learning with
healthcare systems. REV J Electron Commun. https://doi.org/10.21553/rev-jec.338
Dodge Y (2008) The concise encyclopedia of statistics. Springer, Cham
Ding S, Zhang L, Pan M, Yuan X (2024) Patrol: Privacy-oriented pruning for collaborative inference against
model inversion attacks. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Com-
puter Vision, pp 4716–4725. ​h​t​t​p​s​:​​/​/​o​p​e​​n​a​c​c​e​s​​s​.​t​h​​e​c​v​f​.​​c​o​m​/​c​​o​n​t​e​n​t​​/​W​A​C​​V​2​0​2​4​​/​h​t​m​l​​/​D​i​n​g​_​​P​A​T​R​​O​L​
_​P​r​​i​v​a​c​y​​-​O​r​i​e​n​​t​e​d​_​​P​r​u​n​i​​n​g​_​f​o​​r​_​C​o​l​l​​a​b​o​r​​a​t​i​v​e​​_​I​n​f​e​​r​e​n​c​e​_​​A​g​a​i​​n​s​t​_​M​​o​d​e​l​_​​I​n​v​e​r​s​​i​o​n​_​​A​t​t​a​c​k​s​_​W​A​C​V​
_​2​0​2​4​_​p​a​p​e​r​.​h​t​m​l
Fan M, Chen C, Wang C, Li X, Zhou W, Huang J (2023) Refiner: Data refining against gradient leakage
attacks in federated learning. Preprint at arXiv:2212.02042
Fang H, Chen B, Wang X, Wang Z, Xia S-T (2023) Gifd: A generative gradient inversion method with feature
domain optimization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision,
pp 4967–4976. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​/​​I​C​C​V​​2​0​2​3​/​​h​t​m​l​/​​F​a​n​g​_​G​​I​F​D​_​​A​_​G​e​n​​e​r​a​t​i​​v​e​_​G​r​a​​
d​i​e​n​​t​_​I​n​v​​e​r​s​i​o​​n​_​M​e​t​h​​o​d​_​w​​i​t​h​_​F​​e​a​t​u​r​​e​_​D​o​m​a​​i​n​_​O​​p​t​i​m​i​z​a​t​i​o​n​_​I​C​C​V​_​2​0​2​3​_​p​a​p​e​r​.​h​t​m​l
Fredrikson M, Jha S, Ristenpart T (2015) Model inversion attacks that exploit confidence information and basic
countermeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communica-
tions Security, pp 1322–1333. ACM, Denver Colorado USA. https://doi.org/10.1145/2810103.2813677

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 44 of 52 W. Yang et al.

Fan M, Liu Y, Chen C, Wang C, Qiu M, Zhou W (2024) Guardian: Guarding against gradient leakage with prov-
able defense for federated learning. In: Proceedings of the 17th ACM International Conference on Web
Search and Data Mining, pp 190–198. ACM, Merida Mexico. https://doi.org/10.1145/3616855.3635758
Fredrikson M, Lantz E, Jha S, Lin S, Page D, Ristenpart T (2014) Privacy in pharmacogenetics: An End-
to-End case study of personalized warfarin dosing. In: 23rd USENIX Security Symposium (USENIX
Security 14), pp 17–32. ​h​t​t​p​s​:​​/​/​w​w​w​​.​u​s​e​n​i​​x​.​o​r​​g​/​c​o​n​​f​e​r​e​n​​c​e​/​u​s​e​​n​i​x​s​​e​c​u​r​i​​t​y​1​4​/​​t​e​c​h​n​i​​c​a​l​-​​s​e​s​s​i​​o​n​s​/​p​​r​e​s​e​n​
t​​a​t​i​o​​n​/​f​r​e​d​r​i​k​s​o​n​_​m​a​t​t​h​e​w
Feretzakis G, Papaspyridis K, Gkoulalas-Divanis A, Verykios VS (2024) Privacy-preserving techniques in
generative ai and large language models: a narrative review. Information 15(11):697
Fang H, Qiu Y, Yu H, Yu W, Kong J, Chong B, Chen B, Wang X, Xia S-T, Xu K (2024) Privacy leakage on
DNNS: a survey of model inversion attacks and defenses. Preprint at arXiv:2402.04013
Geiping J, Bauermeister H, Dröge H, Moeller M (2020) Inverting gradients-how easy is it to break privacy
in federated learning? Adv Neural Inf Process Syst 33:16937–16947
Goldsteen A, Ezov G, Farkash A (2020) Reducing risk of model inversion using privacy-guided training.
Preprint at arXiv:2006.15877
Gao W, Guo S, Zhang T, Qiu H, Wen Y, Liu Y (2021) Privacy-preserving collaborative learning with auto-
matic transformation search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp 114–123. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​/​​C​V​P​R​​2​0​2​1​/​​h​t​m​l​/​​G​a​o​_​P​r​​i​v​a​c​​
y​-​P​r​e​​s​e​r​v​i​​n​g​_​C​o​l​​l​a​b​o​​r​a​t​i​v​​e​_​L​e​a​​r​n​i​n​g​_​​W​i​t​h​​_​A​u​t​o​​m​a​t​i​c​​_​T​r​a​n​s​​f​o​r​m​​a​t​i​o​n​_​S​e​a​r​c​h​_​C​V​P​R​_​2​0​2​1​_​p​a​p​e​r​.​
h​t​m​l
Gong H, Jiang L, Liu X, Wang Y, Gastro O, Wang L, Zhang K, Guo Z (2023) Gradient leakage attacks in
federated learning. Artif Intell Rev 56(S1):1337–1374. https://doi.org/10.1007/s10462-023-10550-z
Galloway T, Karakolios K, Ma Z, Perdisci R, Keromytis A, Antonakakis M (2024) Practical attacks against
DNS reputation systems. In: 2024 IEEE Symposium on Security and Privacy (SP), pp 233–233. IEEE
Computer Society, Piscataway, USA. ​h​t​t​p​s​:​​/​/​t​i​l​​l​s​o​n​g​a​​l​l​o​w​​a​y​.​c​o​​m​/​s​p​2​​0​2​4​w​i​n​​t​e​r​-​​f​i​n​a​l​3​0​.​p​d​f
Goodfellow I (2016) Deep learning. MIT press
Guo P, Zeng S, Chen W, Zhang X, Ren W, Zhou Y, Qu L (2024) A new federated learning framework against
gradient inversion attacks https://doi.org/10.48550/arXiv.2412.07187. arXiv:2412.07187
Gao W, Zhang X, Guo S, Zhang T, Xiang T, Qiu H, Wen Y, Liu Y (2023) Automatic transformation search
against deep leakage from gradients. IEEE Trans Pattern Anal Mach Intell 45(9):10650–10668
Gao K, Zhu T, Ye D, Zhou W (2024) Defending against gradient inversion attacks in federated learning via
statistical machine unlearning. Knowl Based Syst 299:111983
Han G, Choi J, Lee H, Kim J (2023) Reinforcement learning-based black-box model inversion attacks. In:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 20504–
20513. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​/​​C​V​P​R​​2​0​2​3​/​​h​t​m​l​/​​H​a​n​_​R​e​​i​n​f​o​​r​c​e​m​e​​n​t​_​L​e​​a​r​n​i​n​g​​-​B​a​s​​e​d​_​
B​l​​a​c​k​-​B​​o​x​_​M​o​d​​e​l​_​I​​n​v​e​r​s​​i​o​n​_​A​​t​t​a​c​k​s​​_​C​V​P​​R​_​2​0​2​3​_​p​a​p​e​r​.​h​t​m​l
Huang Y, Gupta S, Song Z, Li K, Arora S (2021) Evaluating gradient inversion attacks and defenses in feder-
ated learning. Adv Neural Inf Process Syst 34:7232–7241
Ho S-T, Hao KJ, Chandrasegaran K, Nguyen N-B, Cheung N-M (2024) Model inversion robustness: can
transfer learning help? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp 12183–12193. ​h​t​t​p​s​:​​/​/​o​p​e​​n​a​c​c​e​s​​s​.​t​h​​e​c​v​f​.​​c​o​m​/​c​​o​n​t​e​n​t​​/​C​V​P​​R​2​0​2​4​​/​h​t​m​l​​/​H​o​_​M​o​​d​e​l​_​​I​n​
v​e​r​​s​i​o​n​_​​R​o​b​u​s​t​​n​e​s​s​​_​C​a​n​_​​T​r​a​n​s​​f​e​r​_​L​e​​a​r​n​i​​n​g​_​H​e​l​p​_​C​V​P​R​_​2​0​2​4​_​p​a​p​e​r​.​h​t​m​l
Huang J, Hong C, Chen LY, Roos S (2024) Gradient inversion of federated diffusion models ​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​
1​0​.​4​8​5​5​0​/​a​r​X​i​v​.​2​4​0​5​.​2​0​3​8​0​​​​. arXiv:2405.20380
Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: A database forstudying
face recognition in unconstrained environments. In: Workshop on Faces in’Real-Life’Images: Detec-
tion, Alignment, and Recognition. ​h​t​t​p​s​:​​/​/​i​n​r​​i​a​.​h​a​l​​.​s​c​i​​e​n​c​e​/​​i​n​r​i​a​​-​0​0​3​2​1​​9​2​3​/​​d​o​c​u​m​e​n​t
He Y, Meng G, Chen K, Hu X, He J (2020) Towards security threats of deep learning systems: a survey. IEEE
Trans Software Eng 48(5):1743–1770
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale
update rule converge to a local nash equilibrium. Adv Neural Inform Process Syst 30
Hung J (2023) Models as personal data. Available at SSRN 4504856
Hu H, Wang S, Dong T, Xue M (2024) Learn what you want to unlearn: unlearning inversion attacks against
machine unlearning. In: 2024 IEEE Symposium on Security and Privacy (SP), pp 262–262. IEEE,
Piscataway, USA. ​h​t​t​p​s​:​​/​/​w​w​w​​.​c​o​m​p​u​​t​e​r​.​​o​r​g​/​c​​s​d​l​/​p​​r​o​c​e​e​d​​i​n​g​s​​-​a​r​t​i​​c​l​e​/​s​​p​/​2​0​2​4​​/​3​1​3​​0​0​0​a​2​6​2​/​1​W​P​c​Z​m​o​
5​z​2​w
Huang Y, Wang Y, Li J, Yang L, Song K, Wang L (2024) Adaptive hybrid masking strategy for privacy-
preserving face recognition against model inversion attack. Preprint at arXiv:2403.10558
Hatamizadeh A, Yin H, Molchanov P, Myronenko A, Li W, Dogra P, Feng A, Flores MG, Kautz J, Xu D
et al (2023) Do gradient inversion attacks make federated learning unsafe? IEEE Trans Med Imaging
42(7):2044–2056

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 45 of 52 242

Issa W, Moustafa N, Turnbull B, Choo K-KR (2024) Rve-pfl: Robust variational encoder-based personalised
federated learning against model inversion attacks. IEEE Transactions on Information Forensics and
Security
Jang J, Lyu H, Yang HJ (2023) Patch-mi: Enhancing model inversion attacks via patch-based reconstruction.
Preprint at arXiv:2312.07040
Jiang Y, Wang S, Valls V, Ko BJ, Lee W-H, Leung KK, Tassiulas L (2022) Model pruning enables efficient
federated learning on edge devices. IEEE Transact Neural Netw Learn Syst 34(12):10374–10386
Karras T (2019) A style-based generator architecture for generative adversarial networks. Preprint at
arXiv:1812.04948
Kahla M, Chen S, Just HA, Jia R (2022) Label-only model inversion attacks via boundary repulsion. In: Pro-
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15045–15053.​
h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​/​​C​V​P​R​​2​0​2​2​/​​h​t​m​l​/​​K​a​h​l​a​_​​L​a​b​e​​l​-​O​n​l​​y​_​M​o​d​​e​l​_​I​n​v​​e​r​s​i​​o​n​_​A​t​​t​a​c​k​s​​_​
v​i​a​_​B​​o​u​n​d​​a​r​y​_​R​​e​p​u​l​s​​i​o​n​_​C​V​​P​R​_​2​​0​2​2​_​p​a​p​e​r​.​h​t​m​l
Krall A, Finke D, Yang H (2020) Gradient mechanism to preserve differential privacy and deter against model
inversion attacks in healthcare analytics. In: 2020 42nd Annual International Conference of the IEEE
Engineering in Medicine & Biology Society (EMBC), pp 5714–5717. IEEE, Piscataway, USA
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
Krizhevsky A, Nair V, Hinton G (2010) Cifar-10 (Canadian Institute for Advanced Research) 5(4):1. ​h​t​t​p​:​/​/​
w​w​w​.​c​s​.​t​o​r​o​n​t​o​.​e​d​u​/​k​r​i​z​/​c​i​f​a​r​.​h​t​m​l​​​​
Khosravy M, Nakamura K, Hirose Y, Nitta N, Babaguchi N (2021) Model inversion attack: analysis under
gray-box scenario on deep learning based face recognition system. KSII Transact Internet Inform Syst
(TIIS) 15(3):1100–1118
Khosravy M, Nakamura K, Hirose Y, Nitta N, Babaguchi N (2022) Model inversion attack by integration of
deep generative models: privacy-sensitive face generation from a face recognition system. IEEE Trans
Inf Forensics Secur 17:357–372
Kaissis G, Ziller A, Passerat-Palmbach J, Ryffel T, Usynin D, Trask A, Lima I Jr, Mancuso J, Jungmann
F, Steinborn M-M (2021) End-to-end privacy preserving deep learning on multi-institutional medical
imaging. Nat Mach Intell 3(6):473–484
Li H, Chen Y, Luo J, Wang J, Peng H, Kang Y, Zhang X, Hu Q, Chan C, Xu Z, Hooi B, Song Y (2024) Privacy
in large language models: Attacks, defenses and future directions ​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​1​0​.​4​8​5​5​0​/​a​r​X​i​v​.​2​3​1​0​.​
1​0​3​8​3​​​​. arXiv:2310.10383
Lin W, Dong L, Xue P (2005) Visual distortion gauge based on discrimination of noticeable contrast changes.
IEEE Trans Circuits Syst Video Technol 15(7):900–909
LeCun Y (1998) The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
Lin Y, Han S, Mao H, Wang Y, Dally B (2018) Deep gradient compression: Reducing the communication
bandwidth for distributed training. In: International Conference on Learning Representations. ​h​t​t​p​s​:​/​/​o​
p​e​n​r​e​v​i​e​w​.​n​e​t​/​f​o​r​u​m​?​i​d​= ​S​k​h​Q​H​M​W​0​W​​​​
Li O, Hao Y, Wang Z, Zhu B, Wang S, Zhang Z, Feng F (2024) Model inversion attacks through target-
specific conditional diffusion models. https://doi.org/10.48550/arXiv.2407.11424. arXiv:2407.11424
Liu H, Li B, Gao C, Xie P, Zhao C (2023) Privacy-encoded federated learning against gradient-based data
reconstruction attacks. IEEE Trans Inf Forensics Secur 18:5860–5875
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE
International Conference on Computer Vision, pp 3730–3738. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​_​​i​c​
c​v​​_​2​0​1​5​​/​h​t​m​l​​/​L​i​u​_​D​​e​e​p​_​​L​e​a​r​n​​i​n​g​_​F​​a​c​e​_​I​C​​C​V​_​2​​0​1​5​_​p​a​p​e​r​.​h​t​m​l
Liang H, Li Y, Zhang C, Liu X, Zhu L (2023) EGIA: An external gradient inversion attack in federated learn-
ing. IEEE Transactions on Information Forensics and Security
Li J, Rakin AS, Chen X, He Z, Fan D, Chakrabarti C (2022) RESSFL: a resistance transfer framework for
defending model inversion attack in split federated learning. In: Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition, pp 10194–10202
Liu Y-H, Shen Y-C, Chen H-W, Chen M-S (2024) Construct a secure cnn against gradient inversion attack.
In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, pp 250–261
Liu S, Wang Z, Lei Q (2024) Data reconstruction attacks and defenses: a systematic evaluation. Preprint at
arXiv:2402.09478
Liu R, Wang D, Ren Y, Wang Z, Guo K, Qin Q, Liu X (2024) Unstoppable attack: Label-only model inversion
via conditional diffusion model. IEEE Transactions on Information Forensics and Security
Lu J, Xue L, Wan W, Li M, Zhang LY, Hu S (2023) Preserving privacy of input features across all stages of
collaborative learning. In: 2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications,
Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Net-
working (ISPA/BDCloud/SocialCom/SustainCom). IEEE, Piscataway, pp 191–198. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​
e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​1​0​4​9​1​7​4​1​/

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 46 of 52 W. Yang et al.

Li W, Yu P, Cheng Y, Yan J, Zhang Z (2024) Efficient and privacy-enhanced federated learning based on
parameter degradation. IEEE Transactions on Services Computing, pp 1–16 ​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​1​0​.​1​1​0​9​/​T​
S​C​.​2​0​2​4​.​3​3​9​9​6​5​9​​​​
Liu T, Yao H, Wu T, Qin Z, Lin F, Ren K, Chen C (2024) Mitigating privacy risks in LLM embeddings from
embedding inversion. Preprint at https://doi.org/10.48550/arXiv.2411.05034
Luo Z, Zhu C, Fang L, Kou G, Hou R, Wang X (2022) An effective and practical gradient inversion attack.
Int J Intell Syst 37(11):9373–9389
Li Z, Zhang J, Liu L, Liu J (2022) Auditing privacy defenses in federated learning via generative gradient
leakage. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp 10132–10142. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​/​​C​V​P​R​​2​0​2​2​/​​h​t​m​l​/​​L​i​_​A​u​d​​i​t​i​n​​g​_​P​r​i​​v​a​c​y​_​​D​e​f​e​n​s​​
e​s​_​i​​n​_​F​e​d​​e​r​a​t​e​​d​_​L​e​a​r​​n​i​n​g​​_​v​i​a​_​​G​e​n​e​r​​a​t​i​v​e​_​​G​r​a​d​​i​e​n​t​_​L​e​a​k​a​g​e​_​C​V​P​R​_​2​0​2​2​_​p​a​p​e​r​.​h​t​m​l
Makhdoom I, Abolhasan M, Lipman J, Shariati N, Franklin D, Piccardi M (2024) Securing personally iden-
tifiable information: a survey of Sota techniques, and a way forward. IEEE Access
Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis
Support Syst 62:22–31
Milner L (2024) Threat models to machine unlearning
MaungMaung A, Kiya H (2023) Generative model-based attack on learnable image encryption for privacy-
preserving deep learning. Preprint at https://doi.org/10.48550/arXiv.2303.05036
Masuda H, Kita K, Koizumi Y, Takemasa J, Hasegawa T (2021) Model fragmentation, shuffle and aggrega-
tion to mitigate model inversion in federated learning. In: 2021 IEEE International Symposium on Local
and Metropolitan Area Networks (LANMAN). IEEE, Piscataway, pp 1–6
Melis L, Song C, De Cristofaro E, Shmatikov V (2019) Exploiting unintended feature leakage in collabora-
tive learning. In: 2019 IEEE Symposium on Security and Privacy (SP). IEEE, Piscataway, pp 691–706.​
h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​8​8​3​5​2​6​9​/
Madono K, Tanaka M, Onishi M, Ogawa T (2021) SIA-GAN: Scrambling inversion attack using generative
adversarial network. IEEE Access 9:129385–129393
Morris JX, Zhao W, Chiu JT, Shmatikov V, Rush AM (2023) Language model inversion ​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​1​0​.​4​
8​5​5​0​/​a​r​X​i​v​.​2​3​1​1​.​1​3​6​4​7​​​​. arXiv:2311.13647
Nguyen N-B, Chandrasegaran K, Abdollahzadeh M, Cheung N-M (2023) Re-thinking model inversion
attacks against deep neural networks. In: Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pp 16384–16393
Nguyen B-N, Chandrasegaran K, Abdollahzadeh M, Cheung N-MM (2024) Label-only model inversion
attacks via knowledge transfer. Adv Neural Inform Process Syst 36
Nguyen K (2024) Enhancing data privacy in artificial intelligence
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsuper-
vised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning, vol.
2011. Granada, London, p 4. ​h​t​t​p​:​/​​/​r​e​s​e​​a​r​c​h​.​g​​o​o​g​l​​e​.​c​o​m​​/​p​u​b​s​​/​a​r​c​h​i​​v​e​/​3​​7​6​4​8​.​p​d​f
Niu B, Wang X, Zhang L, Guo S, Cao J, Li F (2023) A sensitivity-aware and block-wise pruning method
for privacy-preserving federated learning. In: GLOBECOM 2023-2023 IEEE Global Communications
Conference. IEEE, Piscataway, pp 4259–4264. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​​1​0​4​3​7​​7​6​6​
/​?​c​​a​s​a​_​​t​o​k​e​n​​= ​_​F​W​i​​c​t​A​I​J​d​​U​A​A​A​​A​A​:​s​i​​A​8​O​J​E​​4​d​n​r​4​n​​Y​e​u​j​​V​k​R​p​J​​N​J​x​E​5​​5​Y​b​S​D​K​​v​O​i​u​​i​g​b​J​F​​z​l​M​Z​R​​6​k​
v​W​R​T​​e​z​u​3​​O​T​0​Y​h​e​n​4​c​d​x​M​E​X​Z​q​w
Nguyen TPV, Yang W, Tang Z, Xia X, Mullens AB, Dean JA, Li Y (2024) Lightweight federated learning for
STIS/HIV prediction. Sci Rep 14(1):6560
Noorbakhsh SL, Zhang B, Hong Y, Wang B (2024) Inf2Guard: An Information-Theoretic framework for
learning Privacy-Preserving representations against inference attacks. In: 33rd USENIX Security Sym-
posium (USENIX Security 24), pp 2405–2422. ​h​t​t​p​s​:​​/​/​w​w​w​​.​u​s​e​n​i​​x​.​o​r​​g​/​c​o​n​​f​e​r​e​n​​c​e​/​u​s​e​​n​i​x​s​​e​c​u​r​i​​t​y​2​4​/​​p​
r​e​s​e​n​​t​a​t​i​​o​n​/​n​o​o​r​b​a​k​h​s​h
Ovi PR, Dey E, Roy N, Gangopadhyay A (2023) Mixed quantization enabled federated learning to tackle
gradient inversion attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pp 5046–5054
Pizzi K, Boenisch F, Sahin U, Böttinger K (2023) Introducing model inversion attacks on automatic speaker
recognition. Preprint at arXiv:2301.03206
Pang S, Chen Y, Deng J, Wu J, Bai Y, Xu W (2024) Adversarial for good-defending training data privacy
with adversarial attack wisdom. In: 2024 IEEE International Conference on Metaverse Computing,
Networking, and Applications (MetaCom), pp. 190–197. IEEE, Piscataway, USA. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​
e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​​1​0​7​4​0​​0​2​2​/​?​c​​a​s​a​_​​t​o​k​e​n​​= ​-​l​q​p​​C​o​Q​v​v​J​​M​A​A​A​​A​A​:​N​p​​u​G​c​v​A​​Q​S​2​Z​5​s​​y​j​_​B​​p​M​I​u​
T​​t​I​o​a​c​​Q​u​I​O​o​s​​h​d​0​4​​V​0​6​W​l​​N​C​b​D​A​​B​G​v​P​m​T​​y​p​W​a​​Z​D​W​8​W​d​p​9​i​E​F​F​A​o​L​E​w
Petrov I, Dimitrov DI, Baader M, Müller MN, Vechev M (2024) DAGER: Exact gradient inversion for large
language models. Preprint at arXiv:2405.15586

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 47 of 52 242

Prakash P, Ding J, Li H, Errapotu SM, Pei Q, Pan M (2020) Privacy preserving facial recognition against
model inversion attacks. In: GLOBECOM 2020-2020 IEEE Global Communications Conference.
IEEE, Piscataway, pp 1–6
Pei W, Li Y, Siuly S, Wen P (2022) A hybrid deep learning scheme for multi-channel sleep stage classifica-
tion. Comput Mater Continua 71(1):889–905
Peng X, Liu F, Zhang J, Lan L, Ye J, Liu T, Han B (2022) Bilateral dependency optimization: Defending against
model-inversion attacks. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Dis-
covery and Data Mining. ACM, Washington, pp 1358–1367. https://doi.org/10.1145/3534678.3539376
Papadopoulos G, Satsangi Y, Eloul S, Pistoia M (2024) In: Goharian N, Tonellotto N, He Y, Lipani A,
McDonald G, Macdonald C, Ounis I (eds.) Absolute Variation Distance: An Inversion Attack Evalua-
tion Metric for Federated Learning. Lecture Notes in Computer Science, vol. 14611. Springer, Cham,
pp 243–256. https://doi.org/10.1007/978-3-031-56066-8_20
Palihawadana C, Wiratunga N, Kalutarage H, Wijekoon A (2023) Mitigating gradient inversion attacks in
federated learning with frequency transformation. In: European Symposium on Research in Computer
Security. Springer, Cham, pp 750–760
Pengcheng L, Yi J, Zhang L (2018) Query-efficient black-box attack by active learning. In: 2018 IEEE Inter-
national Conference on Data Mining (ICDM). IEEE, Piscataway, pp 1200–1205. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​
e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​8​5​9​4​9​6​8​/
Pan X, Zhang M, Ji S, Yang M (2020) Privacy risks of general-purpose language models. In: 2020 IEEE
Symposium on Security and Privacy (SP). IEEE, Piscataway, pp 1314–1331
Qi G, Chen Y, Mao X, Hui B, Li X, Zhang R, Xue H (2023) Model inversion attack via dynamic memory
learning. In: Proceedings of the 31st ACM International Conference on Multimedia, pp 5614–5622
Qiu Y, Fang H, Yu H, Chen B, Qiu M, Xia S-T (2024) A closer look at gan priors: Exploiting intermediate
features for enhanced model inversion attacks. In: ECCV 2024. ​h​t​t​p​s​:​​/​/​w​w​w​​.​e​c​v​a​.​​n​e​t​/​​p​a​p​e​r​​s​/​e​c​c​​v​_​2​0​2​
4​​/​p​a​p​​e​r​s​_​E​C​C​V​/​p​a​p​e​r​s​/​0​4​6​4​2​.​p​d​f
Qi T, Wang H, Huang Y (2024) Towards the robustness of differentially private federated learning. Proc
AAAI Conf Artif Intell 38:19911–19919. ​h​t​t​p​s​:​​/​/​o​j​s​​.​a​a​a​i​.​​o​r​g​/​​i​n​d​e​x​​.​p​h​p​/​​A​A​A​I​/​a​​r​t​i​c​​l​e​/​v​i​e​w​/​2​9​9​6​7
Qiu Y, Yu H, Fang H, Yu W, Chen B, Wang X, Xia S-T, Xu K (2024) MIBENCH: a comprehensive benchmark
for model inversion attack and defense https://doi.org/10.48550/arXiv.2410.05159. arXiv:2410.05159
Qiu P, Zhang X, Ji S, Fu C, Yang X, Wang T (2024) Hashvfl: Defending against data reconstruction attacks in
vertical federated learning. IEEE Transactions on Information Forensics and Security
Ren H, Deng J, Xie X, Ma X, Ma J (2023) Gradient leakage defense with key-lock module for federated
learning. Preprint at arXiv:2305.04095
Rigaki M, Garcia S (2023) A survey of privacy attacks in machine learning. ACM Comput Surv 56(4):1–34
Ra JS, Li T, Li Y (2021) A novel spectral entropy-based index for assessing the depth of anaesthesia. Brain
Inform 8(1):10. https://doi.org/10.1186/s40708-021-00130-8
Shin S, Boyapati M, Suo K, Kang K, Son J (2023) An empirical analysis of image augmentation against
model inversion attack in federated learning. Clust Comput 26(1):349–366
Slokom M, Wolf P-P, Larson M (2023) Exploring privacy-preserving techniques on synthetic data as a
defense against model inversion attacks. In: International Conference on Information Security. Springer,
Cham, pp 3–23
Struppek L, Hintersdorf D, Correia ADA, Adler A, Kersting K (2022) Plug & play attacks: Towards robust
and flexible model inversion attacks. Preprint at arXiv:2201.12179
Struppek L, Hintersdorf D, Kersting K (2023) Be careful what you smooth for: Label smoothing can be a
privacy shield but also a catalyst for model inversion attacks. Preprint at arXiv:2310.06549
Shi Y, Kotevska O, Reshniak V, Singh A, Raskar R (2024) Dealing doubt: Unveiling threat models in gradi-
ent inversion attacks under federated learning, a survey and taxonomy. Preprint at arXiv:2405.10376
Shu Y, Li S, Dong T, Meng Y, Zhu H (2025) Model inversion in split learning for personalized llms:
New insights from information bottleneck theory https://doi.org/10.48550/arXiv.2501.05965.
arXiv:2501.05965
Sun J, Li A, Wang B, Yang H, Li H, Chen Y (2021) Soteria: Provable defense against privacy leakage in
federated learning from representation perspective. In: Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp 9311–9319. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​/​​C​V​P​
R​​2​0​2​1​/​​h​t​m​l​/​​S​u​n​_​S​o​​t​e​r​i​​a​_​P​r​o​​v​a​b​l​e​​_​D​e​f​e​n​​s​e​_​A​​g​a​i​n​s​​t​_​P​r​i​​v​a​c​y​_​L​​e​a​k​a​​g​e​_​i​n​​_​F​e​d​e​​r​a​t​e​d​_​​L​e​a​r​​n​i​n​g​_​F​r​o​
m​_​C​V​P​R​_​2​0​2​1​_​p​a​p​e​r​.​h​t​m​l
Scheliga D, Mäder P, Seeland M (2022) Precode-a generic model extension to prevent deep gradient leak-
age. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp
1849–1858. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​/​​W​A​C​V​​2​0​2​2​/​​h​t​m​l​/​​S​c​h​e​l​i​​g​a​_​P​​R​E​C​O​D​​E​_​-​_​A​​_​G​e​n​e​r​​
i​c​_​M​​o​d​e​l​_​​E​x​t​e​n​​s​i​o​n​_​T​​o​_​P​r​​e​v​e​n​t​​_​D​e​e​p​​_​G​r​a​d​i​​e​n​t​_​​W​A​C​V​_​2​0​2​2​_​p​a​p​e​r​.​h​t​m​l
Scheliga D, Mäder P, Seeland M (2023) Dropout is not all you need to prevent gradient leakage. Proc AAAI
Conf Artif Intell 37:9733–9741. ​h​t​t​p​s​:​​/​/​o​j​s​​.​a​a​a​i​.​​o​r​g​/​​i​n​d​e​x​​.​p​h​p​/​​A​A​A​I​/​a​​r​t​i​c​​l​e​/​v​i​e​w​/​2​6​1​6​3

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 48 of 52 W. Yang et al.

Singh S, Sharma PK, Moon SY, Park JH (2024) Advanced lightweight encryption algorithms for IoT devices:
survey, challenges and solutions. J Ambient Intell Human Comput 1–18
Shokri R, Stronati M, Song C, Shmatikov V (2017) Membership inference attacks against machine learning
models. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE, Piscataway, pp 3–18. ​h​t​t​p​s​:​​/​/​i​
e​e​​e​x​p​l​o​r​​e​.​i​e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​7​9​5​8​5​6​8​/
Sen J, Waghela H, Rakshit S (2024) Privacy in federated learning. Preprint at arXiv:2408.08904
Tian Z, Cui L, Zhang C, Tan S, Yu S, Tian Y (2023) The role of class information in model inversion attacks
against image deep learning classifiers. IEEE Transactions on Dependable and Secure Computing
Thapa B (2024) Assessing the viability of privacy, ethics, and utility in machine learning experiments via
analysis of structured data. Phd thesis, Marymount University
Titcombe T, Hall AJ, Papadopoulos P, Romanini D (2021) Practical defences against model inversion attacks
for split neural networks. Preprint at arXiv:2104.05743
Tran V-H, Nguyen N-B, Mai ST, Vandierendonck H, Cheung N-m (2024) Defending against model inversion
attacks via random erasing. https://doi.org/10.48550/arXiv.2409.01062. arXiv:2409.01062
Tschandl P, Rosendahl C, Kittler H (2018) The ham10000 dataset, a large collection of multi-source derma-
toscopic images of common pigmented skin lesions. Sci Data 5(1):1–9
Tang Z, Van Nguyen TP, Yang W, Xia X, Chen H, Mullens AB, Dean JA, Osborne SR, Li Y (2024) High secu-
rity and privacy protection model for STI/HIV risk prediction. Digit Health 10:20552076241298424.
https://doi.org/10.1177/20552076241298425
Ullah H, Manickam S, Obaidat M, Laghari SUA, Uddin M (2023) Exploring the potential of metaverse
technology in healthcare: applications, challenges, and future directions. IEEE Access 11:69686–69707
Vero M, Balunović M, Dimitrov DI, Vechev M (2023) Tableak: tabular data leakage in federated learning.
In: Proceedings of the 40th International Conference on Machine Learning. ICML’23, vol. 202. JMLR.
org, Honolulu, pp 35051–35083
Veale M, Binns R, Edwards L (2018) Algorithms that remember: model inversion attacks and data protection
law. Philosoph Transact R Soc A 376(2133):20180083. https://doi.org/10.1098/rsta.2018.0083
Virmaux A, Scaman K (2018) Lipschitz regularity of deep neural networks: analysis and efficient estimation.
Adv Neural Inform Process Syst 31
Wu D, Bai J, Song Y, Chen J, Zhou W, Xiang Y, Sajjanhar A (2024) Fedinverse: Evaluating privacy leakage
in federated learning. In: The Twelfth International Conference on Learning Representations. ​h​t​t​p​s​:​/​/​o​p​
e​n​r​e​v​i​e​w​.​n​e​t​/​f​o​r​u​m​?​i​d​= ​n​T​N​g​k​E​I​f​e​b​​​​
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to
structural similarity. IEEE Trans Image Process 13(4):600–612
Wu R, Chen X, Guo C, Weinberger KQ (2023) Learning to invert: Simple adaptive attacks for gradient
inversion in federated learning. In: Uncertainty in Artificial Intelligence, pp 2293–2303. PMLR, USA
Wan G, Du H, Yuan X, Yang J, Chen M, Xu J (2023) Enhancing privacy preservation in federated learning
via learning rate perturbation. In: Proceedings of the IEEE/CVF International Conference on Computer
Vision, pp 4772–4781. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​/​​I​C​C​V​​2​0​2​3​/​​h​t​m​l​/​​W​a​n​_​E​n​​h​a​n​c​​i​n​g​_​P​​r​i​v​a​c​​
y​_​P​r​e​s​​e​r​v​a​​t​i​o​n​_​​i​n​_​F​e​​d​e​r​a​t​e​​d​_​L​e​​a​r​n​i​n​​g​_​v​i​a​​_​L​e​a​r​n​​i​n​g​_​​R​a​t​e​_​​P​e​r​t​u​​r​b​a​t​i​o​​n​_​I​C​​C​V​_​2​0​2​3​_​p​a​p​e​r​.​h​t​m​l
Wang K-C, Fu Y, Li K, Khisti A, Zemel R, Makhzani A (2021) Variational model inversion attacks. Adv
Neural Inf Process Syst 34:9706–9719
Wang Y, Guo S, Deng Y, Zhang H, Fang Y (2024) Privacy-preserving task-oriented semantic communica-
tions against model inversion attacks. IEEE Transactions on Wireless Communications
Wang J, Guo S, Xie X, Qi H (2022) Protect privacy from gradient leakage attack in federated learning.
In: IEEE INFOCOM 2022-IEEE Conference on Computer Communications. IEEE, Piscataway, pp
580–589. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​9​7​9​6​8​4​1​/
Wang F, Hugh E, Li B (2024) More than enough is too much: Adaptive defenses against gradient leakage in
production federated learning. IEEE/ACM Transactions on Networking
Wu J, Hayat M, Zhou M, Harandi M (2024) Concealing sensitive samples against gradient leakage in feder-
ated learning. Proc AAAI Conf Artif Intell 38:21717–21725. ​h​t​t​p​s​:​​/​/​o​j​s​​.​a​a​a​i​.​​o​r​g​/​​i​n​d​e​x​​.​p​h​p​/​​A​A​A​I​/​a​​r​t​i​
c​​l​e​/​v​i​e​w​/​3​0​1​7​1
Wang S, Ji Z, Xiang L, Zhang H, Wang X, Zhou C, Li B (2024) Crafter: Facial feature crafting against
inversion-based identity theft on deep models. Preprint at arXiv:2401.07205
Wang Q, Kurz D (2022) Reconstructing training data from diverse ml models by ensemble inversion. In:
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2909–2917
Wang Z, Lee J, Lei Q (2023) Reconstructing training data from model gradient, provably. In: International
Conference on Artificial Intelligence and Statistics. PMLR, USA, pp 6595–6612. ​h​t​t​p​s​:​​/​/​p​r​o​​c​e​e​d​i​n​​g​s​.​
m​​l​r​.​p​r​​e​s​s​/​v​​2​0​6​/​w​a​​n​g​2​3​​g​.​h​t​m​l
Wu L, Liu Z, Pu B, Wei K, Cao H, Yao S (2025) DGGI: Deep generative gradient inversion with diffusion
model. Inform Fusion 113:102620

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 49 of 52 242

Wei S, Li Y, Yang W (2023) In: Li, Y, Huang, Z, Sharma, M, Chen, L, Zhou, R. (eds.) An Adaptive Feature
Fusion Network for Alzheimer’s Disease Prediction. Lecture Notes in Computer Science, vol 14305.
Springer, Singapore, pp 271–282. https://doi.org/10.1007/978-981-99-7108-4_23
Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM (2017) CHESTX-ray8: Hospital-scale chest X-ray
database and benchmarks on weakly-supervised classification and localization of common thorax
diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.
2097–2106. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​_​​c​v​p​r​​_​2​0​1​7​​/​h​t​m​l​​/​W​a​n​g​_​​C​h​e​s​​t​X​-​r​a​​y​8​_​H​o​​s​p​i​t​a​l​​-​S​c​a​​l​
e​_​C​h​e​s​t​_​C​V​P​R​_​2​0​1​7​_​p​a​p​e​r​.​h​t​m​l
Wang Z, Simoncelli EP, Bovik AC (2003) Multiscale structural similarity for image quality assessment. In:
The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol 2. IEEE, Pisca-
taway, pp 1398–1402. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​1​2​9​2​2​1​6​/
Wang Y, Si C, Wu X (2015) Regression model fitting under differential privacy and model inversion attack.
In: Twenty-fourth International Joint Conference on Artificial Intelligence. ​h​t​t​p​s​:​​/​/​w​w​w​​.​i​j​c​a​i​​.​o​r​g​​/​P​r​o​c​​e​
e​d​i​n​​g​s​/​1​5​/​​P​a​p​e​​r​s​/​1​4​6​.​p​d​f
Wan Y, Xu H, Liu X, Ren J, Fan W, Tang J (2022) Defense against gradient leakage attacks via learning to
obscure data. Preprint at arXiv:2206.00769
Wen J, Yiu S-M, Hui LC (2021) Defending against model inversion attack by adversarial examples. In:
2021 IEEE International Conference on Cyber Security and Resilience (CSR). IEEE, Piscataway, pp
551–556. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​​9​5​2​7​9​​4​5​/​?​c​a​​s​a​_​t​​o​k​e​n​= ​​a​3​X​f​N​​J​B​s​8​5​Q​​A​A​A​A​​A​
:​B​y​Y​​_​p​7​v​1​​J​c​K​N​E​N​​g​Q​d​U​​3​x​0​a​0​​x​G​n​1​B​​8​E​V​8​z​b​​P​K​Y​G​​e​7​a​d​5​​Q​h​k​g​O​​r​h​d​1​v​a​​I​H​B​a​​g​q​Y​6​D​J​-​T​9​i​j​J​d​m​a​Q
Wang T, Zhang Y, Jia R (2021) Improving robustness to model inversion attacks via mutual information
regularization. Proc AAAI Conf Artif Intell 35:11666–11673
Xu J, Hong C, Huang J, Chen LY, Decouchant J (2022) Agic: Approximate gradient inversion attack on feder-
ated learning. In: 2022 41st International Symposium on Reliable Distributed Systems (SRDS). IEEE,
Piscataway, pp 12–22
Xue L, Hu S, Zhao R, Zhang LY, Hu S, Sun L, Yao D (2024) Revisiting gradient pruning: A dual realization
for defending against gradient attacks. Proc AAAI Conf Artif Intell 38:6404–6412. ​h​t​t​p​s​:​​/​/​o​j​s​​.​a​a​a​i​.​​o​r​g​/​​
i​n​d​e​x​​.​p​h​p​/​​A​A​A​I​/​a​​r​t​i​c​​l​e​/​v​i​e​w​/​2​8​4​6​0
Xiao D, Li J, Li M (2024) In: Luo B, Cheng L, Wu Z-G, Li H, Li C (eds.) Privacy-Preserving Federated
Compressed Learning Against Data Reconstruction Attacks Based on Secure Data. Communications in
Computer and Information Science, vol 1969. Springer, Singapore, pp 325–​3​3​9​.​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​1​0​.​1​0​0​
7​/​9​7​8​-​9​8​1​-​9​9​-​8​1​8​4​-​7​_​2​5​​​​
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning
algorithms https://doi.org/10.48550/arXiv.1708.07747. arXiv:1708.07747
Xu Y, Zhang S, Ding Y, Wang Z (2024) Secure distributed machine learning client selection algorithm based
on privacy leakage weight. In: 2024 5th International Seminar on Artificial Intelligence, Networking
and Information Technology (AINIT), pp 790–794. ​h​t​t​p​s​:​​/​/​d​o​i​​.​o​r​g​/​1​​0​.​1​1​​0​9​/​A​I​​N​I​T​6​1​​9​8​0​.​2​0​​2​4​.​1​​0​5​8​1​8​
3​8. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​1​0​5​8​1​8​3​8
Yuan X, Chen K, Zhang J, Zhang W, Yu N, Zhang Y (2023) Pseudo label-guided model inversion attack via
conditional generative adversarial network. Proc AAAI Conf Artif Intell 37:3349–3357. ​h​t​t​p​s​:​​/​/​o​j​s​​.​a​a​a​i​
.​​o​r​g​/​​i​n​d​e​x​​.​p​h​p​/​​A​A​A​I​/​a​​r​t​i​c​​l​e​/​v​i​e​w​/​2​5​4​4​2
Yu W, Fang H, Chen B, Sui X, Chen C, Wu H, Xia S-T, Xu, K (2024) GI-NAS: Boosting gradient inversion
attacks through adaptive neural architecture search. Preprint at arXiv:2405.20725
Yang X, Feng Y, Fang W, Shao J, Tang X, Xia S-T, Lu R (2022) An accuracy-lossless perturbation method for
defending privacy attacks in federated learning. In: Proceedings of the ACM Web Conference 2022, pp
732–742. ACM, Virtual Event, Lyon France. https://doi.org/10.1145/3485447.3512233
Yang H, Ge M, Xue D, Xiang K, Li H, Lu R (2023) Gradient leakage attacks in federated learning: Research
frontiers, taxonomy and future directions. IEEE Network
Yang C, Hang S, Ding Y, Li C, Liang H, Liu Z (2024) Gradient leakage defense in federated learning using
gradient perturbation-based dynamic clipping. In: 2024 IEEE International Conference on Web Ser-
vices (ICWS), pp 178–187. IEEE, Piscataway, USA. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​/​1​
0​7​0​7​5​8​5​/
Ye Z, Luo W, Naseem ML, Yang X, Shi Y, Jia Y (2023) C2FMI: Corse-to-fine black-box model inversion
attack. IEEE Trans Dependable Secure Comput 21(3):1437–1450
Ye Z, Luo W, Zhou Q, Zhu Z, Shi Y, Jia Y (2024) Gradient inversion attacks: Impact factors analyses and
privacy enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence
Ye Z, Luo W, Zhou Q, Tang Y (2024) High-fidelity gradient inversion in distributed learning. Proc AAAI
Conf Artif Intell 38:19983–19991

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 50 of 52 W. Yang et al.

Yin H, Molchanov P, Alvarez JM, Li Z, Mallya A, Hoiem D, Jha NK, Kautz J (2020) Dreaming to distill:
Data-free knowledge transfer via deepinversion. In: Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pp 8715–8724. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​_​​C​V​P​R​​_​2​
0​2​0​​/​h​t​m​l​​/​Y​i​n​_​D​​r​e​a​m​​i​n​g​_​t​​o​_​D​i​s​​t​i​l​l​_​D​​a​t​a​-​​F​r​e​e​_​​K​n​o​w​l​​e​d​g​e​_​T​​r​a​n​s​​f​e​r​_​v​​i​a​_​D​e​​e​p​I​n​v​e​​r​s​i​o​​n​_​C​V​P​R​_​2​0​2​
0​_​p​a​p​e​r​.​h​t​m​l
Yin H, Mallya A, Vahdat A, Alvarez JM, Kautz J, Molchanov P (2021) See through gradients: Image batch
recovery via gradinversion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pp 16337–16346. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​/​​C​V​P​R​​2​0​2​1​/​​h​t​m​l​/​​Y​i​n​_​S​e​​e​_​T​
h​​r​o​u​g​h​​_​G​r​a​d​​i​e​n​t​s​_​​I​m​a​g​​e​_​B​a​t​​c​h​_​R​e​​c​o​v​e​r​y​​_​v​i​a​​_​G​r​a​d​​I​n​v​e​r​​s​i​o​n​_​C​​V​P​R​_​​2​0​2​1​_​p​a​p​e​r​.​h​t​m​l
Yoshimura S, Nakamura K, Nitta N, Babaguchi N (2021) Model inversion attack against a face recognition
system in a black-box setting. In: 2021 Asia-Pacific Signal and Information Processing Association
Annual Summit and Conference (APSIPA ASC). IEEE, Piscataway, pp 1800–1807
Yu H, Qiu Y, Fang H, Chen B, Yu S, Wang B, Xia S-T, Xu K (2024) Calor: Towards comprehensive model
inversion defense. Preprint at arXiv:2410.05814
Yang W, Wang S, Cui H, Tang Z, Li Y (2023) A review of homomorphic encryption for privacy-preserving
biometrics. Sensors 23(7):3566
Yang W, Wang S, Hu J, Ibrahim A, Zheng G, Macedo M, Johnstone M, Valli C (2019) A cancelable iris- and
steganography-based user authentication system for the internet of things. Sensors 19(13):2985. ​h​t​t​p​s​:​
/​/​d​o​i​.​o​r​g​/​1​0​.​3​3​9​0​/​s​1​9​1​3​2​9​8​5​​​​
Yang W, Wang S, Hu J, Zheng G, Valli C (2019) Security and accuracy of fingerprint-based biometrics: a
review. Symmetry 11(2):141. https://doi.org/10.3390/sym11020141
Yang W, Wang S, Kang JJ, Johnstone MN, Bedari A (2022) A linear convolution-based cancelable fingerprint
biometric authentication system. Comput Security 114:102583
Yuan Z, Wu F, Long Y, Xiao C, Li B (2022) In: Avidan, S, Brostow, G, Cissé, M, Farinella, G.M, Hassner,
T. (eds.) SecretGen: Privacy Recovery on Pre-trained Models via Distribution Discrimination. Lecture
Notes in Computer Science, vol 13665. Springer, Cham, pp 139–155. ​h​t​t​p​s​:​/​/​d​o​i​.​o​r​g​/​1​0​.​1​0​0​7​/​9​7​8​-​3​-​0​
3​1​-​2​0​0​6​5​-​6​_​9​​​​
Yang Z, Yang S, Huang Y, Martínez J-F, López L, Chen Y (2023) AAIA: an efficient aggregation scheme
against inverting attack for federated learning. Int J Inf Secur 22(4):919–930
Zhang K, Cheng S, Shen G, Ribeiro B, An S, Chen P-Y, Zhang X, Li N (2025) Censor: Defense against gradi-
ent inversion via orthogonal subspace Bayesian sampling. NDSS 2025
Zhang R, Guo S, Wang J, Xie X, Tao D (2022) A survey on gradient inversion: Attacks, defenses and future
directions. In: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
(IJCAI-22). ​h​t​t​p​s​:​​/​/​w​w​w​​.​i​j​c​a​i​​.​o​r​g​​/​p​r​o​c​​e​e​d​i​n​​g​s​/​2​0​2​​2​/​0​7​​9​1​.​p​d​f
Zhang R, Hidano S, Koushanfar F (2022) Text revealer: Private text reconstruction via model inversion
attacks against transformers. Preprint at arXiv:2209.10505
Zhu H, Huang L, Xie Z (2024) GGI: Generative gradient inversion attack in federated learning. In: 2024 6th
International Conference on Data-driven Optimization of Complex Systems (DOCS). IEEE, Piscat-
away, pp 379–384
Zhu H, Huang L, Xie Z (2024) Privacy attack in federated learning is not easy: an experimental study. Pre-
print at arXiv:2409.19301
Zhang J, Hou C, Yang X, Yang X, Yang W, Cui H (2024) Advancing face detection efficiency: utilizing clas-
sification networks for lowering false positive incidences. Array 22:100347
Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as
a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pp 586–595. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​_​​c​v​p​r​​_​2​0​1​8​​/​h​t​m​l​​/​Z​h​a​n​g​​_​T​h​e​​_​U​n​r​e​​a​s​o​n​a​​b​l​e​_​E​
f​​f​e​c​t​​i​v​e​n​e​s​s​_​C​V​P​R​_​2​0​1​8​_​p​a​p​e​r​.​h​t​m​l
Zhang Y, Jia R, Pei H, Wang W, Li B, Song D (2020) The secret revealer: Generative model-inversion attacks
against deep neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp 253–261. ​h​t​t​p​:​/​​/​o​p​e​n​​a​c​c​e​s​s​​.​t​h​e​​c​v​f​.​c​​o​m​/​c​o​​n​t​e​n​t​_​​C​V​P​R​​_​2​0​2​0​​/​h​t​m​l​​/​Z​h​a​n​g​​_​T​h​
e​​_​S​e​c​r​​e​t​_​R​e​​v​e​a​l​e​r​​_​G​e​n​​e​r​a​t​i​​v​e​_​M​o​​d​e​l​-​I​n​​v​e​r​s​​i​o​n​_​A​​t​t​a​c​k​​s​_​A​g​a​i​​n​s​t​_​​D​e​e​p​_​​N​e​u​r​a​​l​_​N​e​t​w​​o​r​k​s​​_​C​V​P​R​_​2​
0​2​0​_​p​a​p​e​r​.​h​t​m​l
Zhu L, Liu Z, Han S (2019) Deep leakage from gradients. Adv Neural Inform Process Syst 32
Zhang Z, Liu Q, Huang Z, Wang H, Lee C-K, Chen E (2022) Model inversion attacks against graph neural
networks. IEEE Trans Knowl Data Eng 35(9):8729–8741
Zhao B, Mopuri KR, Bilen H (2020) IDLG: Improved deep leakage from gradients. Preprint at
arXiv:2001.02610
Zhang Q, Ma J, Xiao Y, Lou J, Xiong L (2020) Broadening differential privacy for deep learning against
model inversion attacks. In: 2020 IEEE International Conference on Big Data (Big Data). IEEE, Pisca-
taway, pp 1061–1070

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Deep learning model inversion attacks and defenses: a comprehensive… Page 51 of 52 242

Zhu Z, Shi Y, Luo J, Wang F, Peng C, Fan P, Letaief KB (2023) Fedlp: Layer-wise pruning mechanism for
communication-computation efficient federated learning. In: ICC 2023-IEEE International Conference
on Communications. IEEE, Piscataway, pp 1250–1255. ​h​t​t​p​s​:​​/​/​i​e​e​​e​x​p​l​o​r​​e​.​i​e​​e​e​.​o​r​​g​/​a​b​s​​t​r​a​c​t​/​​d​o​c​u​​m​e​n​t​
/​​1​0​2​7​8​​5​6​3​/​?​c​​a​s​a​_​​t​o​k​e​n​​= ​U​4​r​-​​_​N​y​v​f​v​​8​A​A​A​​A​A​:​T​s​​C​O​g​S​k​​V​h​Z​7​5​K​​I​P​l​b​​O​f​B​e​Z​​v​M​E​p​F​​1​x​B​A​p​v​​M​d​Z​L​​R​
5​6​S​q​​9​K​U​C​x​​e​K​g​_​i​d​​H​P​9​Z​​0​U​a​h​T​o​-​c​C​O​_​a​i​m​A​B​A
Zhang Z, Tianqing Z, Ren W, Xiong P, Choo K-KR (2023) Preserving data privacy in federated learning
through large gradient pruning. Comput Security 125:103039
Zhang X, Wei X-Y, Wu J, Zhang T, Zhang Z, Lei Z, Li Q (2024) Compositional inversion for stable diffusion
models. Proc AAAI Conf Artif Intell 38:7350–7358. ​h​t​t​p​s​:​​/​/​o​j​s​​.​a​a​a​i​.​​o​r​g​/​​i​n​d​e​x​​.​p​h​p​/​​A​A​A​I​/​a​​r​t​i​c​​l​e​/​v​i​e​w​
/​2​8​5​6​5
Zhang L, Zhang L, Mou X, Zhang D (2011) FSIM: a feature similarity index for image quality assessment.
IEEE Trans Image Process 20(8):2378–2386
Zhao X, Zhang W, Xiao X, Lim B (2021) Exploiting explanations for model inversion attacks. In: Proceed-
ings of the IEEE/CVF International Conference on Computer Vision, pp 682–692. ​h​t​t​p​:​​​/​​/​o​p​e​n​a​c​c​e​s​​s​.​t​
h​​e​c​v​​f​.​​c​​o​m​/​c​​o​n​t​e​​n​​t​/​I​C​​C​V​2​​0​​2​1​/​h​​t​​m​l​/​​Z​h​​a​o​_​E​​x​p​l​o​i​​​t​i​n​g​_​​E​x​p​l​a​n​​a​​t​i​​o​n​​s​_​f​​o​r​​_​M​o​​d​e​l​_​I​​n​v​​e​r​​s​i​o​n​_​​A​t​t​a​c​​k​s​_​I​​C​
C​V​_​2​0​​2​1​_​p​a​p​e​r​.​h​t​m​l
Zhou S, Zhu T, Ye D, Zhou W, Zhao W (2024) Inversion-guided defense: Detecting model stealing attacks by
output inverting. IEEE Transactions on Information Forensics and Security
Zhou Z, Zhu J, Yu F, Li X, Peng X, Liu T, Han B (2024) Model inversion attacks: A survey of approaches and
countermeasures. https://doi.org/10.48550/arXiv.2411.10023. arXiv:2411.10023

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

Authors and Affiliations

Wencheng Yang1 · Song Wang2 · Di Wu1 · Taotao Cai1 · Yanming Zhu3 · Shicheng Wei1 ·
Yiying Zhang4 · Xu Yang5 · Zhaohui Tang1 · Yan Li1

Wencheng Yang
wencheng.yang@unisq.edu.au
Song Wang
song.wang@latrobe.edu.au
Di Wu
di.wu@unisq.edu.au
Taotao Cai
taotao.cai@unisq.edu.au
Yanming Zhu
yanming.zhu@griffith.edu.au
Shicheng Wei
shicheng.wei@unisq.edu.au
Yiying Zhang
yiyingzhang@tust.edu.cn
Xu Yang
xu.yang@mju.edu.cn
Zhaohui Tang
zhaohui.tang@unisq.edu.au
Yan Li
yan.li@unisq.edu.au
1
University of Southern Queensland, Toowoomba, QLD 4350, Australia

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
242 Page 52 of 52 W. Yang et al.

2
La Trobe University, Melbourne, VIC 3083, Australia
3
Griffith University, Gold Coast, QLD 4222, Australia
4
Tianjin University of Science and Technology, Tianjin 300222, China
5
Minjiang University, Fuzhou 350108, Fujian, China

13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center
GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers
and authorised users (“Users”), for small-scale personal, non-commercial use provided that all
copyright, trade and service marks and other proprietary notices are maintained. By accessing,
sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of
use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and
students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and
conditions, a relevant site licence or a personal subscription. These Terms will prevail over any
conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to
the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of
the Creative Commons license used will apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may
also use these personal data internally within ResearchGate and Springer Nature and as agreed share
it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise
disclose your personal data outside the ResearchGate or the Springer Nature group of companies
unless we have your permission as detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial
use, it is important to note that Users may not:

1. use such content for the purpose of providing other users with access on a regular or large scale
basis or as a means to circumvent access control;
2. use such content where to do so would be considered a criminal or statutory offence in any
jurisdiction, or gives rise to civil liability, or is otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association
unless explicitly agreed to by Springer Nature in writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a
systematic database of Springer Nature journal content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a
product or service that creates revenue, royalties, rent or income from our content or its inclusion as
part of a paid for service or for other commercial gain. Springer Nature journal content cannot be
used for inter-library loans and librarians may not upload Springer Nature journal content on a large
scale into their, or any other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not
obligated to publish any information or content on this website and may remove it or features or
functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke
this licence to you at any time and remove access to any copies of the Springer Nature journal content
which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or
guarantees to Users, either express or implied with respect to the Springer nature journal content and
all parties disclaim and waive any implied warranties or warranties imposed by law, including
merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published
by Springer Nature that may be licensed from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a
regular basis or in any other manner not expressly permitted by these Terms, please contact Springer
Nature at

onlineservice@springernature.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy