0% found this document useful (0 votes)
24 views3 pages

LoRA Retains More

This document presents a novel machine unlearning method called PruneLoRA, which combines model pruning and Low-Rank Adaptation (LoRA) to efficiently remove specific data from machine learning models while retaining performance on remaining classes. The proposed method addresses challenges associated with traditional unlearning techniques, such as catastrophic forgetting and high computational costs, demonstrating improved unlearning accuracy and efficiency in experiments with ResNet-50 and Vision Transformer models. The findings suggest significant potential for further research in applying this approach to larger models like Large Language Models and Vision-Language Models.

Uploaded by

furvur8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views3 pages

LoRA Retains More

This document presents a novel machine unlearning method called PruneLoRA, which combines model pruning and Low-Rank Adaptation (LoRA) to efficiently remove specific data from machine learning models while retaining performance on remaining classes. The proposed method addresses challenges associated with traditional unlearning techniques, such as catastrophic forgetting and high computational costs, demonstrating improved unlearning accuracy and efficiency in experiments with ResNet-50 and Vision Transformer models. The findings suggest significant potential for further research in applying this approach to larger models like Large Language Models and Vision-Language Models.

Uploaded by

furvur8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

LoRA Unlearns More and Retains More (Student Abstract)

Atharv Mittal
Vision and Language Group, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India - 247667
atharv m@mfs.iitr.ac.in
arXiv:2411.11907v1 [cs.LG] 16 Nov 2024

Abstract One common approximate unlearning method is simple fine-


Due to increasing privacy regulations and regulatory compli-
tuning (FT), which fine-tunes the pre-trained model on the
ance, Machine Unlearning (MU) has become essential. The remaining dataset for a few training epochs, but presents its
goal of unlearning is to remove information related to a spe- own challenges. When a model is fine-tuned to forget spe-
cific class from a model. Traditional approaches achieve exact cific information, it often suffers from catastrophic forget-
unlearning by retraining the model on the remaining dataset, ting, i.e it loses the ability to perform well on previously
but incur high computational costs. This has driven the de- learned tasks, thus degrading its performance on the remain-
velopment of more efficient unlearning techniques, including ing classes. Fine-tuning can also be computationally expen-
model sparsification techniques, which boost computational sive on very large models.
efficiency, but degrade the model’s performance on the re- To address these limitations, (Liu et al. 2024) explored
maining classes. To mitigate these issues, we propose a novel model sparsification techniques, where they selectively re-
method, PruneLoRA which introduces a new MU paradigm,
termed prune first, then adapt, then unlearn. LoRA (Hu et al.
moved specific weights or neurons within the model, rather
2022) reduces the need for large-scale parameter updates by than updating the entire network before fine-tuning. By fo-
applying low-rank updates to the model. We leverage LoRA cusing on a subset of the model’s parameters, they reduced
to selectively modify a subset of the pruned model’s param- overfitting and computational cost. However, despite offer-
eters, thereby reducing the computational cost, memory re- ing improvements over standard fine-tuning, these methods
quirements and improving the model’s ability to retain perfor- still encounter challenges in effectively balancing the trade-
mance on the remaining classes. Experimental Results across offs between unlearning efficiency, computational cost, and
various metrics showcase that our method outperforms other maintaining overall model performance. Low-Rank Adapta-
approximate MU methods and bridges the gap between ex- tion (LoRA), offers a solution that builds upon the principles
act and approximate unlearning. Our code is available at of model sparsity while addressing its limitations by updat-
https://github.com/vlgiitr/LoRA-Unlearn.
ing only a small subset of model parameters through low-
rank matrix decomposition. Since (Biderman et al. 2024)
Introduction shows that in the context of LLMs, LoRA provides a form
The process of removing specific data points or classes from of regularization that mitigates “forgetting” of the source
trained machine learning models is known as machine un- domain, through rigorous experimentation, we prove using
learning. Its importance has intensified due to growing pri- LoRA to update model parameters preserves model perfor-
vacy concerns and the need to comply with evolving regu- mance and lowers computational costs exponentially.
lations, which enable users to request the removal of their
personal data from models as part of the “right to be forgot- Methodology
ten” in General Data Protection Regulation (GDPR). We evaluated four paradigms for machine unlearning:
Machine unlearning techniques can be classified into two
1. Fine-tuning: The model is fine-tuned on the remaining
broad categories: exact and approximate unlearning. The ex-
dataset, using standard gradient descent techniques.
act approach to machine unlearning typically involves re-
training the entire model on a modified dataset, excluding 2. Pruning + Fine-tuning: First, we apply model pruning
the data to be forgotten. While this method guarantees the to reduce the number of parameters. Then, the pruned
removal of the influence of a data instance from a model, it is model is fine-tuned on the remaining dataset (Liu et al.
highly computationally intensive for larger models. Approx- 2024)
imate unlearning focuses on reducing the influence of tar- 3. LoRA: Apply LoRA to selectively modify a subset of the
geted data points through efficient parameter updates. How- model’s parameters.
ever, these methods often struggle to balance unlearning ef- 4. Pruning + LoRA: First prune the model, then add LoRA
fectiveness with performance and computational efficiency. Adapters and fine-tune.
Copyright © 2025, Association for the Advancement of Artificial For our experiments, we employed a ResNet50 and a Vi-
Intelligence (www.aaai.org). All rights reserved. sion Transformer (ViT) and trained both on the CIFAR-10
UA MIA-Efficacy RA TA RTE GPU
Model
5 Epochs 10 Epochs 5 Epochs 10 Epochs 5 Epochs 10 Epochs 5 Epochs 10 Epochs (secs/epoch) GB
ResNet-50
Retrain 100.00 100.00 100.00 100.00 98.02 98.02 96.70 96.70 - -
Finetune 100.00 100.00 100.00 100.00 92.52 96.27 88.29 91.44 137 6.9
Pruned 100.00 100.00 100.00 100.00 94.92 95.68 90.79 90.72 137 4.2
LoRA 97.22 100.00 100.00 100.00 96.90 97.19 95.03 93.49 122 5.8
Pruned LoRA 99.78 99.98 97.68 97.89 97.96 98.00 95.18 95.41 122 5.5
ViT
Retrain 100.00 100.00 100.00 100.00 96.90 96.90 84.92 84.92 - -
Finetune 100.00 100.00 100.00 100.00 87.64 86.80 79.79 79.94 132 3.2
Pruned 100.00 100.00 100.00 100.00 84.76 86.14 78.81 79.53 132 3.2
LoRA 87.66 95.58 98.14 99.58 97.72 97.81 85.33 85.16 48 0.7
Pruned LoRA 100.00 100.00 100.00 100.00 97.39 97.63 85.53 85.34 48 0.7

Figure 1: Results of ResNet-50 and ViT When Tested Using Various Unlearning Approaches (in percent accuracy)

dataset. Our unlearning task focused on removing the influ- Future Scope
ence of the forget class while maintaining performance on There is significant potential for further research and ex-
the remaining classes. To establish an exact unlearning base- perimentation to strengthen and validate our hypothesis. A
line, we retrain both the models on the remaining dataset promising avenue for future research is the application of
for 200 and 90 epochs respectively. We used L2 Pruning to this method to Large Language Models (LLMs) and Vision-
prune 50% of the specific layers in each model, Convolu- Language Models (VLMs). These models, with their vast
tional layers were pruned in ResNet50 and linear and atten- parameter spaces, emphasize the need for efficient unlearn-
tion layers were pruned in ViT. After final finetuning on the ing techniques. Although computational constraints limited
remaining dataset for 5/10 epochs, we evaluate the models our ability to explore this direction, scaling our approach
based on the following metrics: to these larger models could help develop adaptable and
• Unlearning accuracy (UA): 1-Acc(Df), where Acc(Df) is privacy-preserving AI systems.
the accuracy of the unlearned model on the forget dataset.
• Membership inference attack (MIA-Efficacy): Applying Conclusion
the confidence-based MIA predictor to the unlearned This study addresses the challenge of machine unlearning in
model on the forgetting dataset (Df). A higher MIA- light of growing privacy regulations and the need for adapt-
Efficacy implies less information about Df in the model. able AI systems. We present a novel approach, PruneLoRA
• Remaining accuracy (RA): This refers to the accuracy of to LoRA to fine-tune sparse models. Our findings highlight
the unlearned model on the retain dataset. the efficacy of LoRA, especially when combined with prun-
• Testing accuracy (TA): This refers to the accuracy of the ing, in achieving high unlearning performance with minimal
unlearned model on the testing dataset of the remaining computational cost and memory requirements while main-
classes. taining general accuracy on remaining classes. These results
advances the research in exploring parameter efficient ma-
• Run-time efficiency (RTE): This measures the computa- chine approximate unlearning techniques, thus laying the
tion efficiency of the MU method (run time cost). groundwork for applying these methods to complex mod-
• GPU Memory (GPU): This measures the memory re- els such as Large Language Models and Vision-Language
quirements of the MU method for a model. Models.

Results References
Table 1 presents the accuracy metrics for both model un- Biderman, D.; Portes, J.; Ortiz, J. J. G.; Paul, M.; Greengard,
der the given five paradigms: It is observed that all methods P.; Jennings, C.; King, D.; Havens, S.; Chiley, V.; Frankle,
achieved perfect or near-perfect Unlearning Accuracy (UA) J.; Blakeney, C.; and Cunningham, J. P. 2024. LoRA Learns
and Membership Inference Attack (MIA) efficacy, indicat- Less and Forgets Less. Transactions on Machine Learning
ing successful removal of the target class information. For Research. Featured Certification.
ResNet-50, PruneLoRA outperformed all methods, achiev- Hu, E. J.; yelong shen; Wallis, P.; Allen-Zhu, Z.; Li, Y.;
ing the highest Remaining (RA) and Testing accuracy (TA), Wang, S.; Wang, L.; and Chen, W. 2022. LoRA: Low-Rank
while experiencing near-perfect UA. For the ViT model, Adaptation of Large Language Models. In International
PruneLoRA significantly outperformed other methods (ex- Conference on Learning Representations.
cept LoRA) in terms of RA and TA. Moreover, while LoRA
Liu, J.; Ram, P.; Yao, Y.; Liu, G.; Liu, Y.; SHARMA, P.;
demonstrated a drastically low UA, PruneLoRA achieved
Liu, S.; et al. 2024. Model sparsity can simplify machine
perfect UA. These results suggest that PruneLoRA offers a
unlearning. Advances in Neural Information Processing Sys-
balance between effective unlearning, retained model per-
tems, 36.
formance, and computational efficiency.
Appendix
Experiment Details
We trained ResNet-50 and Vision Transformer (ViT) on
CIFAR10, using custom implementations. The ResNet-50
model was trained for 200 epochs, and the ViT model was
trained for 90 epochs, both on a P100 GPU. They achieved
a test accuracy of 95.56% and 83.77% respectively. (a) Fine-tuned R (b) PruneFT R (c) LoRA R
All further experiments were conducted on a T4 GPU.
To allow for meaningful comparison between the various
fine-tuning techniques employed, we consistently used the
Adam optimizer with a learning rate of 10−3 , along with
cross-entropy loss.

In the case of ResNet-50, we applied Structured L2


pruning with a sparsity level of 0.5 across all convolutional
(d) PruneLoRA R (e) Fine-tuned V (f) PruneFT V
layers. Additionally, LoRA was applied to these layers
to enable efficient fine-tuning. It is worth noting that
future work could explore reducing the number of layers
to which LoRA is applied, potentially leading to further
computational gains without sacrificing model performance.

For the ViT model, Structured L2 pruning with 0.5


sparsity was applied to the last linear layer and the last
attention layer. While initial experiments involved applying (g) LoRA V (h) PruneLoRA V
LoRA to multiple attention layers, we found that restricting
LoRA to the last attention layer yielded the best results. Figure 2: Visual Comparison of Cross-Entropy Losses of
This insight highlights the importance of targeted layer Test and Forget Set (R: ResNet-50 V: ViT) Forget Losses
modification in enhancing the model’s efficiency. must be higher than Test losses for Unlearning Accuracy
The exact architecture and implementation details for
these experiments can be found in the public repository at without such constraints, another approach is to include
https://github.com/vlgiitr/LoRA-Unlearn. layer-specific adaptation strategies where different layers or
components of the model are subject to distinct unlearning
Detailed metric settings approaches after studying optimal unlearning strategies for
Details of MIA implementation: MIA is implemented using the respective layers. Models can also be studied under con-
the prediction confidence-based attack method. There are tinual learning contexts, where models repeatedly learn and
mainly two phases during its computation: (1) training unlearn information over time.
phase, and (2) testing phase. To train an MIA model, we
first sample a balanced dataset from the remaining dataset
(Dr) and the test dataset (different from the forgetting
dataset Df) to train the MIA predictor. The learned MIA
is then used for MU evaluation in its testing phase. To
evaluate the performance of MU, MIA-Efficacy is obtained
by applying the learned MIA predictor to the unlearned
model on the forgetting dataset (Df). Our objective is to find
out how many samples in Df can be correctly predicted as
non-training samples by the MIA model.
TN
MIA-Efficacy = |Df |

where TN refers to the true negatives predicted by our


MIA predictor, i.e., the number of the forgetting samples
predicted as non-training examples, and |Df | refers to the
size of the forgetting dataset.

Future Scope
Due to lack of computational resources and funding, we
were only able to perform a limited number of experiments,

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy