BAM! Born-Again Multi-Task Networks for Natural Language Understanding

Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le


Abstract
It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task model surpass its single-task teachers. We evaluate our approach by multi-task fine-tuning BERT on the GLUE benchmark. Our method consistently improves over standard single-task and multi-task training.
Anthology ID:
P19-1595
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Editors:
Anna Korhonen, David Traum, Lluís Màrquez
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5931–5937
Language:
URL:
https://aclanthology.org/P19-1595/
DOI:
10.18653/v1/P19-1595
Bibkey:
Cite (ACL):
Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, and Quoc V. Le. 2019. BAM! Born-Again Multi-Task Networks for Natural Language Understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5931–5937, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
BAM! Born-Again Multi-Task Networks for Natural Language Understanding (Clark et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1595.pdf
Video:
 https://aclanthology.org/P19-1595.mp4
Code
 google-research/google-research
Data
CoLAGLUEMRPCMultiNLISSTSST-2

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy