0% found this document useful (0 votes)
20 views25 pages

Clinic Albert

Uploaded by

Bhakti Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views25 pages

Clinic Albert

Uploaded by

Bhakti Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

BERT and ClinicalBERT

2018 Oct 2019 Jan 2019 Mar 2019 April


BERT BioBERT SciBERT ClinicalBERT
2018 Oct 2019 Jan 2019 Mar 2019 April
BERT BioBERT SciBERT ClinicalBERT
BERT = Bidirectional Encoder Representation from Transformers

• BERT is published by Google in 2018. It obtained the best accuracy in 11 different NLP tasks.
Why was BERT
needed?
The lack of training data was a big challenge in NLP, as deep learning models
require large amounts of annotated data to perform well.

To address this, researchers have developed pre-training techniques such as


BERT to utilize unannotated text data.

These kind of pre-trained models (e.g., BERT) can be fine-tuned on smaller task-
specific datasets (e.g., MIMIC notes) to get fine-tuned models (e.g., BioBERT,
SciBERT, ClinicalBERT) to achieve better accuracy in specific domain.

Read more: https://towardsml.wordpress.com/2019/09/17/bert-explained-a-complete-guide-with-theory-


and-tutorial/
What is the core idea behind
BERT?
• BERT takes advantages of multiple models

(1) BERT predicts word from given context - Word2Vec CBOW

(2) 2-layer bidirectional model –ELMO (a word embedding method for


representing a sequence of words as a corresponding sequence of
vectors)

(3) Transformer instead of RNN –GPT (Generative Pre-training)

Use Transformer proposed in Attention is All you need in 2017 to


replace RNN
Bert : Pretraining and fine-
tuning
The corpus BERT uses for pre-training is BooksCorpus and English Wikipedia.

https://arxiv.org/pdf/1810.04805.pdf
https://huggingface.co/blog/bert-101
Bert- 1. Uses Masked Language Model to train model.

Pretraining It masks 15% words of doc:

task
80% use “[mask]”

10% use original word

10% use a random word

e.g., To be or [mask] to be, that is the question

2. Continuous sentence or not

To be or not to be, that is the question VS To be


or not to be, or to take arms against a sea of
troubles
Fine tuning tasks

https://arxiv.org/pdf/1810.04805.pdf
Why ClinicalBERT?
Directly applying BERT to biomedical NLP tasks is not promising because of a word
distribution shift from general domain corpus to biomedical domain corpus.

Thus, other models e.g, BioBERT [2] and BlueBERT [3], SciBERT, ClinicalBERT
pretrained on biomedical domain corpus are proposed.
ClinicalBERT finetuning

https://arxiv.org/pdf/1904.05342.pdf
Finetuning task:
Readmission prediction

RP80: Recall at
Precision 80%,
which is used to
control false
positive.

https://arxiv.org/pdf/1904.05342.pdf
ClinicalBert Tutorial
• Modified from Chris McCormick and Nick Ryan's SciBERT
Tutorial

https://colab.research.google.com/drive/
19loLGUDjxGKy4ulZJ1m3hALq2ozNyEGe#scrollTo=uXKyKe3NZONV
Bert/ClinicalBERT
Architecture
BERT_base has 13 hidden layers:

the first layer = sub-word embedding layer = “input embeddings”= Token embeddings+
Segment Emb+ Position Emb)

The last layer = Contextual representations = final output of BERT = we usually use this as
embeddings for other tasks
https://arxiv.org/abs/1810.04805
BERT (clinical bert)’s first
layer:
Input embeddings
1. Token embeddings: A [CLS] token is added to the input word tokens at the
beginning of the first sentence and a [SEP] token is inserted at the end of each
sentence.
2.Segment embeddings: A marker indicating Sentence A or Sentence B is added
to each token. This allows the encoder to distinguish between sentences.
3.Positional embeddings: A positional embedding is added to each token to
indicate its position in the sentence.
Tokenizer
The token embeddings are obtained from WordPiece. An example of the tokenized word
is “coronaviruses” => “Co##rona##virus##es”.

For the named entity recognition task, [CLS] is added at the beginning of each sentence
and [SEP] is added at the end of each sentence.

https://arxiv.org/pdf/1609.08144v2.pdf
Compare all BERT
and its variant
Get
word
embeddi
ng and
sentenc
e
Get
word
embeddi
ng and
sentenc
e
Get
word
embeddi
ng and
sentenc
e
Embeddings of Clinical Bert
Embeddings of Bio Bert
Embeddings of Blue Bert
Embeddings of Sci Bert
References:

1. Bert: https://github.com/google-research/bert
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert:
Pre-training of deep bidirectional transformers for language
understanding. arXiv preprint arXiv:1810.04805.
2. BioBert: https://github.com/dmis-lab/biobert
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J.
(2019). BioBERT: pre-trained biomedical language
representation model for biomedical text mining. arXiv
preprint arXiv:1901.08746.
Notes: this tutorial is built based on these
reference:

1. https://towardsml.wordpress.com/2019/09/17/bert-
explained-a-complete-guide-with-theory-and-tutorial/
2. Transformer:
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L.,
Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you
need. In Advances in neural information processing systems
(pp. 5998-6008).
3. https://www.youtube.com/watch?v=Po38Dl-XDd4
4. https://medium.com/@_init_/why-bert-has-3-embedding-
layers-and-their-implementation-details-9c261108e28a

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy