Clinic Albert
Clinic Albert
• BERT is published by Google in 2018. It obtained the best accuracy in 11 different NLP tasks.
Why was BERT
needed?
The lack of training data was a big challenge in NLP, as deep learning models
require large amounts of annotated data to perform well.
These kind of pre-trained models (e.g., BERT) can be fine-tuned on smaller task-
specific datasets (e.g., MIMIC notes) to get fine-tuned models (e.g., BioBERT,
SciBERT, ClinicalBERT) to achieve better accuracy in specific domain.
https://arxiv.org/pdf/1810.04805.pdf
https://huggingface.co/blog/bert-101
Bert- 1. Uses Masked Language Model to train model.
task
80% use “[mask]”
https://arxiv.org/pdf/1810.04805.pdf
Why ClinicalBERT?
Directly applying BERT to biomedical NLP tasks is not promising because of a word
distribution shift from general domain corpus to biomedical domain corpus.
Thus, other models e.g, BioBERT [2] and BlueBERT [3], SciBERT, ClinicalBERT
pretrained on biomedical domain corpus are proposed.
ClinicalBERT finetuning
https://arxiv.org/pdf/1904.05342.pdf
Finetuning task:
Readmission prediction
RP80: Recall at
Precision 80%,
which is used to
control false
positive.
https://arxiv.org/pdf/1904.05342.pdf
ClinicalBert Tutorial
• Modified from Chris McCormick and Nick Ryan's SciBERT
Tutorial
https://colab.research.google.com/drive/
19loLGUDjxGKy4ulZJ1m3hALq2ozNyEGe#scrollTo=uXKyKe3NZONV
Bert/ClinicalBERT
Architecture
BERT_base has 13 hidden layers:
the first layer = sub-word embedding layer = “input embeddings”= Token embeddings+
Segment Emb+ Position Emb)
The last layer = Contextual representations = final output of BERT = we usually use this as
embeddings for other tasks
https://arxiv.org/abs/1810.04805
BERT (clinical bert)’s first
layer:
Input embeddings
1. Token embeddings: A [CLS] token is added to the input word tokens at the
beginning of the first sentence and a [SEP] token is inserted at the end of each
sentence.
2.Segment embeddings: A marker indicating Sentence A or Sentence B is added
to each token. This allows the encoder to distinguish between sentences.
3.Positional embeddings: A positional embedding is added to each token to
indicate its position in the sentence.
Tokenizer
The token embeddings are obtained from WordPiece. An example of the tokenized word
is “coronaviruses” => “Co##rona##virus##es”.
For the named entity recognition task, [CLS] is added at the beginning of each sentence
and [SEP] is added at the end of each sentence.
https://arxiv.org/pdf/1609.08144v2.pdf
Compare all BERT
and its variant
Get
word
embeddi
ng and
sentenc
e
Get
word
embeddi
ng and
sentenc
e
Get
word
embeddi
ng and
sentenc
e
Embeddings of Clinical Bert
Embeddings of Bio Bert
Embeddings of Blue Bert
Embeddings of Sci Bert
References:
1. Bert: https://github.com/google-research/bert
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert:
Pre-training of deep bidirectional transformers for language
understanding. arXiv preprint arXiv:1810.04805.
2. BioBert: https://github.com/dmis-lab/biobert
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J.
(2019). BioBERT: pre-trained biomedical language
representation model for biomedical text mining. arXiv
preprint arXiv:1901.08746.
Notes: this tutorial is built based on these
reference:
1. https://towardsml.wordpress.com/2019/09/17/bert-
explained-a-complete-guide-with-theory-and-tutorial/
2. Transformer:
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L.,
Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you
need. In Advances in neural information processing systems
(pp. 5998-6008).
3. https://www.youtube.com/watch?v=Po38Dl-XDd4
4. https://medium.com/@_init_/why-bert-has-3-embedding-
layers-and-their-implementation-details-9c261108e28a