0% found this document useful (0 votes)
67 views6 pages

Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-Trained Language Models

This document discusses using pre-trained language models to classify personality types based on the Myers-Briggs Type Indicator (MBTI) and generate personality-specific language. It scraped labeled texts to train a model that can predict MBTI types with 47% accuracy for all 4 types and 86% for at least 2 types. It also explores fine-tuning BERT to generate text conditioned on personality type, which could benefit psychology and conversational agents.

Uploaded by

Shem Patrick
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views6 pages

Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-Trained Language Models

This document discusses using pre-trained language models to classify personality types based on the Myers-Briggs Type Indicator (MBTI) and generate personality-specific language. It scraped labeled texts to train a model that can predict MBTI types with 47% accuracy for all 4 types and 86% for at least 2 types. It also explores fine-tuning BERT to generate text conditioned on personality type, which could benefit psychology and conversational agents.

Uploaded by

Shem Patrick
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/334478453

Myers-Briggs Personality Classification and Personality-Specific Language


Generation Using Pre-trained Language Models

Preprint · July 2019

CITATIONS READS

0 1,153

2 authors:

Sedrick Scott So Keh I-Tsun Cheng


The Hong Kong University of Science and Technology The Hong Kong University of Science and Technology
1 PUBLICATION   0 CITATIONS    1 PUBLICATION   0 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by I-Tsun Cheng on 31 October 2019.

The user has requested enhancement of the downloaded file.


Myers-Briggs Personality Classification and Personality-Specific
Language Generation Using Pre-trained Language Models

Sedrick Scott Keh I-Tsun Cheng


The Hong Kong University of The Hong Kong University of
Science and Technology Science and Technology
sskeh@connect.ust.hk ichengaa@connect.ust.hk

Abstract Overall, personality classification has many


complexities because there are countless factors
The Myers-Briggs Type Indicator (MBTI) involved. Furthermore, even a human being
is a popular personality metric that uses may not be able to accurately classify a person-
arXiv:1907.06333v1 [cs.LG] 15 Jul 2019

four dichotomies as indicators of per- ality given a text. However, we hypothesize that
sonality traits. This paper examines the using pre-trained language models might allow
use of pre-trained language models to us to pick up on various subtleties in how differ-
predict MBTI personality types based on ent personality types use language. The objec-
scraped labeled texts. The proposed tive of this paper is thus two-fold: given a set of
model reaches an accuracy of 0.47 for labeled texts, first, we want to train a model that
correctly predicting all 4 types and 0.86 can predict the Myers-Briggs type, and second,
for correctly predicting at least 2 types. use this model to generate a stream of text given
Furthermore, we investigate the possi- a certain type.
ble uses of a fine-tuned BERT model
for personality-specific language gener-
ation. This is a task essential for both 2 Related Work
modern psychology and for intelligent
empathetic systems. 2.1 Personality Classification Systems

1 Introduction The earliest forms of personality recognition


Proposed by psychoanalyst Carl Jung, the Myers- systems employed a combination of SVM and
Briggs Type Indicator (MBTI) is one of the most feature engineering (Rangel et al., 2016). Other
commonly-used personality tests of the 21st methods also utilized part-of-speech frequen-
century. Its prevalence ranges from casual Inter- cies (Litvinova et al., 2015). More recently, newer
net users to large corporations who use it as part systems have begun to use deep learning tech-
of their recruitment process (Michael, 2003). niques such as convolutional neural networks
This test uses four metrics to capture abstract and recurrent neural networks (Shi et al., 2017).
ideas related to one’s personality. These four di- These have the advantage of being able to eas-
chotomies are as follows: ily extract meaningful features, as well as cap-
ture temporal relationships between neighbor-
• introversion vs extroversion (I vs E) ing context words.
Today, state-of-the-art systems for personal-
• sensing vs intuition (S vs N)
ity classification employ a wide combination
• thinking vs feeling (T vs F) of deep learning techniques. C2W2S4PT is a
model that combines word bidirectional RNN
• perception vs judging (P vs J) and character bidirectional RNN to create hi-
erarchical word and sentence representations
The MBTI combines four letters to come up for personality modeling (Liu et al., 2016). Other
with an overall personality type. In total, there models also incorporate audio and video inputs
are 24 = 16 personality types. Each of these types in CNNs to output more accurate predictions
corresponds to a unique personality class. (Kampman et al., 2018).
2.2 Pre-trained Language Models ations, and then put spaces after punctuations
More recently, pre-trained language mod- to separate them as new tokens. We separated
els such as BERT have begun to emerge things like ’re or ’ll to be new tokens (ex. you’re =
(Devlin et al., 2018). BERT involves bidirectional you + ’re). Lastly, we converted all tokens to low-
training of transformers on two tasks simul- ercase.
taneously. These tasks are masked language Another important preprocessing step was to
modelling and next sentence prediction. BERT get rid of the instances where MBTI types were
has led to numerous improvements in various explicitly mentioned, and replace them with
areas of natural language processing, such placeholder "<type>", as these may distort the
as question answering and natural language task or make the task too easy for the classifier.
inference (Zhang and Xu, 2018).
However, to this day, very few research has 4 Classification Methodology
been done on applying pre-trained language
models on personality classification. As such, 4.1 Tokenization
this paper introduces the use of BERT on clas-
sifying personality. Furthermore, we also extend For tokenization, we used BERT’s custom to-
this by training the models to be able to gener- kenizer, which includes masking and padding
ate sentences given a type of personality. Such a the sentences. Sentences that were too long
feature will be helpful especially for empathetic were truncated by the indicated maximum se-
dialogue systems. quence length, and sentences that were too
short were padded with zeros. For classification
3 Data Scraping and Preprocessing tasks such as this, the start of sentence was indi-
cated with special token "[CLS]" and end of sen-
3.1 Scraping tence with special token "[SEP]". This works be-
For MBTI personality types, there is no standard cause the model has been pre-trained and we
dataset. Previous studies on the topic mostly are only fine-tuning over the pre-trained model
scraped datasets from various social media plat- (Devlin et al., 2018).
forms, including Twitter (Plank and Hovy, 2015)
and Reddit (Gjurkovic and Šnajder, 2018). 4.2 BERT Model Fine-Tuning and Training
For this paper, we scraped
posts from the forums in The fine-tuning model was created using Py-
https://www.personalitycafe.com. torch’s custom "BertForSequenceClassification"
PersonalityCafe’s MBTI forums are divided into model. Since the model was already pre-trained
16 distinct sections (one for each MBTI type). on 110 million parameters, the fine-tuning part
We also analyzed and discovered that over 95% basically involves training the final set of layers
of users who post in a particular section identify on our own corpus of inputs and outputs. The
as a member of that type, so these posts serve as main purpose of fine-tuning is to allow the BERT
a good general representation of how people of bidirectional transformer model to adapt to our
these personality types communicate. corpus and classification task, since the model
When scraping from this forum, we only con- itself was pre-trained on different tasks. Here,
sidered posts that were over 50 characters long, the main architecture for the BERT model’s fine-
since posts that are too short likely do not con- tuning for the sequence classification task con-
tain any meaningful information. Overall, we sists of 12 hidden layers with hidden size 768 and
scraped the 5,000 most recent posts from each a dropout probability of 0.1, as well as incorpo-
forum, resulting in a sample size of 68,733 posts rated attention functions with attention dropout
and 3,827,558 words. (Some forums had less of 0.1.
than 5,000 posts). This data was divided in an In the training part, we generated a dataloader
85-15 train-test ratio. and iterated over it. Training was done by batch
(batch size 32). For the loss function, a cross en-
3.2 Cleaning and Preprocessing tropy loss was used, and for the optimizer, we
For data cleaning, we first removed all the sym- used the BertAdam optimizer with warmup pro-
bols that were not letters, numbers, or punctu- portion 0.1 and weight decay 0.01.
5 Classification Results and Analysis Method Dataset Acc.
5.1 Hyperparameter Optimization Logistic Reg
Twitter (2.1
There were 3 main parameters that we exam- (Plank & Hovy, 0.190
million tweets)
ined, namely learning rate, maximum sequence 2015)
length, and number of epochs. Other parame-
SVM Reddit
ters were mostly kept constant: we used bert-
(Gjurkovic & (22.9 million 0.370
base-uncased model, training batch size of 32,
Snajder, 2018) comments)
evaluation batch size of 8, and warmup propor-
tion of 0.1. The results are highlighted in the ta- LSTM Kaggle dataset
ble in the following page. 0.380
(Cui & Qi, 2018) (8675 sentences)

Learn. Rate Max Seq. Epochs Acc. PersonalityCafe


BERT 0.479
0.001 128 5 0.0972 forums (68k posts)
0.0001 128 5 0.4701
0.00001 128 5 0.4135 Table 2: Classification accuracy of different pa-
0.000001 128 5 0.1739 rameter combinations.
0.0000001 128 5 0.0901
0.00001 128 30 0.4797
5.3 Further Results: Other Ways to Measure
0.00001 64 5 0.4138
Accuracy
0.00001 256 5 0.4146
Aside from the simple measure of accuracy (ex-
Table 1: Classification accuracy of different pa- act match of all 4 letters), since the MBTI per-
rameter combinations. sonality consists of 4 letters, we can also con-
sider other the number of correctly classified let-
ters (personality categories) per prediction. This
5.2 Discussion makes sense because for instance, if the true
The best accuracy of the model was 0.4797, personality is INTJ, then a prediction of INTP
which was achieved with a learning rate of 10−5 , (3 matches) would be a much better prediction
maximum sequence length 128, and 30 epochs. as compared to a prediction of ESTP (1 match).
This model vastly outperforms other models and Thus, for our personality classifier, we also con-
previous benchmarks. Below is a table of how sider the number of correctly classified letters as
this BERT model performs compares to previous a measure of accuracy:
studies done on MBTI personality classification:
Additionally, from the results of Table 1 (differ- At least At least At least At least
ent parameter combinations), we can also ana- 1 match 2 matches 3 matches 4 matches
lyze the effects of various parameters to the over- 0.9813 0.8573 0.6606 0.4797
all accuracy. First, we see that the learning rate
plays a very significant role in the accuracy of the Table 3: Number of correctly predicted letters
model. As we lower learning rate, the accuracy
increases then begins to decrease. At learning From the table above, we can see that almost
rate levels of 10−3 and 10−7 , accuracy was around all the time, there is at least 1 match. Proba-
0.09, which was basically close to random guess- bilistically, the expected number of matches is
ing. 2.9789.
For the number of epochs, there is also a rela- In addition, we can also consider the accuracy
tively significant increase in accuracy: when the of the predictions each of the letter categories,
number of epochs were increased from 5 to 30 to see if the BERT language model works well as
for the 10−5 learning rate model, the accuracy in- a binary classifier:
creased from 0.4135 to 0.4797. Meanwhile, for From the table above, we notice that "E/I" and
the maximum sequence length, the difference is "F/T" have the highest distinctions, which indi-
negligible, as changing from 128 to 64 or 256 only cates that these two are the classes that the BERT
resulted in accuracy changes of less than 0.01. model finds easiest to discern. Comparatively,
E/I N/S F/T P/J 6.2 Language Generation Results
0.7583 0.7441 0.7575 0.7190
Type Loss Type Loss
Table 4: Individual category accuracies ENFJ 0.01591 INFJ 0.032599
ENFP 0.021193 INFP 0.028531
ENTJ 0.02907 INTJ 0.028092
the model has a relatively harder time differenti- ENTP 0.030716 INTP 0.028124
ating between "P/J", although by not that large of ESFJ 0.017829 ISFJ 0.027062
a difference. These observations are consistent ESFP 0.016334 ISFP 0.025123
across other studies (Cui and Qi, 2017) and gen- ESTJ 0.016708 ISTJ 0.02662
erally reflect observations about the MBTI met- ESTP 0.025886 ISTP 0.0239
ric.
Table 5: Language Generation Results
Overall, the performance as a binary classifier
is actually not that high, as some other models
such as SVM were able to achieve an accuracy of For the hyperparameter settings, we used a
around 0.80 (Gjurkovic and Šnajder, 2018). We training batch size of 16, learning rate of 3 · 10−5 ,
believe that with more training data and using number of training epochs of 10, max sequence
a larger BERT model (bert-large), we can fur- length of 128, and a warmup proportion of 0.1.
ther increase the accuracy of our binary classifier We found this to be quite optimal in training the
model. language model of different personality types as
the loss decreases consistently. The results after
10 epochs for each personality type are shown
6 Language Generation from Given above.
Personality Type
6.3 Discussion

6.1 Methodology The results show that ENFJ, ESFJ, ESFP, ESTJ all
have the lowest loss among all the personality
In this task, we want to be able to generate types as their losses are all under 0.02 after train-
a stream of text from a given MBTI person- ing for 10 epochs. All of these types contain
ality type. The language generation is built âĂIJEâĂİ as their value of the first dichotomy. In-
using the Pytorch âĂIJBertForMaskedLanguage- terestingly, this means BERT is better at generat-
ModellingâĂİ custom model, which consists of ing language for extroverted personalities as op-
the BERT transformer with a fully pre-trained posed to introverted personalities. This might be
masked language model. Similar to the case in due to the fact that there are more data for extro-
the sequence classification task, the main archi- verted personalities as they tend to be more ac-
tecture of the pre-trained BERT model contains tive on forums and thus BERT is more capable of
12 layers, a hidden size of 768, and around 110 mimicking them.
million parameters. However, while the BERT Here is an example of generated sentence
model used in classification task focuses only on trained on ENFJ that has the lowest loss of all.
lower case words, the BERT model here accounts The red sentence is input and the highlighted
for upper case words too, since we want the lan- sentence is generated.
guage generation model to train and learn when
to generate upper-case and lower-case words. “I have no idea if he feels the same way,
and I am too afraid to press it. Our rela-
The tokenizer implemented in this language
tionship is very special to too lose them
model uses the same method as that in classifi-
not much reasons if for anyone if or
cation.
[UNK] or so Kahn else anything which
For each personality type, we train the lan- is public or now not is or or ...”
guage model on the corresponding texts we
scraped. We compute the losses after each epoch Comparing between the pairs that differ only
and gather the final losses of the experiment run by the second dichotomy "N/S" (for example,
for each personality type as the results. ENFP and ESFP) indicates that S usually results
in lower loss than its counterpart. Intuition per- [Liu et al.2016] Fei Liu, Julien Perez, and Scott Now-
sonalities are generally more less dependent on son. 2016. A recurrent and compositional model
for personality trait recognition from short texts.
the senses and logic, while sensing personalities
In Proceedings of the Workshop on Computational
usually rely on more facts, suggesting that BERT Modeling of People’s Opinions, Personality, and
can more easily generate more sensical and log- Emotions in Social Media (PEOPLES), pages 20–29,
ical language as opposed to abstract language. Osaka, Japan, December. The COLING 2016 Orga-
nizing Committee.
For the last two dichotomies, "F/T" and "J/P",
they do not impact the performance of the lan- [Michael2003] James Michael. 2003. Using the
guage model that much. The first two di- myers-briggs type indicator as a tool for leader-
chotomies are more dominant in affecting the ship development? apply with caution. Journal of
Leadership & Organizational Studies, 10(1):68–81.
results of language generation.
[Plank and Hovy2015] Barbara Plank and Dirk Hovy.
7 Conclusion and Future Work 2015. Personality traits on twitterâĂŤorâĂŤhow to
get 1,500 personality tests in a week. pages 92–98,
In this paper, we explored the use of pre-trained 01.
language models (BERT) in personality classifi- [Rangel et al.2016] Francisco Rangel, Paolo Rosso,
cation. For the classification task, the proposed Ben Verhoeven, Walter Daelemans, Martin Pot-
model achieved an accuracy of 0.479. Further- thast, and Benno Stein. 2016. Overview of the
more, for the language generation, we achieved 4th author profiling task at pan 2016: cross-genre
evaluations. In Working Notes Papers of the CLEF
losses of around 0.02. Possible improvements 2016 Evaluation Labs. CEUR Workshop Proceed-
to this model would include using larger and ings/Balog, Krisztian [edit.]; et al., pages 750–784.
cleaner datasets. If computation and memory
[Shi et al.2017] Z. Shi, M. Shi, and C. Li. 2017. The
resources permit, we can also use a larger bert
prediction of character based on recurrent neural
model (bert-large trained on 340 million param- network language model. In 2017 IEEE/ACIS 16th
eters) instead of bert-base. International Conference on Computer and Infor-
mation Science (ICIS), pages 613–616, May.

[Zhang and Xu2018] Yuwen Zhang and Zhaozhuo Xu.


References 2018. Bert for question answering on squad 2.0.
[Cui and Qi2017] Brandon Cui and Calvin Qi. 2017.
Survey analysis of machine learning methods for
natural language processing for mbti personality
type prediction.

[Devlin et al.2018] Jacob Devlin, Ming-Wei Chang,


Kenton Lee, and Kristina Toutanova. 2018. BERT:
pre-training of deep bidirectional transformers for
language understanding. CoRR, abs/1810.04805.

[Gjurkovic and Šnajder2018] Matej Gjurkovic and Jan


Šnajder. 2018. Reddit: A gold mine for personality
prediction. NAACL HLT 2018, page 87.

[Kampman et al.2018] Onno Kampman, Elham


J. Barezi, Dario Bertero, and Pascale Fung. 2018.
Investigating audio, video, and text fusion meth-
ods for end-to-end automatic personality predic-
tion. In Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics (Vol-
ume 2: Short Papers), pages 606–611, Melbourne,
Australia, July. Association for Computational
Linguistics.

[Litvinova et al.2015] TA Litvinova, PV Seredin, and


OA Litvinova. 2015. Using part-of-speech se-
quences frequencies in a text to predict author per-
sonality: a corpus study. Indian Journal of Science
and Technology, 8:93.

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy