Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-Trained Language Models
Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-Trained Language Models
net/publication/334478453
CITATIONS READS
0 1,153
2 authors:
All content following this page was uploaded by I-Tsun Cheng on 31 October 2019.
four dichotomies as indicators of per- ality given a text. However, we hypothesize that
sonality traits. This paper examines the using pre-trained language models might allow
use of pre-trained language models to us to pick up on various subtleties in how differ-
predict MBTI personality types based on ent personality types use language. The objec-
scraped labeled texts. The proposed tive of this paper is thus two-fold: given a set of
model reaches an accuracy of 0.47 for labeled texts, first, we want to train a model that
correctly predicting all 4 types and 0.86 can predict the Myers-Briggs type, and second,
for correctly predicting at least 2 types. use this model to generate a stream of text given
Furthermore, we investigate the possi- a certain type.
ble uses of a fine-tuned BERT model
for personality-specific language gener-
ation. This is a task essential for both 2 Related Work
modern psychology and for intelligent
empathetic systems. 2.1 Personality Classification Systems
6.1 Methodology The results show that ENFJ, ESFJ, ESFP, ESTJ all
have the lowest loss among all the personality
In this task, we want to be able to generate types as their losses are all under 0.02 after train-
a stream of text from a given MBTI person- ing for 10 epochs. All of these types contain
ality type. The language generation is built âĂIJEâĂİ as their value of the first dichotomy. In-
using the Pytorch âĂIJBertForMaskedLanguage- terestingly, this means BERT is better at generat-
ModellingâĂİ custom model, which consists of ing language for extroverted personalities as op-
the BERT transformer with a fully pre-trained posed to introverted personalities. This might be
masked language model. Similar to the case in due to the fact that there are more data for extro-
the sequence classification task, the main archi- verted personalities as they tend to be more ac-
tecture of the pre-trained BERT model contains tive on forums and thus BERT is more capable of
12 layers, a hidden size of 768, and around 110 mimicking them.
million parameters. However, while the BERT Here is an example of generated sentence
model used in classification task focuses only on trained on ENFJ that has the lowest loss of all.
lower case words, the BERT model here accounts The red sentence is input and the highlighted
for upper case words too, since we want the lan- sentence is generated.
guage generation model to train and learn when
to generate upper-case and lower-case words. “I have no idea if he feels the same way,
and I am too afraid to press it. Our rela-
The tokenizer implemented in this language
tionship is very special to too lose them
model uses the same method as that in classifi-
not much reasons if for anyone if or
cation.
[UNK] or so Kahn else anything which
For each personality type, we train the lan- is public or now not is or or ...”
guage model on the corresponding texts we
scraped. We compute the losses after each epoch Comparing between the pairs that differ only
and gather the final losses of the experiment run by the second dichotomy "N/S" (for example,
for each personality type as the results. ENFP and ESFP) indicates that S usually results
in lower loss than its counterpart. Intuition per- [Liu et al.2016] Fei Liu, Julien Perez, and Scott Now-
sonalities are generally more less dependent on son. 2016. A recurrent and compositional model
for personality trait recognition from short texts.
the senses and logic, while sensing personalities
In Proceedings of the Workshop on Computational
usually rely on more facts, suggesting that BERT Modeling of People’s Opinions, Personality, and
can more easily generate more sensical and log- Emotions in Social Media (PEOPLES), pages 20–29,
ical language as opposed to abstract language. Osaka, Japan, December. The COLING 2016 Orga-
nizing Committee.
For the last two dichotomies, "F/T" and "J/P",
they do not impact the performance of the lan- [Michael2003] James Michael. 2003. Using the
guage model that much. The first two di- myers-briggs type indicator as a tool for leader-
chotomies are more dominant in affecting the ship development? apply with caution. Journal of
Leadership & Organizational Studies, 10(1):68–81.
results of language generation.
[Plank and Hovy2015] Barbara Plank and Dirk Hovy.
7 Conclusion and Future Work 2015. Personality traits on twitterâĂŤorâĂŤhow to
get 1,500 personality tests in a week. pages 92–98,
In this paper, we explored the use of pre-trained 01.
language models (BERT) in personality classifi- [Rangel et al.2016] Francisco Rangel, Paolo Rosso,
cation. For the classification task, the proposed Ben Verhoeven, Walter Daelemans, Martin Pot-
model achieved an accuracy of 0.479. Further- thast, and Benno Stein. 2016. Overview of the
more, for the language generation, we achieved 4th author profiling task at pan 2016: cross-genre
evaluations. In Working Notes Papers of the CLEF
losses of around 0.02. Possible improvements 2016 Evaluation Labs. CEUR Workshop Proceed-
to this model would include using larger and ings/Balog, Krisztian [edit.]; et al., pages 750–784.
cleaner datasets. If computation and memory
[Shi et al.2017] Z. Shi, M. Shi, and C. Li. 2017. The
resources permit, we can also use a larger bert
prediction of character based on recurrent neural
model (bert-large trained on 340 million param- network language model. In 2017 IEEE/ACIS 16th
eters) instead of bert-base. International Conference on Computer and Infor-
mation Science (ICIS), pages 613–616, May.