0% found this document useful (0 votes)
39 views50 pages

Incel Radicalisations A Case Study

This thesis studies the online incel community through analyzing over 5 million posts scraped from the Incels.is website over 4 years. The study aims to test if active participation on the site increases the frequency of expressions of misogyny, harassment, and moral outrage, demonstrating a radicalization tendency or increased nihilism. Text classification and regression analyses uncover that longer participation duration and more posts positively predict greater amounts of moral outrage, misogynistic, harassing, and nihilistic content, indicating radicalization trajectories and increased nihilism through participation.

Uploaded by

John Srith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views50 pages

Incel Radicalisations A Case Study

This thesis studies the online incel community through analyzing over 5 million posts scraped from the Incels.is website over 4 years. The study aims to test if active participation on the site increases the frequency of expressions of misogyny, harassment, and moral outrage, demonstrating a radicalization tendency or increased nihilism. Text classification and regression analyses uncover that longer participation duration and more posts positively predict greater amounts of moral outrage, misogynistic, harassing, and nihilistic content, indicating radicalization trajectories and increased nihilism through participation.

Uploaded by

John Srith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Linköping University | Department of Management and Engineering

Master’s thesis, 30 credits| Master’s programme


Spring 2022| ISRN: LIU-IEI-FIL-A--22/03971--SE

Incels: Frustrated and Angry


due to Deprivation of Intimacy

– A Case Study of the Radicalisation


Trajectories of an Online Community on a
Fringe Social Media Platform

Aron Kiss

Main Supervisor: Sarah Valdez


Technical Supervisor: Rubing Shen
Examiner: Etienne Ollion

Linköping University
SE-581 83 Linköping, Sweden
+46 013 28 10 00, www.liu.se
Table of Contents
List of tables and figures ........................................................................................................................ iii
Abstract ................................................................................................................................................... 1
Acknowledgements................................................................................................................................. 2
Introduction ............................................................................................................................................ 3
Literature Review .................................................................................................................................... 6
Social Movements, Collective Action & Interpretative Frames .......................................................... 6
Social Movements in the Digitised World, Echo Chambers, and E-bile.............................................. 7
Incels – a Social Movement Organisation ......................................................................................... 10
Misogyny, Harassment, and other General Incel Themes in Previous Literature ............................ 13
Incel and Jihadist Radicalisation, and Extremism ............................................................................. 16
Data ....................................................................................................................................................... 19
Web Scraping .................................................................................................................................... 20
Classification themes ........................................................................................................................ 21
Methods ................................................................................................................................................ 23
BERT - Bidirectional Encoder Representations from Transformers .................................................. 24
Variables for Ordinary Least Squares Regression ............................................................................. 25
Results ................................................................................................................................................... 28
Web Scraping & BERT ....................................................................................................................... 28
Regression Results ............................................................................................................................ 33
Discussion.............................................................................................................................................. 36
Limitations ............................................................................................................................................ 36
Conclusions & Future Work .................................................................................................................. 38
Appendices............................................................................................................................................ 39
Appendix A1 – Human labelling vs algorithmic classification ........................................................... 39
Appendix A2: Lorenz-curve ............................................................................................................... 40
Appendix B1 - R code ........................................................................................................................ 41
Appendix B2 – Python code .............................................................................................................. 41
References ............................................................................................................................................ 42
Online resources used for developing programme code ..................................................................... 47

ii
List of tables and figures
Table 1 BERT Performance Statistics averaged over 5-fold cross-validation ....................................... 25
Table 2 Statistics of Manually labelled vs BERT labelled messages ...................................................... 28
Table 3 Total of Posted Messaged & Count of Threads ........................................................................ 29
Table 4 Monthly Breakdown Statistics ................................................................................................. 29
Table 5 User-level statistics .................................................................................................................. 30
Table 6 Gini-indices ............................................................................................................................... 31
Table 7 OLS Regression Summary table ................................................................................................ 34
Table 9 Statistics on Human vs Algorithmic Labelling .......................................................................... 39

Figure 1 Distribution of Posted Message Counts by Theme ................................................................. 31


Figure 2 Share of Posts vs Share of Users by Theme ............................................................................ 32
Figure 3 Jaccard Similarity of top 50 users ........................................................................................... 32
Figure 4 Number of Active Users per Theme ........................................................................................ 33
Figure 5 Venn-diagram of Manually Labelled Messages ...................................................................... 39
Figure 6 Lorenz-curve............................................................................................................................ 40

iii
Abstract

Technological advancements and affordability enable voicing of social injustice, feelings of


deprivation, and oppression. Spatial barriers no longer pose obstacles to connecting with like-minded
(or dissimilar) others to define and refine ingroup and outgroup. Some scholars anticipate that the
internet liberates the discussion of opinions, others claim social networking platforms play a role in
the polarisation of the public by creating echo chambers. However, it is recognised that ideas,
ideologies, and social movements spread across the internet at an unprecedented pace. Connecting
with others with whom one shares deprivation in a support network offers a sense of belonging. Broad
scholarly literature addresses opinion polarisation and potential radicalisation in online social media
platforms. However, quantifying radicalisation trajectories in fringe online communities like the
misogynist incels are still to be done.

In this thesis I study the online presence of the incel community. Incels are mostly young men who
feel stigmatised and need to hide their incel existence. Incels voice their feelings of deprivation of a
relationship and sex with a willing partner. This unfulfilled masculinity and sense of entitlement to sex
cause frustration and anger which are vented in online forums blaming primarily women and
feminism. Calls for action to social change, even for violence is common. However, incels do not
unanimously consider violence a solution, many demonstrate the tame side of the so-called
blackpilled mindset, the acceptance of powerlessness, and nihilism. Regardless, some scholars view
the community as potentially dangerous to society, labelling them as terrorists.

This study investigates whether participating registered users of the Incels.is website display
increasing tendency toward expressing utterances with the themes of misogyny, harassment, nihilism,
and moral outrage in their posted messages, and whether users gradually become more aligned with
the general perception of incels in previous scholarly work. In other words, this work tests whether
active participation increases the frequency of utterances of misogyny, harassment, and moral
outrage, thus demonstrating a radicalisation tendency or increased nihilism. To answer the research
question, I first scraped the Incels.is website, and retained ~5.38M posts published over 4 years for
analysis. Next, a subset of posts was manually labelled to train a supervised text classification model
(BERT). Finally, the results of the classification task were complemented with Ordinary Least Squares
regression (n = 4623). The analyses uncover temporal user-level radicalisation trajectories, and
increased nihilism. More specifically, the duration of active participation (in days) and the number of
posted messages positively predict the count of moral outrage, misogynistic, harassing, and nihilistic
content.

1
Acknowledgements

The completion of this thesis work marks the end of a long journey. A journey filled in a new
country, with new cultures, experiences, friends, and beautiful memories. And some hard times
before deadlines. I could not be standing here without the greatest support of my family, my partner,
my old and new friends. Besides the support and encouragement from my closest ones, I want to
extend my gratitude to all lecturers, PhDs, and our administrator at IAS, Madelene, the IAS Seminars
presenters, and so many more employees of LiU. I owe a great deal to my supervisors. Without their
guidance, insight, support, and inspiration, this work would be of lesser quality. I especially thank
Sarah Valdez as my main supervisor for her excellent recommendations to broaden the scope of my
topic, and to structure my thesis. I also thank Rubing Shen as my technical supervisor whose invaluable
insight on Supervised Machine Learning enabled this project to conclude.

2
Introduction

Broad scholarly literature addresses opinion polarisation and potential radicalisation in online
social media platforms like social networking sites and forums. However, there is a gap in the literature
to quantify radicalisation trajectories on fringe social media platforms which are likely echo chambers.
This arrear is possibly due to constraints on data availability and analytical techniques. I argue that
inquiring such platforms can enrich the field of echo chambers, online polarisation, and radicalisation.
Concerning analytical methods, advancements in machine learning techniques, including Natural
Language Processing have enabled analysing large corpora of textual data. Regarding accessible data,
however, scholarly preference for easy to query databases like Twitter and Reddit allow for room to
oversee the opportunities social media platforms of understudied fringe communities have to offer.
Considering fringe social media sites as potential online echo chambers, textual data from these
platforms can shed new light or contrast previous theories and findings on radicalisation processes
and user behaviour. Should the researcher want to analyse data from a fringe social media platform,
they likely face that there is no API available for querying databases and accessing vast amounts of
data, rather the researcher has to employ a web scraper. On the one hand, universal web scrapers
require less programming but tend to be less efficient. On the other hand, programming an effective
and efficient custom-built scraper can become a lengthy task. Besides the difficulties of data
availability, another challenge is to identify salient classification topics and themes to decompose
complex interpretative frames and linguistic patterns for quantitative analysis. This work has
overcome both these challenges. The most popular incels website, Incels.is offers textual data
covering 4.5 years of recent posting activity. A custom-built web scraper collected ~6M posted
messages which were subsequently prepared for a supervised classification task by a BERT classifier.
Following active reading of posts and informed by the scholarly work carried out in this topic, I devised
four broad themes to capture salient topics of incels. Incels are often depicted as angry, frustrated,
and lonely, for this reason and because Brady et al. (2021) propose that moral outrage content can
increase the influence of the poster as well as fortify intergroup boundaries, I created the first theme,
moral outrage. Messages were labelled with the moral outrage theme if they conveyed strong
emotions like blame, anger, disgust, or contempt or demanded punishment. The next category is
harassment, another broad theme as it encapsulates mentions or threats of violence, abuse, personal
attack, hate speech, racism, homophobia, etc. Attacks on women were coded as misogyny, unless
mentioning threats of violence or abuse. The misogyny theme captures a very broad notion of anti-
feminism, hatred for women, general sexism, pro-patriarchy, etc. Lastly, the theme nihilism largely
encapsulates the feelings of being unattractive and insignificant coupled with fatalistic resignation,

3
hopelessness, apathy, mentions of depression or loneliness. The Literature Review section provides
further insights, and the Methods section elaborates on these themes. The corpus of classified
messages permits the creation of variables to estimate user radicalisation trajectories in the Incels.is
website. Quantifying radicalisation processes in social media like Incels.is is a novel approach that can
generate further inquiry both from scholars and policy makers.

The term incel is a portmanteau derived from involuntary celibate. Going through phases with
varying periods without physical and emotional intimacy on an involuntarily basis is not unique to
individuals who identify as incels, nevertheless aggrieved sexlessness is a hallmark of incels. Identifying
as an incel is a contemporary phenomenon, arguably a product of social media. The incel community
or inceldom is part of the Manosphere, which can be defined as an online representation of a network
of typically anti-feminist, racist, and homophobic groups, communities, and subcultures of men of
mainly white decent (Ging, 2019). The principles of the Manosphere are believed to closely align with
hegemonic or traditional masculinities upheld by white, straight men who hold and recreate structural
oppression over subordinate identities who hold less power like women, sexual, racial, or ethnic
minorities (Glace et al., 2021). Mens’ rights activists, including incels claim that feminism and neo-
liberalism threaten male prerogatives and oppress men (Ging, 2019). However, incels were found to
showcase hybrid masculinities (Glace et al., 2021). This means that incels do not claim traditional toxic,
white, patriarchal power (Glace et al., 2021). Incels do not constitute dominant and competitive alpha
males, rather, they declare membership in a subjugated, genetically inferior group of males. They do
not dominantly comprise of racist or homophobic members, unlike typical white, hegemonic
masculinities but yearn traditional roles of men and women. Moreover, incels fortify traditional
standards of masculinity by deriding weak, effeminate men, and voicing male biological superiority,
yet they admit being vulnerable and longing for affection. Most incels are young men who carry a
subordinate masculinity stigma (Scaptura & Boyle, 2020), thus feel the need to hide their incel
existence in real life social interactions. Incels perceive being deprived from a relationship and sex as
unfulfilled masculinity. An incel’s aggrieved entitlement to sex and intimacy causes frustration and
anger which are vented in online forums blaming primarily women and feminism. Calls for action to
social change and even violence in the posted online messages is common. However, incels do not
unanimously consider violence a solution to their problems, many demonstrate helpless nihilism.
Irrespective of the disagreement among incels about violence, some scholars view the whole
community as potentially dangerous to society. Incels were accused of a handful of violent and
extremist attacks (e.g., Baele et al., 2019), therefore some call for considering incels terrorists
(Hoffman et al., 2020).

4
Previous literature has not attempted to quantify radicalisation trajectories on fringe social media
platforms which are likely to be echo chambers. I argue that examining such platforms can enrich the
field of echo chambers, online polarisation, and radicalisation. Besides the gap in the literature, Horta
Ribeiro et al. (2021) investigating the Manosphere, reported that discourse on the Incels.is website
show higher scores in severely toxic and misogynistic language than other, more senior bases of the
Manosphere. This is a noteworthy claim, as less isolated, more senior incel platforms on Reddit
demonstrated a milder tone than the independent Incels.is website. Thus, I devise the following
research question:

Do the written expressions of members of the Incels.is website display increasing


tendency toward misogynistic, harassing, nihilistic, and moral outrage utterances
in their posted messages as users interact with other users and content of the
Incels.is forums?

In the following, the Literature Review section introduces relevant theories about social
movements, collective action, and interpretative framing. Then, a social movement organisation,
incels are introduced through the lens of the theory previously laid out. Furthermore, the incel
community is a contemporary phenomenon, a product of the internet era. Therefore, this work has
to consider the effects of echo chambers, online disinhibition, e-bile, and increasing opinion
polarisation, too. The following subsection reviews research on discursive themes like misogyny and
harassment prevalent in the incel community. The Literature Review section concludes with a brief
overview of the literature on online radicalisation, and extremism. Finally this section spells out
hypotheses to test. Thereafter, I reason for the choice of data source, Incels.is which is a fringe social
media platform. I briefly describe the structure of the website and the web scraping technique. Then,
based on the introduced literature, I define the discursive themes I use to manually label a subset of
the collected textual data. Next, in the Methods section, I introduce the supervised machine learning
technique to address the natural language processing task at hand. This section concludes with a
description of variables I use in the proposed parametric regression technique, which is the Ordinary
Least squares. Afterwards, the Results section provides statistics on the corpus and user-level
variables, and an insight into the skewed distribution of posting behaviour. Then, I describe the results
of the regression models, and test the proposed hypotheses. Next, I discuss the implications of the
results along with limitations pertaining to the data and the applied methods. Finally, I conclude this
work with suggestions for future work. A word of warning: the reader is advised that due to the nature
of discursive themes among incels, there are a few instances of inappropriate or offensive language
in this work.

5
Literature Review

In this section I introduce relevant theories about social movements, collective action, and
interpretative frames. Thereafter, to accommodate technological advancements, I briefly inform the
reader about echo chamber effects and opinion polarisation, besides online toxic language: e-bile. The
successive subsections provide an overview on the incel community, and the literature covering
qualitative and quantitative approaches to the main discursive themes in incel communities. Then, I
touch on computational approaches to online radicalisation and extremism. Finally, I propose a gap in
the literature and address this gap by subsequent quantitative analysis.

Social Movements, Collective Action & Interpretative Frames


The turbulent changes from the 1960’s prompted social scientists to study (social) movements and
collective action, as individuals, groups, and masses became increasingly mobilised for social action.
Social movements manifest to challenge the status quo of the social structure or to respond to issues
emerging from the changing environment. From single issue movements to more complex, evolving
ones, social movements represent the oppressed and deprived, and seek to find solutions to issues
pertaining to domains of culture, society, policy, and politics. Besides, movements aim to provide an
alternative way and justification for collective action. Furthermore, social movements seek to alter
and create new norms, values, and solutions by propelling change (della Porta & Diani, 2006). They do
so through interpretative (mental) frames. Compared to ideologies, frames are a more flexible cultural
product. Besides, frames are selective: focussing on an issue may result in overseeing another.
Consequently, an interpretative frame does not need to lay out a “whole coherent set of integrated
principles and assumptions but provides instead a key to make sense of the world” (della Porta &
Diani, 2006, p. 79). Frames guide perception and meaning attribution of social and material things,
events, and behaviours (Goffman, 1974) to help social actors navigate, position others and themselves
in everyday life. Social movements depict and address a selected social issue through frames that can
mobilise the individuals for collective action (della Porta & Diani, 2006). According to Della Porta &
Diani’s (2006) summary, frames have three elements: once a social issue is recognised, the diagnostic
element (i) of a frame enables the identification of concerned actors, including the deprived or
oppressed; those who can or should act on the problem, as well as those who or what causes the
problem. The prognostic element (ii) of frames is to offer an alternative, and approaches to reach
change. Finally, interpretative frames need a motivational element (iii) to appear credible and relevant
to incentivise collective action and mobilisation.

6
Movements emerge to respond to social issues and may perish when the issue loses relevance or
become resolved (della Porta & Diani, 2006). The survival of social movements depends on the ability
to mobilise their supporters. As the social cause mobilises actors for social action, the framing of the
social problem and inherently the social cause is crucial. Thus, to remain credible and relevant,
movements need to adjust the framing of the social problem to reflect societal, cultural, and political
changes. For frame adjustment, movements can extend their focus and rhetoric on the concerned
social issue, include new issues, or alter their means and practises for mobilisation. By extending the
social problem, movements can not only sustain themselves but also expand and mobilise a larger
group of social actors. This technique of frame adjustment is termed frame-bridging (Snow et al. 1986;
as cited in della Porta & Diani, 2006).

Mobilisation, and ultimately social action is aimed to relieve the oppressed and the deprived.
However, changes in the social structure can bring about novel problems. Axiomatically, the newly
deprived, once recognised, will claim victimhood, and declare a problem. Solving one’s problem may
not be in the interest of the other, thus a divide in society can manifest. A social problem is considered
dividing if the problem and the proposed solution to it polarises opinions and those who hold them.
The deeper rooted the problem in society, the more different the opinions of the proponents of each
side, the more difficult to reach consensus and more likely for conflict to arise (DiMaggio et al., 1996).

Social Movements in the Digitised World, Echo Chambers, and E-bile


The previous subsection informed that movements address social problems, offer solutions, and
execute their strategy by mobilising social actors. For mobilisation, a movement needs to inform the
actors of society about its presence, the justification for its existence, its goals, the solutions it
proposes as well as the means to reach social change. Ideally, the information flows in both directions
so movements can promptly respond to changes in the environment in which they exist.

Transmitting an interpretative frame and feedback from social actors require means of information
exchange like networks of actors, printed media, broadcast services, telephone service, etc. Seminal
research have investigated information and innovation diffusion patterns and mechanisms, starting
from theories on opinion leaders (Katz & Lazarsfeld, 1955, as cited in Watts & Dodds, 2007), through
the impact of weak but long ties (Granovetter, 1973), and the importance of reaching a critical mass
(Oliver, Marwell, & Teixeira, 1985) of “easily influenced individuals influencing other easy-to-influence
people” to instigate cascades (Watts & Dodds, 2007, p. 454) to complex contagion models in which
close-knit clusters perform best in diffusing innovation (Centola & Macy, 2007). Although, industrial,
and technological advancements gradually enabled faster and more efficient means of information

7
exchange in the pre-internet era, the internet has delivered the greatest innovation in mass
communication thus far. By eradicating geographical boundaries, the web 2.0 enables connecting with
like-minded (or dissimilar) others, sharing content, and exchanging information in an instant. The
internet widened the horizons for social movements, too. Online activism provides an unparalleled
opportunity to reach wide audiences, to mobilise them, and receive instantaneous feedback for frame
adjustment.

A known issue is that the abundance of information on the internet not only enables but
necessitates avoiding some information due to the processing capacity of the human brain. People
have a limited capacity for a network of close peers (Dunbar, 1998), and an inherent preference to
associate and maintain connections with similar others, posits the homophily principle (McPherson et
al., 2001). In addition, the internet and social media platforms facilitate discovering like-minded others
and enable the creation of homophilous connections. A consequence of the above factors is what
broad literature describes as the echo chamber effect of social media, a phenomenon of people
interacting through online networking, sharing, liking, and commenting with similar others with
congruent opinions. In an echo chamber, participants’ existing beliefs and personal biases receive a
feedback loop and reinforcement, while conflicting views may be avoided or not get challenged and
thus updated (Prasetya & Murata, 2020; Wojcieszak, 2010; Dubois & Blank, 2018). The issue does not
lie within the convergence of opinions of like-minded people but in the dispersion and divergence of
opposing opinions of dissimilar groups, which is polarisation (DiMaggio et al., 1996). Regarding
partisan polarisation in the United States, Levendusky & Malhotra (2016) found that people
overestimate both ingroup and outgroup members’ standings and observe greater differences than
they are. The overestimation of differences can be an artefact of a loud minority; holders of more
extreme opinions tend to be more active in voicing their opinions, thus appear more salient than
people of less extreme views on the issue at hand. This process may reinforce intergroup boundaries
and aggravate differences.

In addition to homophily and confirmation bias (Nickerson, 1998) from feedback loops, Brady et al.
(2020) propose that the design of social media platform plays a role in the formation of echo
chambers. They found that moral-emotional content is more likely to receive reactions, than neutral
ones. The authors define moralised content if “it references ideas, objects, or events typically
construed in terms of the interests or good of a unit larger than the individual (e.g., society, culture,
one’s social network)” (p. 978). In the context of intergroup relations, moral-emotional content is
motivationally relevant as “it can affect an active or ongoing goal” and expressing such content may
increase the odds of capturing others’ attention and can enhance the reputation of the poster (Brady
et al. 2020, p. 993). Calling out transgressors of social norms (that is, moral outrage in a broad sense)

8
is a step toward preventing future deviant behaviour and restoring justice, as well as a signal of what
the poster considers morally right and wrong, which reinforces intergroup boundaries, and
strengthens group cohesion. Besides, expressing moral-emotional, especially moral outrage content
is less costly among like-minded others. Reinforcement from peers in the form of likes, shares, and
assenting responses promotes habit formation (Crockett, 2017), which is a “central component of
reinforcement learning” (Brady et al., 2021, p. 1). Thus, online media platforms cater for groups of
people with similar values and principles, in theory, providing a space for norm learning and formation
(Brady et al., 2020). In a latter work, Brady et al. (2021) hypothesised that moral outrage is a major
emotion to maintain ingroup norms, as it can convey blame attribution and demands for repercussion
of a person or a group(s), events, or things. Moral outrage is present in a post when the poster’s textual
expression carries strong emotions such as anger, disgust, or contempt “in response to a perceived
violation of their personal morals” (2021, p. 2). The authors found evidence that positive social
reinforcement (from peers) of moral outrage content positively predicts future moral outrage.

A salient problem in social media is the degree of harassing, trolling, and bullying content. Jane
(2014) coins such content as e-bile. She asserts that online posters use e-bile to “among other things:
register disagreement and disapproval; test and mark the boundaries of on-line communities;
compete and create; ward off boredom; prod for reaction; seek attention; and/or simply gain
enjoyment” (p. 542). What makes sharing such content prevalent is the social disinhibition effect that
Suler (2004) describes as a phenomenon individuals experience due to dissociative anonymity and
physical invisibility, coupled with asynchronicity between publishing of, receiving and deciphering of
messages. Online social disinhibition is believed to be the result of the lack of restraints people have
during in-person interactions.

Like ideologies, interpretative frames can be a means to further deepen the divide between
opinions of social actors and induce conflict (della Porta & Diani, 2006). When polarisation leads to
escalated conflict, scholars of violence and terrorism frequently use radicalisation and extremism
interchangeably, to the advantage of latter term (Kundani, 2012). Mandel (2009, p. 111) sets forth
that “radicalization refers to an increase in and/or reinforcing of extremism in the thinking,
sentiments, and/or behaviour of individuals and/or groups of individuals”.

In summary, online social media platforms help overcome geographic obstacles and enable similar
others to connect. Platforms catered for like-minded people can behave like echo chambers; existing
beliefs and personal biases get reinforced, while incongruent opinions are avoided and do not get
challenged. Posters of moralised content, especially moral outrage, fortify intergroup boundaries by
calling out transgressors of ingroup norms. Sharing of such content can increase their influence and

9
reputation, while positive reinforcement from peers positively predicts future posting of moral
outrage. Sharing moral outrage and e-bile content for entertainment, to increase or compete for
influence and reputation, with decreased risk due to similar others and online disinhibition may result
in distorted perception of peers and group norms. In turn, new members of such platforms learn
misleading group norms and outgroup image, thus overestimate polarisation between ingroup and
outgroup members which may lead to conflict.

Incels – a Social Movement Organisation


The incel community emerged in the 1990’s, it comprised of young individuals who perceived
sexlessness as a pressing problem. Sexlessness still persists and the framing of the problem have been
extended with misogynistic, anti-feminist ideas, and elements of hybrid masculinities. Involuntary
celibacy, in the traditional fashion, encompasses absence or dearth of sex, reciprocated attraction,
and romantic affection. Going through phases with varying periods without physical and emotional
intimacy on an involuntarily basis (Donnelly & Burgess, 2008) is common. Nevertheless, aggrieved
sexlessness is a hallmark of incels. The word incel is a portmanteau derived from the term involuntary
celibate. Identifying as an incel is a contemporary phenomenon, arguably a product of social media.

Sexual transition in early adolescence like establishing and maintaining intimate relationship with
another individual can be a pressing normative expectation of society. Although, normative
expectations are subject to the individual’s age, historical time, and place, it is becoming a tendency
to complete sexual transition prior to marriage in most Western countries (Elder, 1998; Thornton,
1990). Young virgins who wish to complete sexual transition but are unable, can perceive it as lacking
agency to shape their own life courses. Generally, failing in self-verification may result in frustration
that turns into anxiety and depression (Stryker & Burke, 2000). Similarly, Donnelly et al. (2001) and
Donnelly & Burgess (2008) lists outcomes of involuntary celibacy like sexual and emotional frustration,
depression, reduced self-confidence, and self-esteem, all of which can affect prospects of future
sexual activity.

The incel community is a social movement organisation. It came about as a support network open
for anyone to discuss their experiences of difficulties in dating, and to find and provide support. As an
organisation, it identifies and addresses an issue; a group of actors faces difficulties in dating. The
recognition of a deprived group and its problem fits the diagnostic element of interpretative frames
(della Porta & Diani, 2006). Over the course of the 2000’s a gradual radicalisation, involving
objectification of women, misogyny, and expressed support for patriarchist values, started to
dominate the discourse on incel forums. For this reason, incel forums have been considered part of

10
the Manosphere since the early-mid 2010’s which is the online representation of a network of typically
anti-feminist, racist, and homophobic groups, communities, and subcultures of white men (Ging,
2019). Horta Ribeiro, et al. (2021) studied the transformation and expansion of the Manosphere to
investigate whether its members migrated between social media platforms and consequently
influenced the discourse of these sites. They found evidence that migration has taken place between
older and newer social media platforms which can explain how the initial inclusive support network
of incels turned into an anti-feminist community. This process aligns with the concept of frame
bridging (Snow et al. 1986; as cited in della Porta & Diani, 2006), a technique for social movements to
expand to maintain their relevance. In addition, the Manosphere provided the incel community with
a scape goat; women and feminism are the cause of men failing in dating and in their masculinity. The
influence of the Manosphere also provided the incels with the prognostic element of interpretive
frames (della Porta & Diani, 2006): it offers an alternative and advocates for action in the form of the
return of traditional family and masculine values to replace neoliberal feminist ideology and feminist
oppression of men.

The principles of the Manosphere are believed to align with hegemonic or traditional masculinities
upheld by white, straight men who hold and recreate structural oppression over subordinate identities
like women, sexual, racial, and ethnic minorities (Glace et al., 2021). Men’s rights activists, including
incels claim that feminism and neo-liberalism threatens male prerogatives and oppress men (Ging,
2019). However, some scholars of feminism argue that traditional masculinities may not be the best
fit for Generation Y and Generation Z individuals (Pascoe & Bridges, 2014; Ging, 2019; Glace et al.,
2021). Incels are believed to be of young age (mean age was 24.84 years, median = 23 years, n = 272
(Speckhard et al., 2021)), with reduced or no sexual experience, thus an adjusted interpretative frame
of traditional masculinity that better aligns with newer generations should appear more credible and
relevant. Consequently, the Manosphere including the incel community needs to adjust its
represented norms and values via its activists by interacting with its target group. In framing theory,
this process is termed frame alignment (Snow et al. 1986; as cited in della Porta & Diani, 2006). The
following hybrid masculinities were theorised to fit the broader Manosphere by Ging (2019) and found
to be applicable to the incel community by Glace et al. (2021). First, discursive distancing specifies that
the members of the community claim to have reduced or no access to traditional manhood by not
having their privilege to sexual intercourse with the opposite sex fulfilled (Glace et al., 2021). Incels do
not constitute dominant and competitive alpha males, moreover, they admit being vulnerable and
hurt. Second, the strategic borrowing element explains that men in general but particularly incels are
oppressed and victimised by women, and a feminist and misandrist society. With the claim of
oppressive feminist society, incels position themselves in an appropriated marginalised group which

11
Hoffman et al. (2020, p. 575) coined a “culture of martyrdom”. Lastly, incels fortify traditional
standards of masculinity by deriding weak, effeminate men, voicing male biological superiority, and
yearning traditional roles of men and women (Glace et al., 2021).

For incels, the salient issue, the lack of physical and emotional intimacy has endured. The way this
disenfranchisement from sex is expressed, coupled with the voiced blame attribution to women and
feminism, as well as the frequent utterances of misogyny (see Jaki et al., 2019) and toxic language (see
Horta Ribeiro et al., 2021) deduce an extensive and unambiguous conflict between incels (the
deprived) and women (the blamed). The “Black Pill philosophy” (Blackpill, 2022) is a dominant
interpretative framework the incel community proposes. Inherited from the prevalent red pill
framework of the Manosphere, the black pill also carries antifeminism, misogyny, men’s rights
activism, misandry of feminists, and domination over and oppression of men by a feminist society. The
black pill philosophy asserts constraints on domains of traditional values like the importance of
monogamy, avoidance of premarital sex, female subordinance, male superiority, yet does not
prioritise addressing topics like immigration and economics, thus the pill frameworks do not offer a
coherent and comprehensive set of principles and values like ideologies. A tamer, more passive side
of the black pill advocates helpless acceptance of sexlessness and nihilism. Besides a reduction in
uncertainty and a piece of mind, the black pill offers incels comradery and shared fate. Incels believe
that they are genetically inferior, no effort to change can markedly improve their emotional and
physical qualities to attain affection and attraction of women, thus leaving their incel status behind.
They advocate giving up on hope (cope) and LDAR (lay down and rot). A shared value is to call for
systemic change in society (occasionally through revolt and unrest), to abolish females’ oppression
over men. In extreme cases, incels glorify suicide, and fantasise about committing violent actions as a
retribution on the unjust society before committing suicide (Baele et al., 2019). Hoffman et al. (2020)
summarises that although incels largely agree on the description of the roots of their inceldom, they
are rather divided on the prescription to solve their problem. In addition, Jaki et al. (2019) arrived at
the conclusion that there is no consensus within incels regarding the solution to their problem.

In general, a website offering a ready interpretative frame constructed and endorsed by like-
minded others can be appealing to the susceptible. For those who experience mental health problems
or social inhibition, the anonymity and physical invisibility social media platforms offer, may appear
particularly beneficial to find comradery.

There is no empirical data estimating the number of incels in society, but they are theorised to be
a geographically dispersed online fringe community. Incels rely on online social media platforms to
congregate, discuss matters pertaining to sexlessness and its proposed solution, to form and reform

12
group boundaries. Incels report entitlement to sex, and to intimacy to a lesser extent. The perception
of being deprived from these may cause a feeling of failed self-verification and unfulfilled masculinity
which in turn can cause frustration and anger. Besides, they carry a stigma of a subordinate
masculinity (Scaptura & Boyle, 2020), thus feel the need to hide their incel existence. Jaki et al (2019)
reports incels claiming mental health issues like anxiety and depression. Incels on the Incels.is forums
are encouraged to discuss matters pertaining to inceldom. The website features a dedicated genre of
threads, which is called article to propagate cherry-picked information (Blackpill, 2022; Baele et al.,
2019) which attempts to substantiate and advocate the black pill philosophy, why incels fail in dating:
how feminist society pushes the agenda of hypergamy, and most women date tall, handsome men
only. Interestingly, according to Sharot & Sunstein (2020) people prefer to avoid negative information.
However, incels seem to seek negative information to find explanations to their failed self-verification.
As a consequence of stigma, online presence, and mental health issues, incels vent their anger and
frustration in online forums. They blame primarily women and feminism, often calling for action to
social change, even violence in the posted online messages is common. Posts in incel forums often
carry sarcasm, irony, outrage, and blame, therefore it is difficult to precisely measure the level of
exaggeration or threat the community pose (Hoffman et al., 2020). Nevertheless, Jaki et al. (2019)
articulates that forums like Incels.is (Incels.me in 2019) reinforce toxic language, regardless of the
benign or malicious intent of the poster. Moreover, Hoffman et al. (2020, p. 574) states that fringe
online spaces provide a platform to connects people with a shared purpose regardless of location and
background, hence such a place “exacerbates the incel threat by accelerating two, mutually
reinforcing trends: online entrenchment both intensifies one’s grievances as well as pushes one
further toward violence”. Incels do not unanimously consider violence a solution, rather many
demonstrate a state of powerless nihilism. Irrespective of the divide among incels about violence,
some scholars view the community as potentially dangerous to society. Particularly because a handful
of violent and extremist attacks were perpetrated by incels (see Baele et al., 2019), therefore some
call for considering incels terrorists (Hoffman et al., 2020).

Misogyny, Harassment, and other General Incel Themes in Previous Literature


Analysing textual data from incel blogs and websites is sensible, since incels presumably
congregate exclusively online. Utterances of incels have been empirically investigated by scholars like
Jaki et al. (2019), Horta Ribeiro et al. (2021), Farrell et al. (2019), Glace et al. (2021), and Baele et al.
2019). Recurring themes found in these forums pertain to anti-feminism, misogyny, misandry by
women, toxic language, harassment, negativity, pessimism, suicide, importance of physical
appearance and reinforcing intergroup boundaries. The following subsection provides a summary on

13
Natural Language Processing (henceforth NLP) approaches to incels forums, misogyny, e-bile, and
finally online (jihadist) radicalisation and extremism.

Farrell et al. (2019) explores misogyny through examining some online bases of the Manosphere.
The authors identify incel communities on Reddit and use a lexicon-based NLP method for each
subreddit to measure misogyny in a broad sense. The calculated word frequencies on nine lexicons
measured prevalence of the following topics: stoicism, patriarchy, flipping the narrative, sexual
violence, physical violence, hostility, belittling, homophobia, and racism. Farrell and colleagues’
selection of categories gauge multiple dimensions of the Manosphere. Publishing their lexicons is
exemplary yet shed light on inconsistencies in the building of them. They conclude that violent rhetoric
against women, and anti-feminism have been increasing in the Manosphere. Anzovino et al. (2018) to
automate identification of misogynistic content online, manually labelled 4,5K tweets. The following
labels were applicable within misogyny: Discredit, Sexual Harassment and Threats of Violence,
Stereotype and Objectification, Dominance, and Derailing. The authors applied multiple NLP
techniques to find the best performing classification algorithm. They concluded that the best
performing combination was Token N-Grams with Support Vector Machine algorithm (hereafter SVM)
(reported accuracy 0.7995). Frenda et al. (2019) investigated whether sexist and misogynistic
utterances are positively associated and reported that sexist posts and tweets more frequently
mentioned women, reflecting a patriarchist mentality in the corpora studied. The authors analysed
the IberEval 2018 (Fersini, Rosso, & Anzovino, 2018) and the EvalIta 2018 (Fersini, Nozza, & Rosso,
2018) training corpora for misogyny detection, besides a self-collected corpus of tweets about
harassment and attacks on women (titled SRW). They report that of the feature extraction methods
employed with SVM, the Character N-Grams achieved the best accuracy scores with the external
corpora (IberEval: 0.7544; EvalIta: 07877), and a combination of Bag and Sequence of Words with their
own corpus (SRW: 0.8932) besides, lexicon-based ones fared worst.

Investigating online hatred towards women on the Incels.me website, Jaki et al. (2019) used the
Pattern toolkit (De Smedt & Daelemans, 2012) to crawl the Incels.me website and gathered around
65K messages by 1,250 unique users, posted in 3,500 threads. Their graph detailing the (Figure 1.
Timeline of messages posted on Incels.me (November 2017 – April 2018)., p. 5) the temporal
distribution of posted messages suggest that their crawler collected only a fraction of the posted
messages over a sustained period, this matter will be addressed later in the Web Scraping subsection
in Data section. The authors provide an overview of frequently used words and word combinations,
and reveal three main topics, misogyny, homophobia, and racism. They articulate that forums like
Incels.is (Incels.me in 2019) reinforce toxic language, regardless of the benign or malicious intent of
the poster. They employ sentiment analysis and demographic profiling by Textgain API, too. Latter

14
yield startling results: only 50% of the posters were labelled as males, 35% of them females, and for
15% of the posters the result was inconclusive. They admit that some automated techniques might
return biased results. A strength of the article is the effort to both qualitatively and quantitatively
evaluate the posted messages on the Incels.me forums, in a creative fashion.

Horta Ribeiro et al. (2021) found that Incels.is was more toxic than its eighteen analysed subreddit
counterparts. The authors employed the Perspective API1 to obtain severe toxicity levels and applied
a lexicon-based word frequency method to measure misogyny. The lexicon Horta Ribeiro and
colleagues applied was tailored to measure broadly misogyny only, by lumping six lexicons obtained
from Farrell et al. (2019). Compared to its Reddit counterparts, Incels.is was found to exhibit more
toxic language, besides reporting high misogyny for the Incels.is website. To assess the influence of
the manosphere on individual forums, they used Jaccard Similarity index and Overlap Coefficient, and
discovered within-platform and cross-platform intersecting memberships and user migration between
2006 and 2019 in the Manosphere.

Baele et al. (2019) utilised Topic Modelling and assert that incels exhibit a high-level separation
between ingroup and outgroup, a unidirectional, highly blame-attributing tendency against women,
and proclaim themselves a (appropriated) victimised group. This “radical dualism” (Strozier et al.,
2010; as cited in Baele et al., 2019, p. 4) is aligned with extremist worldviews. Based on posting
frequencies, they assumed that a loud minority was responsible for a large share of posts, possibly
driving the discourse on Incels.is. Baele et al. (2019) conclude that the Incels.me (.me domain during
the data collection of theirs, Incels.is as of 11 April 2022) forums play a critical role in the propagation
of incel worldview by providing a forum, “without a way to relate and discuss, these individuals would
have had no way to recognize themselves as “Incels” and learn the culture and particular idiom that
cements the Incel worldview” through echo chamber dynamics (Baele et al., 2019, p. 20).

Golbeck and her colleagues (2017) developed a labelled corpus 35,000 tweets on violent online
harassment. Their categorisation for harassing content was based on threats or hate speech addressed
to a group of people based on their inherent traits, like race, sexual orientation, or gender, excluding
disagreement based on political leaning. Chakrabarty et al. (2019) explain another factor to consider
in studying potentially harmful content. They illustrate the problem with the following example:
“Obama is kinder to islam than any other future western leader is likely to be” vs. “you can not even
imagine how i think because i cannot imagine how anyone would take such a vile religion as islam”
(p.77). While the mention of Islam in the first instance carries neutral meaning, in the latter example
the context suggests a negative connotation. They advise against using bag-of-words models and

1
https://www.perspectiveapi.com/

15
simpler deep learning models for classification tasks like classifying abusive language, regardless of
whether it is harassment, hate speech, racism, or personal attacks. They emphasise the importance
of context of the message, which can be indicative of the identity of the poster. Thus, self-attention
NLP models that do not account for grammar and word order, are expected to perform poorly
compared to more advanced models in such situations as colloquial language. Chakrabarty et al.
(2019) advocate for a preference in the application of contextual attention models, like Bidirectional
Long-Short Term Memory (Bi-LSTM) networks, which accounts for the importance of a word in new
context based on its importance in prior context, unlike self-attention models. They report that
stacked Bi-LSTM networks overperform Bi-LSTM networks of simpler architecture. Similarly, Schmidt
and Wiegand (2017) provide a summary of hate speech research and inform that there is no generally
accepted definition of this term. They suggest that bag-of-words approaches are insufficient to gauge
contextual elements for classifying hate speech. The authors discourage using lexicon-based
approaches. Though lexicons can be generally effective, they ignore context, are sensitive to spelling
mistakes and word knowledge. Ideally, lexicons should be available for inspection and tailored for the
specific task at hand.

Incel and Jihadist Radicalisation, and Extremism


In terms of incels, Hoffman et al. (2020) add that some demographics (young, male, unemployed)
of the community are in concordance with lone-wolf terrorism, moreover the authors reverberate the
antagonism underlying the women vs. incels schism based on Baele and colleagues’ work (2019), and
call to consider incels terrorists. Attacks by incels are hard to fit into typically extreme far right
ideology, although an incel, Elliot Rodger’s manifesto Hoffman et al. (2020) analysed was found deeply
racist. Yet, the authors specify that other incel offenders in the study were of non-white descent. In
concordance with the claim of Hoffman et al. (2020), Jaki et al. (2019) found that the data collected
from the Incels.me (in 2019, Incels.is was hosted at Incels.me) website showed indicators of violent
extremism. Namely, utterances of “Incel Rebellion” and “Beta Uprising” (p.6), as well as references to
violence “kill, rape, and/or shoot” (p. 19), and the discovered group dynamics (stark separation of
ingroup and outgroup) coupled with the intensity and frequency of harassing and violent content. In
context of neo-Nazi online forums, a Wojcieszak (2010) found that opinion extremism levels were
positively associated with participation in online ideologically homogeneous groups, however the
sample was small (n = 114), and the regression model yielded low explained variance (18-21%). In the
following, a brief overview introduces the computational approaches to the studying of Jihadist
extremism.

16
According to Lara-Cabrera, et al. (2017) there is a general agreement on factors that may contribute
to jihadist radicalisation, like socioeconomic status, traumatic individual experiences and grievances,
a need for belonging and appreciation. Online social platforms enable jihadist militants to easier
recruit frustrated, angry, marginalised, deprived, and humiliated individuals are more susceptible
become indoctrinated. The authors translated personality related indicators like frustration and
introversion, besides attitudes and beliefs about feeling discriminated as a Muslim, and utterances
about pro-jihadist views and anti-western society stances to indicators to analyse Twitter data. They
recognised strong positive association between expressed perception of discrimination and positive
ideas about Jihadism (r = 0.831), along with increased use of swearwords (p = 0.726). Camacho et al.
(2016) propose a theoretical framework employing community detection clustering techniques to
identify potentially extremist individuals. Nouh et al. (2019) also point out the possible shortcomings
of keyword-based text classification; due to the lack of context, these methods produce high false
positives. They analyse radical language found in propaganda material created by ISIS by using Term
Frequency Inverse - Document Frequency (TF-IDF) and N-grams and word2vec embeddings, besides
use the Linguistic Inquiry and word count (LIWC) (Pennebaker et al., 2001) to assess psychological,
emotional and personality categories. For binary classification of tweets, they considered a tweet
radical (pro-ISIS) if it promoted violence, racism, or supported violent behaviour. In conclusion, the us-
them dichotomy, the count of violent words, and expressed sad emotions are highlighted in the top
ten most important features in their models that may be relevant to this thesis given the applied
methods expounded in the next chapter. Agarwal & Sureka (2015) implemented K-Nearest
Neighbours (KNN) and LibSVM classification to identify hate and extremism promoting tweets to
create and automate detection of such content. The dataset contained 45M English speaking tweets,
of which 10.5K were manually annotated, moreover, a random sample of 1M tweets were used for
training and validation. Compared to KNN, the LibSVM achieved higher F-Score (0.60, 0.83,
respectively). For a comprehensive review on the analysis, detection, and prediction of jihadist
radicalisation, the reader is advised to consult Fernandez et al. (2018).

Despite prior work to analyse opinion polarisation and radicalisation in online echo chambers,
efforts to quantify radicalisation trajectories in fringe communities like the misogynistic incels are still
to be done. Based on the introduced previous findings in the literature, I anticipate that echo chamber
effects are present in the Incels.is website, which is dedicated to a fringe group of actors, the members
of the incel social movement organisation. My research question is based on the identified prevalence
of misogyny and harassment themes, and the frequent mentions of nihilistic approach to life. The final
theme investigated here is what Brady et al. (2021) call moral outrage content which is assumed to
reinforce intergroup boundaries. Hence, I posit the following hypotheses:

17
H1: Prolonged and repeated active participation (posting) in the Incels.is website is positively
associated with the number of shared messages containing utterances of:

a) Moral Outrage
b) Misogyny
c) Harassment
d) Nihilism

The following section gives account of the data and employed methods to investigate the
research question of this work through descriptive statistics and hypothesis testing.

18
Data

In this section, I first provide reasoning for the choice of data source, then a brief description about
the collection, thereafter descriptive statistics on the data follow. I conclude this section with defining
the classification categories which constitute the four themes a posted message can have, besides
fitting neither of these labels.

I believe that the Incels.is2 website provides a unique source of textual data to study fringe
communities like incels. An advantage of choosing Incels.is for data collection is that registration3 with
the website is a prerequisite for posting. Moreover, posted messages cannot be edited or deleted by
the poster4 after the 30-minute edit window passed, hence the forum provides a lasting opportunity
for historical text mining. Also, posts from deleted members appear to be retained5. Regarding
registration, prospective members of the community need to persuade the forum administrators to
approve the request. This procedure may be described as a gate-keeping process as the rules to use
the website are restrictive. The community only permits and welcomes heterosexual males who
identify as incels. Besides, it is observable that the community tends to report members who do not
follow the community guidelines, and articulate opinions not in concordance with the incel worldview.
In addition, there exist only a handful of exclusively incel forums and blogs (List of incel forums, 2021),
thus individuals that experience seeking answers to why they lack the agency to fulfil their manhood,
may turn to this website. Some users describe reading these forums as a place where they finally find
understanding, comradery, and most importantly explanations to their misfortunes in their love and
sex lives. These users frequently state that they were blackpilled, that is the understanding of their
situation: the oppressive feminist ideology, the lookism which revolves around appearances, paired
with hopelessness which provides a reduction in uncertainty.

Furthermore, I anticipate that the discourse in this website is a more distilled form of incel
worldviews compared to forums in which anybody can post like in the case of Reddit, for the following
reasons: requirement of registration, and the instances of members report other members for not

2
The predecessors of Incels.is were hosted at Incels.co and Incels.me top-level domains (hereafter TLD). The
dataset demonstrates continuity between domains. In other words, posts and threads published at another
TLD, were backed up and restored at the new TLDs, thus, although Incels.is did not function in e.g., 2017,
messages from 2017 are still accessible.
3
My registration attempt as a researcher got rejected, furthermore, demonstration of non-incel values may
result in banning from posting.
4
The administrators can delete a post upon request.
5
There are multiple accounts with username Deleted member and an integer. Upon inspection, 610 such
unique usernames were found after data cleaning, and 452 of these entered the regression with the restricted
user database

19
aligning to what is expected behaviour on Incels.is. The claim from Horta Ribeiro et al. (2021) that de-
platforming marginal communities like incels (i.e., banning of subreddits) may eventuate in increased
toxicity in a less controlled host environment like a dedicated website, can substantiate my
expectation.

Web Scraping
The final data collection took place between 8-14th March 20226 during which the Incels.is website
was systematically scraped. The website and most of its content are publicly accessible but posting
requires registration. The three publicly accessible forums were visited by a custom-built web scraper
programmed in R 4.1.2 (R Core Team, 2021), RStudio Build 485 (RStudio Team, 2022). In each forum
the thread titles and their corresponding URLs were gathered into a database, thereafter another web
scraper visited each thread. Then, a custom R function retained and stored the posted text messages
along with some basic information regarding the poster (username, posted messages count), besides
the date and time of posting. The earliest threads available on Incels.is date back to early November
2017 when the r/incels subreddit got banned, went defunct and Incels.me was created. The Incels.is
website features a tally for threads, posts, and members; as of 8th March 2022, the tally registered
15,604 members, and approximately 7.44 million posts replying to 331,325 threads.

The scraping of the Incels.is website returned 6,059,167 publicly accessible posts, published by
10,211 unique users (usernames), in 281,729 threads. The posts cover a 53-month period between 8th
November 2017 and 7th March 2022. The html parser of the scraping algorithm did not return emojis,
embedded images or videos. Several posts only contained quotes of previous posts, like retweets.
These quoted messages were removed from each post. Furthermore, some posts contained special
characters like non-Latin characters and symbols which are not meaningful for the choice of Natural
Language Processing algorithm, thus these were removed. The language of the website, and the posts
are in English; and I did not encounter posts in any other language than English during the manual
labelling process. The registered users are advised against sharing sensitive personal information and
are encouraged to use pseudo names as username. Unique web pages of registered users (i.e., user
profiles) are not publicly accessible, hence sensitive personal information could not be collected
either. Moreover, as much as I observed, the usernames do not enable identification of users, hence
I deemed unnecessary to attempt to anonymise the usernames.

After the first round of data cleaning, I retained 5,443,030 of posts, posted by 10,166 unique
usernames in 281,729 threads. The NLP classification task was completed on a subset (i.e., retained
posts) of scraped messages, therefore descriptive statistics including theme distributions were

6
Posts published during this period were excluded from analysis

20
calculated. However, the distribution of two variables (further described in the Variables for Ordinary
Least Squares Regression section) the active days count and post count of users variables were highly
skewed, hence the dataset was further sifted. I removed users who posted less than 26 messages each
and were active for less than 5 days in the forums of the website. The median of these two variables
was used as a cut-off point for this decision. The final sifting excluded 5,542 users and 67,012 posts.
This decrease is equivalent to -0.01% point change in the number of posts, yet -53% point change in
the userbase. Descriptive statistics are available in Table 5.

The next subsection describes labelling themes for posted message classification. Before this, a
word of caution on using crawlers and universal scrapers vs. a custom-built solution. As mentioned in
the Literature Review section, Jaki et al. (2019) reported to have collected posts from Incels.me
between November 2017 and April 2018 with a universal crawler algorithm. The Incels.is and
Incels.me are the same website hosted under different servers, and it is understood that changing the
top-level domain while moving servers did not result in losing of previous threads and posts. Between
November 2017 and April 2018 Jaki and colleagues crawled approximately 65,000 posts from 3,500
threads posted by 1,250 unique users. In comparison, from the same period, I collected and after data
cleaning retained 598,527 posts in 35,825 threads by 2825 unique users. According to the Internet
Archive, as of 26th April 2018, Incels.me had 654,906 posts in 36,819 threads (Wayback Machine:
Incels.me, 2018). For this reason, the reported figures by Jaki et al. (2019) should be a warning sign to
programme a custom web scraper for websites without API accessibility to avoid biased sampling.
However, despite the large differences, it is possible that the authors applied some filtering
mechanism but did not detail it in their work, thus the discrepancy.

Classification themes
This subsection provides a description of the labelling themes for the NLP classification task.
Multidimensional social phenomena like the incel movement can be challenging to find concepts to
measure. For this reason, the defining of the label categories is informed by previous literature spelt
out in the Literature Review section. These categories are broad; thus, I refer to them as themes
hereafter. Variables created from these themes will serve as dependent variables in the regression
models.

A posted message was labelled moral outrage if it conveys an expression of moral outrage, such as
strong emotions like anger, disgust, or contempt; “feelings in response to a perceived violation of their
[posters’] personal morals” (Brady, et al., 2021, p. 2). Moreover, a post labelled as moral outrage can

21
convey blame attribution to, and demands for repercussions of, a person(s) or a group(s), events, or
things. An example:

Every fucking picture I see for Christmas is a COUPLE. A COUPLE. A COUPLE. A COUPLE. THIS IS
UNREAL. Just STOP. STOP STOP!!!! (ID: 2028558)

Next, harassment is another broad category to encapsulate mentions and threats of physical and
sexual violence, mental abuse, attack on a person, hate speech toward a person or a group based on
their physical appearance, sex, gender identity, sexual orientation, ethnicity, race, or religion. Thus, it
includes homophobia, transphobia, racism, and xenophobia, too. Moreover, incitement to violence or
suicide was labelled as harassment. The conceptualisation of this theme was also guided by the
Wikipedia article on regulating the behaviour of its editors (Wikipedia: No personal attacks, 2022).
Important to note that posts containing attack on women in general were labelled as Misogyny and
not Harassment, unless the post contained mentions of sexual or physical violence as well. An example
of harassment labelled posts:

My advice to you: rope. [Hang yourself] (ID: 3116002)

The misogyny theme captures a broad concept of hatred for women, anti-feminism, general
sexism, belittling women and their experiences, advocation of patriarchy and traditional values,
challenging of legal age of consented sex, perception of misandry (hatred for men), and notions of
men being oppressed by women coined as narrative flipping (Farrell et al., 2019). For instance:

as much as i agree with you , there are exceptions and places where women know their places
(ID: 2937715)

The theme nihilism encapsulates the tame, non-violent side of the Black Pill framework. It
comprises expressions of the individuals feeling unattractive and insignificant coupled with fatalistic
resignation, hopelessness, powerlessness, apathy, mentions of depression, loneliness. Moreover, the
nihilism theme includes expressions of the suicide of the self (poster) but not the incitement for suicide
of others, or incitement for violence or abuse. Farrell et al. (2019) call this ‘stoicism’ (original
reference: Zuckerberg, 2018). An example as follows:

22
You'll never be able to cope for all those lost years. Once they're gone, they're gone. That's time
you'll NEVER get back. But that's alright, I'm hideous and suffering too. It's fucking over for the
both of us. (ID: 1389937)

The temporal prevalence of these four themes on the individual level provides a preliminary insight
whether members of the Incels.is forums become more radical over time, that is display gradual
deviation from what is considered norms toward a more extreme position in misogyny, harassment,
the trademark of incel worldview nihilism. Moreover, the presence of the moral outrage theme in a
comment can aid in measuring the intensity of radicalisation trajectories. Altogether, 3048 posts were
manually labelled. During the labelling process, the aim was to approximate 500 positive labels for
each theme. Every posted message can receive any combination of the thematic labels, including all
four labels, or none, too. The least prevalent theme was the harassment theme, 16% of the sampled
messages were harassment positive. The most ubiquitous theme was nihilism which was deemed
applicable for 22% of the randomly drawn sample of messages.
A Venn-diagram in Figure 5 in Appendix 1 depicts the thematic overlaps of the manually labelled
messages. Besides, Table 8 in Appendix 1 provides information about the counts and shares of each
theme.

Methods

The manual labelling was followed by a text classification task. In this NLP task, the manually
labelled messages described in the previous section were used to train and validate supervised
machine learning (hereafter SML) models. After model hyperparameter tuning, the best performing
models were retained to classify (predict) the remainder of the messages in the dataset. Once the
classification tasks were completed, I computed statistics and created variables for Ordinary Least
Squares (hereafter OLS) regression (Gelman & Hill, 2006).

The confidence in the results of any analysis is reliant on the confidence in the data analysed.
Likewise, the quality of the training data is crucial for model performance in Machine Learning, too.
The more, reliably hand-coded data used for training, the better model performance can be expected.
However, accurate deciphering of short textual data (i.e., posted messages) is a challenging task.
Limited social or textual context, and limited knowledge of jargon or argot in the textual data may
result in biased annotation (Sap et al., 2019). Moreover, missing replacements of non-verbal cues in
written communication can further hinder the precise interpretation of the intended message, like

23
text accompanying emojis, memes, gifs, URLs that are aimed to help convey emotions and clarify the
message. Analytical means to measure the reliability of the concept captured by the variable are
available, like Cohen’s kappa statistic to measure interrater reliability (Cohen, 1960). Crucially, it
requires more than one annotator to gauge the accuracy of operationalisation. Thus, the quality of
the labelled data is positively correlated with the confidence in the accuracy of the findings. All in all,
sound concepts paired with suitable operationalisation, and consistent and measurable annotation
are key to create valid and reliable variables to substantiate confidence in the findings. In this work,
the labelling procedure was executed by myself, hence the bias in the manual coding was not
accounted for.

BERT - Bidirectional Encoder Representations from Transformers


Previous research in similar domains, found that NLP techniques without contextual attention like
bag-of-words methods and lexicon-based approaches perform lesser than methods with contextual
attention (see. Schmidt & Wiegand, 2017; Chakrabarty et al., 2019). Furthermore, Nelson et al. (2018)
recommend SML for classification tasks with complex themes and concepts. Additionally, these
models should be less sensitive to changes in jargon compared to fixed dictionary methods. In this
work I use a pre-trained Bidirectional Encoder Representations from Transformers (BERT hereafter).
The BERT model was developed by Devlin et al. (2019) and it represents a state-of-the-art model for
a set of language representation tasks, including classification. It employs the self-attention layers
introduced in Vaswani et al. (2017). The authors proposed the Transformer model which was found
more efficient than convolutional and recurrent layers in neural networks. Given the task at hand and
the machine for computation available, I chose the uncased BERTBASE model, the simplest pre-trained
model, with 12 layers (transformer blocks), 768 hidden layers, 12 self-attention heads, and 110M
parameters for English language tasks. To implement the classification task, I used a slightly modified
AugmentedSocialScientist package (Shen, 2021) run in Jupyter Notebooks (Kluyver, et al., 2016) under
Python 3.9.12 kernel (Van Rossum & Drake, 2009).

Since the BERT models are pre-trained, mostly, training in the traditional sense is not needed,
rather the researcher starts with fine-tuning the model with their own training data. However, the
lingo retains the phrase training in most cases, thus I also use the term model training in the following.
The manually labelled posts were 3048 in number and they constitute the training (fine-tuning) set for
the BERT model. For each theme, a separate single label classification training and evaluation were
conducted. Since the training set was only moderate in size, to measure the robustness of the model,
I used 5-fold cross-validation to ensure that the model performance was consistent. In 5-fold cross-

24
validation, the training dataset is divided into 80/20 splits. 80% of the manually labelled dataset is
used to train the model, while the 20% split is used to evaluate model performance. This procedure is
executed on four more occasions, though the consecutive splits contain pseudo-randomised
observations to ensure that the utilisation of the training set for pattern learning is balanced. A BERT
model requires tokenised data; thus, I used a BERT tokenizer to split each message in the dataset into
word tokens. Regarding the maximum token length, the BERTBASE allows maximum 512 tokens per
document (i.e., posted message). Should a message be shorter, the tokenizer adds padding tokens to
generalise all documents within the dataset to the same length, by first predetermining the longest
message within the dataset. In the case of a message longer than 512 tokens, the tokenizer truncates
the message to the length of the longest message in the batch but no longer than 512 tokens.

For each iteration of the cross-validation process, model performance metrics were retained and
averaged. I experimented with two and three epochs for each theme. The observed learning curves
suggested that two epochs in each theme proved the best training loss – validation loss ratio. The
following hyperparameters were kept constant: training batch size = 32; learning rate = 5e-5 and
epsilon = 1e-8 for the AdamW optimizer, as recommended by Devlin et al. (2019). The gauge model
performance, the averaged statistics of the 5-fold cross-validation are reported in Table 1. Considering
the relatively small size of the training data, the model performance is acceptable, however lower
than expected. Across all metrics, the misogyny classifier performed best, however, the other three
classifiers also fared better than by chance. Based on the hyperparameter tuning, the best performing
models were chosen to complete the classification tasks. For each theme, separate classification was
done on the remainder of the unlabelled dataset (~5.38M posts). A message was classified positive if
the predicted probability of class membership was equal to or larger than 50% (Ptheme i ≥ 0.5). Statistics
on themes are available in Table 8 in Appendix 1.

Table 1 BERT Performance Statistics averaged over 5-fold cross-validation

Variables for Ordinary Least Squares Regression


Having completed the classification tasks, I created statistics on theme distribution on both
aggregate-levels over time (in monthly fashion) as well as user-level variables were calculated for the
whole duration of membership on Incel.is. These results are reported in the Results section. As a
reminder, the hypotheses are as follow:

25
H1: Prolonged and repeated active participation (posting) in the Incels.is website is positively
associated with the number of shared messages containing utterances of:

a) Moral Outrage
b) Misogyny
c) Harassment
d) Nihilism

To test these hypotheses, utterances of each theme were counted for each user. These aggregated
themei counts per user serve as dependent variables in the regression models. The operationalisation
of predictor variables are as follows: for each user, I counted the days when the user posted any
message, this count yielded the active days count. This variable is unstandardised, as it is not weighted
by the number of daily posts, for instance. I argue that this measure is a more advanced metric than
observing the time between the user’s first and last posts; as a user may receive a temporary ban from
the website moderator, or voluntarily abstain and resume activity later. Important to note that the
active day count variable is directly affected by user post count. The reason for this is that to be
considered active on a selected day, the user must post at least one message (Pearson’s rho: 0.71, p <
0.01).

The next independent variable is the post count of users, which is simply the aggregate count of
posts during the period the collected data covers, after data cleaning. Additionally, I created a
dichotomous control variable named top 50 to account for the top 50 most active users. As earlier
stated, I had also removed users who posted less than 26 messages altogether and were active for less
than 5 days in the forums of the website. The median of these two variables was used as a cut-off point
for this decision. Out of the total 5.38M posts, these top 50 users posted 986,493 posts, which account
for 18.3% of the total posts. Thus, to better understand the radicalisation trajectories, differentiating
the most active user base from the average users may be meaningful.

In this subsection, I use Ordinary Least Squares (hereafter OLS) regression (Gelman & Hill, 2006).
For non-biased OLS regression coefficients, assume there is a linear and additive relationship between
the variables. The below equation illustrates the estimation in an OLS model:

1) 𝑦𝑖 = 𝛽0 + 𝛽1 𝑝𝑜𝑠𝑡_𝑐𝑜𝑢𝑛𝑡𝑖 + 𝛽2 𝑎𝑐𝑡𝑖𝑣𝑒_𝑑𝑎𝑦𝑠𝑖 + 𝛽3 𝑡𝑜𝑝_50𝑖 + 𝑒𝑖

where 𝑦𝑖 is the outcome variable themei counts per user, 𝛽0 is the intercept, the other 𝛽s are the
coefficients, and 𝑒 is the error term. In OLS the aim is to find the best fitting line, which is the regression
line when the outcome variable is a function of the predictor variables. The best fit is estimated with
the least squares estimation, the fit can be assessed through the sum of squared errors, and the R2,

26
the coefficient of determination (Gelman & Hill, 2006). The estimation was implemented in R with the
lm() function. A summary table is reported in the Results section (Table 7). This work considers
statistical significance at p<0.05.

27
Results

Web Scraping & BERT


First, I address the results of the manual labelling and the subsequent classification tasks. Table 2
confirms that the classification tasks would benefit from a larger training dataset. Comparing the
harassment theme, the BERT model classified half of what the randomly sampled manual labelling
found. Next, the moral outrage model identified approximately 25%-point fewer messages moral
outrage positive, as the human coder. Although the misogyny classifier achieved the best
performance, the model underpredicted the human coder by 16%-point. The closest share between
human and algorithmic classification was for the nihilism theme, the model classified messages 6%-
point lower than the human coder. The performance of the models and thus the confidence in the
classification tasks are considered when evaluating results.

Table 2 Statistics of Manually labelled vs BERT labelled messages

After the final sifting (excluding a subset of users below median active days count and post counts),
this work analysed 5,376,018 posted messages (Table 3). Table 4 provides an insight into descriptive
statistics aggregated over months for the duration the collected data cover (53 months). The retained
messages were published by 4,264 unique usernames (Table 5) in 281,729 unique threads between 8th
November 2017 and 7th March 2022. The most popular thread attracted 6,556 posts. One can observe
that the distribution of Number of Posts per Thread is right-skewed, the median post count is 27 (mode
= 19). This means that half of all threads had 27 or fewer posts. Regarding the number of posts per
month, the lowest post count 17,653 was recorded for March 2022. The data collection started on 8th
March 2022, thus for this month only one-week worth of data were collected7. Consequently, this
statistic should not misguide the perception of the average monthly posts. For this reason, Figure 1
depicts theme count distributions over time without the first and the last months (partial months) of

7
Posts published during the data collection period were not retained by excluding messages posted after
00:00 AM 8th March 2022.

28
the data collection. A constant pattern around the trends of themes is observable, however there is
prominent fluctuation in the number of shared messages, particularly noticeable in the overall
message count, as well as the number of messages without a theme. Nonetheless, Figure 2. is better
suited to demonstrate a rather steady thematic posting pattern. It shows that the share of themes
across messages and posters probably respond to seasonal trends (meaningful events to the
community), but the fluctuation is not as pronounced overall as Figure 1 suggests. To substantiate this
claim, Table 4 shows that the mean and median statistics of theme shares are very similar. We can
observe that on average, the registered users posted 101,434 messages a month. Of these posts on
average 13.58% contained moral outrage content (median = 13.46%), 8.5% of the posts were classified
harassment (median = 8.15%), on average 16.29% (median = 16.58%) were deemed misogynistic, and
20.71% (median = 20.67%) had utterances pertaining to the nihilistic philosophy many incels claim. At
the same time, the BERT classifiers uncovered that on average 60.94% (median = 60.97%) of the posted
messages carried neither of the four labels. Bear in mind, a post may have multiple themes, hence the
shares may add up to over 100%.

Table 3 Total of Posted Messaged & Count of Threads

Table 4 Monthly Breakdown Statistics

29
Table 5 User-level statistics

Regarding the 4,624 users (after excluding users with low activity) whose statistics were used in
the regression models, the following was found (Table 5). On average, users were active on 106 days,
and of all users, the average duration of membership in the Incels.is forums was 358 days. Additionally,
the median values for these statistics were 39 days (mode = 6) and 189 days (mode = 9), respectively.
This indicates that 50% of the userbase were active on less than 40 days and were members for less
than 190 days. The most active user was a registered member for 1581 days, which is roughly the
whole period since the website was established. Moreover, this user posted every day except 61 days
in over 4 years. Table 5 shows that most statistics have higher mean than median values. This implies
that the user involvement in the forums is skewed. More specifically, to unveil the assumed inequality,
the Gini-coefficients were calculated for the count statistics of themes or lack thereof (neither theme).
The Gini-index (Allison, 1978) ranges from 0 to 1, 0 indicates complete equality, while 1 indicates total
inequality in posting across the sample population. All themes display high inequality in the
distribution of posted messages. To aid the interpretation of the Gini-coefficient, the Lorenz-curves of
all themes were drawn in Figure 6 in Appendix 2. The Gini-coefficients reported in Table 6 are between
0.75-0.78, suggesting that likely a small minority of the users posted a high share of messages
regardless of theme. To contrast this finding, I use the Jaccard similarity index which is calculated as
the intersection of two sets over the union of these sets. In practice, I compare the 50 most active
users (top 50) across the whole period (over 53 months) against the 50 most active users each month.
This index is also bound between 0 and 1 like the Gini-index. When the Jaccard similarity index is low,
the similarity between the two sets of users is low, conversely, when it is high, the two userbases
display high similarity or overlap. This statistic can be seen in Figure 3. From the start of Incesl.is, an
increasing tendency is observable for the Jaccard indices until mid 2020, since then there has been a
decline across all themes. For instance, in August 2020 the Jaccard similarity index for harassment-
themed messages was 0.351, which can be interpreted as comparing the top 50 harassing content

30
posters in August 2020 to the overall top 50 harassing content users; between sets of users there was
a 35.1% overlap in this month. In other words, there is a considerable overlap between monthly and
whole duration 50-most-active users in any theme. This confirms and contrasts the high inequality in
posting revealed by the Gini-indices, as well as suggested by previous literature (see Baele et al., 2019,
Hoffman et al., 2020). This finding suggests that there is a loud minority among the posters who
endure over time and constantly share high numbers of posts.

Table 6 Gini-indices

Figure 1 Distribution of Posted Message Counts by Theme

Returning to the observed rather steady thematic posting patterns in Figure 2. A more interesting
finding, however, is that a high share of active users posts messages with themes. For instance, in April
2020, of all active users 79% posted at least one message with moral outrage content, 82% posted at
least one misogyny-themed message, and 86% posted at least one message with nihilistic content.
However, these statistics do not account for within-user shares of themed-messages. In other words,
a user with 1% share of misogynistic posts out of their monthly total has the same representation as
another user with 100% share of misogynistic posts out of his monthly total. In conclusion, this variable
and the graphs picturing its distribution reveal that most users share utterances of harassing,
misogynist, moral outrageous, and nihilistic content. However, they can also conceal the presence of
a loud minority responsible for a large share of posts (18.3% of posts by the top 50 users).

31
Figure 2 Share of Posts vs Share of Users by Theme

There is one notable peak in the number of posts (Figure 1) in April 2020. An explanation to the
highest salience in the total of monthly posts is the rival r/IncelTears went defunct on 10 th April 2020
(R/IncelTears, 2021). The Incel Wiki describes r/IncelTears as a blog filled with toxic masculinity, and
incelphobia. The users of this subreddit would ridicule members of the incel community, thus when it
shut down, the incel community on Incels.is responded with increased tendency of posting. It is worth
noting that prior to the peak in April, the monthly number of posts were surging from around
December 2019. After this surge in 2020 April, the number of posts and active users decreased slightly
but remained steady.

Figure 3 Jaccard Similarity of top 50 users

Figure 4 shows that the monthly active user count steeply increased from the start till the end of
Q2 in 2018, since mid-2018 there were a few peaks in the active userbase, yet it appears steady with
average 778 users. This graph also depicts linear trendlines (dashed lines) for the active userbase for

32
each theme, as well as for userbases who post messages with none of the recognised themes, and the
total of active users per month. These dashed linear trendlines demonstrate a slow decline, in other
words, there are fewer active users. However, this work does not address time trends with time trends
analysis due to the restricted timeframe for completion, thus the magnitude and direction of these
trends are subject to later analysis.

Figure 4 Number of Active Users per Theme

Regression Results
The OLS regression summary can be found in Table 7. The following hypotheses were formulated
for hypothesis testing, and coefficients are considered significant at α = 0.05:

H1: Prolonged and repeated active participation (posting) in the Incels.is website is positively
correlated with the number of shared messages containing utterances of:

a) Moral Outrage
b) Misogyny
c) Harassment
d) Nihilism

The first theme is moral outrage. Moral outrage content is positively associated both with active
days count and the number of posts the user has. The intercept in this case is not statistically
significant at α = 0.05, which means that we have no confidence that this coefficient is different from

33
0. Considering the effect sizes, one can say that with an additional day of active participation on the
Incels.is website, the user will post 0.293 posts with moral outrage content, all else held constant.
However, this is less intuitive than saying that additional three days of active use of the website yield
approximately one new message with the moral outrage theme, ceteris paribus. Similarly, the
coefficient of user post counts positively predicts moral outrage content. This effect size is smaller
(0.109), yet the unit of measurement is posted messages, not days as with the previous coefficient.
An intuitive way to express the impact of this coefficient is saying that every 10 posted messages
should result in one additional posted message with moral outrage content, all other variables held
constant. The control variable top 50 embodies a binary variable marking whether a user is one of the
50 most active posters of the website. The effect size appears large, although as for a binary variable,
it would only mean that being a top 50 poster results in 53 posts containing moral outrage content,
all else equal. However, the coefficient is not statistically significant, thus we cannot be confident that
the effect is different from 0. In summary, the coefficient for both the number of active days and the
number of posts is statistically significant, therefore supporting H1a.

Table 7 OLS Regression Summary table

For the misogyny theme the regression model yields a statistically significant constant term
(intercept). It is negative, exemplifying that sometimes linear regression coefficients are non-sensible
should one want to interpret them in real life scenarios. The coefficient for the top 50 variable is also
negative and significant. The interpretation of this term is the following: in comparison to users who
are not among the top 50 performing posters, a top 50 user is expected to have 190 fewer utterances
of misogyny, ceteris paribus. The active days count and the number of posts predictors are both

34
positive and significant, thus supporting H1b. Prolonged and repeated active participation yield an
increase in misogynistic content.

Regarding the harassment theme, the statistically significant top 50 binary variable indicates that
the 50 most active users should have approximately 70 more harassment-containing posts, all else
equal. The post count variable has the smallest effect, however for any user with average number of
posts (mean post count = 1163) the model predicts 72 messages with the harassing theme, ceteris
paribus. Similarly to the aforementioned coefficient, the number of active days variable is significant
and positive, therefore supporting H1c.

The top 50 variable has a very similar impact on the number of nihilism themed messages as in the
case of the number of misogyny-themed messages, on average one of the 50 most active users should
have 172 fewer nihilistic messages than a non-top 50 poster, all else equal. The number of active days
and the number of posts both have a positive association with the predicted nihilism theme count
variable, supporting H1d.

The count of posts with neither theme is not part of the Hypotheses testing. However, during the
data preparation process I recognised that it may be meaningful to include it in the analysis. This
theme stands out as it appears controversial. Based on the said findings, one could expect that both
the number of active days on the website and the number of posts has a positive association with the
number of messages carrying none of the four themes spelt out in this work. Yet, ceteris paribus, the
number of active days spent on the Incels.is forums variable reduces the tendency to post content
with neither defined themes. Nonetheless one unit increase in the user post count variable predicts
0.68 additional messages without any of the four themes, all else equal. In addition, as a baseline, the
top 50 active users are predicted to have approximately 202 more messages with no theme, compared
to other users, while all other variables are held constant.

In summary, the data suggest that active participation in the Incels.is forums increases tendencies
toward utterances of all four discursive themes, which are the measures of radicalisation trajectories
in this work. Although, the top 50 user control variable provides minor contribution, yet it moderates
the other coefficients to obtain a closer estimation to the effects in the real world.

35
Discussion

This work has quantified radicalisation trajectories in the Incels.is website. The results are
indicative, suggesting that active days on the website and posting frequency both positively predict
subsequent utterances in the identified themes. An explanation according to Brady et al. (2021) is that
posting moral outrage content reinforces intergroup boundaries. In other words, Incels.is members
frequent moral outrage utterances in their messages to indicate to their peers that they are legitimate
members of the community. Although Brady et al. (2021) asserted that moral outrage content
positively predicts further moral outrage posts, it needs consideration that members may post moral
outrage content to spark dispute and increase their own influence within the community. Jane’s
(2014) notion on e-bile suggests that these posts with strong emotions are published to gain
enjoyment and alleviate boredom. Similarly, the themes of misogyny, harassment, and nihilism can
serve to fortify group membership, and exert dominance except for nihilism. Incels cannot assert real-
life dominance on men, who they perceive as having successful romantic and sex lives. Moreover, the
collected unparalleled data made it possible to quantitatively substantiate the claim about a loud
minority. In practice this means that the discursive tone is highly influenced by a minority group. The
influence of this loud minority is yet to be evaluated. Nevertheless, the risk of a loud minority acting
similar to the recruitment attempts of jihadist militants is alarming. Lara-Cabrera et al. (2017)
described this process as benign-acting activists approach and attempt to indoctrinate frustrated,
vulnerable, and marginalised individuals as they are more susceptible. Besides the signs of increasing
tendency toward misogynistic and harassing content, the nihilism theme is also worrying. Hopeless
nihilism is the graveyard of ambition to take back control over one’s life and to attempt change for
the better. Incels may lose ambition to complete sexual transition (ascend in incel terminology) or
seek professional help. Policy makers should consider this if they want to act on incel radicalisation.

Limitations

Although the applied methods allow an insight into user radicalisation trajectories on Incels.is, the
results should be taken with a grain of salt and confirmed by future work. The regression estimators
are likely biased. First, the count of active days and the count of user posts variables show strong
positive correlation (Pearson’s rho: 0.71). Second, the work would benefit from user demographics
and other survey data collected from users and not inferred from online user behaviour. Moreover,

36
my variables measure count data, which is known for skewed distributions and heteroscedastic
variance. More advanced regression methods like a negative binomial or poisson regression models
can tackle this latter issue and yield higher confidence in the estimators. In addition, a culprit also lies
within the data. The Incels.is website features a forum called Sewers. It is not publicly accessible. What
gives it away is a discrepancy in the number of posts, the number of threads, and more importantly,
the number of users displayed on Incels.is as opposed to the dataset scraped. The difference is
stunning, over a million posts may be hidden in this subforum, which the Rules & FAQ mentions, too
(Rules and FAQ, 2017). This means that these posts are invisible prior to registration. This implies that
a relatively large share of posted messages and users were excluded from analysis. Moreover, the
manual labelling task can become overwhelming. One reason is the colloquial argot of incels. This
obstacle can be overcome by frequenting Urban Dictionary (Urban Dictionary, n.d.), Squirrell’s (2018)
collection of inceldom related terms, and search engines like Google. Besides occasional ambiguity
rooting from the identified themes being closely related, annotator bias was not addressed in this
work. Additionally, despite the strengths of contextual attention SML models like BERT, the human
brain makes unconscious judgement on patterns of the textual data a machine cannot infer, thus
cannot reproduce in the classification task. Moreover, the performance metrics of the SML model will
benefit from extending the number of manually labelled messages. Finally, BERT models are not a
panacea. Besides BERT is evolving, another SML classification tasks should be trialled, like stacked Bi-
LSTM models recommended by Chakrabarty et al. (2019).

37
Conclusions & Future Work

This thesis aimed to address the understudied radicalisation trajectories of members of fringe social
movements organisations. Mapping possible radicalisation trajectories can facilitate a better
understanding of individual and group motivations and actions, as well as it may act as an early-
warning system to avoid escalation, like in the case of documented violence perpetrated by incels.
This work proposes a simple (perhaps rudimentary) way to quantify online radicalisation in echo
chamber-like online platforms. To measure echo chamber effects, at least two modes (cf. bimodality
principle, Esteban and Ray, 1994; as cited in DiMaggio et al., 1996) are needed for comparison. Future
research can thus apply a similar approach to feminist forums to compare the breadth and depth of
polarised opinions on each other (anti-feminists vs. radical feminists). Similarly, other bases of the
Manosphere can be analysed in the fashion laid out in this work to gauge whether different
communities display varying levels of radicalisation. Lastly, this work extended on the premise of a
loud minority group in fringe social media platforms.

This work only scratched the surface of what the data I collected can uncover. A main constraint
was the timeframe to complete the thesis project. To further expand on this work, one could first
extend the number of annotated messages to re-train the model to increase confidence in the results
laid out in the Results section. In addition, negative binomial or poisson regression models would
decrease the bias in the regression estimators. Furthermore, Growth Curves Models utilise repeated
measures like the user-level monthly distribution of labels, thus can account for latent variables. The
latter would improve our confidence in the regression estimators, as representative traditional survey
data to complement the textual analysis does not exist. Moreover, fluctuations in the prevalence of
themes can be addressed by models analysing time trends. Also, social network analysis is another
suitable method to account for homophily and influence in ego networks. In other words, one can
differentiate whether actors choose to interact with similarly radicalised peers, or one becomes more
radicalised through interaction with a radicalised peer; thus, elaborate on the role of very active users,
the loud minorities in fringe social medias. Finally, future work can address the role the loud minorities
from the perspective of fringe platforms versus more popular platforms, like Reddit. This is to
investigate whether de-platforming marginalised communities (Horta Ribeiro, et al., 2021) increases
polarisation by affording less room for differing opinions.

38
Appendices

Appendix A1 – Human labelling vs algorithmic classification

Figure 5 Venn-diagram of Manually Labelled Messages

Table 8 Statistics on Human vs Algorithmic Labelling

39
Appendix A2: Lorenz-curve

Figure 6 Lorenz-curve

40
Appendix B1 - R code
The majority of the computer code was written and executed in R 4.1.2 (R Core Team, 2021),
RStudio Build 485 (RStudio Team, 2022), except for BERT classification.

Computer code has been removed

Appendix B2 – Python code


The classification tasks were run in in Jupyter Notebooks (Kluyver, et al., 2016) under Python 3.9.12 kernel
(Van Rossum & Drake, 2009).

Computer code has been removed

41
References
Agarwal, S., & Sureka, A. (2015). Using KNN and SVM Based One-Class Classifier for Detecting Online
Radicalization on Twitter. In R. Natarajan, G. Barua, & M. R. Patra (Eds.), Distributed
Computing and Internet Technology (pp. 431-442). Springer. doi:10.1007/978-3-319-14977-6

Allison, P. D. (1978). Measures of Inequality. American Sociological Review, 43(6), 865-880. Retrieved
from http://www.jstor.com/stable/2094626

Anzovino, M., Fersini, E., & Rosso, P. (2018). Automatic Identification and Classification of
Misogynistic Language on Twitter. In M. Silberztein, F. Atigui, E. Kornyshova, E. Métais, & F.
Meziane (Eds.), Natural Language Processing and Information Systems (Vol. 10859, pp. 57-
64). Springer. Retrieved from https://doi.org/10.1007/978-3-319-91947-8_6

Baele, S. J., Brace, L., & Coan, T. G. (2019). From “Incel” to “Saint”: Analyzing the violent worldview
behind the 2018 Toronto attack. Terrorism and Political Violence, 33(8), 1667-1691.
doi:10.1080/09546553.2019.1638256

Blackpill. (2022). Retrieved May 05, 2022, from Incels Wiki:


https://incels.wiki/w/Blackpill#Shitposting_vs_cult

Brady, W. J., Crockett, M. J., & Van Bavel, J. J. (2020). The MAD Model of Moral Contagion: The Role
of Motivation, Attention, and Design in the Spread of Moralized Content Online.
Perspectives on Psychological Science, 15(4), 978-1010. doi:10.1177/1745691620917336

Brady, W. J., McLoughlin, K., Doan, T. N., & Crockett, M. J. (2021). How social learning amplifies
moral outrage expression in online social networks. Science Advances, 7(33), 1-14.
doi:10.1126/sciadv.abe5641

Camacho, D., Gilperez-Lopez, I., Gonzelez-Pardo, A., Ortigosa, A., & Urruela, C. (2016). RiskTrack: A
New Approach for Risk Assessment on Radicalisation Based on Social Media Data.
Proceedings of the Workshop on Affective Computing and Context Awareness in Ambient
Intelligence (AfCAI ), (pp. 1-10). Murcia, Spain. Retrieved from http://ceur-ws.org/Vol-
1794/afcai16-paper5.pdf

Centola, D., & Macy, M. (2007). Complex Contagions and the Weakness of Long Ties. American
Journal of Sociology, 113(3), 702-734. Retrieved from
https://www.jstor.org/stable/10.1086/521848

Chakrabarty, T., Gupta, K., & Muresan, S. (2019). Pay “Attention” to Your Context when Classifying
Abusive Language. Proceedings of the Third Workshop on Abusive Language Online, (pp. 70-
79). Florence, Italy.

Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational and Psychological
Measurement, 20(1), 37-46. doi:10.1177/001316446002000104

Crockett, M. J. (2017). Moral outrage in the digital age. Nature Human Behaviour, 1(11), 769-771.
doi:10.1038/s41562-017-0213-3

De Smedt, T., & Daelemans, W. (2012). Pattern for Python. Journal of Machine Learning Researc, 13,
2063-2067. Retrieved from
https://jmlr.csail.mit.edu/papers/volume13/desmedt12a/desmedt12a.pdf

42
della Porta, D., & Diani, M. (2006). Social Movements. An Introduction (2nd ed.). Oxford: Blackwell
Publishing.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding. Retrieved Jan 31, 2022, from arXiv.org:
https://arxiv.org/abs/1810.04805v2

DiMaggio, P., Evans, J., & Bryson, B. (1996). Have American's Social Attitudes Become More
Polarized? American Journal of Sociology, 102(3), 690-755. Retrieved from
https://www.jstor.org/stable/2782461

Donnelly, D. A., & Burgess, E. O. (2008). The Decision to Remain in an Involuntarily Celibate
Relationship. Journal of Marriage and Family, 70(2), 519-535. doi:10.1111/j.1741-
3737.2008.00498.x

Donnelly, D., Burgess, E., Anderson, S., Davis, R., & Dillard, J. (2001). Involuntary Celibacy: A Life
Course Analysis. Journal of Sex Research, 38(2), 159-169. doi:10.1080/00224490109552083

Dubois, E., & Blank, G. (2018). The echo chamber is overstated: the moderating effect of political
interest and diverse media. Information, Communication & Society, 21(5), 729-745.
doi:10.1080/1369118X.2018.1428656

Dunbar, R. I. (1998). The Social Brain Hypothesis. Evolutionary Anthropology, 6(5), 178-190.

Elder, G. H. (1998). The Life Course as Developmental Theory. Child Development, 69(1), 1-12.
Retrieved from https://www.jstor.org/stable/1132065

Farrell, T., Fernandez, M., Novotny, J., & Alani, H. (2019). Exploring Misogyny across the Manosphere
in Reddit. 11th ACM Conference on Web Science (WebSci ’19), (pp. 87-96).
doi:10.1145/3292522.3326045

Fernandez, M., Asif, M., & Alani, H. (2018). Understanding the Roots of Radicalisation on Twitter.
WebSci '18: Proceedings of the 10th ACM Conference on Web Science, (pp. 1-10).
Amsterdam, Netherlands.

Fersini, E., Nozza, D., & Rosso, P. (2018). Overview of the Evalita 2018 Task on Automatic Misogyny
Identification (AMI). Proceedings of the 6th evaluation campaign of Natural Language
Processing and Speech tools for Italian. Turin, Italy.

Fersini, E., Rosso, P., & Anzovino, M. (2018). Overview of the Task on Automatic Misogyny
Identification at IberEval 2018. Proceedings of the Third Workshop on Evaluation of Human
Language Technologies for Iberian Languages (IberEval 2018), (pp. 214-228). Seville, Spain.

Frenda, S., Ghanem, B., Montes-Y-Gómez, M., & Rosso, P. (2019). Online Hate Speech against
Women: Automatic Identification of Misogyny and Sexism on Twitter. Journal of Intelligent
and Fuzzy Systems, 36(5), 4743-4752. Retrieved from https://doi.org/10.3233/JIFS-179023

Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. New
York: Cambridge University Press. doi:10.1017/CBO9780511790942

Ging, D. (2019). Alphas, Betas, and Incels: Theorizing the Masculinities of the Manosphere. Men and
Masculinities, 638-657. doi:10.1177/1097184X17706401

43
Glace, A. M., Dover, T. L., & Zatkin, J. G. (2021). Taking the Black Pill: An Empirical Analysis of the
“Incel”. Psychology of Men & Masculinities, 22(2), 288-297.
doi:https://doi.org/10.1037/men0000328

Goffman, E. (1974). Frama analysis: An essay on the organization of experience. Harvard University
Press.

Golbeck, J., Ashktorab, Z., Banjo, R. O., Berlinger, A., BuntainBuntain, C., Buntain, C., . . . Wu, D. M.
(2017). A Large Human-Labeled Corpus for Online Harassment Research. WebSci’17, (pp.
229-233). Troy, NY, USA. Retrieved from http://dx.doi.org/10.1145/3091478.3091509

Goldberg, A., & Stein, S. (2018). Beyond Social Contagion: Associative Diffusion and the Emergence
of Cultural Variation. American Sociological Review, Vol. 5(83), 897–932.

Granovetter, M. S. (1973). The Strength of Weak Ties. The American Journal of Sociology, 78(6),
1360-1380.

Hoffman, B., Ware, J., & Shapiro, E. (2020). Assessing the Threat of Incel Violence. Studies in Conflict
& Terrorism, 43(7), 565-587. doi:10.1080/1057610X.2020.1751459

Horta Ribeiro, M., Blackburn, J., Bradlyn, B., De Cristofaro, E., Stringhini, G., Long, S., . . . Zannettou,
S. (2021). The Evolution of the Manosphere Across the Web. Proceedings of the Fifteenth
International AAAI Conference onWeb and Social Media (ICWSM 2021), 15, pp. 196-201.

Jaki, S., De Smedt, T., Gwóźdź, M., Panchal, R., Rossa, A., & De Pauw, G. (2019). Online Hatred of
Women in the Incels.me Forum: Linguistic Analysis and Automatic Detection. Journal of
Language Aggression and Conflict, 7(2), 240-268. Retrieved from
https://doi.org/10.1075/jlac.00026.jak

Jane, E. A. (2014). “Your a Ugly, Whorish, Slut”. Feminist Media Studies, 14(4), 531-546. Retrieved
from https://doi.org/10.1080/14680777.2012.741073

Kluyver, T., Ragan-Kelley, B., Perez, F., Granger, B., Bussonnier, M., Frederic, J., . . . Willing, C. (2016).
Jupyter Notebooks – a publishing format for reproducible computational workflows. In F.
Loizides, & B. Schmidt (Eds.), Positioning and Power in Academic Publishing: Players, Agents
and Agendas (pp. 87-90). IOS Press.

Kundani, A. (2012). Radicalisation: the jouney of a concept. Race & Class, 54(2), 3-25.
doi:10.1177/0306396812454984

Lara-Cabrera, R., González Pardo, A., Benouaret, K., Faci, N., Benslimane, D., & Camacho, D. (2017).
Measuring the Radicalisation Risk in Social Networks. IEEE Access, 5(17), 10892-10900.
doi:10.1109/ACCESS.2017.2706018

Levendusky, M. S., & Malhotra, N. (2016). (Mis)perceptions of Partisan Polarization in The Americal
Public. Public Opinion Quarterly,, 80(Special Issue), 378-391. doi:doi:10.1093/poq/nfv045

List of incel forums. (2021, Nov 15). Retrieved Jan 28, 2022, from Incels Wiki:
https://incels.wiki/w/List_of_incel_forums

Mandel, D. R. (2009). Radicalization: what does it mean? In T. M. Pick, A. Speckhard, & B. Jacuch
(Eds.), Home-Grown Terrorism: understanding the root causes of radicalisation among
groups with an immigrant heritage in Europe (pp. 101-113). Brussels: Institute of Physics

44
Press. Retrieved from
https://www.researchgate.net/publication/253313042_Radicalization_What_does_it_mean

McPherson, M., Smith-Lovin, L., & Cook, J. (2001). Birds of a Feather: Homophily in Social Networks.
Review of Sociology, 415-444.

Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2018). The Future of Coding: A Comparison of
Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods. Sociological
Methods & Research, 50(1), 202-237. Retrieved from
https://doi.org/10.1177/0049124118769114

Nickerson, R. S. (1998). Confirmation Bias: A Ubiquitous Phenomenon in Many Guides. Review of


General Psychology, 2(2), 175-220.

Nouh, M., Nurse, J. R., & Goldsmith, M. (Eds.). (2019). Understanding the Radical Mind: Identifying
Signals to Detect Extremist Content on Twitter. 2019 IEEE International Conference on
Intelligence and Security Informatics (ISI), (pp. 98-103). doi:10.1109/ISI.2019.8823548

Oliver, P. E., Marwell, G., & Teixeira, R. (1985). A Theory of the Critical Mass. I. Interdependence,
Group Heterogeneity, and the Production of Collective Action. Americal Journal of Sociology,
91(3), 522-556. Retrieved from http://www.jstor.org/stable/2780201

Pascoe, C. J., & Bridges, T. (2014). Hybrid Masculinities: New Directions in the Sociology of Men and
Masculinities. Sociology Compass, 8(3), 246-258. doi:10.1111/soc4.12134

Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic Inquiry and Word Count. Mahwah,
NJ: Erlbaum Publishers.

Prasetya, H. A., & Murata, T. (2020). A model of opinion and propagation structure polarization in
social media. Computational Social Networks, 7(2), 1-35. doi:10.1186/s40649-019-0076-z

R Core Team. (2021, November 01). R: A language and environment for statistical computing. R
Foundation for Statistical Computing. Vienna, Austria. Retrieved from https://www.R-
project.org/

R/IncelTears. (2021). Retrieved May 16, 2022, from Incels Wiki: https://incels.wiki/w/R/IncelTears

RStudio Team. (2022). Integrated Development Environment for R. Boston, MA, USA: PBC. Retrieved
from http://www.rstudio.com

Rules and FAQ. (2017, November 09). Retrieved April 10, 2022, from Incels.is:
https://incels.is/threads/rules-and-faq.799/

Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The Risk of Racial Bias in Hate Speech
Detection. Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics, 1668-1678.

Scaptura, M. N., & Boyle, K. M. (2020). Masculinity Threat, "Incel" Traits, and Violent Fantasies
Among Heterosexual Men in the United States. Feminist Criminology, 15(3), 278-298.
doi:10.1177/1557085119896415

Schmidt, A., & Wiegand, M. (2017). A Survey on Hate Speech Detection using Natural Language
Processing. Proceedings of the Fifth International Workshop on Natural Language Processing
for Social Media, (pp. 1-10). Valencia, Spain.

45
Sharot, T., & Sunstein, C. R. (2020). How people decide what they want to know. Nature Human
Behaviour, 4, 14-19. doi:10.1038/s41562-019-0793-1

Shen, R. (2021). AugmentedSocialScientist. Retrieved May 2022, 15, from https://github.com/such-


as-ice/AugmentedSocialScientist

Speckhard, A., Ellenberg, M., Morton, J., & Ash, A. (2021). Involuntary Celibates’ Experiences of and
Grievance over Sexual Exclusion and the Potential Threat of Violence Among Those Active in
an Online Incel Forum. Journal of Strategic Security, 14(2), 89-121. Retrieved from
https://www.jstor.org/stable/10.2307/27026635

Squirrell, T. (2018). A defenitive guide to Incels part two: the A-Z incel dictinoary. Retrieved March
20, 2022, from https://www.timsquirrell.com/blog/2018/5/30/a-definitive-guide-to-incels-
part-two-the-blackpill-and-vocabulary

Stryker, S., & Burke, P. J. (2000). The Past, Present, and Future of an Identity Theory. Social
Psychology Quarterly, 64(4), 284-297. Retrieved from https://www.jstor.org/stable/2695840

Suler, J. (2004). The Online Disinhibition Effect. CyberPsychology & Behavior 2004, 7(3), 321-326.
Retrieved from https://doi.org/10.1089/1094931041291295

Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. Scotts Valley, CA, USA.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017).
Attention Is All You Need. 31st Conference on Neural Information Processing Systems (NIPS
2017), (pp. 1-11). Long Beach, CA.

Watts, D. J., & Dodds, P. S. (2007). Influentials, Networks, and Public Opinion Formation. Journal of
Consumer Research, 34(4), 441–458. doi:https://doi.org/10.1086/518527

Wayback Machine: Incels.me. (2018, April 26). Retrieved May 22, 2022, from Internet Archive -
Wayback Machine: https://web.archive.org/web/20180426073335/https://incels.me/

Wikipedia:No personal attacks. (2022, Feb 15). Retrieved May 14, 2022, from Wikipedia:
https://en.wikipedia.org/wiki/Wikipedia:No_personal_attacks

Wojcieszak, M. (2010). ‘Don’t talk to me’: effects of ideologically homogeneous online groups and
politically dissimilaroffline ties on extremism. new media & society, 12(4), 637-655.
doi:10.1177/1461444809342775

46
Online resources used for developing programme code

1. https://stackoverflow.com/questions/22272571/data-input-via-shinytable-in-r-shiny-
application#25928643 # shiny
2. https://statisticsglobe.com/scale-colour-fill-brewer-rcolorbrewer-package-r#example-2-
select-color-brewer-palette
3. https://r-lang.com/mode-in-r/
4. https://www.geeksforgeeks.org/how-to-calculate-jaccard-similarity-in-r/
5. https://r-charts.com/part-whole/ggvenndiagram/
6. https://ggplot2.tidyverse.org/reference/stat_ecdf.html
7. http://www.sthda.com/english/wiki/ggplot2-axis-scales-and-transformations
8. https://r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2.html
9. https://community.rstudio.com/t/adding-manual-legend-to-ggplot2/41651/2
10. http://www.sthda.com/english/wiki/ggplot2-legend-easy-steps-to-change-the-position-and-
the-appearance-of-a-graph-legend-in-r-software
11. http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-
format-and-visualize-a-correlation-matrix-using-r-software
12. https://stackoverflow.com/questions/3462143/get-difference-between-two-lists
13. https://medium.com/swlh/k-fold-as-cross-validation-with-a-bert-text-classification-
example-4017f76a863a

47

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy