Approaches To Identify Fake News: A Systematic Literature Review
Approaches To Identify Fake News: A Systematic Literature Review
net/publication/341680937
CITATIONS READS
20 4,442
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Dylan de Beer on 31 August 2020.
1 Introduction
Paskin (2018: 254) defines fake news as “particular news articles that originate either
on mainstream media (online or offline) or social media and have no factual basis, but
are presented as facts and not satire”. The importance of combatting fake news is starkly
illustrated during the current COVID-19 pandemic. Social networks are stepping up in
using digital fake news detection tools and educating the public towards spotting fake
news. At the time of writing, Facebook uses machine learning algorithms to identify
false or sensational claims used in advertising for alternative cures, they place potential
fake news articles lower in the news feed, and they provide users with tips on how to
identify fake news themselves (Sparks and Frishberg 2020). Twitter ensures that
searches on the virus result in credible articles and Instagram redirects anyone searching
for information on the virus to a special message with credible information (Marr 2020).
These measures are possible because different approaches exist that assist the detec-
tion of fake news. For example, platforms based on machine learning use fake news
from the biggest media outlets, to refine algorithms for identifying fake news (Macau-
lay 2018). Some approaches detect fake news by using metadata such as a comparison
of release time of the article and timelines of spreading the article as well where the
story spread (Macaulay 2018).
2
The purpose of this research paper is to, through a systematic literature review, cat-
egorize current approaches to contest the wide-ranging endemic of fake news.
Fake news is not a new concept. Before the era of digital technology, it was spread
through mainly yellow journalism with focus on sensational news such as crime, gos-
sip, disasters and satirical news (Stein-Smith 2017). The prevalence of fake news relates
to the availability of mass media digital tools (Schade 2019). Since anyone can publish
articles via digital media platforms, online news articles include well researched pieces
but also opinion-based arguments or simply false information (Burkhardt 2017). There
is no custodian of credibility standards for information on these platforms making the
spread of fake news possible. To make things worse, it is by no means straightforward
telling the difference between real news and semi-true or false news (Pérez-Rosas et al.
2018).
The nature of social media makes it easy to spread fake news, as a user potentially
sends fake news articles to friends, who then send it again to their friends and so on.
Comments on fake news sometimes fuel its ‘credibility’ which can lead to rapid sharing
resulting in further fake news (Albright 2017).
Social bots are also responsible for the spreading of fake news. Bots are sometimes
used to target super-users by adding replies and mentions to posts. Humans are manip-
ulated through these actions to share the fake news articles. (Shao et al. 2018).
Clickbait is another tool encouraging the spread of fake news. Clickbait is an adver-
tising tool used to get the attention of users. Sensational headlines or news are often
used as clickbait that navigate the user to advertisements. More clicks on the advert
means more money (Chen, Conroy, and Rubin 2015a).
Fortunately, tools have been developed for detecting fake news. For example, a tool
has been developed to identify fake news that spreads through social media through
examining lexical choices that appear in headlines and other intense language structures
(Chen, Conroy, and Rubin 2015b). Another tool, developed to identify fake news on
Twitter, has a component called the Twitter Crawler which collects and stores tweets
in a database (Atodiresei, Tănăselea, and Iftene 2018). When a Twitter user wants to
check the accuracy of the news found they can copy a link into this application after
which the link will be processed for fake news detection. This process is built on an
algorithm called the NER (Named Entity Recognition) (Atodiresei, Tănăselea, and
Iftene 2018).
There are many available approaches to help the public to identify fake news and
this paper aims to enhance understanding of these by categorizing these approaches as
found in existing literature.
3
3 Research Method
The purpose of this paper is to categorize approaches used to identify fake news. In
order to do this, a systematic literature review was done. This section presents the
search terms that were used, the selection criteria and the source selection.
Specific search terms were used to enable the finding of relevant journal articles such
as the following:
(“what is fake news” OR “not genuine information” OR “counter fit news” OR “in-
accurate report*” OR “forged (NEAR/2) news” OR “mislead* information” OR “false
store*” OR “untrustworthy information” OR “hokes” OR “doubtful information” OR
“incorrect detail*” OR “false news” OR “fake news” OR “false accusation*” )
AND (“digital tool*” OR “digital approach” OR “automated tool*” OR “approach*”
OR “programmed tool*” OR “digital gadget*” OR “digital device*” OR “digital ma-
chan*” OR “digital appliance*” OR “digital gizmo” OR “IS gadget*” OR “IS tool*”
OR “IS machine*” OR “digital gear*” OR “information device*”)
AND (“fake news detection” OR “approaches to identify fake news” OR “methods
to identify fake news” OR “finding fake news” OR “ways to detect fake news”).
Figure 1 below gives a flowchart of the search process: the identification of articles, the
screening, the selection process and the number of the included articles.
4
5 Findings
In this section of the article we list the categories of approaches that are used to identify
fake news. We also discuss how the different approaches interlink with each other and
how they can be used together to get a better result.
The following categories of approaches for fake news detection are proposed: (1)
language approach, (2) topic-agnostic approach, (3) machine learning approach, (4)
knowledge-based approach, (5) hybrid approach.
The five categories mentioned above are depicted in figure 2 below. Figure 2 shows
the relationship between the different approaches. The sizes of the ellipses are propor-
tional to the number of articles found (given as the percentage of total included articles)
in the systematic literature review that refer to that approach.
Fig. 1. Categories of fake news detection approaches resulting from the systematic literature
review
The approaches are discussed in depth below with some examples for illustration
purposes.
and letters in a word, how they are structured and how it fits together in a paragraph
(Burkhardt 2017). The focus is therefore on grammar and syntax (Burkhardt 2017).
There are currently three main methods that contribute to the language approach:
Bag of Words (BOW): In this approach, each word in a paragraph is considered of
equal importance and as independent entities (Burkhardt 2017). Individual words fre-
quencies are analysed to find signs of misinformation. These representations are also
called n-grams (Thota et al. 2018). This will ultimately help to identify patterns of word
use and by investigating these patterns, misleading information can be identified. The
bag of words model is not as practical because context is not considered when text is
converted into numerical representations and the position of a word is not always taken
into consideration (Potthast et al., 2017).
Semantic Analysis. (Chen, Conroy, and Rubin 2015b) explain that truthfulness can
be determined by comparing personal experience (e.g. restaurant review) with a profile
on the topic derived from similar articles. An honest writer will be more likely to make
similar remarks about a topic than other truthful writers. Different compatibly scores
are used in this approach.
Deep Syntax: The deep syntax method is carried out through Probability Context
Free Grammars (Stahl 2018). The Probability Context Free Grammars executes deep
syntax tasks through parse trees that make Context Free Grammar analysis possible.
Probabilistic Context Free Grammar is an extension of Context Free Grammars (Zhou
and Zafarani 2018). Sentences are converted into a set of rewritten rules and these rules
are used to analyse various syntax structures. The syntax can be compared to known
structures or patterns of lies and can ultimately lead to telling the difference between
fake news and real news (Burkhardt 2017).
This category of approaches detect fake news by not considering the content of articles
bur rather topic-agnostic features. The approach uses linguistic features and web mark-
up capabilities to identify fake news (Castelo et al. 2019). Some examples of topic-
agnostic features are 1) a large number of advertisements, 2) longer headlines with eye-
catching phrases, 3) different text patterns from mainstream news to induce emotive
responses 4) presence of an author name (Castelo et al. 2019; Horne and Adalı 2017).
Machine learning algorithms can be used to identify fake news. This is achieved
through using different types of training datasets to refine the algorithms. Datasets en-
ables computer scientists to develop new machine learning approaches and techniques.
Datasets are used to train the algorithms to identify fake news. How are these datasets
created? One way is through crowdsourcing. Perez-Rosas et al (2018) created a fake
news data set by first collecting legitimate information on six different categories such
as sports, business, entertainment, politics, technology and education (Pérez-Rosas et
al., 2018). Crowdsourcing was then used and a task was set up which asked the workers
to generate a false version of the news stories (Pérez-Rosas et al., 2018). Over 240
stories were collected and added to the fake news dataset.
6
A machine learning approach called the rumor identification framework has been
developed that legitimizes signals of ambiguous posts so that a person can easily iden-
tify fake news (Sivasangari, Anand, and Santhya 2018). The framework will alert peo-
ple of posts that might be fake (Sivasangari, Anand, and Santhya 2018). The framework
is built to combat fake tweets on Twitter and focuses on four main areas; the metadata
of tweets, the source of the tweet; the date and area of the tweet, where and when the
tweet was developed (Sivasangari, Anand, and Santhya 2018). By studying these four
parts of the tweet the framework can be implemented to check the accuracy of the in-
formation and to separate the real from the fake (Sivasangari, Anand, and Santhya
2018). Supporting this framework, the spread of gossip is collected to create datasets
with the use of a Twitter Streaming API (Sivasangari, Anand, and Santhya 2018).
Twitter has developed a possible solution to identify and prevent the spread of mis-
leading information through fake accounts, likes and comments (Atodiresei, Tănăselea,
and Iftene 2018) - the Twitter crawler, a machine learning approach works by collecting
tweets and adding them to a database, making comparison between different tweets
possible.
Recent studies argue for the integration of machine learning and knowledge engineer-
ing to detect fake news. The challenging problem with some of these fact checking
methods is the speed at which fake news spreads on social media. Microblogging plat-
forms such as Twitter causes small pieces of false information to spread very quickly
to a large number of people (Qazvinian et al. 2011). The knowledge-based approach
aims at using sources that are external to verify if the news is fake or real and to identify
the news before the spread thereof becomes quicker. There are three main categories;
(1) Expert Oriented Fact Checking, (2) Computational Oriented Fact Checking, (3)
Crowd Sourcing Oriented Fact Checking (Ahmed, Hinkelmann, and Corradini 2019).
Expert Oriented Fact Checking. With expert oriented fact checking it is necessary
to analyze and examine data and documents carefully (Ahmed, Hinkelmann, and Cor-
radini 2019). Expert-oriented fact-checking requires professionals to evaluate the ac-
curacy of the news manually through research and other studies on the specific claim.
Fact checking is the process of assigning certainty to a specific element by comparing
the accuracy of the text to another which has previously been fact checked (Vlachos
and Riedel 2014).
Computational Oriented Fact Checking. The purpose of computational oriented fact
checking is to administer users with an automated fact-checking process that is able to
identify if a specific piece of news is true or false (Ahmed, Hinkelmann, and Corradini
2019). An example of computational oriented fact checking is knowledge graphs and
open web sources that are based on practical referencing to help distinguish between
real and fake news (Ahmed, Hinkelmann, and Corradini 2019). A recent tool called the
ClaimBuster has been developed and is an example of how fact checking can automat-
ically identify fake news (Hassan et al. 2017). This tool makes use of machine learning
techniques combined with natural language processing and a variety of database que-
7
ries. It analyses context on social media, interviews and speeches in real time to deter-
mine ‘facts’ and compares it with a repository that contains verified facts and delivers
it to the reader (Hassan et al. 2017).
Crowd Sourcing Oriented. Crowdsourcing gives the opportunity for a group of peo-
ple to make a collective decision through examining the accuracy of news (Pennycook
and Rand 2019). The accuracy of the news is completely based on the wisdom of the
crowd (Ahmed, Hinkelmann, and Corradini 2019). Kiskkit is an example of a platform
that can be used for crowdsourcing where the platform allows a group of people to
evaluate pieces of a news article (Hassan et al. 2017). After one piece has been evalu-
ated the crowd moves to the next piece for evaluation until the entire news article has
been evaluated and the accuracy thereof has been determined by the wisdom of the
crowd (Hassan et al. 2017).
There are three generally agreed upon elements of fake news articles, the first element
is the text of an article, second element is the response that the articles received and
lastly the source used that motivate the news article (Ruchansky, Seo, and Liu 2017).
A recent study has been conducted that proposes a hybrid model which helps to identify
fake news on social media through using a combination of human and machine learning
to help identify fake news (Okoro et al. 2018). Humans only have a 4% chance of iden-
tifying fake news if they take a guess and can only identify fake news 54% of the time
(Okoro et al. 2018). The hybrid model as proven to increase this percentage (Okoro et
al. 2018). To make the hybrid model effective it combines social media news with ma-
chine learning and a network approach (Okoro et al. 2018). The purpose of this model
is to identify the probability that the news could be fake (Okoro et al. 2018). Another
hybrid model called CSI (capture, score, integrate) has been developed and functions
on the main elements; (1) capture - the process of extracting representations of articles
by using a Recurrent Neutral Network (RNN), (2) Score – to create a score and repre-
sentation vector, (3) Integrate – to integrate the outputs of the capture and score result-
ing in a vector which is used for classification (Ruchansky, Seo, and Liu 2017).
7 Conclusion
In this paper we discussed the prevalence of fake news and how technology has changed
over the last years enabling us to develop tools that can be used in the fight against fake
news. We also explored the importance of identifying fake news, the influence that
misinformation can have on the public’s decision making and which approaches exist
to combat fake news. The current battle against fake news on COVID-19 and the un-
certainty surrounding it, shows that a hybrid approach towards fake news detection is
needed. Human wisdom as well as digital tools need to be harnessed in this process.
Hopefully some of these measures will stay in place and that digital media platform
owners and public will take responsibility and work together in detecting and combat-
ting fake news.
8
8 References
Ahmed, Sajjad, Knut Hinkelmann, and Flavio Corradini. 2019. “Combining Machine Learning
with Knowledge Engineering to Detect Fake News in Social Networks-a Survey.”
AAAI Spring Symposium, 12.
Albright, Jonathan. 2017. “Welcome to the Era of Fake News.” Media and Communication 5 (2):
87. https://doi.org/10.17645/mac.v5i2.977.
Atodiresei, Costel-Sergiu, Alexandru Tănăselea, and Adrian Iftene. 2018. “Identifying Fake
News and Fake Users on Twitter.” Procedia Computer Science 126: 451–61.
https://doi.org/10.1016/j.procs.2018.07.279.
Burkhardt, Joanna M. 2017. “History of Fake News.” Library Technology Reports 53 (8): 37.
Castelo, Sonia, Thais Almeida, Anas Elghafari, Aécio Santos, Kien Pham, Eduardo Nakamura,
and Juliana Freire. 2019. “A Topic-Agnostic Approach for Identifying Fake News
Pages.” Companion Proceedings of The 2019 World Wide Web Conference on - WWW
’19, 975–80. https://doi.org/10.1145/3308560.3316739.
Chen, Yimin, Niall J. Conroy, and Victoria L. Rubin. 2015a. “Misleading Online Content: Rec-
ognizing Clickbait as False News?” In Proceedings of the 2015 ACM on Workshop on
Multimodal Deception Detection - WMDD ’15, 15–19. Seattle, Washington, USA:
ACM Press. https://doi.org/10.1145/2823465.2823467.
———. 2015b. “News in an Online World: The Need for an ‘Automatic Crap Detector.’” Pro-
ceedings of the Association for Information Science and Technology 52 (1): 1–4.
https://doi.org/10.1002/pra2.2015.145052010081.
Hassan, Naeemul, Fatma Arslan, Chengkai Li, and Mark Tremayne. 2017. “Toward Automated
Fact-Checking: Detecting Check-Worthy Factual Claims by ClaimBuster.” In Pro-
ceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discov-
ery and Data Mining - KDD ’17, 1803–12. Halifax, NS, Canada: ACM Press.
https://doi.org/10.1145/3097983.3098131.
Horne, Benjamin D, and Sibel Adalı. 2017. “This Just In: Fake News Packs a Lot in Title, Uses
Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News.”
International AAAI Conference on Web and Social Media, 8.
Macaulay, Thomas. 2018. “Can Technology Solve the Fake News Problem It Helped Create? |
Startups | Techworld.” Can Technology Solve the Fake News Problem It Helped Cre-
ate? | Startups | Techworld. 2018. https://www.techworld.com/startups/can-technol-
ogy-solve-fake-news-problem-it-helped-create-3672139/.
Marr, Bernard. 2020. “Coronavirus Fake News: How Facebook, Twitter, And Instagram Are
Tackling The Problem.” Forbes. 2020. https://www.forbes.com/sites/bernard-
marr/2020/03/27/finding-the-truth-about-covid-19-how-facebook-twitter-and-insta-
gram-are-tackling-fake-news/.
Okoro, E.M., B.A. Abara, A.O. Umagba, A.A. Ajonye, and Z.S. Isa. 2018. “A Hybrid Approach
to Fake News Detection on Social Media.” Nigerian Journal of Technology 37 (2):
454. https://doi.org/10.4314/njt.v37i2.22.
Paskin, Danny. 2018. “Real or Fake News: Who Knows?” The Journal of Social Media in Society
7 (2): 252–73.
9
Pennycook, Gordon, and David G. Rand. 2019. “Fighting Misinformation on Social Media Using
Crowdsourced Judgments of News Source Quality.” Proceedings of the National
Academy of Sciences 116 (7): 2521–26. https://doi.org/10.1073/pnas.1806781116.
Pérez-Rosas, Verónica, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. 2018. “Au-
tomatic Detection of Fake News.” In Proceedings of the 27th International Conference
on Computational Linguistics, 3391–3401. Santa Fe, New Mexico, USA: Association
for Computational Linguistics. https://www.aclweb.org/anthology/C18-1287.
Potthast, Martin, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. 2017. “A
Stylometric Inquiry into Hyperpartisan and Fake News.” ArXiv:1702.05638 [Cs], Feb-
ruary. http://arxiv.org/abs/1702.05638.
Qazvinian, Vahed, Emily Rosengren, Dragomir R Radev, and Qiaozhu Mei. 2011. “Rumor Has
It: Identifying Misinformation in Microblogs.” Proceedings of the 2011 Conference on
Empirical Methods in Natural Language Processing, 11.
Ruchansky, Natali, Sungyong Seo, and Yan Liu. 2017. “CSI: A Hybrid Deep Model for Fake
News Detection.” Proceedings of the 2017 ACM on Conference on Information and
Knowledge Management - CIKM ’17, 797–806.
https://doi.org/10.1145/3132847.3132877.
Schade, Ulrich. 2019. “Software That Can Automatically Detect Fake News.” Computers Sci-
ence And Engineering, 3.
Shao, Chengcheng, Giovanni Luca Ciampaglia, Onur Varol, Kaicheng Yang, Alessandro Flam-
mini, and Filippo Menczer. 2018. “The Spread of Low-Credibility Content by Social
Bots.” Nature Communications 9 (1): 4787. https://doi.org/10.1038/s41467-018-
06930-7.
Sivasangari, V, Pandian Vijay Anand, and R Santhya. 2018. “A Modern Approach to Identify
the Fake News Using Machine Learning.” International Journal of Pure and Applied
Mathematics 118 (20): 10.
Sparks, Hannah, and Hannah Frishberg. 2020. “Facebook Gives Step-by-Step Instructions on
How to Spot Fake News.” 2020. https://nypost.com/2020/03/26/facebook-gives-step-
by-step-instructions-on-how-to-spot-fake-news/.
Stahl, Kelly. 2018. “Fake News Detection in Social Media.” California State University Stani-
slaus, 6.
Stein-Smith, Kathy. 2017. “Librarians, Information Literacy, and Fake News.” Strategic Library
37.
Thota, Aswini, Priyanka Tilak, Simrat Ahluwalia, and Nibrat Lohia. 2018. “Fake News Detec-
tion: A Deep Learning Approach” 1 (3): 21.
Vlachos, Andreas, and Sebastian Riedel. 2014. “Fact Checking: Task Definition and Dataset
Construction.” In Proceedings of the ACL 2014 Workshop on Language Technologies
and Computational Social Science, 18–22. Baltimore, MD, USA: Association for
Computational Linguistics. https://doi.org/10.3115/v1/W14-2508.
Yang, Yang, Lei Zheng, Jiawei Zhang, Qingcai Cui, Zhoujun Li, and Philip S. Yu. 2018. “TI-
CNN: Convolutional Neural Networks for Fake News Detection.” ArXiv:1806.00749
[Cs], June. http://arxiv.org/abs/1806.00749.
Zhou, Xinyi, and Reza Zafarani. 2018. “Fake News: A Survey of Research, Detection Methods,
and Opportunities.” ArXiv:1812.00315 [Cs], December.
http://arxiv.org/abs/1812.00315.