0% found this document useful (0 votes)
53 views8 pages

Scopus-Design Engg Paper

This document summarizes a research paper that aims to detect fake Twitter profiles using machine learning models. It discusses using support vector machine (SVM), random forest, and neural network algorithms to classify Twitter profiles as real or fake based on attributes like screen name, number of statuses, followers, friends, favorites, date created, profile photo, verification status, and description. It provides background on social media growth and issues of fake accounts. It also reviews related works applying machine learning to fake profile detection and compares the proposed multi-attribute model to existing systems relying on fewer attributes.

Uploaded by

Reenie Tanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views8 pages

Scopus-Design Engg Paper

This document summarizes a research paper that aims to detect fake Twitter profiles using machine learning models. It discusses using support vector machine (SVM), random forest, and neural network algorithms to classify Twitter profiles as real or fake based on attributes like screen name, number of statuses, followers, friends, favorites, date created, profile photo, verification status, and description. It provides background on social media growth and issues of fake accounts. It also reviews related works applying machine learning to fake profile detection and compares the proposed multi-attribute model to existing systems relying on fewer attributes.

Uploaded by

Reenie Tanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ISSN: 0011-9342 | Year 2021

Design Engineering Issue: 8 | Pages: 10919 - 10926

Detection of Fake Twitter Profiles using SVM,


Random Forest and Neural Network Models

Ms Sheetal Prasad1, Mr Antriksh Dwivedi2, Mrs. Reenie Tanya3, and Ms G. Stuthi4


1
sp2722@srmist.edu.in,2ad3232@srmist.edu.in,3reeniet@srmist.edu.in,4gs7996@srmist.edu.i
n
1
Final Year Undergraduate, Final Year Undergraduate,3Assistant Professor,4Final Year
2

Undergraduate
Computer Science And Engineering, SRM Institute of Science And Technology, Chennai,
India

Abstract. Online social networks are one of the most important parts of our lives. Today,
online social media has changed the way we communicate. People are more likely to use
social media to keep in touch with family, friends, and college. Social media creates a perfect
environment for hackers. They use fake accounts on social media platforms such as
Facebook, Twitter, Instagram to access personal data. The purpose of our research is to create
a model that will detect whether an account is fake or not, thus protecting people from fake
accounts and providing them with privacy. This is done taking into account some algorithms
such as SVM, Random Forest Algorithm and neural network models.

Keywords: Social Network, media, twitter, hacker, communication, S.V.M, Random Forest,
Neural Network.

1 Introduction
Online Social Networks represent one of the main parts of our lives. Most of us wake up in
the morning and the first thing we do is to check our smartphone. We look at the most
obvious medium, social media. Online social networks know a lot of things about us: when
we wake-up, when we leave our house, who our close friends are, they basically know
everything and that is a big problem; our sensitive and personal data are available on the
internet. Online social networks have become a target for hackers lately. They create fake
accounts to carry out abuse. Scammers make contact with people on the internet, later use
sensitive information of the people for blackmailing, phishing and cyberbullying.
Facebook started in 2004 with a basic usage and has grown exponentially becoming a social
media giant over the years and has 2.7 billion monthly users at present. In 2012 Facebook
acquired another growing potential giant and dominated the market in 2014 with the
whatsapp acquisition. Facebook estimated in 2015 that up to 14 million of its monthly active
users are in fact fake. We can see very clearly the importance and growth of social media.
This growth makes our lives easier by giving us so many ways to communicate, but at the
same time, we entrust all our personal data to the different online platforms. In these times
many entities try to have all the information in order to have control over people’s actions in
ways unimaginable. Twitter is a real-time microblogging platform, publicly launched in July
2006. From there, Twitter grew out its user base to reach over 300 million monthly active

[10919]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 8 | Pages: 10919 - 10926

users. With a reported 500 million registered users, 20 million users are blatantly fake and
bots, that comes out to roughly 4% of all Twitter accounts being fake.

We will be using three algorithms namely Support Vector Machine Algorithm, Random
Forest Algorithm and the Neural Network, following that we will compare them and find the
best algorithm with the maximum efficiency.

A. Facebook removed 3.2 billion fake accounts


Facebook Inc (FB.O) removed 3.2 billion fake accounts between April and September 2019,
representing the largest database reduction in Facebook history and doubling the number of
fake accounts removed over the same reporting period nearly 1.3 billion fake accounts,
compared to 1.7 billion fake accounts in the corresponding quarter of 2021. Across the
network, regardless of their activities and non-broadcast rulings (temporary / permanent)
based on the findings were broadcast.

B. Twitter and fake accounts


Fake news and spam have been plaguing social media giants like Facebook and Twitter for
some time. The closer it comes to elections in the country, the more important it becomes to
curb fake accounts that spread false propaganda. Facebook deletes nearly a million accounts
every day. and Twitter does something similar too.
The Indian government has asked major social media companies such as Facebook, Twitter,
and Google to remove accounts containing fake profile photos of well-known personalities
and companies and even the general user within 24 hours of notification the same by the user
or someone on behalf of the user. This is part of the new IT rules and social media giants will
have to act immediately upon receiving a complaint to this effect, according to a Times of
India report. This move would end the threat of identity theft on social media in India.

2 Related Work
We did research on a bunch of papers based on the concept we would be using to create the
project. The first paper [1] & [16] & [20] brought us to the concept of using Unsupervised
Machine Learning Techniques and Administered Machine Learning Calculations. The
upsides of the paper were that the model could work efficiently to differentiate between fake
and real based on the technology of the time when this paper was published. The downsides
are since the technologies have evolved over time, newer and more efficient models can be
[10920]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 8 | Pages: 10919 - 10926

created. The paper [2] and [13] and [19] was a survey carried out on Facebook. The concept
of the document was Facebook's web services and social engineering experiment. The
concept of the paper was the factors contributing to the successful integration of a fake
profile into an existing friendship network and human behavior and the interaction between
common user-profiles and fake profiles portraying patterns. In the [3] and [12], the authors
used the concept of machine learning algorithms, support machine vector, decision tree,
artificial neural networks, and Naive Bayes classifier. The good points were that the project
ran efficiently with the downside being that the project uses outdated machine learning
algorithms and could run in newer algorithms.

Paper [4] [11] used the concept of Big Data, Data Science, Identity Deception, filtering,
Supervised Learning, and the Naive Bayes machine learning model. The upsides of this
model were that the model yielded the best F1 score of 49.75% with the downsides being that
the F1 score could be beaten for better in the future. In the fifth, [5] & [17] the concepts of
naive Bayes classification were used alongside decision trees, support vector machines, K-
nearest neighbor, and logistic regression. The Upsides were that the model was efficiently
successful but the downside was that the work could be done for unsupervised learning
techniques too. Paper [6] & [15] & [21] used the concept of data science and random forest
algorithm which accurately predicted fake accounts. The future findings of the paper were
that it eliminated the need for manual prediction of a fake account.
In [7] & [10] the concepts of supervised learning algorithm, clustering algorithm, machine
learning, and naive algorithm were used. Features in future versions may help improve the
performance of social media platform applications without significantly affecting results.
Articles [8] and [14] used the concept of machine learning algorithms, Random Forest, as
well as J48, KNearest Neighbor and Sequential Minimal Optimization. The immediate
advantage was that the model was exceptionally successful, but the downside was that it was
not intended for languages other than English. Paper [9] & [18] used the concept of Machine
Learning Algorithms such as Random Forest algorithm, Classification algorithm, Multilayer
Perceptron, and Logistic Regression. The drawback was it could be made more advanced
with the modern ML algorithms.
3 Existing System
The existing system generally uses a few criteria like the number of friends, a number of
pictures, and when the profile was created to detect the profile for a general social media
account.

4 Proposed System
The proposed model is to use all the things mentioned above and also add important criteria
like screen name, number of statuses, number of subscribers, number of friends, number of
favourites, the number listed, date created, profile picture whether or not the account is
verified, the description.

[10921]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 8 | Pages: 10919 - 10926

5 Architecture Diagram

Figure 5.1

6 Proposed Model

Figure 6.1

Above is a bar graph with information on data involved in the accounts in the data set.
Comprising criteria such as screen name, status count, followers count, friends count,
favourite count, listed count, created date, profile photo, whether or not the account is
verified, description. Few things can be easily derived with examples being that most of the
dataset's account location is Chennai and the users have updated their application.

[10922]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 8 | Pages: 10919 - 10926

Figure 6.2

The workflow would be that the system would fetch the data from the user, clean it and
prepare it to be used. Later the dataset would be trained and the model evaluation would
occur. Then the model would be deployed to check if the profile is fake or a legitimate one.
With that being found out, the most crucial step would be to compare the working and the
efficiencies of the multiple models used, such as that of Neural Network, Support Vector
Machine and Random Forest algorithm.

7 Modules Used
Training the dataset on:
● Neural Network
● Support Vector Machine
● Random Forest

8 Conclusion

To distinguish between legitimate and fake profiles and to detect fraud and to block the
spammers, we ran multiple algorithms like Neural Network, Support Vector Machine, and
Random Forest to attain and get the most possible result with the best accuracy out of it.
Based on the results, we have concluded that the best and the most accurate algorithm would
be Random Forest Algorithm with an accuracy of 99.88 % whereas Neural Network and
Support Vector Machine have accuracies 87.35 % and 79.90% respectively. This research
will be feasible for the creation of apps and softwares in the future and help users across the
world have the best social experience with the least possible threat value.

[10923]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 8 | Pages: 10919 - 10926

Figure 8.1

Figure 8.2

9 Figures and Tables


Figure 1.1 - Is a bar graph of the Number of fake accounts removed per million vs quarterly
divided years since late 2017 to Early 2021. Figure 5.1 - Architecture Diagram of the
Proposed Model Figure 6.1 - Is a bar graph depicting information of cumulation of all the
datasets provided vs their individual characteristic criterion. Figure 6.2 - Represents the
workflow of each Dataset in the module. Figure 8.1 - Is the screenshot of the results found
through algorithms. Figure 8.2 - Is a bar graph comparing the results of all the modules’
efficiencies.

10 References

[1] Detection of Fake Profile in Online Social Networking Using Machine Learning,
published by Naman Singh, Tushar Sharma , Abha Thakral, Tanupriya Choudhury
Published at the International Conference on Advances in Computing and
Communication Engineering (ICACCE-2018).
[2] Fake Identities in Social Media: A Case Study on the Sustainability of the Facebook
Business Model, Published by Katharina Krombholz, Dieter Merkl, Edgar Weippl
Published in Journal of Service Science Research (2012).

[10924]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 8 | Pages: 10919 - 10926

[3] Prediction of Fake Profiles on Facebook using Supervised


Machine Learning Techniques-A Theoretical Model Published by Suheel Yousuf Wani,
Mudasir M Kirmani, and Syed Imamul Ansarulla, published by (IJCSIT) International
Journal of Computer Science and Information Technologies, Vol. 7.
[4] Using Machine Learning to Detect Fake Identities: Bots vs Humans Published by Estee
Van Der Walt and Jan Eloff Published on IEEE Access.
[5] A Survey on Fake Review Detection using Machine
Learning Techniques Published by Nidhi Patel and Rakesh Patel Published on 2018 4th
International Conference on Computing Communication and Automation (ICCCA).
[6] Fake Account Detection using Machine Learning and Data Science Published by S. P.
Maniraj, Harie Krishnan G, Surya T, Pranav R Published in International Journal of
Innovative Technology and Exploring Engineering (IJITEE).
[7] Friend or foe? Fake profile identification in online social networks Published by Michael
Fire, Dima Kagan, Aviad Elyashar, Yuval Elovici Published on Springer-Verlag Wien
2014.
[8] Supervised machine learning for the detection of troll profiles in twitter social network:
application to a real case of Cyberbullying Published by PATXI GALAN-GARCIA,
JOSE GAVIRIA DE LA PUERTA, CARLOS LAORDEN GOMEZ,IGOR SANTOS,
PABLO GARCIA BRINGAS Published on Oxford University Press.
[9] Classification of instagram fake users using supervised machine
Learning Algorithms Published by Kristo Radion Purba, David Asirvatham, Raja Kumar
Murugesan Published in International Journal of Electrical and Computer Engineering
(IJECE).
[10] D. M. Freeman, ―Detecting clusters of fake accounts in online social networks‖, 8th
ACM Workshop on Artificial Intelligence and Security, pp. 91–101.
[11] B. Hudson, B. R. Voter, ―Profile characteristics of fake twitter accounts‖, Big Data &
Society, 2016.
[12] X. Chen, O. Martinez, ―In a world that counts: Clustering and detecting fake social
engagement at scale,‖ 25th International Conference on WWW, pp. 111–120.
[13] B. Hudson, ―Fake twitter accounts: profile characteristics obtained using an activity-
based pattern detection approach‖, International Conference on Social Media & Society,
ACM 2015.
[14] An Innovative Smart Soft Computing Methodology towards Disease (Cancer, Heart
Disease, Arthritis) Detection in an Earlier Stage and in a Smarter Way, International
Journal of Computer Science and Mobile Communication, 2014.
[15] J. J. Xu, ―Automatically detecting criminal identity deception: an adaptive detection
algorithm,‖ Systems, Man and Cybernetics, IEEE, 2006.
[16] K. Dinakar, R. Reichart and H. Lieberman. Modeling the detection of textual
cyberbullying. In International Conference on Weblog and Social Media-Social Mobile
Web Workshop, 2011.
[17] S. R. Garner, et al. Weka: The waikato environment for knowledge analysis. In
Proceedings of the New Zealand Computer Science Research Students Conference, pp.
57–64. Citeseer, 1995.
[18] ] C. Laorden, P. Gal´an-García, I. Santos, B. Sanz, J. M. G. Hidalgo and P. G. Bringas.
Negobot: a conversational agent based on game theory for the detection of paedophile
[10925]
ISSN: 0011-9342 | Year 2021
Design Engineering Issue: 8 | Pages: 10919 - 10926

behaviour. In International Joint Conference CISIS’12-ICEUTE’ 12-SOCO’ 12 Special


Sessions, pp. 261–270. Springer, 2013.
[19] J. Platt, et al. Fast training of support vector machines using sequential minimal
optimization. Advances in kernel methods—support vector learning 3, 1999.
[20] D. A. Simanjuntak, H. P. Ipung, C. Lim and A. S. Nugroho. Text classification
techniques used to faciliate cyber terrorism investigation. In Advances in Computing,
Control and Telecommunication Technologies (ACT), 2010 Second International
Conference on, pp. 198–200. IEEE, 2010.
[21] Y. Singh, A. Kaur and R. Malhotra. Comparative analysis of regression and machine
learning methods for predicting fault proneness models. International Journal of
Computer Applications in Technology, 35, 183–193, 2009.

[10926]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy