Irjet V7i12375
Irjet V7i12375
I. Introduction
searching
procedure
2.1 Aim of the Project studies report that cyberbullying constitutes a growing
problem among youngsters. Successful prevention
The main aim of the detecting the cyberbullying depends on the adequate detection of potentially
model will help to improve manual monitoring for harmful messages and the information overload on the
cyberbullying on social networks. In this project we Web requires intelligent systems to identify potential
fetch the tweets from twitter accounts and risks automatically. So, In this project we focus on to
preprocess the twits and images and applying make a model on automatic cyberbullying detection in
generated model will detect the cyberbullying or social media text by modelling posts written by bullies
not. on social network.
The objectives of the systems development and
event management are:
found to be the most used content-based feature Dadvar and De Jong (2012), Sood and Churchill
across the reviewed studies,with 22 papers using (2012a), and Nahar et al.(2013).
the presence of profanity in text as an indicator for
cyberbullying. Studies such as Dinakar et al. Of the 41 studies using content-based features,
(2011), Perez et al.(2012), Kontostathis et al. 5checked for the presence of cyberbullying keywords as
(2013), Nahar et al.(2013)and Bretschnei-der et al. part of the detection process. By cyberbullying
(2014), created profanity lexicons using wordlists keywords, we refer to non-profane words the use of
compiled by the researchers or sourced from which can indicate the presence of cyberbullying. These
external libraries such as noswearing.com3 and often are words associated with themes such as race,
urban dictionary.com.By equating the presence of physical appearance, gender, and sexuality. As far back
profanity to cyberbullying, the use of profanity as the earliest study we discovered (i.e.,Mahmud et al.,
lexicon salone fails to consider other key aspects of 2008), cyberbullying key-words have
cyberbullying such as repeti- tiveness and the
presence of a power differential. Rafiq et al.
(2015)similarly cautioned against the use of
profanity as the only feature for cyberbullying
detection and argued that not all use of profanity
and cyber-aggression constitutes bullying. Studies
such as Nahar et al.(2013), Dadvar et al.(2014),
Bretschneider et al.(2014) and Nahar et al.
(2013)incorporate do ther features such as pro
nouns in close proximity to profanity, since such
personalised abusive content is potentially more
indicative of cyberbullying than abusive terms on
their own. For example, the phase “the f**king
train was delayed again” is definitely not
cyberbullying although it contained profanity but
“you f**king idiot” could be. While this is an
improvement, the pronoun + profanity feature still
suffers the same short com- ings as using profane
terms alone.
The maximum posterior class, or the most likely Linear kernel is a special case of the RBF kernel,
and works best when the number of features is
class, being in our case either bullying or not,
very large. The linear kernel on data sets acquired
would be:
from Myspace, Kongregate and Slashdot datasets
Cmap = were used. The datasets are available from the
workshop on Content Analysis for the Web 2.0 .
The datasets contain manually-labeled data from ,
= which is used as a ground truth dataset. Data from
3 different social networking sites are included in
the dataset: Slashdot (496
= files, 140,000 comments total (one for each article)),
Kongregate (12 files, 150,000 comments total (one for
each
The corpus of data obtained to experiment with is testing purposes is also converted into data matrix and
the same as that used for J48. In this case, a true this data matrix is passed to the classifier. SVMs use
positive rate of 0.723, taking into account both sophisticated statistical learning theory to overcome the
textual and social features, was obtained. Without curse of dimensionality
taking into account social features, the rate was
0.584 once again proving, as with similar tests Instead of specifying the feature vector, kernel
performed with J48, that social features help functions can be used to provide similarity between
improve the result.[18] data points. There are various kernels that can be used
with SVM namely,
4.2 SVM Model:
•RBF kernel (Radial basis function)
SVM (Support Vector machine) is a supervised
learning algorithm, and is one of the most efficient
and universal classification algorithms. Its goal is
to fmd the optimal separating hyperplane which
maximizes the margin of training data. Initially the
classifier is trained with labelled data before being
used to classify the data to test accuracy. Before
the data can be used to train our classifier, it is
imperative to process it. This consists of the
following steps:
•Labelling of data
•Generation of vocabulary
The “hat” notation xˆ indicates that 1 has been who can then follow-up with appropriate actions Twitter
appended to the vector x. Hidden-layer activation will not allow to go in the profile of user for this we
functions h(l)(x) often have the same form at each might create our own system which can identify such
level, but this is not a requirement. changes and will determine how the bullying affected
person.
In contrast to graphical models such as Bayesian
networks where hidden variables are random
variables, the hidden units here are intermediate
deterministic computations, which is why they are
not represented as circles. However, the output
variables yk are drawn as circles because they can
be formulated probabilistically.
V. Features
VII. Conclusion
VIII. References
[3] Anti Defamation League. (2011) [14] An Effective Approach for Cyberbullying
Glossary of Cyberbullying Detection and avoidance ieee paper
Terms.adl.org.[Online].Available:http://www.adl.or
g/educati on/curriculum [15] Approaches to Automated Detection of
connections/cyberbullying /glossary.pdf Cyberbullying: A Survey ieee paper
[4] N. E. Willard, Cyberbullying and [16] Cyberbullying Detection System on Twitter ieee
Cyberthreats: Responding to the Challenge of paper
Online Social Aggression, Threats, and Distress.
Research Press, 2007.