0% found this document useful (0 votes)

126 views8 pages

Insult Detection in Hindi: Course Project On Artificial Intelligence

This document summarizes a student project on detecting insults in Hindi language comments. The project aims to build a model using machine learning techniques like logistic regression and SVM to classify Hindi comments as insulting or not. The students created their own dataset by collecting Hindi comments online and translating English insults to Hindi. They used word features like n-grams and negation words to represent the data as vectors for training classifiers. Preliminary results showed the model had better accuracy at detecting direct insults compared to indirect or sarcastic insults in Hindi.

Uploaded by

Amol Sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

126 views8 pages

Insult Detection in Hindi: Course Project On Artificial Intelligence

Uploaded by

Amol Sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Course Project on Artificial Intelligence

Insult Detection in Hindi

Advisor: Prof. Amitabha Mukherjee

Chetan Dalal(11218) ∗ , Shivyansh Tandon(11690) †

Indian Institute of Technology, Kanpur

Abstract: The aim of our project is to detect comments in Hindi that may be considered as insulting to other participants
of conversation. We use feature selection tools like skip grams, n-grams, negation feature and second person
feature to build a vector model. We reduce the large number of featurers to an appropriate level by the
Chi-Squared test. We employ supervised learning like logistic regression and SVM for training and testingthe
data. We have also compared these techniques against the insults in English language. We create our own
datset for insults in English by using source materials from various blogs and by translating available English
insults to Hindi via Google Translate. Since the direct translation of source was not very efficient we had to
manually change to get a meaningful translation. We have also presented a qualitative study on accuracy of
Google Translate while translating from English to Hindi.
Keywords: Sentiment Analysis • Insult Detection• Hindi corpus• Supervised classification

c Department of Computer Science & Engineering.

1. Introduction

The growth of internet has been immense in the past few years. According to The Telecom Regulatory Authority

of India (TRAI), the number of Internet subscribers in India are 164.81 million as of March 31, 2013, and is

now worlds third largest internet user. Even social networking sites like Facebook have over 100 million active

users in India (Times of India). Thus, detection of inappropriate use of language on internet which might harm
the user is of utmost importance. With the advent of technology many devices and operating systems are now

supporting Hindi, and hence there is a need for an insult detection system in Hindi. Insults act as a repelling

force for new users and also prevent regular users to participate in discussions in future. It is also frustrating to

find foul language when looking for something. It is also not possible to have a human moderator to monitor this
enormous amount of data.

We focus on comments that may be insulting or harmful for other participant. Insults can be of many types

like racial slurs, reference to handicap, foul language and provocative words. The indirect insults like sarcasm,

disguise and crude words are not identified by the method. We aim at detecting the direct and extreme insults.

∗
E-mail: chetand@iitk.ac.in, Department of Computer Science & Engineering
†
E-mail: shivyans@iitk.ac.in, Department of Mathematics & Statistics

1
Insult Detection in Hindi

2. Related Works

Various works on Sentiment Analysis have been done for the English language. The work by Xiang, G., Hong [2]
dealt with offensive tweets with the help of Topical Feature Discovery over a Large Scale Twitter Corpus by using

Latent Dirichlet Allocation model. The work by Razavi [3], Inkpen have used a Insulting and Abusing language

dictionary along with features like frequency to run the three-level classification algorithm.

For Hindi, there has been a work on sentiment analysis in movie reviews by using a Semi-Supervised approaches

to train a Deep Belief Network. A recent work on Hindi Subjective Lexicon was done by Bakliwal et al., where

they worked on Hindi Polarity Classification. They used Hindi WordNet and used synonyms and antonyms of a

given word in Hindi.

These previous research works have not been able to detect insults on third persons. The research have been

directed on trivial and extreme second-person insults. However no research on Insult detection has been done
for the Hindi language due to small corpora. We aim at creating a classifier that detects peer-to-peer insults by

applying supervised learning techniques.

3. Datasets

One difficult part of our project was to obtain the dataset. Since there was no such database we manually created
one. We tackled this problem in different ways.

• Collected comments from various Hindi blogs and forums(400 entries).

• Employed Google Translate to convert the available Kaggle’s English dataset into Hindi(1000 entries) and
then manually modified it to retain the context.

• Around 70% of the input is negative strings(non-insults).

• Created a list of bad words in Hindi, procured from various Hindi websites.

• Did a qualitative study on how Google Translate works for English to Hindi.

3.1. Google Translate: A Qualitative Study

We used Google Translate to convert English dataset into Hindi. As opposed to what one might expect with

Google, it is not so awesome. It has its limitations but can at least help the user to understand the general

meaning of a foreign text. Some languages produce better results than others, and works especially well when the
target language is English and source language is one of the languages of European Union. The following were

our conclusions from the study we did on English to Hindi translation according to the general results.

• Word Translation

2
Chetan Dalal(11218) , Shivyansh Tandon(11690)

– Some of the English insults were not translated since the corpus avialabe was small.(Fig 1A)

– Severity of several insults got reduced. Also, there is a similar many-to-one mapping of many insults
to a single mid insult.(Fig 1B)

– Meanings of many insults get lost in the literal translation of the statement. (Fig 1C)

• Sentence Translation: Short sentences are good(Fig 2A) while long ones lose meaning(Fig 2B).

• Idioms: Some are translated well(Fig 3A) while some are translated word to word and their meanings get

lost(Fig 3B).

Figure 1: Examples of insult detection in Hindi

3
Insult Detection in Hindi

Figure 2: Examples of insult detection in Hindi

4. Implementation

Our method follows a 4-step process.

Figure 3: Implementation Model

4
Chetan Dalal(11218) , Shivyansh Tandon(11690)

4.1. Normalization

A raw source cannot be used directly as an input. The data from our dataset has to be modified before it can be
used for Insult detection. It also helps to reduce unnecessary computation. Although, we must be careful not to

lose useful information.

4.1.1. Removal of unwanted strings

First we remove the unwanted strings. The unwanted strings can be the encoding parts such as \\xa0, \\xc2,

\\n or some HTML tags or some English words that were not translated. We also remove words that have single

occurrence or come too frequently as they have no effect on insult but they are necessary for grammar.

4.1.2. Stemming

The second task that our code does is reducing words to their root. There are many words that have similar

meanings but due to grammar or usage are modified. Since this results in unnecessary increase in number of
features, we reduce them to their root. This helps to reduce the number of features and enables the code to

produce good results on lesser data. (Fig 6A)

4.2. Feature Extraction

The words have no meaning for a computer and thus have to be translated into a vector form to perform operations

on it. We convert the strings into vectors which are then used by Supervised Machine Learning algorithms.

4.2.1. Tokenizing

Split the data into tokens. The tokens can be characters, n-grams or words. The code uses words as tokens and

builds 2,3,4,5 n-grams for feature vector.

4.2.2. Counting

Enumerate the tokens generated in previous step for each text string. This way a matrix (generally sparse) is

created, representing our data (text strings) where the number of occurrence of each token is a feature for that

string. The size of matrix is S x F, where S is the size of training data and F is size of the vocabulary.

4.2.3. Skip-Grams

We can have long distance related features in the input data. So in addition to n-grams we use skip grams and
thus increase the size of our feature matrix. This feature has a parameter which determines the number of words

to skip between two words.

4.2.4. Second Person Feature

We also add to our feature the set of words that occur after a second person words. We use this based on our

observation of our dataset, which had insults based on similar structure. This feature is important as it improves

accuracy.(Fig 4A)

5
Insult Detection in Hindi

4.2.5. Negation Feature

We also added a feature that considers the words with negative implications which inverts the meaning of a string.
We then give extra weight to such sentences. This has significant improvement in results.(Fig 4B)

4.2.6. Normalization

Some words occur nearly in every comment and hence do not have any significance. So, we reduce its importance

by removing all such words with high frequency in the beginning and for the rest of the features we use a measure

of the relevance of the word by using TF-IDF.(Fig 6B)

4.2.7. TF-IDF

Some terms that appear frequently in a few statements but rarely in other comments tend to be more relevant

and specific for those comments and therefore more useful for detecting insults. Hence we multiply each term

with its corresponding inverse document frequency (IDF) and obtain a tf-idf vector for each string. This is also
called weighing each term based on its inverse document frequency.

4.3. Feature Selection

Since the number of features generated are high it will be inefficient to compute directly on all of them. They
might not be as important in deciding if a string is an insult or not. We use a feature selection algorithm Chi-

Squared test, to select k best features. We chose this parameter equal to be 200 for the current dataset. We apply

this statistical method to our feature matrix, constructed earlier.

4.3.1. Chi-Squared Test

This is a statistical test basically to find if a pair of variables on a data is statistically dependent or independent.

Our method uses:

• Label of string, i.e. insult or not

• Occurrence or Non-occurrence of a feature

as the pair of variable. The feature which scores high in this test is selected for the training classifiers, and rest

of the features are discarded. Thus we have optimized our model and selected only the relevant features and do

not perform unnecessary computation.

4.4. Classification

This is the final step. We now apply machine learning algorithms to learn a classifier. We are using SVM

and Logistic Regression to train our classifier and then combine the results of both algorithms to obtain a final

classifier. This classifier is then used to classify whether a given string is an insult or not.

6
Chetan Dalal(11218) , Shivyansh Tandon(11690)

5. Results

Feature Accuracy Recall Precision

Without second person and 86.84 0.60 0.84
Without negation
With second person and 87.96 0.67 0.86
Without negation
Without second person and 86.96 0.63 0.86
With negation
With second person and 87.71 0.68 0.84
With negation

The algorithm did not have any previous results to compare with, however, we have obtained good results in
comparison with insult detection in English. We have such high results mainly because of many-to-one mapping

while translating from English to Hindi. We have improved our database to include more negative text

strings (non insults) to check the efficiency of our algorithm. However, the results follow because of similar and

extreme nature of Hindi insults.

6. Conclusion and Future Work

In our attempt to detect inuslting comments, we have employed a supervised approach based on SVM and and

Logistic Regression. We also took care of special cases like negation and second person. We have created a good

dataset to further build our algorithms on.

We aim to increase the size of our dataset in future, and also look for alternative approaches that might be better

and faster than the current supervised approach. To make use of the large amounts of online data available, we

also plan on creating a model based on unsupervised approach. There is also a scope to identify insults that have

been tampered(Fig 5).

7. Acknowledgement

We would like to thank our professor Dr. Amitabha Mukherjee. The project wold not have been possible without

his encouragement and diligent efforts. We would also like to acknowledge the help of our TAs Ms. Sunaskhi

Gupta and Mr. Mamidela Seetha Ramaiah.

References

[1] Sentiment Analysis For Hindi Language, MS Thesis IIIT-H 2013, Piyush Arora. Vestibulum turpis quam,

tristique vel dapibus et, scelerisque luctus enim.

7
Insult Detection in Hindi

[2] Xiang G., Hong J., & Rose, C. P. 2012. Detecting Offensive Tweets via Topical Feature Discovery over a Large

Scale Twitter Corpus, Proceedings of The 21st ACM Conference on Information and Knowledge Management,
Sheraton, Maui Hawaii, Oct. 29- Nov. 2, 2012

[3] Amir H. Razavi, Diana Inkpen, Sasha Uritsky, and Stan Matwin. 2010 Offensive language detection using

multi-level classification. In Proceedings of the 23rd Canadian Conference on Artificial Intelligence, pages

1627.
[4] D. Das and S. Bandyopadhyay. Labeling emotion in bengali blog corpus a fine grained tagging at sentence level.

In Proceedings of the Eighth Workshop on Asian Language Resources, pages 4755, Beijing, China, August

2010. Coling 2010 Organizing Committee.

[5] Starter Code: http://www.kaggle.com/c/detectinginsults%25E2%2580%2593in-social-commentary/

forums

[6] Datasets:

http://www.kaggle.com/c/detecting-insults-in-social-commentary/data

http://khabar.ndtv.com/news/zara-hatke/90-per-cent-indians-are-idiots-justice-katju-357932
http://ek-ziddi-dhun.blogspot.in/2008/08/blog-post_22.html

http://www.bbc.co.uk/hindi/

http://loksangharsha.blogspot.com/

http://www.noswearing.com/
http://www.youswear.com/

Jonathan Ma Resume
100% (2)
Jonathan Ma Resume
2 pages
Three Letter Words Sentences
100% (2)
Three Letter Words Sentences
24 pages
DLL Proper Use of Tools in Embroidery
50% (6)
DLL Proper Use of Tools in Embroidery
3 pages
Marathi Hate Speech Detection IEEE Paper
No ratings yet
Marathi Hate Speech Detection IEEE Paper
5 pages
2nd Term - Stories - Future Will & Going To
No ratings yet
2nd Term - Stories - Future Will & Going To
14 pages
Risk Analysis: Dr. Ashish Kumar
No ratings yet
Risk Analysis: Dr. Ashish Kumar
30 pages
Complex Linguistic Features For Text Classification: A Comprehensive Study
No ratings yet
Complex Linguistic Features For Text Classification: A Comprehensive Study
15 pages
Social N Regional Notes
No ratings yet
Social N Regional Notes
5 pages
Detecting AI-Synthesized Speech Using Bispectral Analysis
No ratings yet
Detecting AI-Synthesized Speech Using Bispectral Analysis
6 pages
Use of English A1 A2
No ratings yet
Use of English A1 A2
4 pages
Hatemonitors: Language Agnostic Abuse Detection in Social Media
No ratings yet
Hatemonitors: Language Agnostic Abuse Detection in Social Media
8 pages
A Survey On Hate Speech Detection Using Natural Language Processing
No ratings yet
A Survey On Hate Speech Detection Using Natural Language Processing
10 pages
05 Introduction To NLP
No ratings yet
05 Introduction To NLP
63 pages
Humanistic and Political Literature in Florence and Venice at TH PDF
No ratings yet
Humanistic and Political Literature in Florence and Venice at TH PDF
233 pages
Ip - Practical - File SRI
No ratings yet
Ip - Practical - File SRI
76 pages
Text Augmentation For Neural Networks
No ratings yet
Text Augmentation For Neural Networks
6 pages
2 - Software Development Methodologies
No ratings yet
2 - Software Development Methodologies
28 pages
Design & Architecture: Dr. Ashish Kumar
No ratings yet
Design & Architecture: Dr. Ashish Kumar
43 pages
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
No ratings yet
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
45 pages
1 Acca Style Guide: Page Subject
No ratings yet
1 Acca Style Guide: Page Subject
32 pages
Chatbot: Reduce Support Expenditure, Increase User Experience
No ratings yet
Chatbot: Reduce Support Expenditure, Increase User Experience
8 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
Past Simple and Past Continuous: Grammar
No ratings yet
Past Simple and Past Continuous: Grammar
4 pages
Grade 5 Compare Contrast A
100% (1)
Grade 5 Compare Contrast A
3 pages
QP & Solution-MTE-1 - 2018
No ratings yet
QP & Solution-MTE-1 - 2018
4 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Portrait Relighting PDF
No ratings yet
Portrait Relighting PDF
12 pages
LMB 162 Adc
No ratings yet
LMB 162 Adc
11 pages
SAP HANA Cloud - Foundation - Unit 3
No ratings yet
SAP HANA Cloud - Foundation - Unit 3
20 pages
BT 3308
No ratings yet
BT 3308
29 pages
Sentiment Prediction in Hindi and English Language
No ratings yet
Sentiment Prediction in Hindi and English Language
25 pages
Fakespotter: A Simple Yet Robust Baseline For Spotting Ai-Synthesized Fake Faces
No ratings yet
Fakespotter: A Simple Yet Robust Baseline For Spotting Ai-Synthesized Fake Faces
8 pages
Exposing Deep Fakes Using Inconsistent Head Poses Xin Yang, Yuezun Li and Siwei Lyu University at Albany, State University of New York, USA
No ratings yet
Exposing Deep Fakes Using Inconsistent Head Poses Xin Yang, Yuezun Li and Siwei Lyu University at Albany, State University of New York, USA
5 pages
Copyreading & Headline Writing-Division Virtual Training
No ratings yet
Copyreading & Headline Writing-Division Virtual Training
56 pages
Mte 1-Solutions: A Non-Periodic Composite Signal Contains Frequencies From 10 To 30 Khz. The Peak
No ratings yet
Mte 1-Solutions: A Non-Periodic Composite Signal Contains Frequencies From 10 To 30 Khz. The Peak
5 pages
Unit 3 - Vocabulary
No ratings yet
Unit 3 - Vocabulary
33 pages
God Is Pure Bliss
No ratings yet
God Is Pure Bliss
26 pages
Se Content
No ratings yet
Se Content
3 pages
The Unreasonable Effectiveness of Data PDF
No ratings yet
The Unreasonable Effectiveness of Data PDF
5 pages
Detecting Offensive Language in English, Hindi, and Marathi Using Classical Supervised Machine Learning Methods and Word/Char N-Grams
No ratings yet
Detecting Offensive Language in English, Hindi, and Marathi Using Classical Supervised Machine Learning Methods and Word/Char N-Grams
7 pages
Mini-Project NLP
No ratings yet
Mini-Project NLP
7 pages
Loop IBM
No ratings yet
Loop IBM
3 pages
The Unreasonable Effectiveness of Data by Halevy, Norvig
No ratings yet
The Unreasonable Effectiveness of Data by Halevy, Norvig
5 pages
Learning Based Approach For Hindi Text S 77957aeb
No ratings yet
Learning Based Approach For Hindi Text S 77957aeb
8 pages
A Deep-Word and Character Based Approach To Offensive Language Identification
No ratings yet
A Deep-Word and Character Based Approach To Offensive Language Identification
5 pages
Pat B.sunda
No ratings yet
Pat B.sunda
3 pages
Catch 22 Thesis Statement
100% (3)
Catch 22 Thesis Statement
8 pages
405 - Subject Pronouns Possessive Adjectives Possessive S and of Test A1 A2 Level Exercises
No ratings yet
405 - Subject Pronouns Possessive Adjectives Possessive S and of Test A1 A2 Level Exercises
3 pages
Towards Understanding People From Multilingual Societies (Deepanshu Vijay, MS, 201302093)
No ratings yet
Towards Understanding People From Multilingual Societies (Deepanshu Vijay, MS, 201302093)
46 pages
Chavan 2015
No ratings yet
Chavan 2015
5 pages
Detecting Offensive Tweets in Hindi-English Code-Switched Language-W18-3504
No ratings yet
Detecting Offensive Tweets in Hindi-English Code-Switched Language-W18-3504
9 pages
Dairy Management
No ratings yet
Dairy Management
15 pages
2022.dravidianlangtech 1.44
No ratings yet
2022.dravidianlangtech 1.44
7 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
25 pages
(Slide) Sentiment Analysis v3
No ratings yet
(Slide) Sentiment Analysis v3
46 pages
Cambria 2017
No ratings yet
Cambria 2017
7 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
NLP Unit 1
No ratings yet
NLP Unit 1
44 pages
FOF Preview
No ratings yet
FOF Preview
7 pages
Pyq 2 Real Analysis
No ratings yet
Pyq 2 Real Analysis
2 pages
Reference Material NLP - 2
No ratings yet
Reference Material NLP - 2
40 pages
Unsupervised Hindi Word Sense Disambiguation Using Graph Based Centrality Measures
No ratings yet
Unsupervised Hindi Word Sense Disambiguation Using Graph Based Centrality Measures
8 pages
Mod 5
No ratings yet
Mod 5
19 pages
Ai Part B ch12
No ratings yet
Ai Part B ch12
16 pages
Ai NLP
No ratings yet
Ai NLP
9 pages
Leveraging CNN-Bilstm For Multi-Class Cyber Bullying Detection in Hindi Text
No ratings yet
Leveraging CNN-Bilstm For Multi-Class Cyber Bullying Detection in Hindi Text
12 pages
A Conversation-Driven Approach For Chatbot Management: Presented By: Naheeda Afreen 19B81A3325
No ratings yet
A Conversation-Driven Approach For Chatbot Management: Presented By: Naheeda Afreen 19B81A3325
19 pages
NLP Presentation
No ratings yet
NLP Presentation
23 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
29 pages
NLP Presentation
No ratings yet
NLP Presentation
23 pages
NLP File
No ratings yet
NLP File
21 pages
Prose 1,2,3 & Poetry 1,2,3
No ratings yet
Prose 1,2,3 & Poetry 1,2,3
6 pages
1 s2.0 S2949719123000389 Main
No ratings yet
1 s2.0 S2949719123000389 Main
16 pages
Extended Workbook
No ratings yet
Extended Workbook
10 pages
HTML CSS JS Notes
No ratings yet
HTML CSS JS Notes
4 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
Overview of The Track On Hasoc-Offensive Language Identification-Dravidiancodemix
No ratings yet
Overview of The Track On Hasoc-Offensive Language Identification-Dravidiancodemix
9 pages
Introduction To Dynamic Spin Chemistry Magnetic Field Effects On Chemical and Biochemical Reactions Hisaharu Hayashi PDF Download
No ratings yet
Introduction To Dynamic Spin Chemistry Magnetic Field Effects On Chemical and Biochemical Reactions Hisaharu Hayashi PDF Download
27 pages
23141091,18201115,19301124,19101116 Cse
No ratings yet
23141091,18201115,19301124,19101116 Cse
53 pages
IGNOU Software Engineering Previous 10 Years Solved Papers
From Everand
IGNOU Software Engineering Previous 10 Years Solved Papers
Manish Soni
No ratings yet
Prompt Engineering Master Guide
From Everand
Prompt Engineering Master Guide
Om Prakash Saini
No ratings yet
Introduction to Programming Languages
From Everand
Introduction to Programming Languages
IntroBooks Team
4/5 (1)
Java/J2EE Design Patterns Interview Questions You'll Most Likely Be Asked: Second Edition
From Everand
Java/J2EE Design Patterns Interview Questions You'll Most Likely Be Asked: Second Edition
Vibrant Publishers
No ratings yet
Basic Guide to Programming Languages Python, JavaScript, and Ruby
From Everand
Basic Guide to Programming Languages Python, JavaScript, and Ruby
Kiet Huynh
No ratings yet
Learning Advanced Programming
From Everand
Learning Advanced Programming
IT Campus Academy
No ratings yet
Python 3 Object Oriented Programming
From Everand
Python 3 Object Oriented Programming
Dusty Phillips
4/5 (9)
Applied Natural Language Processing with PyTorch 2.0: Master Advanced NLP Techniques, Transform Text Data into Insights, and Build Scalable AI Models with PyTorch 2.0 (English Edition)
From Everand
Applied Natural Language Processing with PyTorch 2.0: Master Advanced NLP Techniques, Transform Text Data into Insights, and Build Scalable AI Models with PyTorch 2.0 (English Edition)
Dr Deepti Chopra
No ratings yet
Hands-on Go Programming: Learn Google’s Golang Programming, Data Structures, Error Handling and Concurrency ( English Edition)
From Everand
Hands-on Go Programming: Learn Google’s Golang Programming, Data Structures, Error Handling and Concurrency ( English Edition)
Sachchidanand Singh
5/5 (1)
Functional Programming with C#: Unlock coding brilliance with the power of functional magic
From Everand
Functional Programming with C#: Unlock coding brilliance with the power of functional magic
Alex Yagur
No ratings yet
Touchpad Prime Ver. 2.1 Class 8: Windows 10 & MS Office 2016
From Everand
Touchpad Prime Ver. 2.1 Class 8: Windows 10 & MS Office 2016
Bhawna Sharma
No ratings yet
Using Vocals Determine Human Emotion
From Everand
Using Vocals Determine Human Emotion
Faiz ul haque Zeya
No ratings yet
Learn Emotion Analysis with R: Perform Sentiment Assessments, Extract Emotions, and Learn NLP Techniques Using R and Shiny (English Edition)
From Everand
Learn Emotion Analysis with R: Perform Sentiment Assessments, Extract Emotions, and Learn NLP Techniques Using R and Shiny (English Edition)
PARTHA MAJUMDAR
No ratings yet
Penning the Future: Mastering Writing with GenAI
From Everand
Penning the Future: Mastering Writing with GenAI
Wesley Cisco
No ratings yet
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
From Everand
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
Chitra Lele
No ratings yet
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
Learning Microsoft Cognitive Services
From Everand
Learning Microsoft Cognitive Services
Leif Larsen
No ratings yet
Prompt Engineering for AI Techniques, Strategies, and Best Practice
From Everand
Prompt Engineering for AI Techniques, Strategies, and Best Practice
Dr. islam Abo Amna
No ratings yet
Test-Driven iOS Development with Swift: Create fully-featured and highly functional iOS apps by writing tests first
From Everand
Test-Driven iOS Development with Swift: Create fully-featured and highly functional iOS apps by writing tests first
Dr. Dominik Hauser
5/5 (2)
iOS Programming Nuts and bolts
From Everand
iOS Programming Nuts and bolts
Keith Lee
4/5 (1)
Swift Programming Nuts and bolts
From Everand
Swift Programming Nuts and bolts
Keith Lee
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Insult Detection in Hindi: Course Project On Artificial Intelligence

Uploaded by

Insult Detection in Hindi: Course Project On Artificial Intelligence

Uploaded by

Course Project on Artificial Intelligence

Insult Detection in Hindi

Chetan Dalal(11218) ∗ , Shivyansh Tandon(11690) †

Indian Institute of Technology, Kanpur

given word in Hindi.

applying supervised learning techniques.

• Collected comments from various Hindi blogs and forums(400 entries).

• Around 70% of the input is negative strings(non-insults).

3.1. Google Translate: A Qualitative Study

Figure 1: Examples of insult detection in Hindi

Figure 2: Examples of insult detection in Hindi

Our method follows a 4-step process.

Figure 3: Implementation Model

lose useful information.

4.1.1. Removal of unwanted strings

produce good results on lesser data. (Fig 6A)

4.2. Feature Extraction

builds 2,3,4,5 n-grams for feature vector.

to skip between two words.

4.2.4. Second Person Feature

4.2.5. Negation Feature

of the relevance of the word by using TF-IDF.(Fig 6B)

4.3. Feature Selection

this statistical method to our feature matrix, constructed earlier.

4.3.1. Chi-Squared Test

Our method uses:

• Label of string, i.e. insult or not

• Occurrence or Non-occurrence of a feature

not perform unnecessary computation.

Feature Accuracy Recall Precision

extreme nature of Hindi insults.

6. Conclusion and Future Work

dataset to further build our algorithms on.

been tampered(Fig 5).

Gupta and Mr. Mamidela Seetha Ramaiah.

tristique vel dapibus et, scelerisque luctus enim.

2010. Coling 2010 Organizing Committee.

[5] Starter Code: http://www.kaggle.com/c/detectinginsults%25E2%2580%2593in-social-commentary/

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.