0% found this document useful (0 votes)

23 views4 pages

DSF Proposal

Uploaded by

wstar2176

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views4 pages

DSF Proposal

Uploaded by

wstar2176

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Analysis of a Large Joke Corpra

Abstract—Humor is a complex and subjective human experi- illustrating the potential of hybrid models to enhance the
ence, challenging to model and quantify. This project proposes a accuracy of humor detection systems .
comprehensive approach to analyze a large corpus of one million [1]In a similar vein, Weller and Seppi (2019) utilize trans-
jokes sourced from Reddit, utilizing state-of-the-art machine
learning algorithms and explainable AI techniques to not only former architectures to assess the humor content of jokes
detect and categorize humor but also to understand and predict collected from the Reddit platform. Their model leverages
its effectiveness. The project aims to address four primary tasks: community ratings to gauge humor, achieving results that
identifying genuine jokes, detecting duplicate jokes, categorizing parallel human judgment. This not only showcases the model’s
jokes into various humor types, and predicting the effectiveness efficacy but also its potential application in real-world scenar-
of jokes. We plan to employ a variety of models, including deep
neural networks like BERT and GPT-3, enhanced with explain- ios where public perception is key .
able AI methodologies such as SHAP, LIME and visualisation of [2]Hasan et al. (2021) develop a cutting-edge Humor
attention weights for greater transparency and interpretability. Knowledge Enriched Transformer (HKT) that integrates multi-
Metrics such as accuracy, F1 score, Jaccard index, and human- modal data—including text, audio, and visual inputs—and ex-
alignment score will be utilized to evaluate model performance ternal humor knowledge. This model sets new benchmarks in
and explanation effectiveness. The insights derived from this
project are expected to not only advance the understanding performance across various datasets and offers a comprehen-
of computational humor but also enhance the development of sive framework for analyzing humor in diverse communicative
AI-driven humor applications, improving user interaction and settings .
engagement in digital platforms.
II. DATASET
I. BACKGROUND R ESEARCH To ensure a comprehensive approach to humor detection and
analysis, this project will utilize multiple datasets sourced from
This project aims to advance the field of humor detection various platforms. Each dataset offers unique characteristics
by developing a sophisticated natural language processing sys- and content types, which are instrumental in training and
tem that utilizes state-of-the-art machine learning techniques, evaluating the machine learning models effectively.
including Transformer architectures. Focusing on the analysis 1. 1 Million Reddit Jokes Dataset[6]: A comprehensive
of jokes from diverse datasets, the project integrates advanced collection of approximately 1 million jokes from the r/Jokes
computational models with explainable AI to enhance both the subreddit, used for diverse humor style recognition and initial
accuracy and transparency of humor recognition. model training.
[5]Inácio et al. (2023) explore the dynamics of humor recog- 2. 200K Humor Detection Dataset[8]: Contains 200,000
nition within Portuguese texts using a BERT-based classifier, labeled instances for binary classification of humor, ideal
achieving an impressive F1-score of 99.6%. Their research for refining models and evaluating performance in controlled
highlights a critical insight: while these models are adept at environments.
identifying stylistic cues like punctuation, they often overlook 3. Puns Dataset[9]: Specializes in pun-based humor with
deeper elements of humor such as linguistic incongruity and detailed annotations, perfect for developing models to analyze
contextual nuances. This finding underscores a potential area and understand wordplay.
for improvement in current models, which could benefit from 4. Short Jokes Dataset[7]: Features a compact collection
an enhanced understanding of the subtleties that define humor. of jokes from a Kaggle competition, great for quick model
[4]Peyrard et al. (2021) demonstrate the capabilities of prototyping and iterative testing.
transformer models in distinguishing between humorous and
serious sentences. Their work not only confirms the effective- III. C HALLENGES
ness of transformers in humor detection but also delves into The most obvious challenge that poses itself in a joke
the mechanics of how these models prioritize and interpret detection mechanism is the absence of a fixed metric to search
different elements of the text, particularly through the lens for. The mechanism of jokes is heavily dependent on the
of attention mechanisms. This approach provides valuable context of the environment and thus have a variety of moving
insights into the specific aspects of text that are most influential parts that need to land correctly, for the success of the joke.
in determining humor . Surely, the presence of funny casual words indicate the onset
[3]Miraj and Aono (2021) present an innovative technique of a joke in a sentence, but however, the roots lie in the
that synergizes BERT with other embedding technologies such sentiment of the sentence being delivered.
as Word2Vec and FastText within a neural network ensemble. Another significant challenge is the explainability of pre-
This method has proven to significantly reduce error rates, dictions made by complex machine learning models used in
Fig. 1: System Flowchart

joke detection. Given the subjective nature of humor, it’s mor classification, making the models’ decision-making
crucial not only to predict whether something is humorous processes transparent and understandable to end-users.
but also to understand and communicate why a particular text V. P RELIMINARY W ORK
is considered a joke. This is particularly challenging with deep
learning models, which are often seen as black boxes. We conduct Exploratory Data Analysis on 4 of the above
datasets and present our findings as follows:
IV. M ETHODOLOGY A. Length of Jokes
The goal of this project is to create a sophisticated sys- The first and simplest metric we experiment with, is the
tem for humour analysis and detection by combining vari- length of the joke. Is there any correlation between how long
ous machine learning approaches and datasets in a thorough a statement runs, and how funny it sounds? Indeed there is
methodology. Through a series of clearly defined steps, the - an ideal joke has a length that is not too short, and not
methodology is designed to address the identification, classifi- too long, limited to about 15 words/ 75 characters on average.
cation, and evaluation of humour, utilising the power of both Naturally, the distribution varies between extremely short puns
advanced deep learning networks and conventional machine to relatively long story based jokes.
learning models.
1) Input and Data Collection
We begin by aggregating jokes from various sources,
including online platforms like Reddit. This stage in-
volves intensive data cleaning to remove duplicates, fix
formatting errors, and handle missing values, ensuring
high-quality data for analysis.
2) Data Processing and Analysis
Post-cleaning, the data is processed through two main
pathways: identifying genuine jokes using NLP tech-
niques to filter out non-humorous content and detecting
duplicates using feature extraction methods like TF-IDF
and Word2Vec. These methods enable the clustering of Fig. 2: Distribution of Joke Lengths for Puns
similar jokes, enhancing the dataset’s utility for precise
humor analysis. B. Commonly Occurring Words
3) Model Development Next, we try to map the most commonly occurring words as
Utilizing Transformer architectures, we develop models a wordcloud, to check if any words are frequently occurring
fine-tuned for humor detection, employing strategies amongst different jokes. Surprisingly, there are no words that
such as ensemble methods for optimal performance. stand out as ‘funny’ or universal to all jokes, suggesting that
Rigorous testing and validation ensure that the models all jokes vary in their style, dialogue and delivery from one
effectively handle diverse humor types and accurately another. Only words which suggest active/ passive voice, or
reflect nuanced humor distinctions. articles and verbs are something that top this statistic.
4) Explainability and Visualization
A significant emphasis is placed on making the models C. Sentiment Analysis
explainable through the visualization of attention mech- Lastly, we apply the sentiment analysis method to all the
anisms and the use of SHAP and LIME. These tools datasets in search of a more context based technique of
help illuminate how specific text elements influence hu- searching for a joke.
Fig. 3: Distribution of Joke Lengths for Storytelling based
Jokes Fig. 5: Sentiment analysis on Jokes

Fig. 4: K-Means Clustering of Jokes

1) Polarity: While jokes do vary on the scale of how

extreme they might sound, in terms of positive, negative or
neutral context, we observe that the sentiment and tone of
jokes are mostly taken in the neutral manner.
2) Subjectivity: An important variable change to account
for, is the subjectivity in matters of interpretation of a joke
(i.e a varying context creating a gap in the interpretation of
the kind of the joke). Our analysis goes to show that the Fig. 6: XAI: Bertviz
subjectivity varies a lot in all kinds of jokes, and hence the
interpretation of a joke (to be a joke in the first place) is also
hampered. • Regression Metrics for Predictive Tasks:
– MSE (Mean Squared Error) and RMSE (Root Mean
D. Explainable AI
Squared Error): Measure the average errors in humor
We employed BERT visualization techniques to analyze the rating predictions to assess accuracy.
attention mechanisms within our models. This involved the use
• Model Testing and Validation
of tools like BERTViz, which provided a detailed view of how
attention weights are distributed across the input sequences. By – Cross-Validation: Ensure model robustness and reli-
visualizing these attention patterns, we were able to pinpoint ability.
which specific words or phrases the model focused on when – Confusion Matrices: Visualize performance across
making predictions about humor. This insight was instrumental different humor categories to identify potential biases
in understanding the influence of individual words on the or weaknesses for improvement.
model’s output, enabling targeted improvements in the model’s • Explainability and User Engagement:
ability to detect and interpret humor accurately. – SHAP Values and Feature Importance Scores:
Enhance transparency and interpretability of the
VI. M ETRICS
model’s decisions.
To ensure a comprehensive evaluation of the humor de- – User Satisfaction Ratings: Gather during interactions
tection system, we will primarily utilize accuracy, precision, to measure user engagement.
recall, and F1-score to assess the overall effectiveness and
classification capabilities of the model.
R EFERENCES
[1] Orion Weller and Kevin Seppi. “Humor detection: A
transformer gets the last laugh”. In: arXiv preprint
arXiv:1909.00252 (2019).
[2] Md Kamrul Hasan et al. “Humor knowledge enriched
transformer for understanding multimodal humor”. In:
Proceedings of the AAAI conference on artificial intel-
ligence. Vol. 35. 14. 2021, pp. 12972–12980.
[3] Rida Miraj and Masaki Aono. “Humor Detection Using a
Bidirectional Encoder Representations from Transform-
ers (BERT) based Neural Ensemble Model”. In: 2021
8th International Conference on Advanced Informatics:
Concepts, Theory and Applications (ICAICTA). IEEE.
2021, pp. 1–6.
[4] Maxime Peyrard et al. “Laughing Heads: Can Transform-
ers Detect What Makes a Sentence Funny?” In: arXiv
preprint arXiv:2105.09142 (2021).
[5] Marcio Inácio, Gabriela Wick-Pedro, and Hugo Gonçalo
Oliveira. “What do humor classifiers learn? An attempt
to explain humor recognition models”. In: Proceedings
of the 7th Joint SIGHUM Workshop on Computational
Linguistics for Cultural Heritage, Social Sciences, Hu-
manities and Literature. 2023, pp. 88–98.
[6] Priyam Choksi. 1 Million Reddit Jokes - r/Jokes. https:
/ / www. kaggle . com / datasets / priyamchoksi / 1 - million -
reddit-jokes-rjokes. Accessed: 2024-11-14.
[7] Kaggle Humor Detection Competition - Dataset. https:
//www.kaggle.com/competitions/humor- detection/data.
Accessed: 2024-11-14.
[8] Vishnu. Humour Detection Using NLP. https : / / www.
kaggle.com/code/vishnu0399/humour- detection- using-
nlp/input. Accessed: 2024-11-14.
[9] Orion Weller. Reddit Humor Detection - Full Datasets.
https://github.com/orionw/RedditHumorDetection/tree/
master/full datasets. Accessed: 2024-11-14.

Theories of Educational Management. Bush Tony.
100% (5)
Theories of Educational Management. Bush Tony.
41 pages
Employing Mindfulness Via Art in Education
No ratings yet
Employing Mindfulness Via Art in Education
9 pages
Reporting Islam - Poole 2002 Reading
No ratings yet
Reporting Islam - Poole 2002 Reading
3 pages
Correct Prayer As Described by Neville Goddard Revision
100% (10)
Correct Prayer As Described by Neville Goddard Revision
11 pages
2022 Lrec-1 558
No ratings yet
2022 Lrec-1 558
6 pages
Laughing Heads Can Transformers Detect What Makes
No ratings yet
Laughing Heads Can Transformers Detect What Makes
8 pages
A Long Short-Term Memory Framework For Predicting Humor in Dialogues
No ratings yet
A Long Short-Term Memory Framework For Predicting Humor in Dialogues
6 pages
Overview HAHA
No ratings yet
Overview HAHA
8 pages
Laughing Out Loud - Exploring AI-Generated and Human-Generated Humor
No ratings yet
Laughing Out Loud - Exploring AI-Generated and Human-Generated Humor
18 pages
Identifying The Type of Sarcasm in Dravidian
No ratings yet
Identifying The Type of Sarcasm in Dravidian
9 pages
HAHA 2019 Dataset: A Corpus For Humor Analysis in Spanish: Luis Chiruzzo, Santiago Castro, Aiala Rosá
No ratings yet
HAHA 2019 Dataset: A Corpus For Humor Analysis in Spanish: Luis Chiruzzo, Santiago Castro, Aiala Rosá
7 pages
How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements
No ratings yet
How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements
15 pages
IA1 Presentation 20BCP299 Charmi Shah Final 1
No ratings yet
IA1 Presentation 20BCP299 Charmi Shah Final 1
15 pages
CASCADE: Contextual Sarcasm Detection in Online Discussion Forums
No ratings yet
CASCADE: Contextual Sarcasm Detection in Online Discussion Forums
11 pages
Emotion Detection Project Report
No ratings yet
Emotion Detection Project Report
51 pages
Final Report End
No ratings yet
Final Report End
92 pages
Chatgpt Is Fun, But It Is Not Funny! Humor Is Still Challenging Large Language Models
No ratings yet
Chatgpt Is Fun, But It Is Not Funny! Humor Is Still Challenging Large Language Models
16 pages
ChatGPT Is Fun, But It Is Not Funny! Humor Is Still Challenging Large Language Models
No ratings yet
ChatGPT Is Fun, But It Is Not Funny! Humor Is Still Challenging Large Language Models
16 pages
Unsupervised Joke Generation From Big Data
100% (1)
Unsupervised Joke Generation From Big Data
5 pages
Template
No ratings yet
Template
16 pages
Thesis - Aru Omarali
No ratings yet
Thesis - Aru Omarali
34 pages
Capstone Review 2
No ratings yet
Capstone Review 2
15 pages
Praveen Phase 3
No ratings yet
Praveen Phase 3
6 pages
Swaraj Project 12march
No ratings yet
Swaraj Project 12march
31 pages
Article 1
No ratings yet
Article 1
3 pages
ICCC-2023 Paper 89
No ratings yet
ICCC-2023 Paper 89
5 pages
Synopsis PRJCT
No ratings yet
Synopsis PRJCT
3 pages
4 TH
No ratings yet
4 TH
5 pages
UR-FUNNY - A Multimodal Language Dataset For Understanding Humor
No ratings yet
UR-FUNNY - A Multimodal Language Dataset For Understanding Humor
11 pages
Humor Sota
No ratings yet
Humor Sota
25 pages
Meme Dectection Project
No ratings yet
Meme Dectection Project
45 pages
Sarcasm Detection On Social Data: Heuristic Search and Deep Learning
No ratings yet
Sarcasm Detection On Social Data: Heuristic Search and Deep Learning
8 pages
Humor: Prosody Analysis and Automatic Recognition For F R I E N D S
No ratings yet
Humor: Prosody Analysis and Automatic Recognition For F R I E N D S
9 pages
Emotions Detection From Messages Using Machine Learning: Abstract
No ratings yet
Emotions Detection From Messages Using Machine Learning: Abstract
4 pages
Practicum Report
No ratings yet
Practicum Report
12 pages
Tweet Emotion
No ratings yet
Tweet Emotion
21 pages
A Wide Evaluation of Chatgpt On Affective Computing Tasks: Mostafa M. Amin, Rui Mao, Erik Cambria, BJ Orn W. Schuller
No ratings yet
A Wide Evaluation of Chatgpt On Affective Computing Tasks: Mostafa M. Amin, Rui Mao, Erik Cambria, BJ Orn W. Schuller
8 pages
HAHA Overview PDF
No ratings yet
HAHA Overview PDF
13 pages
Overview of Haha at Iberlef 2019:: Humor Analysis Based On Human Annotation
No ratings yet
Overview of Haha at Iberlef 2019:: Humor Analysis Based On Human Annotation
13 pages
Overview of Haha at Iberlef 2019:: Humor Analysis Based On Human Annotation
No ratings yet
Overview of Haha at Iberlef 2019:: Humor Analysis Based On Human Annotation
13 pages
Sarcasm Detection Challenge in Sentiment Analysis and Tweets
No ratings yet
Sarcasm Detection Challenge in Sentiment Analysis and Tweets
10 pages
DeepLearning LabManual24!25!41 52
No ratings yet
DeepLearning LabManual24!25!41 52
12 pages
Report Final 5th Sem
No ratings yet
Report Final 5th Sem
18 pages
Sustainability 15 12539
No ratings yet
Sustainability 15 12539
24 pages
Emotion Recognition On Twitter Comparative Study and Training A Unison Model PDF
No ratings yet
Emotion Recognition On Twitter Comparative Study and Training A Unison Model PDF
3 pages
Towards Understanding People From Multilingual Societies (Deepanshu Vijay, MS, 201302093)
No ratings yet
Towards Understanding People From Multilingual Societies (Deepanshu Vijay, MS, 201302093)
46 pages
DL Report Final
No ratings yet
DL Report Final
17 pages
Week 2 Task
No ratings yet
Week 2 Task
4 pages
Uncovering The Limits of Text-Based Emotion Detection
No ratings yet
Uncovering The Limits of Text-Based Emotion Detection
24 pages
BERT-LSTM Model For Sarcasm Detection in Code-Mixed Social Media Post
No ratings yet
BERT-LSTM Model For Sarcasm Detection in Code-Mixed Social Media Post
20 pages
Sentiment Analysis - Beyond Polarity
No ratings yet
Sentiment Analysis - Beyond Polarity
42 pages
2020 Figlang-1 15
No ratings yet
2020 Figlang-1 15
6 pages
Final AI
No ratings yet
Final AI
25 pages
1) Sarcasm Detection in Online Comments Using
No ratings yet
1) Sarcasm Detection in Online Comments Using
14 pages
Ug4 Proj
No ratings yet
Ug4 Proj
51 pages
Article 2
No ratings yet
Article 2
7 pages
Sentimental Analysis
No ratings yet
Sentimental Analysis
7 pages
Textual Analysis of Big Data in Social Media Through Data Mining and Fuzzy Logic Process
No ratings yet
Textual Analysis of Big Data in Social Media Through Data Mining and Fuzzy Logic Process
9 pages
4-Developing As A Leader The Power of Mindful Engagement.
100% (1)
4-Developing As A Leader The Power of Mindful Engagement.
9 pages
The Lost Art of Feedback
No ratings yet
The Lost Art of Feedback
10 pages
Moreno & Mayer 1999
No ratings yet
Moreno & Mayer 1999
11 pages
Biggs What The Student Does
No ratings yet
Biggs What The Student Does
19 pages
Managing Change: Creating A Learning Organization Focused On Quality
No ratings yet
Managing Change: Creating A Learning Organization Focused On Quality
9 pages
Group 1 (Change-Blindness)
No ratings yet
Group 1 (Change-Blindness)
26 pages
Set 1
No ratings yet
Set 1
3 pages
Mindfulness, Need For Cognition, and Thinking Styles Predict Cognitive Failures
No ratings yet
Mindfulness, Need For Cognition, and Thinking Styles Predict Cognitive Failures
1 page
Paper Describing Object
No ratings yet
Paper Describing Object
15 pages
Action Research (Task 1)
No ratings yet
Action Research (Task 1)
4 pages
Experiencing God - The Neurology of The Spiritual Experience
100% (1)
Experiencing God - The Neurology of The Spiritual Experience
8 pages
Human Perception of Visual Information: Bogdan Ionescu Wilma A. Bainbridge Naila Murray Editors
No ratings yet
Human Perception of Visual Information: Bogdan Ionescu Wilma A. Bainbridge Naila Murray Editors
297 pages
Working Memory and Language Learning - A Review
No ratings yet
Working Memory and Language Learning - A Review
13 pages
Social Inclusion For People With Intellectual Disability
100% (1)
Social Inclusion For People With Intellectual Disability
112 pages
Georgia State University ABC Model PDF
No ratings yet
Georgia State University ABC Model PDF
60 pages
Communication Affect and Learning
No ratings yet
Communication Affect and Learning
267 pages
Presentation Skill Quotations For Training Program
No ratings yet
Presentation Skill Quotations For Training Program
5 pages
How To Carry A Conversation
No ratings yet
How To Carry A Conversation
12 pages
SNYB Coaching Manual
No ratings yet
SNYB Coaching Manual
253 pages
By Manly P. Hall: Disciplines of Meditation and Realization
No ratings yet
By Manly P. Hall: Disciplines of Meditation and Realization
8 pages
Driver Drowsiness Prediction Based On Multiple Aspects Using Image Processing Techniques5
No ratings yet
Driver Drowsiness Prediction Based On Multiple Aspects Using Image Processing Techniques5
12 pages
Nelson Handler Prado Blalock 2019 PLCand DLCquestions
No ratings yet
Nelson Handler Prado Blalock 2019 PLCand DLCquestions
19 pages
Webinar Script
No ratings yet
Webinar Script
18 pages
30 Ways To Mindfulness
100% (2)
30 Ways To Mindfulness
36 pages
Alien
No ratings yet
Alien
12 pages
Thompson 1994
No ratings yet
Thompson 1994
29 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DSF Proposal

Uploaded by

DSF Proposal

Uploaded by

Analysis of a Large Joke Corpra

Fig. 4: K-Means Clustering of Jokes

1) Polarity: While jokes do vary on the scale of how

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.