DSF Proposal
DSF Proposal
Abstract—Humor is a complex and subjective human experi- illustrating the potential of hybrid models to enhance the
ence, challenging to model and quantify. This project proposes a accuracy of humor detection systems .
comprehensive approach to analyze a large corpus of one million [1]In a similar vein, Weller and Seppi (2019) utilize trans-
jokes sourced from Reddit, utilizing state-of-the-art machine
learning algorithms and explainable AI techniques to not only former architectures to assess the humor content of jokes
detect and categorize humor but also to understand and predict collected from the Reddit platform. Their model leverages
its effectiveness. The project aims to address four primary tasks: community ratings to gauge humor, achieving results that
identifying genuine jokes, detecting duplicate jokes, categorizing parallel human judgment. This not only showcases the model’s
jokes into various humor types, and predicting the effectiveness efficacy but also its potential application in real-world scenar-
of jokes. We plan to employ a variety of models, including deep
neural networks like BERT and GPT-3, enhanced with explain- ios where public perception is key .
able AI methodologies such as SHAP, LIME and visualisation of [2]Hasan et al. (2021) develop a cutting-edge Humor
attention weights for greater transparency and interpretability. Knowledge Enriched Transformer (HKT) that integrates multi-
Metrics such as accuracy, F1 score, Jaccard index, and human- modal data—including text, audio, and visual inputs—and ex-
alignment score will be utilized to evaluate model performance ternal humor knowledge. This model sets new benchmarks in
and explanation effectiveness. The insights derived from this
project are expected to not only advance the understanding performance across various datasets and offers a comprehen-
of computational humor but also enhance the development of sive framework for analyzing humor in diverse communicative
AI-driven humor applications, improving user interaction and settings .
engagement in digital platforms.
II. DATASET
I. BACKGROUND R ESEARCH To ensure a comprehensive approach to humor detection and
analysis, this project will utilize multiple datasets sourced from
This project aims to advance the field of humor detection various platforms. Each dataset offers unique characteristics
by developing a sophisticated natural language processing sys- and content types, which are instrumental in training and
tem that utilizes state-of-the-art machine learning techniques, evaluating the machine learning models effectively.
including Transformer architectures. Focusing on the analysis 1. 1 Million Reddit Jokes Dataset[6]: A comprehensive
of jokes from diverse datasets, the project integrates advanced collection of approximately 1 million jokes from the r/Jokes
computational models with explainable AI to enhance both the subreddit, used for diverse humor style recognition and initial
accuracy and transparency of humor recognition. model training.
[5]Inácio et al. (2023) explore the dynamics of humor recog- 2. 200K Humor Detection Dataset[8]: Contains 200,000
nition within Portuguese texts using a BERT-based classifier, labeled instances for binary classification of humor, ideal
achieving an impressive F1-score of 99.6%. Their research for refining models and evaluating performance in controlled
highlights a critical insight: while these models are adept at environments.
identifying stylistic cues like punctuation, they often overlook 3. Puns Dataset[9]: Specializes in pun-based humor with
deeper elements of humor such as linguistic incongruity and detailed annotations, perfect for developing models to analyze
contextual nuances. This finding underscores a potential area and understand wordplay.
for improvement in current models, which could benefit from 4. Short Jokes Dataset[7]: Features a compact collection
an enhanced understanding of the subtleties that define humor. of jokes from a Kaggle competition, great for quick model
[4]Peyrard et al. (2021) demonstrate the capabilities of prototyping and iterative testing.
transformer models in distinguishing between humorous and
serious sentences. Their work not only confirms the effective- III. C HALLENGES
ness of transformers in humor detection but also delves into The most obvious challenge that poses itself in a joke
the mechanics of how these models prioritize and interpret detection mechanism is the absence of a fixed metric to search
different elements of the text, particularly through the lens for. The mechanism of jokes is heavily dependent on the
of attention mechanisms. This approach provides valuable context of the environment and thus have a variety of moving
insights into the specific aspects of text that are most influential parts that need to land correctly, for the success of the joke.
in determining humor . Surely, the presence of funny casual words indicate the onset
[3]Miraj and Aono (2021) present an innovative technique of a joke in a sentence, but however, the roots lie in the
that synergizes BERT with other embedding technologies such sentiment of the sentence being delivered.
as Word2Vec and FastText within a neural network ensemble. Another significant challenge is the explainability of pre-
This method has proven to significantly reduce error rates, dictions made by complex machine learning models used in
Fig. 1: System Flowchart
joke detection. Given the subjective nature of humor, it’s mor classification, making the models’ decision-making
crucial not only to predict whether something is humorous processes transparent and understandable to end-users.
but also to understand and communicate why a particular text V. P RELIMINARY W ORK
is considered a joke. This is particularly challenging with deep
learning models, which are often seen as black boxes. We conduct Exploratory Data Analysis on 4 of the above
datasets and present our findings as follows:
IV. M ETHODOLOGY A. Length of Jokes
The goal of this project is to create a sophisticated sys- The first and simplest metric we experiment with, is the
tem for humour analysis and detection by combining vari- length of the joke. Is there any correlation between how long
ous machine learning approaches and datasets in a thorough a statement runs, and how funny it sounds? Indeed there is
methodology. Through a series of clearly defined steps, the - an ideal joke has a length that is not too short, and not
methodology is designed to address the identification, classifi- too long, limited to about 15 words/ 75 characters on average.
cation, and evaluation of humour, utilising the power of both Naturally, the distribution varies between extremely short puns
advanced deep learning networks and conventional machine to relatively long story based jokes.
learning models.
1) Input and Data Collection
We begin by aggregating jokes from various sources,
including online platforms like Reddit. This stage in-
volves intensive data cleaning to remove duplicates, fix
formatting errors, and handle missing values, ensuring
high-quality data for analysis.
2) Data Processing and Analysis
Post-cleaning, the data is processed through two main
pathways: identifying genuine jokes using NLP tech-
niques to filter out non-humorous content and detecting
duplicates using feature extraction methods like TF-IDF
and Word2Vec. These methods enable the clustering of Fig. 2: Distribution of Joke Lengths for Puns
similar jokes, enhancing the dataset’s utility for precise
humor analysis. B. Commonly Occurring Words
3) Model Development Next, we try to map the most commonly occurring words as
Utilizing Transformer architectures, we develop models a wordcloud, to check if any words are frequently occurring
fine-tuned for humor detection, employing strategies amongst different jokes. Surprisingly, there are no words that
such as ensemble methods for optimal performance. stand out as ‘funny’ or universal to all jokes, suggesting that
Rigorous testing and validation ensure that the models all jokes vary in their style, dialogue and delivery from one
effectively handle diverse humor types and accurately another. Only words which suggest active/ passive voice, or
reflect nuanced humor distinctions. articles and verbs are something that top this statistic.
4) Explainability and Visualization
A significant emphasis is placed on making the models C. Sentiment Analysis
explainable through the visualization of attention mech- Lastly, we apply the sentiment analysis method to all the
anisms and the use of SHAP and LIME. These tools datasets in search of a more context based technique of
help illuminate how specific text elements influence hu- searching for a joke.
Fig. 3: Distribution of Joke Lengths for Storytelling based
Jokes Fig. 5: Sentiment analysis on Jokes