Gaurav 2101EE31
Gaurav 2101EE31
USING
MACHINE LEARNING
By- Gaurav Kumar Gupta (2101EE31)
DATA SOURCE-
2 public datasets are procured to get
35000+ samples, and sources are taken
such that they complement each other’s
samples in terms of emotions having less
number of rows.
KEY
• Custom emotion mapping to unify
INNOVATION
similar labels.
• Handling emojis and punctuations.
APPROACH
Preprocessing
• Converting emojis/emoticons to text.
• Removing punctuations and repitions of
special characters.
• Tokenization
Feature Engineering:
• TF-IDF with n-grams (unigrams + bigrams)
• Dimensionality control (10000 max
features)
Model Selection
• Linear SVM (SDGC Classifier)
• Optimized for F1-score (handles class imbalance)
RESULTS
Model Evaluation
• Accuracy: 67.3%
• F1-Score: 66.7% (macro avg)
• Best Class: Anger (F1=0.72)
• Challenging Class: Love (Recall=0.51)
Highlights
• Neutral Emotion Confused with Happy
• Strong Separation for Anger vs Sad.
• Accuracy of 75% can be achieved by
dropping neutral from the dataset.
THANK YOU
ANY
QUESTIONS?