0% found this document useful (0 votes)

21 views8 pages

ISSS609 Project Proposal Group 7

The project proposal outlines the development of a sentiment analysis tool aimed at understanding consumer perceptions of Sephora's products through user-generated content. It involves data collection from Sephora's website, preprocessing of reviews, and the application of various machine learning and deep learning models for sentiment classification. The project aims to provide actionable insights for marketing, customer service, and product development teams to enhance their strategies based on consumer feedback.

Uploaded by

ayushi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views8 pages

ISSS609 Project Proposal Group 7

Uploaded by

ayushi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

SMU Classification: Restricted

ISSS609
Project Proposal

“Analyzing Beauty: Insights into Sephora

Feedback via Sentiment Analysis”
Group 7
Alvin LIM Li Xian
Ayushi SHAKYA
Debanjan DATTA
Junzhe Huang
Neha GOYAL
NGUYEN Thuy Hanh Duyen

29 Sep 2024
SMU Classification: Restricted

1. Introduction
With social media platforms and e-commerce websites being highly prevalent among all
consumer segments, businesses now have access to a wealth of user-generated content that
offers valuable insights into their sentiment. This project aims to develop a sentiment analysis tool
that categorizes and processes the sentiments behind user reviews, helping a popular e-
commerce platform Sephora understand how their brands and products are perceived. The tool
will enable businesses to respond more effectively to customer feedback by classifying
sentiments in captions and comments as positive, negative, or neutral.
Project Components:

• Data Sources: The dataset (downloaded from Kaggle) was collected via Python scraper
from Sephora US website (March 2023) and contains 2 data tables, “Products” and
“review”. (See Appendix 1)
• Challenges: User reviews content often includes informal language, slang, and emojis,
making natural language processing (NLP) essential for cleaning and normalizing the data.
The language this model will be trained on is English, it will not be able to analyze mixed
language posts and comments.
• Machine Learning Models: A mix of traditional machine learning techniques and deep
learning architectures will be applied for sentiment classification.
Benefits for different business functions include:

• Marketing teams can leverage consumer sentiment insights to fine-tune campaigns and
better target audiences
• Customer service teams can respond quickly to negative sentiments, addressing concerns
• Product development teams can gather feedback on product features and identify areas
for improvement and new value propositions
By providing insights into consumer opinions, this tool allows businesses to track and adjust their
strategies, addressing customer concerns and improving overall product satisfaction. Staying
attuned to consumer sentiment can help businesses remain responsive, customer-centric, and
competitive.

2. Proposed Methodology
Analyzing user reviews poses various challenges for sentiment analysis, especially because of
the subtle language nuances often found in the data. This makes it difficult to detect ambiguity,
sarcasm, or irony, given that this heavily depends on context which models may not capture
explicitly. Moreover, traditional models often struggle with contextual understanding, leading to
misclassification of sentiments. Data quality such as comments containing noise, slang,
abbreviations, and emojis can affect model performance. Additionally, comments may be written
in languages other than English, which might not be supported by the models. Training Large
Language Models (LLMs) like BERT [2] and GPT [3] requires significant computational resources.
SMU Classification: Restricted

This project aims to develop and compare sentiment analysis models using traditional machine
learning techniques, deep learning architectures, and Large Language Models (LLMs). The
methodology is structured into three core steps: data collection and preprocessing, model
development, and comparison and evaluation. Figure 1 shows the flowchart for the proposed
methodology [1]:

Figure 1: Flowchart for sentiment analysis

Data Collection and Preprocessing

We will utilize datasets of user reviews and product information. Data preprocessing will remove
noise such as punctuation, stop words, emojis, slang, and abbreviations by cleaning, normalizing
words (stemming or lemmatization), and tokenizing the text. After preprocessing, methods like
word embeddings will be used to capture semantic relationships. The cleaned dataset will be split
into training and testing sets using an 80-20 split.

Model Development
To desmonstrate different approaches to sentiment classification, both traditional machine
learning models and LLM-based models will be implemented. Feature extraction techniques will
be employed to transform text data into numerical representations for traditional machine learning
models algorithms. This will facilitate an assessment of how engineered textual features impact
model performance. Deep learning models, including CNN and transformers like BERT and GPT,
will be used for their ability to capture complex contextual relationships. Comparing these
methods will reveal the trade-offs between computational efficiency and capturing nuanced
sentiment.

Comparison & Evaluation:

The models will be evaluated using a standard set of performance metrics including accuracy,
precision, recall, F1-score, and confusion matrix analysis. The goal is to identify the strengths and
weaknesses of each approach based on their performance on the same test data.
SMU Classification: Restricted

3. Solution Details

Data Pre-processing
Data cleaning and pre-processing such as stop word removal, stemming, lemmatization and
tokenization will be done before being used for model training.

Classic Machine Learning Models

Classic machine learning models are simple lightweight model which can be used as a baseline
model to assess the performance of other more sophisticated models.

Feature Extraction:

Classic machine learning models perform better with numbers of inputs compared to text inputs.
We will explore multiple methods for converting text data into numerical representations:

a. TF-IDF (Term Frequency-Inverse Document Frequency): A statistical measure to

evaluate how important a word is within a document relative to the corpus.
b. Word Embeddings: Using pre-trained word embeddings such as Word2Vec or
GloVe to convert words into vectors that capture semantic meaning.

Model Selection:

Logistic Regression (LR): The LR model is a simple classification model which we will use as
a baseline model. The LR model will be trained on the preprocessed text data to classify the
sentiments of each text. We will explore using 1. a binary classifier to classify posts into positive
or negative, and 2. a multiclass classifier to classify posts into positive, neutral or negative.

Support Vector Machine (SVM): SVM is a more powerful classifier that works well with high-
dimensional data (i.e. its better able to accept a larger corpus than LR). Similar to LR, the SVM
model will be trained using the preprocessed text data and classify the sentiment of the text into
positive and negative sentiment. In addition, we will explore the use of the One vs All approach
to do a 3-class classification with SVM.

Deep Learning Model

Deep Learning models replicates artificial neural networks to learn the underlying patterns from
large amount of data. They are capable of handling more complex data compared to classic
machine learning models and are therefore less reliant on data pre-processing.

Model Selection:

Convolutional Neural Networks (CNN): CNNs are traditionally used for image recognition
tasks but we can use a 1-Dimension CNN models for the purpose of text sentiment analysis.
SMU Classification: Restricted

The CNN model will be trained on the tokenized data to do a 3-class classification of the
sentiment.

Transformers: Transformers are neural networks that uses self-attention mechanisms to

process text, allowing it to better capture the context of each sentence. They are the state-of-
the-art model for sentiment analysis.

We will explore using various transformers model such as BERT (Bidirectional Encoder
Representations from Transformers) [2] and GPT (Generative Pretrained Transformer) [3] to
perform sentiment analysis and compare their performance.

Model Advantage Limitation

LR • Lightweight and fast • Not able to fully capture context
• Predictions are easy to interpret • Performance is highly dependent
and explain on data preprocessing and
feature extraction
SVM • Requires more computation • Not able to fully capture context
resource than LR but less than • Performance is highly dependent
deep learning models on data preprocessing and
• Performs better than LR with feature extraction
complex data such as social
media post
CNN • CNN is better at capturing context • Requires more computation
than classic machine learning resource for training
models
• Less data pre-processing is
required
Transformers • Expected to have the best • Most computationally intensive to
performance train and fine-tune
• Able to process complex data,
highly capable of understanding
textual context

4. Proposed Experiments

For the sentiment analysis experiment, we plan to create a systematic process for data
preparation, model training, and evaluation.
Our first step will involve text preprocessing, where tokenization is performed first to break down
the text into individual words or tokens. Following this, we will remove noise such as
punctuation, emojis, and special characters. Stop-word removal will come next, eliminating
common words that do not add significant meaning to the text. Finally, we will apply stemming
or lemmatization to normalize words to their root forms, ensuring consistency across the
SMU Classification: Restricted

dataset. This process is critical for reducing noise and improving the overall performance of the
sentiment analysis models, as detailed in our methodology.
Next, we will partition the data into three sets: 80% for training, 10% for validation, and 10% for
testing. This will allow us to fine-tune the model using the validation set and evaluate the
model's generalization using the test set.
For the experiment, we will use Python, leveraging key libraries such as Pandas for data handling,
NLTK and spaCy for text preprocessing, and Scikit-learn for traditional machine learning models.
Advanced deep learning models will be implemented using frameworks like TensorFlow and
PyTorch, ensuring efficient model training on GPU resources available through Google Colab.
Our approach involves running two phases of experiments:
1. Phase 1: We will start with traditional machine learning models, such as Logistic
Regression (LR) and Support Vector Machine (SVM). These will be used as baselines to
classify sentiments using TF-IDF and Word2Vec for feature extraction.
2. Phase 2: We will then advance to more sophisticated models like Convolutional Neural
Networks (CNN) and transformers (such as BERT and GPT-3) to capture complex
relationships between words and improve contextual understanding. Fine-tuning will be
conducted for the transformers to optimize performance.
Evaluation metrics will include accuracy, precision, recall, and F1-score to compare the
performance of traditional models against deep learning models.
This step-by-step approach allows us to assess the efficiency and accuracy of various models,
ultimately determining the most effective method for social media sentiment analysis.

5. Project Schedule and Work Division

• Week 2-3: Literature review and dataset selection
Responsible: <Neha Goyal> ,<Ayushi Shakya>
• Week 4: Data cleaning, feature extraction, and data partitioning
Responsible: <Debanjan Datta>
• Week 5-6: Model training for traditional machine learning models
Responsible: <Ayushi Shakya>
• Week 7-8: Fine-tuning and evaluation of LLM models
Responsible: <Alvin Lim>, < Nguyen Thuy Hanh Duyen>
• Week 9: Model comparison, analysis, and final report preparation
Responsible: <Neha Goyal>, <Junzhe Huang>
SMU Classification: Restricted

References
[1] Arun, K. & Srinagesh, Ayyagari. (2020). Multi-lingual Twitter sentiment analysis using machine
learning. International Journal of Electrical and Computer Engineering (IJECE). 10. 5992.
10.11591/ijece.v10i6.pp5992-6000.
[2] Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding. North American Chapter of the
Association for Computational Linguistics.
[3] Radford, A. (2018). Improving language understanding by generative pre-training.
SMU Classification: Restricted

Appendix I – Data Table Schema

The dataset consists of two data tables:

- “Products” containing information about all beauty products (over 8,000) from the
Sephora online store, including product and brand names, prices, ingredients, ratings,
and all features.

- “review” includes user reviews (about 1 million on over 2,000 products) of all products
from the Skincare category, including user appearances, and review ratings by other
users

Overpressure Mod-1
No ratings yet
Overpressure Mod-1
51 pages
Dry Needling Courses For Pelvic Floor Physiotherapists
0% (1)
Dry Needling Courses For Pelvic Floor Physiotherapists
9 pages
Intro International Relations Notes
No ratings yet
Intro International Relations Notes
12 pages
NLP Final Mini Project
No ratings yet
NLP Final Mini Project
17 pages
Twittersentiment
No ratings yet
Twittersentiment
12 pages
Singer Model NM-17-27 Electromagnetic Field Intensity Meter (1-500783-255 (Rec. A) ) Instruction Manual, 1973.
100% (1)
Singer Model NM-17-27 Electromagnetic Field Intensity Meter (1-500783-255 (Rec. A) ) Instruction Manual, 1973.
29 pages
Minor New Report
No ratings yet
Minor New Report
45 pages
Process of The Manufacture of Common Salt
No ratings yet
Process of The Manufacture of Common Salt
23 pages
Sentiment Analysis From H El Reviews: Data Mining For Business Intelligence
No ratings yet
Sentiment Analysis From H El Reviews: Data Mining For Business Intelligence
13 pages
Sentiment Analysis Based On Deep Learning - A Comparative Study
No ratings yet
Sentiment Analysis Based On Deep Learning - A Comparative Study
29 pages
Multiplexing and Demultiplexing
No ratings yet
Multiplexing and Demultiplexing
48 pages
Designing Effective Powerpoint Presentations: Adopted From: Victor Chen Erau - V.Chen@Erau
No ratings yet
Designing Effective Powerpoint Presentations: Adopted From: Victor Chen Erau - V.Chen@Erau
49 pages
Incompatible Element-Rich Uids Released by Antigorite Breakdown in Deeply Subducted Mantle
No ratings yet
Incompatible Element-Rich Uids Released by Antigorite Breakdown in Deeply Subducted Mantle
14 pages
Paving Flooring and Dado
No ratings yet
Paving Flooring and Dado
17 pages
Questions Tags: Negative Statement Positive Tag
No ratings yet
Questions Tags: Negative Statement Positive Tag
4 pages
Contoh Time Schedule Starting Project MBLE - KBU
No ratings yet
Contoh Time Schedule Starting Project MBLE - KBU
1 page
S. Brînza Omorul Săvârşit Asupra A Două Sau Mai Multor Persoane (Lit.g) Alin. (2) Art.145 C.pen. RM) : Analiză de Drept Penal
No ratings yet
S. Brînza Omorul Săvârşit Asupra A Două Sau Mai Multor Persoane (Lit.g) Alin. (2) Art.145 C.pen. RM) : Analiză de Drept Penal
11 pages
Synopsis
No ratings yet
Synopsis
8 pages
Germanic Grammar
No ratings yet
Germanic Grammar
16 pages
S8 - End-of-Unit 1 Test
100% (1)
S8 - End-of-Unit 1 Test
2 pages
Maneesha Nidigonda Major Project
No ratings yet
Maneesha Nidigonda Major Project
11 pages
2016 Ann
No ratings yet
2016 Ann
6 pages
Ana Nadhya Abrar (2020) - Environemntal Journaism in Indonesia - in Search of Principles and Technical Guidelines
No ratings yet
Ana Nadhya Abrar (2020) - Environemntal Journaism in Indonesia - in Search of Principles and Technical Guidelines
15 pages
ICDAIC 2023 Paper 51
No ratings yet
ICDAIC 2023 Paper 51
6 pages
Sentiment Analysis of User Comment Text Based On L
No ratings yet
Sentiment Analysis of User Comment Text Based On L
13 pages
Maneesha Nidigonda Verzeo Major Project
No ratings yet
Maneesha Nidigonda Verzeo Major Project
11 pages
Paper 8848
No ratings yet
Paper 8848
4 pages
A Comparative Study of Sentiment Analysis Using NLP and Different Machine Learning Techniques On US Airline Twitter Data
No ratings yet
A Comparative Study of Sentiment Analysis Using NLP and Different Machine Learning Techniques On US Airline Twitter Data
4 pages
Sindhu Rudianto - PDF - Wiratman Wangsadinata .PDF - Ellen M. Rathje - Makalah
No ratings yet
Sindhu Rudianto - PDF - Wiratman Wangsadinata .PDF - Ellen M. Rathje - Makalah
10 pages
The Social Network Review
No ratings yet
The Social Network Review
16 pages
Template For Research Prtemplate For Research Proposaloposal
No ratings yet
Template For Research Prtemplate For Research Proposaloposal
2 pages
(IJCST-V9I3P23) :aditi Linge, Bhavya Malviya, Digvijay Raut, Payal Ekre
No ratings yet
(IJCST-V9I3P23) :aditi Linge, Bhavya Malviya, Digvijay Raut, Payal Ekre
3 pages
Cat Questions
No ratings yet
Cat Questions
5 pages
Sentiment Analysis 1
No ratings yet
Sentiment Analysis 1
12 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
Tecstrip Flat & Flexible Phenolic Insulating Strip: Linda B - We Simplif y Const Ruc T Ion
No ratings yet
Tecstrip Flat & Flexible Phenolic Insulating Strip: Linda B - We Simplif y Const Ruc T Ion
2 pages
Customer Product
No ratings yet
Customer Product
5 pages
20000027553a - EN - T25 Digital - 032023 - Web
No ratings yet
20000027553a - EN - T25 Digital - 032023 - Web
10 pages
Sentiment Analysis of A Product Based On User Reviews Using Random Forests Algorithm
No ratings yet
Sentiment Analysis of A Product Based On User Reviews Using Random Forests Algorithm
5 pages
RES Presentation
No ratings yet
RES Presentation
21 pages
NILES2021 Paper 43
No ratings yet
NILES2021 Paper 43
5 pages
Sentimental Analysis of Customer Reviews Which Should Be Represent in Graph by Using Plot Scatter
No ratings yet
Sentimental Analysis of Customer Reviews Which Should Be Represent in Graph by Using Plot Scatter
12 pages
Sentiment Analysis Report
No ratings yet
Sentiment Analysis Report
2 pages
Sentimental Analysis
No ratings yet
Sentimental Analysis
37 pages
4 - Sam-Hq
No ratings yet
4 - Sam-Hq
18 pages
AccountFullStatement Asif Raza
No ratings yet
AccountFullStatement Asif Raza
4 pages
ML Project Report
No ratings yet
ML Project Report
26 pages
Sentiments of Public Opinion
No ratings yet
Sentiments of Public Opinion
3 pages
Week 8
No ratings yet
Week 8
6 pages
Sentimental Analysis of Web Scapping Data
No ratings yet
Sentimental Analysis of Web Scapping Data
9 pages
PSYC 6213 Unit2
No ratings yet
PSYC 6213 Unit2
6 pages
BDCC 08 00199 v2
No ratings yet
BDCC 08 00199 v2
18 pages
MP 1
No ratings yet
MP 1
14 pages
Pmei l4 Complete
No ratings yet
Pmei l4 Complete
4 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
14 pages
Mini Project
No ratings yet
Mini Project
16 pages
Study On Sentiment Analysis
No ratings yet
Study On Sentiment Analysis
5 pages
Final Year Project PPT Template
No ratings yet
Final Year Project PPT Template
12 pages
Paper PDF Data
No ratings yet
Paper PDF Data
3 pages
### Seminar Report
No ratings yet
### Seminar Report
12 pages
Reserach Paper
No ratings yet
Reserach Paper
3 pages
IMDB Sentiment Analysis
No ratings yet
IMDB Sentiment Analysis
44 pages
Document From Atharva
No ratings yet
Document From Atharva
8 pages
CC Assignment-1
No ratings yet
CC Assignment-1
7 pages
Software Engineering - Project Proposal
No ratings yet
Software Engineering - Project Proposal
13 pages
XLNet Transfer Learning Model For Sentimental Analysis
No ratings yet
XLNet Transfer Learning Model For Sentimental Analysis
9 pages
9th AI Project 1
No ratings yet
9th AI Project 1
3 pages
Product Rating Through Sentiment Analysis
No ratings yet
Product Rating Through Sentiment Analysis
23 pages
Detailed Report
No ratings yet
Detailed Report
6 pages
OpenAI Function Calling For Financial Data Retrieval
No ratings yet
OpenAI Function Calling For Financial Data Retrieval
6 pages
Twitter Sentiment Analysis Using Deep Learning
No ratings yet
Twitter Sentiment Analysis Using Deep Learning
5 pages
F.Y. B.Sc. (Aviation) - Syllabus As Per NEP With New Subject Codes To Be Implemented From AY 2024-25
No ratings yet
F.Y. B.Sc. (Aviation) - Syllabus As Per NEP With New Subject Codes To Be Implemented From AY 2024-25
44 pages
Restaurant Review Production Analysis Using Python
No ratings yet
Restaurant Review Production Analysis Using Python
33 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
9 pages
Modified Study On Customer Gratification Towards Online Market and Super /hyper Marke
No ratings yet
Modified Study On Customer Gratification Towards Online Market and Super /hyper Marke
11 pages
A Comparative Study of Sentiment Analysis On Customer Reviews Using Machine Learning and Deep Learning
No ratings yet
A Comparative Study of Sentiment Analysis On Customer Reviews Using Machine Learning and Deep Learning
16 pages
Analyzing Customer Feedback Using NLP
No ratings yet
Analyzing Customer Feedback Using NLP
21 pages
Sentiment Analyzer For E-Commerce
No ratings yet
Sentiment Analyzer For E-Commerce
16 pages
A Comparative Study of Different Classification Te
No ratings yet
A Comparative Study of Different Classification Te
10 pages
Dupesh
No ratings yet
Dupesh
9 pages
NM Project Report-Sentiment Analysis-2
No ratings yet
NM Project Report-Sentiment Analysis-2
36 pages
Mukesh Joshiyara FInal
No ratings yet
Mukesh Joshiyara FInal
31 pages
Final Presentation
No ratings yet
Final Presentation
8 pages
Synopsis 6th Sem
No ratings yet
Synopsis 6th Sem
5 pages
Amit Anand Presentation Sem4 Deep Learning Based Sentiment Analysis-2
No ratings yet
Amit Anand Presentation Sem4 Deep Learning Based Sentiment Analysis-2
12 pages
Sample Portfolio With Movs-Annotations-A4
No ratings yet
Sample Portfolio With Movs-Annotations-A4
43 pages
Final Sentiment Classification
No ratings yet
Final Sentiment Classification
16 pages
PDS - Proj - Report-2 RISHI B VATSAL P ANISHA M
No ratings yet
PDS - Proj - Report-2 RISHI B VATSAL P ANISHA M
49 pages
NM Project
No ratings yet
NM Project
18 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ISSS609 Project Proposal Group 7

Uploaded by

ISSS609 Project Proposal Group 7

Uploaded by

SMU Classification: Restricted

“Analyzing Beauty: Insights into Sephora

Figure 1: Flowchart for sentiment analysis

Data Collection and Preprocessing

Comparison & Evaluation:

Classic Machine Learning Models

a. TF-IDF (Term Frequency-Inverse Document Frequency): A statistical measure to

Deep Learning Model

Transformers: Transformers are neural networks that uses self-attention mechanisms to

Model Advantage Limitation

5. Project Schedule and Work Division

Appendix I – Data Table Schema

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.