Amazon-reviews-classification

I have a dataset of Amazon customer reviews for Cell Phones and Accessories. The task is to create a model that can classify each review into 'positive', 'negative' and 'neutral' review.

Approach used:-

The given dataset is in json format so I have converted the data into csv format for my convenience. Also, the given dataset has nearly 9.6 million rows and 9 columns which makes the dataset unloadable into the memory directly. In order to resolve this issue 'chunksize' is used to divide the dataset into smaller datasets with number of rows as 'chunksize' and number of columns remain same i.e. 9.
I have chosen chunksize=10^6 so I have 10 chunks(smaller datasets)
Now, each chunk is basically a dataset, which is then appended to a list named 'chunks' so that operations can be performed on each chunk using a loop.

Model details:-

The model that I have used for classification is Logistic Regression.
The model is applied on each of the chunks(smaller datasets) and the chunk with the best accuracy is chosen for confusion matrix and classification report.

Model metrics:-

Chunk 4 was chosen for confusion matrix and classification report.

	precision	recall	f1-score	support
negative (-1)	0.78	0.68	0.73	43035
neutral (0)	0.49	0.20	0.28	18990
positive (1)	0.89	0.97	0.93	184920

accuracy			0.86	246945
macro avg	0.72	0.62	0.65	246945
weighted avg	0.84	0.86	0.84	246945

Colab Notebook Link:-

https://colab.research.google.com/drive/1QKcnH5rMZQLJ3Q_S5NpB3jR8ZIG45wfh?usp=sharing

Further imporvements to be done:-

Optimizing the model and improving the accuracy of the model.
Trying different classification algorithms like Naive Bayes and SVM for better performance and accuracy
Deployment of the project using Flask or any other tool.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ML_NLP.ipynb		ML_NLP.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Amazon-reviews-classification

Approach used:-

Model details:-

Model metrics:-

Colab Notebook Link:-

Further imporvements to be done:-

About

Uh oh!

Releases

Packages

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

adarshcode/Amazon-reviews-classification

Folders and files

Latest commit

History

Repository files navigation

Amazon-reviews-classification

Approach used:-

Model details:-

Model metrics:-

Colab Notebook Link:-

Further imporvements to be done:-

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Packages