0% found this document useful (0 votes)
13 views15 pages

Ai Project

The document outlines a project for performing sentiment analysis on text data, specifically classifying movie reviews as positive or negative. It details the tools required, including Python and various libraries, and provides a step-by-step workflow from data collection to model training and prediction. Additionally, it includes installation steps, code snippets for implementation, and instructions for running the code.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views15 pages

Ai Project

The document outlines a project for performing sentiment analysis on text data, specifically classifying movie reviews as positive or negative. It details the tools required, including Python and various libraries, and provides a step-by-step workflow from data collection to model training and prediction. Additionally, it includes installation steps, code snippets for implementation, and instructions for running the code.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Sentiment Analysis

Project

By: Talha Saeed


Goal

 Perform sentiment analysis on text data (positive/negative classification).


Tools

 - Python 3.x
 - Libraries: pandas, scikit-learn, nltk
Workflow

 - Data Collection: Use a small dataset (movie reviews).


 - Preprocessing: Clean and vectorize text data.
 - Model Training: Train Logistic Regression classifier.
 - Prediction: Classify new text inputs.
Installation Steps

 1. Install Python.
 2. Install libraries: pip install pandas scikit-learn nltk
Complete Code Overview

 The code includes data loading, preprocessing, vectorization, model training,


evaluation, and prediction.
Executable Code
Import required libraries

 import pandas as pd
 from sklearn.model_selection import train_test_split
 from sklearn.feature_extraction.text import CountVectorizer
 from sklearn.linear_model import LogisticRegression
 from sklearn.metrics import accuracy_score
 import nltk
 from nltk.corpus import stopwords

# Download stopwords
 nltk.download('stopwords')
 stop_words = set(stopwords.words('english'))

# 1. Load your dataset
 dataset_path = "C:\\Users\\Talha\\Downloads\\train.csv\\train.csv"
 df = pd.read_csv(dataset_path, encoding='ISO-8859-1')
2. Preprocess the data

 def preprocess_text(text):
 if isinstance(text, str):
 # Convert to lowercase
 text = text.lower()
 # Remove stopwords
 words = [word for word in text.split() if word not in stop_words]
 return " ".join(words)
 else:
 return ""

df['selected_text'] = df['selected_text'].apply(preprocess_text)
 # 3. Vectorize the text data
 vectorizer = CountVectorizer()
 X = vectorizer.fit_transform(df['selected_text'])
 y = df['sentiment']

 # 4. Split the dataset


 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 5. Train the model
 model = LogisticRegression()
 model.fit(X_train, y_train)
 # 6. Evaluate the model
 y_pred = model.predict(X_test)
 print(f"Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")

 # 7. Make predictions on new data


 def predict_sentiment(text):
 try:
 text = preprocess_text(text)
 vectorized_text = vectorizer.transform([text])
 prediction = model.predict(vectorized_text)[0]
 return prediction
 except Exception as e:
 print(f"Error: {e}")
 return None
Test the model with custom input

 while True:
 user_input = input("Enter a sentence (or type 'exit' to quit): ")
 if user_input.lower() == 'exit':
 break
 sentiment = predict_sentiment(user_input)
 print(f"Predicted Sentiment: {sentiment}")
How to Run the Code

 1. Save the code in a file (e.g., sentiment_analysis.py).


 2. Run the file using Python.
Key Features

 - Small, understandable dataset.


 - Basic preprocessing (stopword removal).
 - Logistic Regression model.
 - Interactive custom input for testing.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy