EXP5
EXP5
Aim: Develop Content (text, emoticons, image, audio, video) based social media analytics
model for business.
Objective: To build a comprehensive social media analytics model utilizing Python, capable of
analyzing various content types including text, emoticons, images, audio, and video. This model
aims to extract valuable insights from diverse content sources, enabling businesses to understand
user sentiment, preferences, and engagement patterns across multiple media formats for
informed decision-making and targeted marketing strategies.
Theory:
Social media sentiment analysis is the process of collecting and analyzing information on the
emotions behind how people talk about your brand on social media. Rather than a simple count
of mentions or comments, sentiment analysis considers feelings and opinions. Social media
sentiment analysis is sometimes called “opinion mining.”
Sentiment analysis, a fundamental aspect of Natural Language Processing (NLP), entails the
classification of text based on polarity, typically categorized as positive, negative, or neutral.
Early approaches to sentiment analysis relied on rule-based methodologies, exemplified by
Python libraries such as TextBlob and NLTK-VADER.
A consistent observation is that the efficacy of sentiment classification improves with methods
capable of capturing contextual nuances. Various techniques for encoding or embedding text
have been developed to enhance context awareness, consequently leading to higher accuracy in
sentiment classification tasks.
What is TextBlob?
TextBlob is a Python library for processing textual data12. It provides a simple API for common
natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction,
sentiment analysis, classification, and more1. TextBlob is built on top of the Natural Language
Toolkit (NLTK) and provides an easier-to-use interface with additional functionalities.
What is WordCloud?
A word cloud is a visual representation of text data. It is a collection of words depicted in
different sizes, where the size of each word indicates its importance or frequency within the
given text. The more a specific word appears in a source of textual data, the bigger and bolder it
appears in the word cloud. Word clouds are used to quickly identify the most common words in a
text and to help visualize the main themes of the text.
Data Collection:-
Obtain the dataset containing text data for sentiment analysis. This could be from social media
platforms, review websites, or any other relevant source like Netlytic, Octoparse.
Data Preprocessing:-
● Text Cleaning: Remove noise such as HTML tags, special characters, punctuation, and
stopwords.
● Tokenization: Split the text into individual words or tokens.
● Normalization: Convert the text to lowercase to ensure uniformity.
● Stemming or Lemmatization: Reduce words to their base or root form to improve
analysis accuracy.
Feature Extraction:-
Convert the preprocessed text into numerical features that can be understood by machine
learning algorithms. Common techniques include:
● Bag of Words (BoW): Represent each document as a vector of word counts.
● Term Frequency-Inverse Document Frequency (TF-IDF): Assign weights to words based
on their importance in the document and across the entire corpus.
Post-processing:-
Optionally, perform additional steps such as thresholding or confidence scoring to refine the
sentiment predictions.
Code:
Collecting snscrape
Downloading snscrape-0.7.0.20230622-py3-none-any.whl (74 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74.8/74.8 kB 2.3 MB/s eta 0:00:00
Requirement already satisfied: requests[socks] in /usr/local/lib/python3.10/dist-packages (from snscrape) (2.31.0)
Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (from snscrape) (4.9.4)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (from snscrape) (4.12.3)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from snscrape) (3.13.1)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4->snscrape) (2.5)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests[socks]->snscrape)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests[socks]->snscrape) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests[socks]->snscrape) (2.0.
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests[socks]->snscrape) (2024
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.10/dist-packages (from requests[socks]->snscrape) (
Installing collected packages: snscrape
Successfully installed snscrape-0.7.0.20230622
import numpy as np
import nltk
nltk.download('stopwords')
import string
import re
import textblob
from textblob import TextBlob
import warnings
%matplotlib inline
import os
256
import pandas as pd
# Reads the json generated from the CLI commands above and creates a pandas dataframe
tweets_df = pd.read_json('text-query-tweets.json', lines=True)
tweets_df.head()
tweets_df.to_csv()
'""\n'
df = pd.read_csv("./Alan_walker_Hello_world.csv")
df.head(5)
Perfect song
n
0 1 @steveodyuo192 music..good Ugxa6AhoP2tgLxuTfx54AaABAg NaN 0
job..from
north eas...
@HieuHoang-
1 2 great UgyXaMpozA_GxcuyXfR4AaABAg NaN 0
🏛Romans 2:
9 to 16.
2 3 @VideosGospels Tribulation UgwH_Kryk9j51c3VBLp4AaABAg NaN 0
and 🤮🤮
anguish ...
Hello, hello,
3 4 @skibniewska3782 hello, world. UgxWKbvkC816aLIU38J4AaABAg NaN 0
I love ❤❤
4 5 @Thethunder007k
alan walker Ugzop-cnCZ6sHhvQ1PV4AaABAg NaN 0
print(df.shape)
(557, 12)
df.info()
df.pubdate.value_counts()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 557 entries, 0 to 556
Data columns (total 12 columns):
# Column Non-Null Count Dtype
import numpy as np
import pandas as
pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
#Heat Map for missing values
plt.figure(figsize=(17, 5))
sns.heatmap(df.isnull(), cbar=True, yticklabels=False)
plt.xlabel("Column_Name", size=14, weight="bold")
plt.title("Places of missing values in column",fontweight="bold",size=17)
plt.show()
import plotly.graph_objects as go
Top_Title_Of_tweet= df['title'].value_counts().head(10)
print (Top_Title_Of_tweet)
import nltk
stop=nltk.download('stopwords')
(557, 12)
Collecting tweet-preprocessor
Downloading tweet_preprocessor-0.6.0-py3-none-any.whl (27 kB)
Installing collected packages: tweet-preprocessor
Successfully installed tweet-preprocessor-0.6.0
#Remove unnecessary characters
punct =['%','/',':','\\','&','&',';', '?']
def remove_punctuations(text):
for punctuation in punct:
text = text.replace(punctuation, '')
return text
df=df.drop_duplicates('description')
(502, 12)
<ipython-input-36-3d2e3b3af81c>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
<ipython-input-37-fed08228fedf>:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
df = df.reset_index(drop=True)
df.sample(5)
I love ❤❤
4 5 @Thethunder007k Ugzop-cnCZ6sHhvQ1PV4AaABAg NaN
alan walker
I am not
home. I am
396 418 @mei3094 an Ugw5i7Q3VV2iCfWWLKR4AaABAg NaN
immigrant.
This is not ...
Thanks for
this video, u
419 450 @ayanocybergod make good UgyyyykL52PuD3FQuo94AaABAg NaN
mood for
me...
Is this stupid
89 94 @patrickparreno7741
heart UgzoM74YWPzx4wOHNLx4AaABAg NaN
sns.set_style('whitegrid')
%matplotlib inline
stop=stop +['hello','good','http','love','happy','Thank','heart','me','mood','video','india']
def plot_20_most_common_words(count_data, count_vectorizer):
import matplotlib.pyplot as plt
words = count_vectorizer.get_feature_names_out()
total_counts = np.zeros(len(words))
for t in count_data:
total_counts+=t.toarray()[0]
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.
def get_subjectivity(text):
return TextBlob(text).sentiment.subjectivity
def get_polarity(text):
return TextBlob(text).sentiment.polarity
df['subjectivity'] = df['description'].apply(get_subjectivity)
df['polarity'] = df['description'].apply(get_polarity)
df.head()
Perfect song
n
0 1 @steveodyuo192 music..good Ugxa6AhoP2tgLxuTfx54AaABAg NaN 0
job..from
north eas...
@HieuHoang-
1 2 great UgyXaMpozA_GxcuyXfR4AaABAg NaN 0
🏛Romans 2
9 to 16.
2 3 @VideosGospels Tribulation UgwH_Kryk9j51c3VBLp4AaABAg NaN 0
and 🤮🤮
anguish w...
Hello,
hello, hello,
world.
3 4 @skibniewska3782 UgxWKbvkC816aLIU38J4AaABAg NaN 0
I love ❤❤
4 5 @Thethunder007k
alan walker Ugzop-cnCZ6sHhvQ1PV4AaABAg NaN 0
neutral_threshold = 0.05
1 great Positive 0
textblob_sentiment
Show code
textblob_df['textblob_sentiment'].value_counts()
Neutral 286
Positive 195
Negative 21
Name: textblob_sentiment, dtype: int64
df_positive=textblob_df[textblob_df['textblob_sentiment']=='Positive']
df_Very_positive=df_positive[df_positive['likecount']>0]
df_Very_positive.head()
df_negative=textblob_df[textblob_df['textblob_sentiment']=='Negative']
df_negative.head()
122 "lost in the dark , but ill never be alone" ca... Negative 0
df_neutral=textblob_df[textblob_df['textblob_sentiment']=='Neutral']
df_neutral.head()
method
plt.imshow(word_cloud1, interpolation='bilinear')
plt.axis("off")
plt.show()
Vidyavardhini’s College of Engineering & Technology
Department of Computer Engineering