0% found this document useful (0 votes)
109 views14 pages

Sin Review-2 PDF

The document summarizes an analysis of email ego networks. It discusses 5 papers on the topic from the literature. The proposed system would analyze email data to generate statistics on word usage, sentiment, and identify groups. It would extract emails from a CSV file, clean the text, and apply natural language processing techniques like word clouds, sentiment analysis, and topic modeling. This would provide insights into an individual's role and interactions within a social network from their email data alone. The system aims to understand global network properties by examining sample ego networks.

Uploaded by

JOSHITHA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views14 pages

Sin Review-2 PDF

The document summarizes an analysis of email ego networks. It discusses 5 papers on the topic from the literature. The proposed system would analyze email data to generate statistics on word usage, sentiment, and identify groups. It would extract emails from a CSV file, clean the text, and apply natural language processing techniques like word clouds, sentiment analysis, and topic modeling. This would provide insights into an individual's role and interactions within a social network from their email data alone. The system aims to understand global network properties by examining sample ego networks.

Uploaded by

JOSHITHA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ANALYSIS OF EMAIL EGO

NETWORK
Final Review Report
Submitted by

19BCE2378 – R.JOSHITHA
19BCE2397 – V.HIMA SINDHU
19BCE2407 – S.SANJANA

Prepared For
SOCIAL AND INFORMATION NETWORKS
CSE3021

Submitted To

Prof. Govinda K
School of Computer Science and Engineering
1. LITERATURE SURVEY

I_Temporal Visualization and Analysis of Social Networks

Gloor, P, Laubacher, R,Zhao,Y, & Dynes, S. (2004,June). Temporal visualization


and analysis of social networks. In NAACSOS Conference,June(pp. 27-29).
Summary:-
The paper introduces a visual browser for the visualization and analysis of social
links (relationships). The visual browser displays the progression of
communication networks between individuals over time. The goal is to come up
with an environment for the analysis of the dynamics of communication in social
spaces, while respecting individual privacy.
Methodology:-
They have implemented a flexible but scalable architecture. Email messages are
processed locally in three steps. In the first step, the e-mail messages and mailing
lists are parsed and stored in decomposed format in a database on the local
machine. In the second step the database can be queried to select messages sent
or received by a group in a given time period. In the third step the selected
communication flows can be represented in the visual browser using the own net.
Dataset details:-
The dataset consists of a one-year email archive of a 200 people global consulting
practice. As an approximation of the ego network of the practice leader, they are
using his mailbox, similarly they obtained the mailbox of the practice coordinator
as an estimate of his ego network.
Relevant Finding:-
For the test dataset the temporal analysis conveys new insights that would have
been much more expensive to obtain with traditional means. The tool offers a fast
way to find periods of low and high centrality, and to identify periods of high
productivity and information dissemination
Limitations/Future Work identified in the Survey :-
The continuing goals are to gain deeper insights into the correlation of the
evolution of online group dynamics with innovation, and developing a theory of
member roles in innovation communities.
II Analysis of Ego Network Structure in Online Social Networks
Arnaboldi, V ,Conti M ,Passarella, A., & Pezzoni, F. (2012, September). Analysis of ego
network structure in online social networks. In 2012 International Conference on Privacy,
Security, Risk and Trust and 2012 International Confernece on Social Computing (pp.31-40).
IEEE.

Summary:-
Online Social Networks are becoming a fundamental medium for humans to
manage their social life, however the structure of ego networks in these virtual
environments has not been investigated yet. In this work they contribute to filling
this gap by analyzing a large data set of Facebook relationships.
Methodology:-
They filter the data to obtain the frequency of contact of the relationships, and
they check - by using different clustering techniques - whether structures similar
to those found in offline social networks can be observed.
Dataset details:-
In this work they contribute to fill this gap by analysing a large data set of
Facebook relationships.
Relevant Finding:-
The results show a strikingly similarity between the social structures in offline
and Online Social Networks.
Limitations/Future Work identified in the Survey :-
These results strongly suggest that, even if the ways to communicate and to
maintain social relationships are changing due to the diffusion of Online Social
Networks, the way people organize their social relationships seem to remain
unaltered.

III Tie Formation on Twitter: Homophily and Structure of Egocentric Networks

De Choudhury, M. (2011, October). Tie formation on twitter: Homophily and structure of


egocentric networks. In 2011 IEEE third international conference on privacy, security, risk
and trust and 2011 IEEE third international conference on social computing (pp.465-470).
Summary:-
In the context of tie formation in these networks, they study different ego network
topologies exhibiting homophily along different attributes.
Methodology :-
They proposed a variety of attributes along which homophily can be measured
between individuals: including demographic attributes, activity specific attributes
and content based attributes. They further categorize ego network structures as
generators, mediators and receptors based on a measure called ego ratio.
Dataset:-
A large Twitter dataset comprising about 29.5M tweets
Relevant finding:-
Mediators were observed to associate tie formation extensively with location,
interactiveness and sentiment homophily, whereas generators were driven largely
by information broadcasting behavior homophily.
Limitations/Future Work identified in the Survey :-
Implications of the findings in understanding the diverse range of motivations
behind user participation in social media sites are very important.

IV A semantic model for academic social network analysis

Hu, J., Liu, M., & Zhang, J. (2014, August). A semantic model for academic social network
analysis. In Proceedings of the 2014 IEEE/ACM International Conference on Advances in
Social Networks Analysis and Mining (pp. 310313). IEEE Press.

Summary:-
They have proposed a semantic model that can naturally represent various
academic social networks. They use a concise language to represent and query
academic social networks.
Dataset:-
They use the social media platforms for their dataset
Relevant Finding:-
They have found various complex semantic relationships among social actors
Limitations/Future Work identified in the Survey :-
This model can be used as the foundation for managing, manipulating and
querying academic social networks.

V Homogeneity of social networks by age and marital status: A multilevel analysis of ego-centered
networks

Kalmijn, M., & Vermunt, J. K. (2007). Homogeneity of social networks by age and marital
status: A multilevel analysis of ego-centered networks. Social networks, 29(1), 25-43.

Summary:-
The paper applies this issue of homogeneity of social networks by age and
marital status to the question of whether there is selection in networks.
Methodology:-
They use a novel analytical approach by adopting a latent class type random-
effects approach to the multilevel structure of the network data which allows for
simple descriptions of homogeneity in terms of odds ratios.
Dataset:-
A representative survey containing data on contact and support networks
Relevant Findings:-
The analyses show that age boundaries are strong and that they partly explain
marital status boundaries.
Limitations/Future Work identified in the Survey :-
The tendency of alters to be more similar to each other than one would expect
from their similarity to ego.
2. OVERVIEW OF PROPOSED SYSTEM
2.1 Introduction
Email Ego networks have been studied in social networks to understand the role
an individual plays and how they interact with others. Typical analysis involves
calculating various graph properties such as density, degree centrality, closeness
centrality, betweenness centrality, and the number of cliques and components,
and clustering alters into groups. The expectation is that the clusters would
correspond to natural groups such as family or other groups based on school,
religion, and hobby. However, these studies are usually limited in scope. Often,
only a small number of ego networks are examined. Ego network information is
obtained via a questionnaire by asking individuals about people they know and in
what context.
Another question of interest is whether we can understand the global properties
of the network by looking at the network properties of a sample of ego networks.
Often, we may only have data on a sample of the population and we would like
to understand the properties of the whole network from that sample. In addition,
certain measures may be too difficult to calculate directly for the entire network
but can be approximated from a sample of ego network. In other cases, some
measures are not necessarily meant for a large network or may simply be different
depending on the ego. For instance, how to cluster nodes into groups may differ
depending on the ego since each individual may want to group people they know
differently because people play different roles in different peoples’ lives.
3.2. Framework, Architecture or Modules of the Proposed System

The key modules are:


• EPT Counter
The Email-Person-Topic Counter allows you to type in a series of search words
and select senders to see the count of emails sent containing those words.
• Wordcloud Generator
We developed a wordcloud application. We choose the name of one of the top
contributors, and see a cloud of the most common words in his or her emails.
• Sentiment Analysis
The sentiment analysis provides the most unique view of the data. "Sentiment"
runs from -1 to 1, depicting a range of completely negative to completely positive
content. Sentiment was determined using TextBlob, a Python package that
determines sentiment by comparing text to its own lexicons of positive and
negative words.
Tools used are Pandas, Seeborn, Matplotlib, Wordcloud, Image(PIL/Pillow)

3.3 Proposed System Model


3. Proposed System Analysis and Design
Get the emails
The data is in a csv file so first we fetch the data and populate a pandas data frame
with the given column names.
Cleaning the Data

Then we use regular expressions to clean unnecessary or problem elements from


the text. The first eliminates symbols and the second strips particular phrases from
the emails, particularly those that label an email as being investigated by the
House Benghazi investigation committee. While these regular expressions
certainly strip out unwanted material, it is feasible that they pull a few words out
of the emails that were, in fact, innocuous and part of the true body text. However,
as the words in question are sentiment-less, we do not feel that we are risking the
loss of pertinent data.
Counts by keyword

Two functions are used to create a count of emails that contain particular topics.
The first function takes a single person and creates a count of emails that contain
each of the given keywords. The second uses that function to complete that task
for all the selected email senders.
Make Sentiment

Sentiment is determined with TextBlob. Two functions are necessary as the first
creates the data that will populate the first sentiment graph, which shows the
density of sentiments among a given sender's corpus of emails. The second
determines the sentiment by recipient. The extra argument, "personlist" is
populated with the selected recipients from the dropdown menu on the
application.
Visualisation

The Sentiments obtained have been visualized in a pie chart in the following
sections:- positive,negative and neutral. EPT Counter has also been visualized.
5.1 Implementation

1) Importing Data and Cleaning it :-


i. Download the dataset from kaggle
ii. Import the emails.csv into a dataframe using pandas
iii. Remove all unnecessary characters using regex substitution
iv. Remove stopwords
v. Remove unnecessary blank spaces similarly

2) Building word cloud


i. Extract Raw Text from the data frame
ii. Remove unrequired data for the wordcloud such as from to , case number,
doc number etc
iii. Plot the wordcloud using the string formed after cleaning rawText

3) Building EPT Counter


i. Access the given database of emails
ii. Input the number of senders and their names to be analyzed
iii. Input the no. of words to be analyzed (maximum of 5) and also input the
words
iv. A bar graph is visualized at the end of the program which compares the
number of times the words has appeared and to which senders mail it has
appeared in.

4) Sentiment Analysis
i. Get all the names of unique people that sent mails to Hilary and make a
list of people with high frequency
ii. Compute the mean sentiment of each person and find who was most
positive toward Hilary. A bar chart has been plotted for visualizing the
same
iii. The total positive,negative,neutral cases are plotted using a pie chart.
5.2. RESULTS AND DISCUSSIONS
a) WordCloud

Wordcloud shows the most occurring word in all of Hilary Clintons emails.
Some of the most used words are as follows:-
• United states
• Government
• Said
• People
• One
• Work
• Obama etc…
b) EPT Counter

In the above graph, how does every refer to the President, Only Mrs. Clinton
uses the shortened POTUS, while Mr. Blumenthal is more likely than the rest to
just use the president's first name. Overall, "Obama" is the most popular, while
very few speak informally in their emails by calling him Barack .

In the graph above, we can see that Mrs. Clinton is quite courteous in her emails
(although she does prefer "pls" to "please"). We also see that Huma Abedin is
more concerned with administrative tasks than her counterpart or her superior,
Cheryl Mills.
In the above graphs, you can see that "Benghazi," "Libya," and "attack" appear
in surprisingly few emails. One would imagine that the majority of emails
concerning those topics remain unreleased or classified. But finer details can be
gleaned: though Jake Sullivan and Huma Abedin held the title of Deputy Chief
of Staff, the content of these emails suggest that Huma was less involved in the
affairs in Benghazi.
c) Sentiment Analysis
The pie chart above shows the percentages of different sentiments in Hilarys
inbox.
Positive sentiment ranges from 0.1 to 1
Negative ranges from -0.1 to -1
Neutral -0.1 to 0.1

By analyzing all the data we were able to find that most the mails have positive
neutral sentiment. Only a very few percentage of mails had negative sentiment.

The above graph shows the average sentiment of the persons that have sent
most number of emails to Hilary.
Mrs. Clinton's emails were largely positive, with peaks at 0.2 and 0.5. This isn't
surprising given all the "please" and "thank you's" that we discovered she uses in
her emails. On the right we see the average sentiment of emails sent to each of
the recipients along the x-axis. While Mrs. Clinton is positive with everyone, she
is the most positive with Mrs Cheryll.
The above graph shows the average sentiment of hilary’s email sent to others.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy