Sin Review-2 PDF
Sin Review-2 PDF
NETWORK
Final Review Report
Submitted by
19BCE2378 – R.JOSHITHA
19BCE2397 – V.HIMA SINDHU
19BCE2407 – S.SANJANA
Prepared For
SOCIAL AND INFORMATION NETWORKS
CSE3021
Submitted To
Prof. Govinda K
School of Computer Science and Engineering
1. LITERATURE SURVEY
Summary:-
Online Social Networks are becoming a fundamental medium for humans to
manage their social life, however the structure of ego networks in these virtual
environments has not been investigated yet. In this work they contribute to filling
this gap by analyzing a large data set of Facebook relationships.
Methodology:-
They filter the data to obtain the frequency of contact of the relationships, and
they check - by using different clustering techniques - whether structures similar
to those found in offline social networks can be observed.
Dataset details:-
In this work they contribute to fill this gap by analysing a large data set of
Facebook relationships.
Relevant Finding:-
The results show a strikingly similarity between the social structures in offline
and Online Social Networks.
Limitations/Future Work identified in the Survey :-
These results strongly suggest that, even if the ways to communicate and to
maintain social relationships are changing due to the diffusion of Online Social
Networks, the way people organize their social relationships seem to remain
unaltered.
Hu, J., Liu, M., & Zhang, J. (2014, August). A semantic model for academic social network
analysis. In Proceedings of the 2014 IEEE/ACM International Conference on Advances in
Social Networks Analysis and Mining (pp. 310313). IEEE Press.
Summary:-
They have proposed a semantic model that can naturally represent various
academic social networks. They use a concise language to represent and query
academic social networks.
Dataset:-
They use the social media platforms for their dataset
Relevant Finding:-
They have found various complex semantic relationships among social actors
Limitations/Future Work identified in the Survey :-
This model can be used as the foundation for managing, manipulating and
querying academic social networks.
V Homogeneity of social networks by age and marital status: A multilevel analysis of ego-centered
networks
Kalmijn, M., & Vermunt, J. K. (2007). Homogeneity of social networks by age and marital
status: A multilevel analysis of ego-centered networks. Social networks, 29(1), 25-43.
Summary:-
The paper applies this issue of homogeneity of social networks by age and
marital status to the question of whether there is selection in networks.
Methodology:-
They use a novel analytical approach by adopting a latent class type random-
effects approach to the multilevel structure of the network data which allows for
simple descriptions of homogeneity in terms of odds ratios.
Dataset:-
A representative survey containing data on contact and support networks
Relevant Findings:-
The analyses show that age boundaries are strong and that they partly explain
marital status boundaries.
Limitations/Future Work identified in the Survey :-
The tendency of alters to be more similar to each other than one would expect
from their similarity to ego.
2. OVERVIEW OF PROPOSED SYSTEM
2.1 Introduction
Email Ego networks have been studied in social networks to understand the role
an individual plays and how they interact with others. Typical analysis involves
calculating various graph properties such as density, degree centrality, closeness
centrality, betweenness centrality, and the number of cliques and components,
and clustering alters into groups. The expectation is that the clusters would
correspond to natural groups such as family or other groups based on school,
religion, and hobby. However, these studies are usually limited in scope. Often,
only a small number of ego networks are examined. Ego network information is
obtained via a questionnaire by asking individuals about people they know and in
what context.
Another question of interest is whether we can understand the global properties
of the network by looking at the network properties of a sample of ego networks.
Often, we may only have data on a sample of the population and we would like
to understand the properties of the whole network from that sample. In addition,
certain measures may be too difficult to calculate directly for the entire network
but can be approximated from a sample of ego network. In other cases, some
measures are not necessarily meant for a large network or may simply be different
depending on the ego. For instance, how to cluster nodes into groups may differ
depending on the ego since each individual may want to group people they know
differently because people play different roles in different peoples’ lives.
3.2. Framework, Architecture or Modules of the Proposed System
Two functions are used to create a count of emails that contain particular topics.
The first function takes a single person and creates a count of emails that contain
each of the given keywords. The second uses that function to complete that task
for all the selected email senders.
Make Sentiment
Sentiment is determined with TextBlob. Two functions are necessary as the first
creates the data that will populate the first sentiment graph, which shows the
density of sentiments among a given sender's corpus of emails. The second
determines the sentiment by recipient. The extra argument, "personlist" is
populated with the selected recipients from the dropdown menu on the
application.
Visualisation
The Sentiments obtained have been visualized in a pie chart in the following
sections:- positive,negative and neutral. EPT Counter has also been visualized.
5.1 Implementation
4) Sentiment Analysis
i. Get all the names of unique people that sent mails to Hilary and make a
list of people with high frequency
ii. Compute the mean sentiment of each person and find who was most
positive toward Hilary. A bar chart has been plotted for visualizing the
same
iii. The total positive,negative,neutral cases are plotted using a pie chart.
5.2. RESULTS AND DISCUSSIONS
a) WordCloud
Wordcloud shows the most occurring word in all of Hilary Clintons emails.
Some of the most used words are as follows:-
• United states
• Government
• Said
• People
• One
• Work
• Obama etc…
b) EPT Counter
In the above graph, how does every refer to the President, Only Mrs. Clinton
uses the shortened POTUS, while Mr. Blumenthal is more likely than the rest to
just use the president's first name. Overall, "Obama" is the most popular, while
very few speak informally in their emails by calling him Barack .
In the graph above, we can see that Mrs. Clinton is quite courteous in her emails
(although she does prefer "pls" to "please"). We also see that Huma Abedin is
more concerned with administrative tasks than her counterpart or her superior,
Cheryl Mills.
In the above graphs, you can see that "Benghazi," "Libya," and "attack" appear
in surprisingly few emails. One would imagine that the majority of emails
concerning those topics remain unreleased or classified. But finer details can be
gleaned: though Jake Sullivan and Huma Abedin held the title of Deputy Chief
of Staff, the content of these emails suggest that Huma was less involved in the
affairs in Benghazi.
c) Sentiment Analysis
The pie chart above shows the percentages of different sentiments in Hilarys
inbox.
Positive sentiment ranges from 0.1 to 1
Negative ranges from -0.1 to -1
Neutral -0.1 to 0.1
By analyzing all the data we were able to find that most the mails have positive
neutral sentiment. Only a very few percentage of mails had negative sentiment.
The above graph shows the average sentiment of the persons that have sent
most number of emails to Hilary.
Mrs. Clinton's emails were largely positive, with peaks at 0.2 and 0.5. This isn't
surprising given all the "please" and "thank you's" that we discovered she uses in
her emails. On the right we see the average sentiment of emails sent to each of
the recipients along the x-axis. While Mrs. Clinton is positive with everyone, she
is the most positive with Mrs Cheryll.
The above graph shows the average sentiment of hilary’s email sent to others.