SNS Unit Iii
SNS Unit Iii
Extracting evolution of Web Community from a Series of Web Archive, Detecting communities in
social networks, Definition of community, Evaluating communities, Methods for community detection
and mining, Applications of community mining algorithms, Tools for detecting communities social
network infrastructures and communities, Big data and Privacy.
Types of Changes
⚫ Emerge
◦ A community c(tk) emerges in C(tk), when c(tk) shares no URLs with any
community in C(tk−1).
⚫ Dissolve
◦ A community c(tk−1) in C(tk) has dissolved, when c(tk−1) shares no URLs with
any community in C(tk)
⚫ Growth and Shrink
◦ The community grows when new URLs are appeared in c(tk), and shrinks when
URLs disappeared from c(tk−1).
⚫ Split
◦ c(tk−1) shares URLs with multiple communities in C(tk)
⚫ Merge
◦ When multiple communities (c(tk−1)), d(tk−1), ...) share URLs with a single
community e(tk), these communities are merged into e(tk)
Evolution Metrics
Evolution metrics measure how a particular community c(tk) has evolved. The metrics are
defined by differences between c(tk) and its corresponding community c(tk−1).
Growth Rate
The growth rate, Rgrow(c(tk−1), c(tk)), represents the increase of URLs per unit time. It allowsus
to find most growing or shrinking communities.
Stability
Represents the amount of disappeared, appeared, merged and split URLs per unit time. A stable
community on a topic is the best starting point for finding interesting changes around the topic.
Disappearance rate
The number of disappeared URLs from c(tk−1) per unit time. Higher disappear rate means thatthe
community has lost URLs mainly by disappearance.
Merge rate
The number of absorbed URLs from other communities by merging per unit time. Higher merge
rate means that the community has obtained URLs mainly by merging.
Split Rate
The split rate, Rsplit (c(tk−1), c(tk)), is the number of split URLs from c(tk−1) per unit time. When
the split rate is low, c(tk) is larger than other split communities. Otherwise, c(tk) is smaller than
other split communities.
Other Metrics
The novelty metrics of a main line (c(ti), c(ti+1), ..., c(t j)) is calculated as follows.
Combining evolution metrics and relevance, evolution around a particular community can be
located. The size distribution of communities followed the power-law, and its exponent did not
change so much over time.
Definition of community
The word “community” intuitively means a subnetwork whose edges connecting inside of it
(intracommunity edges) are denser than the edges connecting outside of it (intercommunity edges).
Definitions of community can be classified into the following three categories.
Local definitions
Global definitions
Definitions based on vertex similarity.
Local definitions
The attention is focused on the vertices of the subnetwork under investigation and on its
immediate neighborhood. Local definitions of community can be further divided into self-
referring ones and comparative ones.
The former considers the subnetwork alone, and the latter compares mutual connections of
the vertices of the subnetwork with the connections with external neighbors.
The examples of self referring definitions are clique (a maximal subnetworks where each
vertex is adjacent to all the others), n-clique (a maximal subnetwork such that thedistance
of each pair of vertices is not larger than n), and k-plex (a maximal subnetwork such that
each vertex is adjacent to all the others except at most k of them).
The examples of comparative definitions are LS set (a subnetwork where each vertex
hasmore neighbors inside than outside of the subnetwork), and weak community (the total
degrees of the vertices inside the community exceeds the the number of edges lying
between the community and the rest of the network)
Global definitions
Global definitions of community characterize a subnetwork with respect to the network as
a whole. These definitions usually starts from a null model, in another words, a
network which matches the original network in some of its topological features, but which
does not display community structure.
The simplest way to design a null model is to introduce randomness in the distribution of
edges among vertices.
The most popular null model consists of a randomized version of the original network,
where edges are rewired at random under the constraint that each vertex keeps its degree.
This null model is the basic concept behind the definition of modularity.
Definitions Based on Vertex Similarity
Definitions of the last category is based on an assumption that communities are groups of vertices
which are similar to each other. Some quantitative criterion is employed to evaluate the similarity
between each pair of vertices. Similarity measures are at the basis of themethod of hierarchical
clustering. Hierarchical clustering is a way to find several layers of communities that are composed
of vertices similar to each other.
Repetitive merges of similar vertices based on some quantitative similarity measures will generate
a structure shown in Fig. 3.a. This structure is called dendrogram, and highly similar vertices are
connected in the lower part of the dendrogram. Subtrees obtained by cutting the dendrogram with
horizontal line correspond to communities. Communities of different granurality will be obtained
by changing the position of the horizontal line
Figure 3.a Dendrogram
Evaluating Communities
It is necessary to establish which partition exihibit a real community structure. Therefore, a quality
function for evaluating how good a partition is needed. The most popular quality function is the
modularity of Newman and Girivan:
where the sum runs over all pairs of vertices, A is the adjacency matrix, k i is the degree of vertex
i and m is the total number of edges of the network.
Modularity can be rewritten as follows:
where nm is the number of communities, ls is the total number of edges joining vertices of
community s, and ds is the sum of the degrees of the vertices of s.
The first term of each summand is the fraction of edges of the network inside the community,
whereas the second term represents the expected fraction of edges that would be there if the
network were a random network with the same degree for each vertex.
Figure 3.b illustrates the meaning of modularity.
Figure 3.b Modularity
The latter formula implicitly shows the definition of a community: a subnetwork is a communityif
the number of edges inside it is larger than the expected number in modularity’s null model.
The modularity of the whole network, taken as a single community, is zero. Modularity is always
smaller than one, and it can be negative as well.
Community detection methods can be broadly categorized into two types; Agglomerative
Methods and Divisive Methods. In Agglomerative methods, edges are added one by one to a graph
which only contains nodes. Edges are added from the stronger edge to the weaker edge. Divisive
methods follow the opposite of agglomerative methods. In there, edges are removed one by one from
a complete graph.
Modularity Optimization
Modularity (separate) is a quality function for evaluating partitions. Therefore, the partition
corresponding to its maximum value on a given network should be the best one. This is the
main idea for modularity optimization. It has been proved that modularity optimization is an NP
hard problem. However, there are currently several algorithms that are able to find fairly good
approximations of the modularity maximum in a reasonable time. One of the famous algorithms
for modularityoptimization is CNM algorithm. Another examples of the algorithms are greedy
algorithms and simulated annealing.
Spectral Algorithms
Spectral (cut) algorithms are to cut given network into pieces so that the number of edges to be
cut will be minimized. One of the basic algorithm is spectral graph bipartitioning.
Other Algorithms
There are many other algorithms for detecting communities, such as the methods focusing on
random walk, and the ones searching for overlapping cliques.
Network Reduction
Network reduction is an important step in analyzing social networks. The example discussed here
is taken from the work in which the network was constructed from the bibliography of the book
entitled “graph products: structure and recognition”. The bibliography contains 360 papers written
by 314 authors. Its corresponding network is a bipartite graph, in which each node denotes either
one author or one paper, and link (i, j) represents author i publishing a paper j. Community structure
is detected using a community mining algorithm called ICS. Each community contains some
papers and their corresponding coauthors. Most of the detected communities are self-connected
components.
Moreover, the clustered coauthor network can be reduced into a much smaller one by condensing
each community as one node. Finally, the top-level condensed network corresponding to a 3-
community structure is constructed by using ICS from the condensed network. From this a
dendrogram corresponding to the original coauthor network can be built.
Discovering Scientific Collaboration Groups from Social Networks
This section show how community mining techniques can be applied to the analysis of scientific
collaborations among researchers.
Flink
Flink is a social network that describes the scientific collaborations among 681 semantic Web
researchers. Flink system for the extraction, aggregation and visualization of online social
networks. Flink employs semantic technology for reasoning with personal information extracted
from a number of electronic information sources including web pages, emails, publication archives
and FOAF profiles. The acquired knowledge is used for the purposes of social network
analysis and for generating a web-based presentation of the community.
(http://flink.semanticweb.org/).
Mining Communities from Distributed and Dynamic Networks
Many applications involve distributed and dynamically-evolving networks, in which resources and
controls are not only decentralized but also updated frequently. One promising solution is based
on an Autonomy-Oriented Computing (AOC) approach, in which a group of self- organizing
agents are utilized. The agents will rely only on their locally acquired information about networks.
Intelligent Portable Digital Assistants (or iPDAs for short) that people carry around can form a
distributed network, in which their users communicate with each other through calls or messages.
One useful function of iPDAs would be to find and recommend new friends with common
interests, or potential partners in research or business, to the users.
The way to implement it will be through the following steps:
(1) based on an iPDA user’s communication traces, selecting individuals who have frequently
contacted or been contacted with the user during a certain period of time;
(2) taking the selected individuals as the input to an AOC-based algorithm.
(3) ranking and recommending new persons who might not be included the current acquaintance
book, the user.
In such a way, people can periodically receive recommendations about friends or partners from
their iPDAs.
There are many tools used to implement social network mining. Some tools given below that are
used to analyse it:
A. Gephi :
Gephi is an open-source software for network visualization and analysis. It helps data analysts
to intuitively reveal patterns and trends, highlight outliers and tells stories with their data. It
uses a 3D render engine to display large graphs in real-time and to speed up the exploration.
(Network visualization, also known as graph visualization or link analysis, is the process of
visually presenting networks of connected entities as links and nodes. Nodes represent data
points, and links represent the connections between them.)
B. Graphviz:
Graphviz is an open source graph visualization platform. It has different graph design
programs suitable for viewing social networks in interactive mode. Graph visualization is a
way of representing structural information as diagrams.
C. SONDY: An Open Source Platform for Social Dynamics Mining and Analysis. Social
dynamics (or sociodynamics) is the study of the behavior of groups and of the interactions of
individual group members, aiming to understand the emergence of complex social behaviors
among microorganisms, plants and animals, including humans.
SONDY is a tool for analyzing trends and dynamics in online social network data. SONDY
helps end users, such as media analysts or journalists, to understand the interests and activity
of social network users by providing emerging topics and event detection, as well as network
analysis functionalities.
E. SocNetV
F. Cytoscape
Cytoscape is open source platform for visualizing large and complex network in few clicks.
It is integrating complex networks with any data attributes. It will be use to analyse social
network, bioinformatics and semantic web.
G. NodeXL
NodeXL provides powerful feature to analyse social network including influencing node,
brand evaluation based on product performance, content analysis, campaign analysis. It
provides automation for different kind of analysis on social network. NodeXL makes it easy
to explore, analyse and visualize network graphs
H. NetMiner
NetMiner is not open source tools but it is premium tools to analyses social network features
based on input data. It allows user to explore network data visually and interactively. Many
other tools are available for graph mining as well as community detection and topic detection.
In an era where information is king and digital platforms dominate our lives, social media has become
an indispensable part of our daily routines. From connecting with friends and family to following
our favorite brands and staying updated on current events, it’s hard to imagine a world without the
constant scroll of social media feeds. However, this sheer volume of data generated by billions of
users can be overwhelming. This is where significant data revolutionizes how we use social media,
making it more personalized, engaging, and effective.
Personalization
Personalization is the cornerstone of social media’s future, and big data plays a pivotal role. By
analyzing past user interactions, interests, and preferences, social media algorithms can offer tailored
experiences. Personalization keeps users hooked, from suggesting friends and groups to
showing content that matches user tastes. For businesses, this level of personalization enhances
their ability to target advertisements effectively, resulting in higher conversion rates and better
return on investment.
Sentiment Analysis
Big data’s prowess extends to sentiment analysis on social media. By examining the language and
emotions expressed in posts and comments, platforms can gauge public opinion on various
subjects, products, or brands. This information is invaluable for businesses and marketers seeking
to understand their audience’s perception. Sentiment analysis can also serve as an early warning
system for potential PR crises, enabling swift intervention and issue resolution.
Influencer Marketing
In the world of social media, influencer marketing is a game-changer. Big data aids in identifying
the most influential users within a niche or industry by analyzing their follower count, engagement
rates, and content impact. This information empowers businesses to collaborate with these
influencers, facilitating product or service promotion to a highly targeted and engaged
audience.
Improved Customer Support
Big data isn’t just about marketing; it can significantly enhance customer support on social media
platforms. Chatbots, driven by artificial intelligence and machine learning, leverage data to
comprehend user queries and provide rapid responses. These chatbots handle routine inquiries,
freeing human customer support agents to focus on complex issues. This not only slashes response
times but also elevates the overall customer experience.
Content Optimization
For content creators and marketers, big data is a goldmine of insights. Businesses can fine-tune
content strategies by scrutinizing metrics like click-through rates, conversion rates, and audience
demographics. This data-driven approach ensures the creation of content that resonates with the
target audience. It also helps content creators allocate resources more effectively, guaranteeing
the production of content that genuinely connects with their audience.
Billions of users across various platforms produce an enormous amount of data every day. This data,
often called “big data,” holds immense potential for businesses, marketers, and individuals seeking
to make the most of their social media presence. Explore how big data is harnessed and utilized in
social media and how it can be a game-changer for your online endeavors.
Utilizing Big Data in Social Media begins with data collection and aggregation. Social media
platforms are designed to gather extensive information about user behavior. Every click, like, share,
comment, and post generates valuable data points. This data includes user demographics, content
preferences, interaction patterns, and sentiment analysis based on the language used in posts and
comments.
Social media companies employ advanced data collection methods to capture information stored
and organized in massive databases. The scale of data collected is astounding and grows
exponentially with each passing second. This raw data forms the foundation upon which the power
of big data analytics is harnessed.
Once the data is collected, the real magic of big data happens through analysis and deriving
actionable insights. Advanced algorithms and machine learning techniques are used to process and
make sense of this vast sea of information. Here’s how it’s done:
User Behavior Analysis: Data analytics tools dissect user behavior patterns.
They determine what content users engage with the most when they are most active and
how they navigate the platform.
Personalization: Social media platforms use big data to create highly personalized
experiences. By analyzing a user’s past interactions and preferences, algorithms suggest
friends, groups, and content tailored to the individual’s interests.
Sentiment Analysis: Natural language processing algorithms are employed to understand
the sentiment in posts and comments. This can help gauge public opinion on various topics,
products, or brands.
Content Optimization: Data analytics provides insights into content performance for
businesses and content creators. Metrics such as click-through rates, conversion rates, and
audience demographics guide the creation of content that resonates with the target audience.
Predictive Analysis: Predictive analytics uses historical data to forecast future trends,
helping businesses anticipate customer needs and market fluctuations.
Influencer Marketing
In the age of social media, influencer marketing has gained immense prominence. Big data is pivotal
in identifying and partnering with the right influencers. By analyzing influencers’ follower
counts, engagement rates, and content impact, businesses can make informed decisions about who
to collaborate with. This approach ensures that products or services are promoted to a highly
targeted and engaged audience, maximizing the impact of influencer marketing campaigns.
Targeted Advertising
One of the most powerful applications of Big Data in Social Media is in the realm of advertising.
The granular data collected about users allows for highly targeted ad campaigns. Businesses
can define their ideal audience based on age, location, interests, and online behavior. This
precision minimizes wasted advertising spend and increases the likelihood of conversions.
Big data isn’t limited to marketing; it also greatly enhances customer support on social media
platforms. Chatbots, driven by artificial intelligence and machine learning, leverage data to
understand user queries and provide rapid responses. These chatbots handle routine inquiries, freeing
human customer support agents to focus on complex issues. This not only slashes response times
but also elevates the overall customer experience.
Monitoring and Crisis Management
In the age of social media, news and public opinion can spread like wildfire. Big data analytics
help businesses and organizations monitor their online reputation. They can detect potential PR
crises early on by tracking mentions, comments, and sentiment. This enables proactive crisis
management, preventing issues from escalating and minimizing reputational damage.
Content Recommendations
Social media platforms use big data to power content recommendation systems. These systems
suggest content to users based on their past interactions and interests. This feature keeps users
engaged and encourages them to spend more time on the platform, ultimately benefiting its
advertising revenue.
Competitive Analysis
Businesses can gain a competitive edge by analyzing the social media strategies of their competitors.
Big data lets them track competitors’ engagement rates, content performance, and audience
demographics. This information helps in benchmarking and refining their social media strategies.