0% found this document useful (0 votes)
36 views21 pages

SNS Unit Iii

Uploaded by

Adithyaraaj R.P.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views21 pages

SNS Unit Iii

Uploaded by

Adithyaraaj R.P.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

UNIT III – EXTRACTION AND MININGIN SOCIAL NETWORKING DATA

Extracting evolution of Web Community from a Series of Web Archive, Detecting communities in
social networks, Definition of community, Evaluating communities, Methods for community detection
and mining, Applications of community mining algorithms, Tools for detecting communities social
network infrastructures and communities, Big data and Privacy.

Extracting evolution of Web Community from a Series of Web Archive


The extraction of Web community utilizes Web community chart A graph of communities, in
which related communities are connected by weighted edges. The main advantage of the Web
community chart is existence of relevance between communities.
Notations Used
⚫ t1, t2, ..., tn: Time when each archive crawled. Currently, a month is used as the unit
time.
⚫ W(tk): The Web archive at time tk.
⚫ C(tk): The Web community chart at time tk.
⚫ c(tk), d(tk), e(tk), ...: Communities in C(tk).

Types of Changes
⚫ Emerge
◦ A community c(tk) emerges in C(tk), when c(tk) shares no URLs with any
community in C(tk−1).
⚫ Dissolve
◦ A community c(tk−1) in C(tk) has dissolved, when c(tk−1) shares no URLs with
any community in C(tk)
⚫ Growth and Shrink
◦ The community grows when new URLs are appeared in c(tk), and shrinks when
URLs disappeared from c(tk−1).
⚫ Split
◦ c(tk−1) shares URLs with multiple communities in C(tk)
⚫ Merge
◦ When multiple communities (c(tk−1)), d(tk−1), ...) share URLs with a single
community e(tk), these communities are merged into e(tk)
Evolution Metrics
Evolution metrics measure how a particular community c(tk) has evolved. The metrics are
defined by differences between c(tk) and its corresponding community c(tk−1).
Growth Rate
The growth rate, Rgrow(c(tk−1), c(tk)), represents the increase of URLs per unit time. It allowsus
to find most growing or shrinking communities.

Stability
Represents the amount of disappeared, appeared, merged and split URLs per unit time. A stable
community on a topic is the best starting point for finding interesting changes around the topic.

Disappearance rate
The number of disappeared URLs from c(tk−1) per unit time. Higher disappear rate means thatthe
community has lost URLs mainly by disappearance.

Merge rate
The number of absorbed URLs from other communities by merging per unit time. Higher merge
rate means that the community has obtained URLs mainly by merging.
Split Rate
The split rate, Rsplit (c(tk−1), c(tk)), is the number of split URLs from c(tk−1) per unit time. When
the split rate is low, c(tk) is larger than other split communities. Otherwise, c(tk) is smaller than
other split communities.

Other Metrics
The novelty metrics of a main line (c(ti), c(ti+1), ..., c(t j)) is calculated as follows.

Web Archives and Graphs


⚫ Web archiving is the process of collecting portions of the Web to ensure the information
is preserved in an archive
⚫ Web crawlers are used for automated capture due to the massive size and amount of
information on the Web.
⚫ From each archive, a Web graph is built with URLs and links by extracting anchors from
all pages in the archive.
⚫ The graph included not only URLs inside the archive, but also URLs outside pointed to
by inside URLs.
⚫ By comparing these graphs, the Web was extremely dynamic
The size distribution of communities also follows the power law and its exponent did not change
so much over time. Although the size distribution of communities is stable, the structure of
communities changes dynamically. The structure of the chart changes mainly by split and merge,
in which more than half of communities are involved.

Split and Merged Communities


⚫ Both distributions roughly follow the power law, and show that split or merge rate is
small in most cases.
⚫ Their shapes and scales are also similar.
⚫ This symmetry is part of the reason why the size distribution of communities does not
change so much.
Emerged and Dissolved Communities
⚫ The size distributions of emerged and dissolved communities also follow the power law
⚫ Contribute to preserve the size distribution of communities.
⚫ Small communities are easy to emerge and dissolve
Growth Rate
⚫ The growth rate is small for most of communities, and the graph has clear y-axis
symmetry.
⚫ size distribution of communities is preserved over time.

Combining evolution metrics and relevance, evolution around a particular community can be
located. The size distribution of communities followed the power-law, and its exponent did not
change so much over time.

Detecting communities in social networks


Detecting communities from given social networks are practically important for the following
reasons:
1. Communities can be used for information recommendation because members of the
communities often have similar tastes and preferences. Membership of detected communities will
be the basis of collaborative filtering.
2. Communities will help us understand the structures of given social networks. Communities are
regarded as components of given social networks, and they will clarify the functions and properties
of the networks.
3. Communities will play important roles when we visualize large-scale social networks.
Relations of the communities clarify the processes of information sharing and information
diffusions, and they may give us some insights for the growth the networks in the future.

Definition of community

Communities are a property of many networks in which a particular


network may have multiple communities such that nodes inside a community
are densely connected. Nodes in multiple communities can overlap. Think of your
Facebook or Instagram account and consider who you interact with daily. You might be heavily
interacting with your friends, colleagues, family members and a few other important people in
your life. They form a very dense community inside your social network. network nodes are
tightly connected in knit groups within communities and loosely connected between
communities.

The word “community” intuitively means a subnetwork whose edges connecting inside of it
(intracommunity edges) are denser than the edges connecting outside of it (intercommunity edges).
Definitions of community can be classified into the following three categories.
 Local definitions
 Global definitions
 Definitions based on vertex similarity.
Local definitions
 The attention is focused on the vertices of the subnetwork under investigation and on its
immediate neighborhood. Local definitions of community can be further divided into self-
referring ones and comparative ones.
 The former considers the subnetwork alone, and the latter compares mutual connections of
the vertices of the subnetwork with the connections with external neighbors.
 The examples of self referring definitions are clique (a maximal subnetworks where each
vertex is adjacent to all the others), n-clique (a maximal subnetwork such that thedistance
of each pair of vertices is not larger than n), and k-plex (a maximal subnetwork such that
each vertex is adjacent to all the others except at most k of them).
 The examples of comparative definitions are LS set (a subnetwork where each vertex
hasmore neighbors inside than outside of the subnetwork), and weak community (the total
degrees of the vertices inside the community exceeds the the number of edges lying
between the community and the rest of the network)
Global definitions
 Global definitions of community characterize a subnetwork with respect to the network as
a whole. These definitions usually starts from a null model, in another words, a

network which matches the original network in some of its topological features, but which
does not display community structure.
 The simplest way to design a null model is to introduce randomness in the distribution of
edges among vertices.
 The most popular null model consists of a randomized version of the original network,
where edges are rewired at random under the constraint that each vertex keeps its degree.
This null model is the basic concept behind the definition of modularity.
Definitions Based on Vertex Similarity
Definitions of the last category is based on an assumption that communities are groups of vertices
which are similar to each other. Some quantitative criterion is employed to evaluate the similarity
between each pair of vertices. Similarity measures are at the basis of themethod of hierarchical
clustering. Hierarchical clustering is a way to find several layers of communities that are composed
of vertices similar to each other.
Repetitive merges of similar vertices based on some quantitative similarity measures will generate
a structure shown in Fig. 3.a. This structure is called dendrogram, and highly similar vertices are
connected in the lower part of the dendrogram. Subtrees obtained by cutting the dendrogram with
horizontal line correspond to communities. Communities of different granurality will be obtained
by changing the position of the horizontal line
Figure 3.a Dendrogram

Evaluating Communities
It is necessary to establish which partition exihibit a real community structure. Therefore, a quality
function for evaluating how good a partition is needed. The most popular quality function is the
modularity of Newman and Girivan:

where the sum runs over all pairs of vertices, A is the adjacency matrix, k i is the degree of vertex
i and m is the total number of edges of the network.
Modularity can be rewritten as follows:

where nm is the number of communities, ls is the total number of edges joining vertices of
community s, and ds is the sum of the degrees of the vertices of s.
The first term of each summand is the fraction of edges of the network inside the community,
whereas the second term represents the expected fraction of edges that would be there if the
network were a random network with the same degree for each vertex.
Figure 3.b illustrates the meaning of modularity.
Figure 3.b Modularity
The latter formula implicitly shows the definition of a community: a subnetwork is a communityif
the number of edges inside it is larger than the expected number in modularity’s null model.
The modularity of the whole network, taken as a single community, is zero. Modularity is always
smaller than one, and it can be negative as well.

Community Detection Techniques

Community detection methods can be broadly categorized into two types; Agglomerative

Methods and Divisive Methods. In Agglomerative methods, edges are added one by one to a graph

which only contains nodes. Edges are added from the stronger edge to the weaker edge. Divisive

methods follow the opposite of agglomerative methods. In there, edges are removed one by one from

a complete graph.

Methods for community detection and mining


There are naive methods for dividing given networks into subnetworks, such as graph
partitioning, hierarchical clustering, and k-means clustering. The methods for detecting
communities are roughly classified into the following categories:

(1) divisive algorithms


(2) modularity optimization
(3) spectral algorithms and
(4) other algorithms.
Divisive (remove) Algorithms
A simple way to identify communities in a network is to detect the edges that connect vertices of
different communities and remove them, so that the communities get disconnected from each
other.
The steps of the algorithm are as follows:
(1) Computation (calculation)of the centrality of all edges,
(2) Removal of edge with largest centrality (middle),
(3) Recalculation of centralities on the running network, and
(4) Iteration of the cycle from step (2).
Edge betweenness is the number of shortest paths between all vertex pairs that run along the
edge.

Modularity Optimization
Modularity (separate) is a quality function for evaluating partitions. Therefore, the partition
corresponding to its maximum value on a given network should be the best one. This is the
main idea for modularity optimization. It has been proved that modularity optimization is an NP
hard problem. However, there are currently several algorithms that are able to find fairly good
approximations of the modularity maximum in a reasonable time. One of the famous algorithms
for modularityoptimization is CNM algorithm. Another examples of the algorithms are greedy
algorithms and simulated annealing.

Spectral Algorithms
Spectral (cut) algorithms are to cut given network into pieces so that the number of edges to be
cut will be minimized. One of the basic algorithm is spectral graph bipartitioning.
Other Algorithms
There are many other algorithms for detecting communities, such as the methods focusing on
random walk, and the ones searching for overlapping cliques.

Applications of community mining algorithms


Social media mining is used across several industries including business development, social
science research, health services, and educational purposes.
Some applications of community mining, with respect to various tasks in social network analysis
are listed below:
1. Network Reduction

2. Discovering Scientific Collaboration Groups from Social Networks

3. Mining Communities from Distributed and Dynamic Networks

Network Reduction
Network reduction is an important step in analyzing social networks. The example discussed here
is taken from the work in which the network was constructed from the bibliography of the book
entitled “graph products: structure and recognition”. The bibliography contains 360 papers written
by 314 authors. Its corresponding network is a bipartite graph, in which each node denotes either
one author or one paper, and link (i, j) represents author i publishing a paper j. Community structure
is detected using a community mining algorithm called ICS. Each community contains some
papers and their corresponding coauthors. Most of the detected communities are self-connected
components.
Moreover, the clustered coauthor network can be reduced into a much smaller one by condensing
each community as one node. Finally, the top-level condensed network corresponding to a 3-
community structure is constructed by using ICS from the condensed network. From this a
dendrogram corresponding to the original coauthor network can be built.
Discovering Scientific Collaboration Groups from Social Networks
This section show how community mining techniques can be applied to the analysis of scientific
collaborations among researchers.

Flink

Flink is a social network that describes the scientific collaborations among 681 semantic Web
researchers. Flink system for the extraction, aggregation and visualization of online social
networks. Flink employs semantic technology for reasoning with personal information extracted
from a number of electronic information sources including web pages, emails, publication archives
and FOAF profiles. The acquired knowledge is used for the purposes of social network
analysis and for generating a web-based presentation of the community.
(http://flink.semanticweb.org/).
Mining Communities from Distributed and Dynamic Networks
Many applications involve distributed and dynamically-evolving networks, in which resources and
controls are not only decentralized but also updated frequently. One promising solution is based
on an Autonomy-Oriented Computing (AOC) approach, in which a group of self- organizing
agents are utilized. The agents will rely only on their locally acquired information about networks.
Intelligent Portable Digital Assistants (or iPDAs for short) that people carry around can form a
distributed network, in which their users communicate with each other through calls or messages.
One useful function of iPDAs would be to find and recommend new friends with common
interests, or potential partners in research or business, to the users.
The way to implement it will be through the following steps:
(1) based on an iPDA user’s communication traces, selecting individuals who have frequently
contacted or been contacted with the user during a certain period of time;
(2) taking the selected individuals as the input to an AOC-based algorithm.
(3) ranking and recommending new persons who might not be included the current acquaintance
book, the user.
In such a way, people can periodically receive recommendations about friends or partners from
their iPDAs.

Tools for detecting communities social network infrastructures and communities

There are many tools used to implement social network mining. Some tools given below that are
used to analyse it:

A. Gephi :

Gephi is an open-source software for network visualization and analysis. It helps data analysts
to intuitively reveal patterns and trends, highlight outliers and tells stories with their data. It
uses a 3D render engine to display large graphs in real-time and to speed up the exploration.

(Network visualization, also known as graph visualization or link analysis, is the process of
visually presenting networks of connected entities as links and nodes. Nodes represent data
points, and links represent the connections between them.)

B. Graphviz:

Graphviz is an open source graph visualization platform. It has different graph design
programs suitable for viewing social networks in interactive mode. Graph visualization is a
way of representing structural information as diagrams.

C. SONDY: An Open Source Platform for Social Dynamics Mining and Analysis. Social
dynamics (or sociodynamics) is the study of the behavior of groups and of the interactions of
individual group members, aiming to understand the emergence of complex social behaviors
among microorganisms, plants and animals, including humans.

SONDY is a tool for analyzing trends and dynamics in online social network data. SONDY
helps end users, such as media analysts or journalists, to understand the interests and activity
of social network users by providing emerging topics and event detection, as well as network
analysis functionalities.

D. NEO4J: Neo4j is a graph database. It is an integrated, disk-based, fully transactional


persistence engine that stores structured data in graph instead of tables.

E. SocNetV

Social Network Visualizer (SocNetV) is a cross-platform, user-friendly free software


application for social network analysis and visualization.

Social Network Visualizer (SocNetV) is a multiplatform, user interactive interface friendly


open source application for social network analysis and visualization of structure. We can
design social network with some clicks and it use different file format like Pajek, GraphMP,
UCINET etc..

F. Cytoscape

Cytoscape is open source platform for visualizing large and complex network in few clicks.
It is integrating complex networks with any data attributes. It will be use to analyse social
network, bioinformatics and semantic web.

G. NodeXL

NodeXL provides powerful feature to analyse social network including influencing node,
brand evaluation based on product performance, content analysis, campaign analysis. It
provides automation for different kind of analysis on social network. NodeXL makes it easy
to explore, analyse and visualize network graphs

H. NetMiner

NetMiner is not open source tools but it is premium tools to analyses social network features
based on input data. It allows user to explore network data visually and interactively. Many
other tools are available for graph mining as well as community detection and topic detection.

How Can Big Data Make Your Social Media Better?

In an era where information is king and digital platforms dominate our lives, social media has become
an indispensable part of our daily routines. From connecting with friends and family to following
our favorite brands and staying updated on current events, it’s hard to imagine a world without the
constant scroll of social media feeds. However, this sheer volume of data generated by billions of
users can be overwhelming. This is where significant data revolutionizes how we use social media,
making it more personalized, engaging, and effective.

Enhanced User Experience


One of the most noticeable ways big data can improve your social media experience is by enhancing
user engagement. Social media platforms collect a staggering amount of data every moment:
likes, shares, comments, and content interactions. By harnessing the power of big data analytics,
social media companies can dissect this information to understand what resonates with users
and what falls flat. This insight helps curate user feeds, ensuring they are exposed to content that
aligns with their interests, preferences, and behaviors. Consequently, users spend more time on
the platform because they find the content more relevant and engaging.

Personalization

Personalization is the cornerstone of social media’s future, and big data plays a pivotal role. By
analyzing past user interactions, interests, and preferences, social media algorithms can offer tailored
experiences. Personalization keeps users hooked, from suggesting friends and groups to
showing content that matches user tastes. For businesses, this level of personalization enhances
their ability to target advertisements effectively, resulting in higher conversion rates and better
return on investment.

Sentiment Analysis

Big data’s prowess extends to sentiment analysis on social media. By examining the language and
emotions expressed in posts and comments, platforms can gauge public opinion on various
subjects, products, or brands. This information is invaluable for businesses and marketers seeking
to understand their audience’s perception. Sentiment analysis can also serve as an early warning
system for potential PR crises, enabling swift intervention and issue resolution.

Influencer Marketing

In the world of social media, influencer marketing is a game-changer. Big data aids in identifying
the most influential users within a niche or industry by analyzing their follower count, engagement
rates, and content impact. This information empowers businesses to collaborate with these
influencers, facilitating product or service promotion to a highly targeted and engaged
audience.
Improved Customer Support

Big data isn’t just about marketing; it can significantly enhance customer support on social media
platforms. Chatbots, driven by artificial intelligence and machine learning, leverage data to
comprehend user queries and provide rapid responses. These chatbots handle routine inquiries,
freeing human customer support agents to focus on complex issues. This not only slashes response
times but also elevates the overall customer experience.

Content Optimization

For content creators and marketers, big data is a goldmine of insights. Businesses can fine-tune
content strategies by scrutinizing metrics like click-through rates, conversion rates, and audience
demographics. This data-driven approach ensures the creation of content that resonates with the
target audience. It also helps content creators allocate resources more effectively, guaranteeing
the production of content that genuinely connects with their audience.

How Do You Use Big Data in Social Media?

Billions of users across various platforms produce an enormous amount of data every day. This data,
often called “big data,” holds immense potential for businesses, marketers, and individuals seeking
to make the most of their social media presence. Explore how big data is harnessed and utilized in
social media and how it can be a game-changer for your online endeavors.

Data Collection and Aggregation

Utilizing Big Data in Social Media begins with data collection and aggregation. Social media
platforms are designed to gather extensive information about user behavior. Every click, like, share,
comment, and post generates valuable data points. This data includes user demographics, content
preferences, interaction patterns, and sentiment analysis based on the language used in posts and
comments.
Social media companies employ advanced data collection methods to capture information stored
and organized in massive databases. The scale of data collected is astounding and grows
exponentially with each passing second. This raw data forms the foundation upon which the power
of big data analytics is harnessed.

Data Analysis and Insights

Once the data is collected, the real magic of big data happens through analysis and deriving
actionable insights. Advanced algorithms and machine learning techniques are used to process and
make sense of this vast sea of information. Here’s how it’s done:

 User Behavior Analysis: Data analytics tools dissect user behavior patterns.
They determine what content users engage with the most when they are most active and
how they navigate the platform.
 Personalization: Social media platforms use big data to create highly personalized
experiences. By analyzing a user’s past interactions and preferences, algorithms suggest
friends, groups, and content tailored to the individual’s interests.
 Sentiment Analysis: Natural language processing algorithms are employed to understand
the sentiment in posts and comments. This can help gauge public opinion on various topics,
products, or brands.
 Content Optimization: Data analytics provides insights into content performance for
businesses and content creators. Metrics such as click-through rates, conversion rates, and
audience demographics guide the creation of content that resonates with the target audience.
 Predictive Analysis: Predictive analytics uses historical data to forecast future trends,
helping businesses anticipate customer needs and market fluctuations.

Influencer Marketing

In the age of social media, influencer marketing has gained immense prominence. Big data is pivotal
in identifying and partnering with the right influencers. By analyzing influencers’ follower
counts, engagement rates, and content impact, businesses can make informed decisions about who
to collaborate with. This approach ensures that products or services are promoted to a highly
targeted and engaged audience, maximizing the impact of influencer marketing campaigns.

Targeted Advertising

One of the most powerful applications of Big Data in Social Media is in the realm of advertising.
The granular data collected about users allows for highly targeted ad campaigns. Businesses
can define their ideal audience based on age, location, interests, and online behavior. This
precision minimizes wasted advertising spend and increases the likelihood of conversions.

Customer Support and Chatbots

Big data isn’t limited to marketing; it also greatly enhances customer support on social media
platforms. Chatbots, driven by artificial intelligence and machine learning, leverage data to
understand user queries and provide rapid responses. These chatbots handle routine inquiries, freeing
human customer support agents to focus on complex issues. This not only slashes response times
but also elevates the overall customer experience.
Monitoring and Crisis Management

In the age of social media, news and public opinion can spread like wildfire. Big data analytics
help businesses and organizations monitor their online reputation. They can detect potential PR
crises early on by tracking mentions, comments, and sentiment. This enables proactive crisis
management, preventing issues from escalating and minimizing reputational damage.

Content Recommendations

Social media platforms use big data to power content recommendation systems. These systems
suggest content to users based on their past interactions and interests. This feature keeps users
engaged and encourages them to spend more time on the platform, ultimately benefiting its
advertising revenue.

Competitive Analysis

Businesses can gain a competitive edge by analyzing the social media strategies of their competitors.
Big data lets them track competitors’ engagement rates, content performance, and audience
demographics. This information helps in benchmarking and refining their social media strategies.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy