Unit 2 Complete Notes Unit 2 Complete Notes
Unit 2 Complete Notes Unit 2 Complete Notes
Adjacency Matrix
In social media analytics, the adjacency matrix is a fundamental tool for representing and analyzing
network data.
What is an Adjacency Matrix?
• Essentially, it is a way to represent a network (a graph) in the form of a matrix (a table of
numbers).
• Rows and columns represent the nodes (users) in the network.
• The values within the matrix indicate whether or not there's a connection (edge) between
those nodes.
How it Works in Social Media:
• Binary Representation:
o Often, the matrix uses binary values:
▪ "1" indicates that there is a connection between two users.
▪ "0" indicates that there is no connection.
o For example, if user A follows user B, the cell corresponding to row A and column
B would have a "1".
• Data Representation:
o Provides a structured way to store and manipulate network data.
• Mathematical Analysis:
o Allows for the application of matrix algebra to analyze network properties.
o This is crucial for calculating centrality measures, detecting communities, and
performing other network analyses.
• Computational Efficiency:
o Adjacency matrices are well-suited for computer processing, enabling efficient
analysis of large social networks.
• Pattern Recognition:
o By viewing the matrix, patterns of connections can sometimes be visually
recognized.
• Input for algorithms:
o Many network analysis algorithms require the network information to be in the
form of an adjacency matrix.
In simpler terms:
• Imagine a grid where each row and column is a person on a social media site.
• If two people are friends, you put a mark in the box where their row and column meet.
• This grid is the adjacency matrix.
By using adjacency matrices, social media analysts can gain valuable insights into the structure
and dynamics of online social networks.
Key Components
Social network analysis explores how these key elements interact to create complex social
structures. By understanding these concepts, analysts can gain valuable insights into the dynamics
of online and offline social interactions.
When analyzing social networks, several key concepts help us understand the structure and
dynamics of these interconnected systems.
1. Nodes:
• Nodes are the fundamental units of a network. They represent the individual actors within
the network.
3. Paths:
• A path is a sequence of ties that connects two nodes.
• Analyzing paths helps understand how information or influence flows through a network.
• Shortest paths are often of particular interest, as they represent the most efficient routes of
communication.
4. Connectivity:
• Connectivity refers to how well-connected the nodes in a network are.
• It can be measured in various ways:
o Overall network density: The proportion of existing ties to possible ties.
Density = Total no. of edges / Total possible no. of edges
Total possible no. of edges = N (N-1)/2
where,
N = Total no. of nodes in a graph
Note: Density will be 100% if all possible connections are there in a graph.
o The presence of connected components: Groups of nodes that are connected to
each other, but not to other parts of the network.
• High connectivity generally facilitates the spread of information and influence.
5. Influencers:
• Influencers are nodes that have a disproportionate impact on the network.
o Information Diffusion:
▪ Simulating how information spreads outward from a source user, layer by
layer, can approximate how viral content spreads.
o Network Visualization:
▪ BFS can be used to generate layers of a network, that can then be visualized.
• Why it is preferred:
o In social networks, we are often interested in the shortest paths and immediate
neighborhoods, which BFS excels at.
o Identifying Cycles:
▪ DFS can detect cycles (loops) in a network, which might indicate certain
patterns of interaction.
o Exploring Deep Connections:
▪ In some very specific cases, if there is a need to explore very deep
connections, DFS might be useful.
• Why it is less common:
o DFS can get "lost" in long, less relevant paths.
o It does not guarantee finding shortest paths, which are often the primary interest in
social network analysis.
o Social networks are generally very broad, and not very deep, so DFS is not usually
the most efficient way to analyze them.
Key Differences:
• Exploration Strategy:
o BFS: Layer by layer (breadth).
o DFS: Depth first.
• Path Finding:
o BFS: Finds shortest paths.
o DFS: Does not guarantee shortest paths.
• Memory Usage:
o BFS: Can require more memory for large networks.
o DFS: Generally, uses less memory.
Centrality
Centrality measures are fundamental tools that help us understand the importance of individual
nodes (people, entities) within a network. They essentially quantify how "central" a node is, but
"central" can have different meanings.
Why Centrality Matters:
• Identifying Influence:
o Central nodes often have significant influence over the network.
• Betweenness Centrality:
o This measures how often a node lies on the shortest path between other nodes.
o Nodes with high betweenness centrality act as "bridges" between different parts of
the network.
o They control the flow of information or resources.
• Closeness Centrality:
o This measures how close a node is to all other nodes in the network.
o Nodes with high closeness centrality can quickly reach other nodes.
o They are efficient at spreading information.
• Eigenvector Centrality:
• PageRank:
o A variant of eigenvector centrality, famously used by Google, that also takes into
account the direction of links.
Key Considerations:
• Context Matters:
o The most appropriate centrality measure depends on the specific network and
research question.
• Network Type:
o Some measures are more suitable for directed networks (where connections have a
direction) than undirected networks.
• Weighted Networks:
o Centrality measures can be adapted to weighted networks, where connections have
different strengths.
Making connections
It refers to the fundamental process of identifying and representing the relationships between
individuals, groups, or other entities within a network. It is the building block upon which all other
analyses are built.
In other words, it is the process of translating real-world social interactions into a format that can
be analyzed using network science methods. It is the critical first step that allows us to understand
the complex dynamics of social networks.
What Making Connections Entails:
• Data Collection:
o This is the first step, gathering the raw data that represents social interactions.
o Sources can include:
▪ Social media platforms (friendships, follows, mentions, shares).
▪ Communication logs (emails, messages).
▪ Collaboration records (co-authorships, project teams).
▪ Surveys or questionnaires.
• Defining Relationships:
o Clearly defining what constitutes a "connection" is crucial.
o This can vary depending on the context:
▪ "Friendship" on Facebook.
▪ "Following" on Twitter.
▪ "Co-worker" in a corporate network.
▪ "Interaction" based on frequency of comments.
• Representing Connections:
o Once connections are defined, they need to be represented in a way that can be
analyzed.
Link analysis
Link analysis is a core technique within social network analysis, focused on understanding the
relationships between entities by examining their connections.
Core Concept:
• At its heart, link analysis explores the connections (links) between entities (nodes) within
a network.
• This involves analyzing the patterns and properties of these links to gain insights into the
network's structure and dynamics.
Key Applications in Social Network Analysis:
• Identifying Influence:
o Link analysis helps determine which individuals or entities are most influential
within a network.
o This can involve analyzing the number of connections a person has, or their position
within the network's structure.
• Community Detection:
o By examining link patterns, analysts can identify clusters of tightly connected
individuals, revealing communities or sub-groups within the larger network.
• Information Flow:
o Link analysis helps trace how information spreads through a network.
o This is crucial for understanding how trends, ideas, or even misinformation
propagate.
• Anomaly Detection:
o Unusual link patterns can indicate suspicious activity, such as fake accounts or
coordinated manipulation campaigns.
• Relationship Mapping:
o It creates a visual representation of relationships, making it easier to understand
complex social structures.
Techniques and Measures:
• Centrality Measures:
o These metrics quantify the importance of nodes within a network.
▪ Degree centrality: The number of connections a node has.
PageRank Algorithm
The PageRank algorithm, originally developed by Google, is a way to measure the importance of
nodes within a network. While famously used for ranking web pages, its principles are highly
applicable to social network analysis.
Core Idea:
• PageRank assigns a numerical value to each node in a network, representing its
importance.
• The core concept is that a node is considered important if it is linked to by other important
nodes.
• In simpler terms, it is like a measure of "influence" or "prestige" within the network.
How It Works:
Random Graphs
Random graphs are incredibly valuable tools, providing a foundation for understanding the
structure and dynamics of online social interactions.
o Now, imagine creating a map like that, but instead of real friendships, you just draw
lines between dots completely randomly.
o That is a "random graph." It is a map where connections are made by chance.
o We use mathematical rules to define "how random" it is. For example, we might
say, "Each pair of dots has a 10% chance of being connected."
Why do we use Random Graphs in Social Network Analysis?
• To Find What's "Normal":
o We use random graphs as a kind of "normal" or "expected" pattern.
o Then, we compare real social networks to these random ones.
o If a real network looks very different from a random one, it tells us that something
interesting is happening.
o For example, if in a real network, some people have way more connections than
others, and this is far more than what would happen in a random graph, it tells us
that those people are probably very influential.
• To See Patterns:
o Random graphs help us understand patterns in social networks.
o For instance:
▪ How connected are people? Do most people have a few friends, or are
there a few people with tons of friends?
▪ Do friends of friends know each other? Random graphs help us see if
people's social circles are tightly knit.
▪ How fast does information spread? We can simulate how information
travels through a random graph to see how quickly it could spread through
a real social network.
• To Spot Influence:
o We can use random graphs to see if someone's popularity is just luck, or if they are
genuinely influential.
o If someone has way more connections than you would expect in a random graph,
they are likely a key player.
In simpler terms:
• Random graphs are like a "control group" for social networks.
• They help us see what social networks look like when things happen by chance.
• By comparing real social networks to random ones, we can discover the hidden forces that
shape our online interactions.
Network evolution
Network evolution is about understanding how online social connections change over time. It is
not just a snapshot; it is a movie of how relationships form, grow, and sometimes fade.
Why Network Evolution Matters in Social Media:
• Social media is dynamic:
o People join and leave platforms.
o Friendships and follower relationships change.
o Interests and trends shift.
• Understanding these changes is crucial for:
o Tracking trends.
o Identifying influential users.
o Analyzing the spread of information.
o Predicting user behavior.
Key Aspects of Network Evolution:
• Growth and Preferential Attachment:
o This is the "rich get richer" phenomenon.
o Users with many followers tend to attract even more.
o This leads to the formation of "hub" users with massive influence.
o Analyzing this helps to predict how a social network is going to grow.
• Network Churn:
o People unfollow others or deactivate accounts.
o Relationships weaken over time.
o Analyzing churn helps understand user engagement and platform health.
• Community Evolution:
o Online communities form and dissolve.
o New communities emerge around trending topics.
Weighted Networks
Weighted networks add a layer of intensity or strength to the connections we analyze.
The Basic Idea:
• In a typical social network graph, a connection (edge) simply indicates that two users are
linked (e.g., friends, followers).
• A weighted network goes further: it assigns a numerical value (weight) to each connection.
This weight represents the strength or frequency of the interaction.
Hypergraphs
Hypergraphs are a powerful, but less commonly used, tool in social media analytics. They excel
at representing relationships that go beyond simple pairwise connections, which are common in
social media.
Understanding Hypergraphs:
• Beyond Pairs:
o Traditional graphs connect two nodes (users) at a time.
o Hypergraphs use "hyperedges" that can connect any number of nodes.
o Think of it like this: a regular edge is a line between two dots; a hyperedge is like
a group circle that can enclose many dots.
• Representing Group Interactions:
o Social media is full of group interactions:
▪ Online communities.
Network Datasets
When conducting social network analysis, having access to relevant and high-quality network
datasets is crucial. These datasets provide the raw material for exploring social structures and
dynamics.
Types of Network Datasets:
1. Government and Public Datasets:
• Data.gov:
o This U.S. government platform provides access to a wide range of public datasets,
some of which may contain network information (e.g., transportation networks,
government collaborations).
• European Data Portal:
o Similar to Data.gov, this portal offers access to public data from European
countries.
• National statistical offices:
o Many countries publish data on social and economic networks.
2. Biological and Scientific Datasets:
• Protein-Protein Interaction Networks:
o Many social science researchers publish data sets related to their studies.
• Information science datasets:
o Researchers in information science, publish datasets related to information flow,
and retrieval.
7. Social Media Datasets:
o These datasets capture interactions from platforms like Twitter, Facebook,
Instagram, and Reddit.
o They often include information about user connections (friendships, followers),
interactions (posts, comments, shares), and user attributes (profiles).
o Examples:
▪ Twitter datasets: Containing tweets, user networks, and hashtag
interactions.
▪ Reddit datasets: Including subreddit interactions and user comments.
8. Collaboration Networks:
o These datasets represent connections based on collaborative activities, such as co-
authorship networks (scientists who have published papers together) or project
collaboration networks.
9. Communication Networks:
o These datasets track communication patterns, such as email networks, phone call
networks, or instant messaging networks.
10. Citation Networks:
o These datasets represent the relationships between academic papers, where
citations indicate connections.
Important Considerations:
• Data Access and Licensing: Be aware of the terms of use and licensing agreements for
any dataset you use.
• Ethical Considerations: When working with social network data, it is crucial to respect
user privacy and adhere to ethical guidelines.
• Data Preprocessing: Network datasets often require significant preprocessing before they
can be analyzed.
By exploring these diverse sources, you can find network datasets that suit a wide range of research
and analysis needs.