0% found this document useful (0 votes)

4 views91 pages

SMA Unit 2

The document outlines the syllabus and objectives for a course on Social Media Analytics taught by Dr. Atul Pratap Singh at the Noida Institute of Engineering and Technology. It covers various units including sentiment mining, web mining, social media mining, text summarization, and recent trends in data analytics. The course aims to equip students with skills in data analysis, machine learning, and the application of modern tools for solving real-world problems.

Uploaded by

rajdivyam730

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views91 pages

SMA Unit 2

Uploaded by

rajdivyam730

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 91

Noida Institute of Engineering and Technology, Greater Noida

WEB-MINING

Unit: 2

Social Media Analytics(ACSAI0622N)

Dr. Atul Pratap Singh

Assistant Professor
B Tech CSE[AI] 6th Sem
CSE[AI]

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2

1
Faculty Introduction

Name
Atul Pratap Singh
Qualification
B.Tech, M.Tech, Ph. D
Designation Assistant Professor

Department CSE[AI]

Teaching
Experience 16.6 years.

Dr. Atul Pratap Singh Social Media Analytics Unit 2

06/19/2025 2
EVALUATION SCHEME

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 3

Syllabus

UNIT-I:SENTIMENT MINING

Overview: Text and Sentiment Mining, Semantic Analysis

Applications, Sentiment Analysis Process, Speech Analytics, Text
Representation- tokenization, stemming, stop words, TF-IDF, Feature
Vector Representation, NER, N-gram modelling, Text Clustering, Text
Classification, Topic Modelling-LDA, HDP. Sentiment Classification,
feature based opinion mining, comparative sentence, and relational
mining, Opinion summarization, Opinion spam detection.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 4

Syllabus

UNIT-II:WEB MINING
Web Mining Overview, Web Structure Mining, Search
Engine, Web Analytics, Machine Learning for extracting
knowledge from the web, Inverted indices and Boolean
queries. PLSI, Query optimization, SEO, page ranking,
Social Graphs (Interaction, Latent and Following Graphs),
Ethics of Scraping, Static data extraction and Web Scraping
using Python

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 5

Syllabus

UNIT-III: MINING SOCIAL MEDIA

Introduction to Social Media Mining, Challenges in Social

Media Mining, Process of Social media Mining, Essentials
of Social graphs and its types, Social Networks Measures,
Network Models, Information Diffusion in social media,
Behavioural Analytics, Influence and Homophily,
Recommendation in social media.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 6

Syllabus

UNIT-IV: TEXT SUMMARIZATION

Introduction to Text Summarization, Text extraction,
classification and clustering, Anomaly and Trend Detection,
Text Processing, N-gram Frequency Count and Phrase
Mining, Page Rank and Text Rank Algorithm, LDA Topic
Modelling, Machine-Learned Classification and Semantic
Topic Tagging, Python libraries for Text
Summarization(NumPy, Pandas, NLTK, Matplotlib)

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 7

Syllabus

UNIT-V: RECENT TRENDS

Trend Analysis, Types of trend analysis, Recent Trends in

Text, Data Localization, Role of Web Mining in E-Commerce,
Social Media Analytics, Social Media Analytics tools.
Case Studies: Facebook Insights Using Python, Sentiment
and Text Mining of Twitter data and Google analytics.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 8

Branch Wise Application

1.Security
2. Digital Advertising
3. E-Commerce
4. Publishing
5. Massively Multiplayer Online Games
6. Backend Services and Messaging
7. Project Management & Collaboration
8. Real time Monitoring Services
9.Live Charting and Graphing
10. Group and Private Chat

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 9

Course Objective

To understand text mining and social media data analytic activities

and apply the complexities of processing text and network data
from different data sources.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 10

Course Outcomes (COs)

At the end of course, the student will be

able to:
Design new solutions to opinion extraction, sentiment classification and
data summarization problems.
Apply a wide range of classification ,clustering ,estimation and prediction
algorithms on web data.
Perform social network analysis to identify important social actors, subgroups and
network properties in social media sites.

Interpret the terminologies ,metaphors of text summarization.

Apply state of the art mining tools and libraries on realistic data sets as a basic
for business decisions and applications.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 11

Program Outcomes (POs)

Engineering Graduates will be able

to:
PO1 : Engineering Knowledge

PO2 : Problem Analysis

PO3 : Design/Development of
solutions
PO4 : Conduct Investigations of
complex problems
PO5 : Modern tool usage

PO6 : The engineer and society

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 12

Program Outcomes (POs)

Engineering Graduates will be able

to:
PO7 : Environment and
sustainability
PO8 : Ethics

PO9 : Individual and teamwork

PO10 : Communication
PO11 : Project management and
finance
PO12 : Life-long learning

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 13

COs - POs Mapping

CO.K PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12

CO1 2 2 2 3 3 - - - - - - -

CO2 3 2 3 2 3 - - - - - - -

CO3 3 2 3 2 3 - - - - - - -

CO4 3 2 3 2 3 - - - - - - -

CO5 3 2 3 3 3 - - - - - - -

AVG 2.8 2.0 2.8 2.4 3.0 - - - - - - -

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 14

Program Specific Outcomes(PSOs)

Program Specific
S. No. PSO Description
Outcomes (PSO)

Should be able to understand the concepts

1 PSO1 of Data Science and their applications in the
field of Agriculture, Healthcare, Education,
Environment and other relevant areas.

Should have an ability to apply technical

knowledge and usage of modern tools and
2 PSO2
technologies related to Data Science for
solving real world problems.

Should have the capability to analyze,

comprehend, design & develop Data based
3 PSO3 applications by working individually or and a
team and thus demonstrating professional
Dr. Atul Pratap Singh Social Media Analytics Unit 2
ethics & concern for societal well being

06/19/2025 15
COs - PSOs Mapping

CO.K PSO1 PSO2 PSO3 PSO4

CO1 3 - - -

CO2 3 2 - -

CO3 3 3 - -

CO4 3 3 - -

CO5 3 3 - -

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 16

Program Educational Objectives (PEOs)

Program Educational
PEOs Description
Objectives (PEOs)
 To produce graduates with a strong foundation of basic
science, Statistics & Engineering and ability to use modern
tools and technologies to solve real-world complex
PEOs problems/to address ever changing industrial requirements
globally.

 To produce graduates who can inculcate life-long learning for

up-skilling and re-skilling and get a successful career as data
PEOs scientist, entrepreneur and bureaucrat for goodwill of the
society.

 To produce graduates who can exhibit professional ethics and

Dr. Atul Pratap Singh Social Media Analytics
Unit 2
moral values with capability of working as an individual and
PEOs as a team to contribute towards the need of industry and
society.
06/19/2025 Aarushi Thusu ACSAI0622 Social Media Analytics Unit 5 17
Pattern of External Exam Question Paper (100
marks)

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 18

Pattern of External Exam Question Paper (100
marks)

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 19

Pattern of External Exam Question Paper (100
marks)

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 20

Pattern of Online External Exam Question Paper
(100 marks)

Dr. Atul Pratap Singh Social Media Analytics Unit 2

06/19/2025 21
Prerequisite / Recap

• Student should have knowledge of Knowledge of Data Analysis Tools and Web Technology.

• Students should have good knowledge of Python Programming and Python coding experience.

• knowledge of Computer and basic skill.

• Good problem solving Skill .

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 22

Brief Introduction about the Subject with
videos

YouTube /other Video Links

• https://www.youtube.com/watch?v=Uqs0GewlMkQ
• https://www.youtube.com/watch?v=tUNwSH7671Y&t=2s
• https://slideplayer.com/slide/14222744/

• https://www.youtube.com/watch?v=KjWu1

• dZn00https://www.youtube.com/watch?v=ntOaoW0T604

Dr. Atul Pratap Singh Social Media Analytics Unit 2

06/19/2025 23
Unit Content

• Web Search
• Data Mining
• and Machine Learning for extracting knowledge from the web,
• Inverted indices and Boolean queries.
• PLSI,
• Query optimization,
• page ranking,
• Essentials of Social graphs,
• Social Networks,
• Models,
• Information Diffusion in social media.
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 24
Unit Objective

1. Web mining can help you to discover your

customers' key initiatives and their financial
situation.
2. Student will able to understand mining tools
that helped them to identify various criminal
activities..
3. Student will able to define web searches.
4. Describe Data Mining and Social Networks
5. Define Information Diffusion in social media.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 25

Topic Objectives: (CO2)
The student will be able to:
• Define Machine Learning for extracting knowledge from the web.
• Give examples of Web Searches.
• Build Inverted indices and Boolean queries.
• Determine page ranking.
• Define Query optimization.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 26

Web Search (CO2)

• A search engine is a software system designed to carry out web searches. The
most productive way to conduct a search on the internet is through a search
engine. A web search engine is a software system designed to search for
information on the World Wide Web. The search results are generally presented
in a line of results often referred to as search engine results pages (SEROs).
The information may be a mix of web pages, images, and other types of files.
Some search engines also mine data available in databases or open directories.
• There are a number of various search engines available and some of them may
seem familiar to you. The top web search engines are Google, Bing, Yahoo,
Ask.com, and AOL.com. For the purpose of this course, we will be searching
using the Google Chrome web browser, and search first with the Google search
engine and then Microsoft’s Bing search engine.
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 27
Data Mining(CO2)
Data mining is the process of sorting through large data sets to identify
patterns and relationships that can help solve business problems through data
analysis. Data mining techniques and tools enable enterprises to predict future
trends and make more-informed business decisions.
It typically involves the collection, processing, and analysis of raw data obtained
from social media platforms such as Facebook, Instagram, Twitter, TikTok,
LinkedIn, YouTube, and others, to uncover meaningful patterns and trends, draw
conclusions, and provide insightful and actionable information.
Social media data mining harvests various types of social data that are either
publicly available (e.g., age, gender, job profession, geographic location, etc.) or
are generated on a daily basis on social media platforms (e.g., comments, likes,
clicks, etc.).
Typically, the data represents people’s attitudes, connections, behavior, and
feelings towards a certain topic, product, or service. Depending on the social media
platform in question, this data may include the number of followers, comments,
likes, or shares, if the targeted social media data comes from Facebook, Twitter’s
retweets or the number of impressions, or Instagram’s engagement rates and
06/19/2025 hashtag usage. Dr. Atul Pratap Singh Social Media Analytics Unit 2 28
Data Mining(CO2)
In computing, data is information that has been translated into a form that is efficient for
movement or processing.
• For each feature type, there exists a set of permissible operations (statistics) using the feature
values and transformations that are allowed.
• Nominal (categorical). These features take values that are often represented as strings. For
instance, a customer’s name is a nominal feature. In general, a few statistics can be computed
on nominal features. Examples are the chi-square statistic (χ 2 ) and the mode(most common
feature value).
For example, one can find the most common first name among customers. The only possible
transformation on the data is comparison. For example, we can check whether our customer’s
name is John or not. Nominal feature values are often presented in a set format.
• Ordinal. Ordinal features lay data on an ordinal scale. In other words, the feature values have
an intrinsic order to them. In our example, Money Spent is an ordinal feature because a High
value for Money Spent is more than a Low one.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 29

Data Mining(CO2)

• Vector Space Model In the vector space model, we are given a set of documents
D. Each document is a set of words. The goal is to convert these textual
documents to [feature] vectors. We can represent document i with vector di , di
= (w1,i , w2,i , . . . , wN,i), (5.1) where wj,i represents the weight for word j
that occurs in document i and N is the number of words used for vectorization.2
To compute wj,i , we can set it to 1 when the word j exists in document i and 0
when it does not. We can also set it to the number of times the word j is
observed in document i. A more generalized approach is to use the term
frequency-inverse document frequency (TF-IDF) weighting scheme. In the TF-
IDF scheme, wj,i is calculated as wj,i = t fj,i × id fj ,
• where t fj,i is the frequency of word j in document i. id fj is the inverse TF-IDF
frequency of word j across all documents, id fj = log2 |D| |{document ∈ D | j ∈
document}|,
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 30
Data Mining(CO2)
Data Quality When preparing data for use in data mining algorithms, the following
four data quality aspects need to be verified:
• Noise is the distortion of the data. This distortion needs to be removed or its
adverse effect alleviated before running data mining algorithms because it may
adversely affect the performance of the algorithms. Many filtering algorithms are
effective in combating noise effects.
• Outliers are instances that are considerably different from other instances in the
dataset. Consider an experiment that measures the average number of followers of
users on Twitter. A celebrity with many followers can easily distort the average
number of followers per individuals. Since the celebrities are outliers, they need to
be removed from the set of individuals to accurately measure the average number
of followers. Note that in special cases, outliers represent useful patterns, and the
decision to removing them depends on the context of the data mining problem.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 31

Data Mining(CO2)

• Missing Values are feature values that are missing in instances. For
example, individuals may avoid reporting profile information on
social media sites, such as their age, location, or hobbies. To solve this
problem, we can (1) remove instances that have missing values, (2)
estimate missing values (e.g., replacing them with the most common
value), or (3) ignore missing values when running data mining
algorithms.
• Duplicate data occurs when there are multiple instances with the exact
same feature values. Duplicate blog posts, duplicate tweets, or profiles
on social media sites with duplicate information are all instances of
this phenomenon
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 32
Data Mining(CO2)

• Interval. In interval features, in addition to their intrinsic ordering, differences are

meaningful whereas ratios are meaningless. For interval features, addition and
subtraction are allowed, whereas multiplications and division are not. Consider two time
readings: 6:16 PM and 3:08 PM. The difference between these two time readings is
meaningful (3 hours and 8 minutes); however, there is no meaning to 6:16 PM 3:08 PM ,
2.
• Ratio. Ratio features, as the name suggests, add the additional properties of
multiplication and division. An individual’s income is an example of a ratio feature
where not only differences and additions are meaningful but ratios also have meaning
(e.g., an individual’s income can be twice as much as John’s income).
• The process of cleaning raw data for it to be used for machine learning activities is
known as data preprocessing. It’s the first and foremost step while doing a machine
learning project. It’s the phase that is generally most time-taking as well
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 33
Data Mining(CO2)

Data Preprocessing Often, the data provided for data mining is not immediately ready. Data
preprocessing (and transformation , prepares the data for mining. Typical data
preprocessing tasks are as follows:
• Aggregation. This task is performed when multiple features need to be combined into a
single one or when the scale of the features change. For instance, when storing image
dimensions for a social media website, one can store by image width and height or
equivalently store by image area (width × height). Storing image area saves storage space
and tends to reduce data variance; hence, the data has higher resistance to distortion and
noise.
• Discretization. Consider a continuous feature such as money spent in our previous
example. This feature can be converted into discrete values – High, Normal, and Low –
by mapping different ranges to different discrete values. The process of converting
continuous features to discrete ones and deciding the continuous range that is being
assigned to a discrete value is called discretization.
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 34
Data Mining(CO2)

• Feature Selection. Often, not all features gathered are useful. Some may be irrelevant,
or there may be a lack of computational power to make use of all the features, among
many other reasons. In these cases, a subset of features are selected that could ideally
enhance the performance of the selected data mining algorithm. In our example,
customer’s name is an irrelevant feature to the value of the class attribute and the task
of predicting whether the individual will buy the given book or not.
• Feature Extraction. In contrast to feature selection, feature extraction converts the
current set of features to a new set of features that can perform the data mining task
better. A transformation is performed on the data, and a new set of features is extracted.
The example we provided for aggregation is also an example of feature extraction
where a new feature (area) is constructed from two other features (width and height).
• Sampling. Often, processing the whole dataset is expensive. With the massive growth
of social media, processing large streams of data 142 is nearly impossible
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 35
Data Mining(CO2)

Data Mining Algorithms

• Data mining algorithms can be divided into several categories. Here, we discuss two
well-established categories: supervised learning and unsupervised learning.
• In supervised learning, the class attribute exists, and the task is to predict the class
attribute value. Our previous example of predicting the class attribute “will buy” is an
example of supervised learning.
• In unsupervised learning, the dataset has no class attribute, and our task is to find
similar instances in the dataset and group them. By grouping these similar instances,
one can find significant patterns in a dataset. For example, unsupervised learning can
be used to identify events on Twitter, because the frequency of tweeting is different
for various events. By using unsupervised learning, tweets can be grouped based on
the times at which they appear and hence, identify the tweets’ corresponding
realworld events
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 36
Machine Learning for extracting knowledge from the web(CO2)

 In machine learning-based algorithms, the information underlying the

knowledge is extracted from the data themselves, which are explored and
analyzed in search of recurring patterns or to discover hidden causal
associations or relationships. The prediction model extracts knowledge through
an inductive process: the input is the data and, possibly, a first example of the
expected output, the machine will then learn the algorithm to follow to obtain
the same result.
 Machine Learning-based algorithms autonomously develop their knowledge
thanks to the data patterns received, without the need to have specific initial
inputs from the developer. In these models, the machine can establish by itself
the patterns to follow to obtain the desired result, therefore, the real factor that
distinguishes artificial intelligence is autonomy. In the learning process that
distinguishes these algorithms, the system receives a set of data necessary for
training, estimating the relationships between the input and output data: these
relationships represent the parameters of the model estimated by the system.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 37

Machine Learning for extracting knowledge from the
web(CO2)

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 38

Inverted Indices(CO2)
An inverted index is an index data structure storing a mapping from content,
such as words or numbers, to its locations in a document or a set of
documents. In simple words, it is a hashmap like data structure that directs
you from a word to a document or a web page.
• There are two types of inverted indexes: A record-level inverted
index contains a list of references to documents for each word. A word-
level inverted index additionally contains the positions of each word
within a document. The latter form offers more functionality, but needs
more processing power and space to be created.
• Suppose we want to search the texts “hello everyone, ” “this article is
based on inverted index, ” “which is hashmap like data structure”. If we
index by (text, word within the text), the index with location in text is:

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 39

Inverted indices(CO2)
hello (1,1)
everyone (1,2
• The word “hello” is in document 1 (“hello everyone”) starting at word 1, so has an entry (1, 1) and word “is” is in
document 2 and 3 at ‘3rd’ and ‘2nd’ positions respectively (here position is based on word).

• Steps to build an inverted index:

• Fetch the Document

Removing of Stop Words: Stop words are most occurring and useless words in document like “I”, “the”, “we”, “is”, “an”.
• Stemming of Root Word
Whenever I want to search for “cat”, I want to see a document that has information about it. But the word present in the
document is called “cats” or “catty” instead of “cat”. To relate the both words, I’ll chop some part of each and every word
I read so that I could get the “root word”. There are standard tools for performing this like “Porter’s Stemmer”.
• Record Document IDs
If word is already present add reference of document to index else create new entry. Add additional information like
frequency of word, location of word etc.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 40

Boolean Queries(CO2)
Boolean variables indicate presence/absence of query terms
๏ Boolean operators AND, OR, and NOT ๏
Boolean queries are arbitrary compositions of those, e.g.:
๏ brutus AND caesar AND NOT Calpurnia
๏ NOT ((duncan AND macbeth) OR (capulet AND montague)) ๏ …
๏ Query result is the (unordered) set of documents satisfying (i.e., “matching”) the query
๏ Extensions of Boolean retrieval (e.g., proximity, wildcards, fields) with rudimentary
ranking (e.g., weighted matches) exist

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 41

PLSI(CO2)
Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent
semantic indexing (PLSI, especially in information retrieval circles) is a statistical
technique for the analysis of two-mode and co-occurrence data. In effect, one can
derive a low-dimensional representation of the observed variables in terms of their
affinity to certain hidden variables, just as in latent semantic analysis, from which
PLSA evolved.
• Compared to standard latent semantic analysis which stems from linear algebra and
downsizes the occurrence tables (usually via a singular value decomposition),
probabilistic latent semantic analysis is based on a mixture decomposition derived
from a latent class model.

• Latent Variable model for general co-occurrence data  Associate each observation
(w,d) with a class variable z Є Z{z_1,…,z_K}

•Generative Model • Select a doc with probability P(d) • Pick a latent class z with
probability P(z|d) • Generate a word w with probability p(w|z)
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 42
Query Optimization(CO2)
• Query optimization is a process of defining the most efficient and optimal way and
techniques that can be used to improve query performance based on rational use of system
resources and performance metrics. The purpose of query tuning is to find a way to
decrease the response time of the query, prevent the excessive consumption of resources,
and identify poor query performance.
• In the context of query optimization, query processing identifies how to faster retrieve data
from SQL Server by analyzing execution steps of the query, optimization techniques, and
other information about the query.
• Query optimization tips for better performance
• Monitoring metrics can be used to evaluate query runtime, detect performance pitfalls, and
show how they can be improved. For example, they include:
• Execution plan: A SQL Server query optimizer executes the query step by step, scans
indexes to retrieve data, and provides a detailed overview of metrics during query
execution.
• Input/Output statistics: Used to identify the number of logical and physical reading
operations during the query execution that helps users detect cache/memory capacity issues.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 43

Query Optimization(CO2)
• Buffer cache: Used to reduce memory usage on the server.
• Latency: Used to analyze the duration of queries or operations.
• Indexes: Used to accelerate reading operations on the SQL
Server.
• Memory-optimized tables: Used to store table data in memory
to make reading and writing operations run faster.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 44

Page ranking(CO2)
PageRank (by Google) is based on the following random walk
๏ jump to a random vertex ( 1 / |V| ) in the graph with probability ε ๏
follow a random outgoing edge ( 1 / out(v) ) with probability (1-ε)
๏ PageRank score p(v) of vertex v is a measure of popularity and corresponds
to its stationary visiting probability
p(v) = (1- ε) · X (u,v)2E p(u) out(u) + ε / | V |.

PageRank scores correspond to components of the dominant Eigenvector π of

the transition probability matrix P which can be computed using the power-
iteration method.
PageRank scores correspond to components of the dominant Eigenvector π of
the transition probability matrix P which can be computed using the power-
iteration method.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 45

Daily Quiz (CO2)
1. It is a program in search tool, that helps the user in searching the information that is interrelated to the specific
topic.
A. Search engine
B. Search directory
C. Search box
D. none of these

2. Which one of the following refers to querying the unstructured textual data?
A. Information access
B. Information update
C.Information retrieval
D. None of these

3. Which of the following is an essential process in which the intelligent methods are applied to extract data patterns?
A. Warehousing
B.Data Mining
C.Text Mining
06/19/2025 D.Data Selection Dr. Atul Pratap Singh Social Media Analytics Unit 2 46
Daily Quiz(CO2)

4. For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?
A. In order to maintain consistency
B. For authentication
C. For data access
D. To obtain the queries response

5. What are the functions of Data Mining?

A. Association and correctional analysis classification
B. Prediction and characterization
C. Cluster analysis and Evolution analysis
D. All of the above

6. In data mining, how many categories of functions are included?

A .5
B. 4
C. 2
D. 3

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 47

Daily Quiz(CO2)
7. A data structure that maps terms back to the parts of a document in which they appear is called
a) Lexicon
b) Dictionary
c) Inverted index
d) All of the above

8. How the information retrieval problem can be defined formally?

a) a triple
b) a quadruple
c) a couple
d) None of the above

9. Which of the following is the local method for improving recall of an information retrieval system?
a) Query expansion
b) Relevance feedback
c) Ontology based model
d) None of the above
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 48
Daily Quiz(CO2)
10. ___________ social network is considered the most popular for business to business
marketing?
a). Facebook
b) .Orkut
c) .Ryze
d). LinkedIn

11. when marketing with social networks is to identify the goals.

a).True
b).False
c).maybe
d).Maybe not

12. What is “social media optimization”?

a). easily creates publicity via social networks
b). Writing clear content
c). Creating short content which is easily indexed
d). create content for social networks hiring people

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 49

Daily Quiz(CO2)
13. On which social network should you share content most frequently?
(A). Facebook
(B). Pinterest
(C). Twitter
(D). LinkedIn

14. Which is not social media network?

(A). Facebook
(B). Wikipedia
(C). Twitter
(D). LinkedIn

15. The process of removing most common words (and, or, the, etc.) by an information retrieval
system before indexing is known as
a) Lemmatization
b) Stop word removal
c) Inverted indexing
d) Normalization
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 50
Daily Quiz(CO2)
16. PageRank is a metric for ________documents based on their quality
A. ranking hypertext
B. ranking document structure
C. ranking web content
D. None of these
17. The main purpose for structure mining is to extract previously unknown
relationships between
A. Web pages
B. Web hyperlinks
C. Web data
D. Web contents
18. Web structure mining is the process of discovering ____ information from the web
A. Semi structured
B. Unstructured
C. Structured
D. None of the above
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 51
Daily Quiz(CO2)

19.What is the minimum number of spanning tree in a connected graph?

A) 1
B) 2
C) 3
D) none of these

20. What will be the sum of degrees of each vertices for undirected graph G if it has n
vertices and e edges?
A) 2e
B) 2ne
C) ne
D) none of these
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 52
Essentials of Social graphs(CO2)

Social networks are naturally modeled as graphs, which we sometimes refer to as a social graph.
The entities are the nodes, and an edge connects two nodes if the nodes are related by the
relationship that characterizes the network. If there is a degree associated with the relationship,
this degree is represented by labeling the edges. Often, social graphs are undirected, as for the
Facebook friends graph. But they can be directed graphs, as for example the graphs of followers
on Twitter or Google+.

Nodes and Edges : A network is a graph

• nodes, actors, or vertices (plural of vertex)
• Connections, edges or ties.
In a social graph, nodes are people and any pair of people connected denotes the friendship,
relationships, social ties between them .In a web graph, “nodes” represent sites and the connection
between nodes indicates web-links between them .The size of the graph is |V|= nNumber of edges
(size of the edge-set|E|=m.
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 53
Essentials of Social graphs(CO2)

• Degree and Degree Distribution : The number of edges connected to one node is the degree of
that node. Degree of a node vi is often denoted using di . In the case of directed edges, nodes
have in-degrees (edges pointing toward the node) and out-degrees (edges pointing away from the
node). These values are presented using d in i and d out i , respectively. In social media, degree
represents the number of friends a given user has. For example, on Facebook, degree represents
the user’s number of friends, and on Twitter in-degree and out-degree represent the number of
followers and followees, respectively. In any undirected graph, the summation of all node degrees
is equal to twice the number of edges.
• Theorem 2.1. The summation of degrees in an undirected graph is twice the number of edges, X i
di = 2|E|. (2.3) Proof. Any edge has two endpoints; therefore, when calculating the degrees di and
dj for any connected nodes vi and vj , the edge between them contributes 1 to both di and dj ;
hence, if the edge is removed, di and dj become di − 1 and dj − 1, and the summation P k dk
becomes P k dk − 2. Hence, by removal of all m edges, the degree summation becomes smaller
by 2m. However, we know that when all edges are removed the degree summation becomes zero;
therefore, the degree summation is 2 × m = 2|E|.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 54

Essentials of Social graphs(CO2)

• Graph Representation
• Adjacency Matrix A simple way of representing graphs is to use an adjacency
matrix (also known as a sociomatrix). Figure 2.4 depicts an example of a graph
and its Sociomatrix corresponding adjacency matrix. A value of 1 in the adjacency
matrix indicates a connection between nodes vi and vj , and a 0 denotes no
connection between the two nodes. When generalized, any real number can be
used to show the strength of connections between two nodes.
• Adjacency List In an adjacency list, every node is linked with a list of all the
nodes that are connected to it. The list is often sorted based on node order or some
other preference.
• Edge List Another simple and common approach to storing large graphs is to
save all edges in the graph. This is known as the edge list representation
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 55
Essentials of Social graphs(CO2)

• Types of Graphs
In general, there are many basic types of graphs. In this section we discuss several basic types of
graphs. Null Graph.
A null graph is a graph where the node set is empty (there are no nodes). Obviously, since there are no
nodes, there are also no edges. Formally, G(V, E), V = E = ∅. (2.11) Empty Graph. An empty or
edgeless graph is one where the edge set is empty: G(V, E), E = ∅. (2.12) Note that the node set can be
non-empty. A null graph is an empty graph but not vice versa.
Directed/Undirected/Mixed Graphs. Graphs that we have discussed thus far rarely had directed edges.
As mentioned, graphs that only have directed edges are called directed graphs and ones that only have
undirected ones are called undirected graphs.
Mixed graphs have both directed and undirected edges. In directed graphs, we can have two edges
between i and j (one from i to j and one from j to i), whereas in undirected graphs only one edge can
exist. As a result, the adjacency matrix for directed graphs is not in general symmetric (i connected to j
does not mean j is connected to i, i.e., Ai,j , Aj,i), whereas the adjacency matrix for undirected graphs is
symmetric (A = A T ).
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 56
Essentials of Social graphs(CO2)

Weighted Graphs. A weighted graph is one in which edges are associated with
weights. For example, a graph could represent a map, where nodes are cities and
edges are routes between them. The weight associated with each edge represents the
distance between these cities. Formally, a weighted graph can be represented as G(V,
E, W), where W represents the weights associated with each edge, |W| = |E|
Adjacent Nodes and Incident Edges.
Two nodes v1 and v2 in graph G(V, E) are adjacent when v1 and v2 are connected via
an edge:
v1 is adjacent to v2 ≡ e(v1, v2) ∈ E. (2.13)
Two edges e1(a, b) and e2(c, d) are incident when they share one endpoint (i.e., are
connected via a node):
e1(a, b) is incident to e2(c, d) ≡ (a = c) ∨ (a = d) ∨ (b = c) ∨ (b = d). (2.14)
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 57
Essentials of Social graphs(CO2)

• Traversing an Edge.
• An edge in a graph can be traversed when one starts at one of its end-nodes, moves along the
edge, and stops at its other endnode. So, if an edge e(a, b) connects nodes a and b, then visiting e
can start at a and end at b.
• Alternatively, in an undirected graph we can start at b and end the visit at a. Walk, Path, Trail,
Tour, and Cycle.
• A walk is a sequence of incident edges traversed one after another. In other words, if in a walk one
traverses edges e1(v1, v2),e2(v2, v3),e3(v3, v4), . . . ,en(vn, vn+1), we have v1 as the walk’s
starting node and vn+1 as the walk’s ending node. When a walk does Open Walk and not end
where it started (v1 , vn+1) then it is called an open walk. When Closed Walk a walk returns to
where it was started (v1 = vn+1), it is called a closed walk. Similarly, a walk can be denoted as a
sequence of nodes, v1, v2, v3, . . . , vn. In this representation, the edges that are traversed are
e1(v1, v2), e2(v2, v3), . . . ,en−1(vn−1, vn). The length of a walk is the number of edges traversed
during the walk and in our case is n − 1. A trail is a walk where no edge is traversed more than
once; therefore, all walk edges are distinct.
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 58
Essentials of Social graphs(CO2)

• A closed trail (one that ends where it started) is called a tour or circuit. A walk where nodes
and edges are distinct is called a path, and a closed path is called a cycle. The length of a
path or cycle is the number of edges traversed in the path or cycle. In a directed graph, we
have directed paths because traversal of edges is only allowed in the direction of the edges.
In Figure 2.7, v4, v3, v6, v4, v2 is a walk; v4, v3 is a path; v4, v3, v6, v4, v2 is a trail; and
v4, v3, v6, v4 is both a tour and a cycle. A graph has a Hamiltonian cycle if it has a cycle
such that all the nodes in the graph are visited. It has an Eulerian tour if all the edges are
traversed only once
• Special Graphs Using general concepts defined thus far, many special graphs can be
defined. These special graphs can be used to model different problems. We review some
well-known special graphs and their properties in this section.
• Trees and Forests Trees are special cases of undirected graphs. A tree is a graph structure
that has no cycle in it. In a tree, there is exactly one path between any pair of nodes. A
graph consisting of set of disconnected trees is called a forest
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 59
Essentials of Social graphs(CO2)

• Special Subgraphs Some subgraphs are frequently used because of their properties. Two such
subgraphs are discussed here.
• Spanning Tree:- For any connected graph, the spanning tree is a subgraph and a tree that
includes all the nodes of the graph. Obviously, when the original graph is not a tree, then its
spanning tree includes all the nodes, but not all the edges. There may exist multiple spanning
trees for a graph. For a weighted graph and one of its spanning trees, the weight of that
spanning tree is the summation of the edge weights in the tree. Among the many spanning
trees found for a weighted graph, the one with the minimum weight is called the minimum
spanning tree (MST) .
• Complete Graphs:- A complete graph is a graph where for a set of nodes V, all possible edges
exist in the graph. In other words, all pairs of nodes are connected with an edge. Hence, |E| =
|V| 2 ! . Complete graphs with n nodes are often denoted as Kn. K1, K2, K3, and K4 .
• Planar Graphs:- A graph that can be drawn in such a way that no two edges cross each other
(other than the endpoints) is called planar. A graph that is not planar is denoted as nonplanar.
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 60
Essentials of Social graphs(CO2)

• Bipartite Graphs A bipartite graph G(V, E) is a graph where the node set can be partitioned into two sets such
that, for all edges, one endpoint is in one set and the other endpoint is in the other set. In other words, edges

V = VL ∪ VR, (2.18) VL ∩ VR = ∅, (2.19) E ⊆ VL × VR.

connect nodes in these two sets, but there exist no edges between nodes that belong to the same set. Formally,

• Regular Graphs A regular graph is one in which all nodes have the same degree. A regular graph where all
nodes have degree 2 is called a 2-regular graph. More generally, a graph where all nodes have degree k is
called a k-regular graph.
• we discuss two traversal algorithms:
• depth-first search (DFS) and breadth-first search (BFS). Depth-First Search (DFS) Depth-first search (DFS)
starts from a node vi , selects one of its neighbors vj ∈ N(vi), and performs DFS on vj before visiting other
neighbors in N(vi). In other words, DFS explores as deep as possible in the graph using one neighbor before
backtracking to other neighbors. Consider a node vi that has neighbors vj and vk ; that is, vj , vk ∈ N(vi). Let
vj(1) ∈ N(vj) and vj(2) ∈ N(vj) denote neighbors of vj such that vi , vj(1) , vj(2). Then for a depth-first search
starting at vi , that visits vj next, nodes vj(1) and vj(2) are visited before visiting vk . In other words, a deeper
node vj(1) is preferred to a neighbor vk that is closer to vi . Depth-first search can be used both for trees and
graphs, but is better visualized using trees

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 61

Essentials of Social graphs(CO2)

• Breadth-First Search (BFS) Breadth-first search (BFS) starts from a

node, visits all its immediate neighbors first, and then moves to the
second level by traversing their neighbors. Like DFS, the algorithm
can be used both for trees and graphs and is provided in Algorithm

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 62

Essentials of Social graphs(CO2)

Algorithm 2.4
• Dijkstra’s Shortest Path Algorithm Require:
• Start node s, weighted graph/tree G(V, E, W)
• return Shortest paths and distances from s to all other nodes.
• for v ∈ V do
• distance[v] = ∞;
• predecessor[v] = −1;
• end for
• distance[s] = 0;
• unvisited = V;
• while unvisited , ∅ do
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 63
Essentials of Social graphs(CO2)
• smallest = arg minv∈unvisited distance(v);
• if distance(smallest)==∞ then
• break;
• end if
• unvisited = unvisited \ {smallest};
• currentDistance = distance(smallest);
• for adjacent node to smallest: neighbor ∈ unvisited do
• newPath = currentDistance+w(smallest, neighbor);
• if newPath < distance(neighbor) then
• distance(neighbor)=newPath;
• predecessor(neighbor)=smallest;
• end if
• end for
• end while
• Return distance[] and predecessor[] arrays
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 64
Essentials of Social graphs(CO2)

• Algorithm Prim’s Algorithm Require:

• Connected weighted graph G(V, E, W)
• return Spanning tree T(Vs , Es)
• Vs = {a random node from V};
• Es = {};
• while V , Vs do
• e(u, v) = argmin(u,v),u∈Vs ,v∈V−Vs w(u, v)
• Vs = Vs ∪ {v};
• Es = Es ∪ e(u, v);
• end while
• Return tree T(Vs , Es) as the minimum spanning tree;
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 65
Essentials of Social graphs(CO2)

• Algorithm Ford-Fulkerson Algorithm Require:

• Connected weighted graph G(V, E, W), Source s, Sink t
• return A Maximum flow graph
• ∀(u, v) ∈ E, f(u, v) = 0
• while there exists an augmenting path p in the residual graph
GR do
• Augment flows by p
• end while
• Return flow value and flow graph;
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 66
Essentials of Social graphs(CO2)
• Algorithm Bridge Detection Algorithm Require:
• Connected graph G(V, E)
• return Bridge Edges
• bridgeSet = {}
• for e(u, v) ∈ E do
• G 0 = Remove e from G
• Disconnected = False;
• if BFS in G 0 starting at u does not visit v then
• Disconnected = True;
• end if
• if Disconnected then
• bridgeSet = bridgeSet ∪ {e}
• end if
• end for
• 06/19/2025
Return bridge Dr. Atul Pratap Singh Social Media Analytics Unit 2 67
Essentials of Social graphs(CO2)
• Directed Edges and Directed Graphs:-

Edges can have directions. A directed edge is sometimes called an arc Edges are represented
using their end-points e(v2,v1). In undirected graphs both representations are the same

• Neighborhood and Degree (In-degree, out-degree):-

•
For any node v, the set of nodes it is connected to via an edge is called its neighborhood and
is represented as N(v)The number of edges connected to one node is the degree of that node
(the size of its neighborhood)Degree of a node i is usually presented using notation di In case
of directed graphs In-degrees is the number of edges pointing towards a node Out-degree is
the number of edges pointing away from a node

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 68

Social Networks(CO2)
• Social Networks have been a major part of everyone's lives since the evolution of the web into Web 2.0 which
emphasizes on user-generated content, usability and interoperability. A social network can formally defined as a
platform to build social relations among people who share similar interests, backgrounds or real life connections.
According to a survey conducted by PewResearchCenter (2015), 72% of American adult internet users use
Facebook, as indicated in Table 1. This accounts to about 62% of the entire American adult population.
• Table 1. Percentage of Social Network Users among American adult Internet Users

Social Network Internet Users

Facebook 72.00%
Pinterest 31.00%
Instagram 28.00%
LinkedIn 25.00%
Twitter 23.00%

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 69

Continue.. (CO2)

As there is a huge number of users for Social Networks, there is a

lot of data generated. Extracting knowledge from this data can
give us a lot of useful information. This is done through social web
mining algorithms and techniques. Social Network Mining is a hot
research topic since it combines two very interesting research
topics: Web Data Mining and Social Network Analysis. Social
Network Mining discusses a lot more disciplines than discussed
above such as Machine Learning, Network Analysis, Sociology,
Ethnography, Statistics and may more.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 70

Network Models (CO2)

• In social media, many social networks contain millions of nodes and billions of
edges. These complex networks have billions of friendships, the reasons for
existence of most of which are obscure. Humbled by the complexity of these
networks and the difficulty of independently analyzing each one of these friendships,
we can design models that generate, on a smaller scale, graphs similar to real-world
networks. On the assumption that these models simulate properties observed in real-
world networks well, the analysis of real-world networks boils down to a cost-
efficient measuring of different properties of simulated networks. In addition, these
models • allow for a better understanding of phenomena observed in realworld
networks by providing concrete mathematical explanations and • allow for controlled
experiments on synthetic networks when rea l world networks are not available.
• We discuss three principal network models in this chapter: the random graph model,
the small-world model, and the preferential attachment model.
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 71
Network Models (CO2)

Properties of Real-World Networks

• Real-world networks share common characteristics. When designing network models, we aim
to devise models that can accurately describe these networks by mimicking these common
characteristics. To determine these characteristics, a common practice is to identify their
attributes and show that measurements for these attributes are consistent across networks. In
particular, three network attributes exhibit consistent measurements across real-world
networks: degree distribution, clustering coefficient, and average path length.
• Degree Distribution : Consider the distribution of wealth among individuals. Most individuals
have an average amount of capital, whereas a few are considered extremely wealthy. In fact, we
observe exponentially more individuals with an average amount of capital than wealthier ones.
Similarly, consider the population of cities. A few metropolitan areas are densely populated,
whereas other cities have an average population size. In social media, we observe the same
phenomenon regularly when measuring popularity or interestingness for entities.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 72

Network Models (CO2)

• Clustering Coefficient : In real-world social networks, friendships are highly

transitive. In other words, friends of an individual are often friends with one
another. These friendships form triads of friendships that are frequently
observed in social networks. These triads result in networks with high average
[local] clustering coefficients.
• Average Path Length : In real-world networks, any two members of the
network are usually connected via short paths. In other words, the average path
length is small. This is known as the small-world phenomenon. In the well-
known small-world experiment conducted in the 1960s by Stanley Milgram,
Milgram conjectured that people around the world are connected to one another
via a path of at most six individuals (i.e., the six degrees of separation).
Similarly, we observe small average path lengths in social networks
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 73
Network Models (CO2)

• Random Graphs : We start with the most basic assumption on how friendships can be formed:
Edges (i.e., friendships) between nodes (i.e., individuals) are formed randomly. The random graph
model follows this basic assumption. In reality Degrees of Separation friendships in real-world
networks are far from random.
• By assuming random friendships, we simplify the process of friendship formation in real-world
networks, hoping that these random friendships ultimately create networks that exhibit common
characteristics observed in real-world networks. Formally, we can assume that for a graph with a fixed
number of nodes n, any of the n 2 edges can be formed independently, with probability p. G(n, p)
This graph is called a random graph and we denote it as the G(n, p) model.
• This model was first proposed independently by Edgar Gilbert [100] and Solomonoff and Rapoport
[262]. Another way of randomly generating graphs is to assume that both the number of nodes n and
the number of edges m are fixed. However, we need to determine which m edges are selected from
the set of n 2 possible edges. Let Ω denote the set of graphs with n nodes and m edges. To generate a
random graph, we can uniformly select one of the graphs in Ω. The number of graphs with n nodes
and m 1 edges (i.e., |Ω|) is |Ω| = n 2 m ! . (4.3) The uniform random graph selection probability is 1 |
Ω|
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 74
Network Models (CO2)

• Small-World Model : The assumption behind the random graph model is that
connections in real-world networks are formed at random. Although unrealistic,
random graphs can model average path lengths in real-world networks properly,
but underestimate the clustering coefficient. To mitigate this problem, Duncan J.
Watts and Steven Strogatz in 1997 proposed the small-world model. In real-world
interactions, many individuals have a limited and often at least, a fixed number of
connections. Individuals connect with their parents, brothers, sisters, grandparents,
and teachers, among others. Thus, instead of assuming random connections, as we
did in random graph models, one can assume an egalitarian model in real-world
networks, where people have the same number of neighbors (friends). This again is
unrealistic; however, it models more accurately the clustering coefficient of real-
world networks. In graph theory terms, this assumption is equiva- Regular Ring
Lattice lent to embedding individuals in a regular network.
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 75
Network Models (CO2)

• Preferential Attachment Model : There exist a variety of scale-free network-

modeling algorithms. A well established one is the model proposed by
Barabasi and Albert [24]. The ´ model is called preferential attachment or
sometimes the Barabasi-Albert ´ (BA) model and is as follows: When new
nodes are added to networks, they are more likely to connect to existing
nodes that many others have connected to. This connection likelihood is
proportional to the degree of the node that the new node is aiming to connect
to. In other words, a rich-getricher phenomenon or aristocrat network is
observed where the higher the node’s degree, the higher the probability of
new nodes getting connected to it. Unlike random graphs in which we assume
friendships are formed randomly, in the preferential attachment model we
assume that individuals are more likely to befriend gregarious others
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 76
Network Models (CO2)

• Require: Graph G(V0, E0), where |V0| = m0 and dv ≥ 1 ∀ v ∈ V0, number of expected connections
m ≤ m0, time to run the algorithm t
• : return A scale-free network
• : //Initial graph with m0 nodes with degrees at least 1
• : G(V, E) = G(V0, E0);
• : for 1 to t do
• : V = V ∪ {vi}; // add new node vi
• : while di , m do
• : Connect vi to a random node vj ∈ V, i , j ( i.e., E = E ∪ {e(vi , vj)} ) with probability P(vj) = dj P k
dk
• : end while
• : end for
• : Return G(V, E)
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 77
Information Diffusion in social media. (CO2)
• Diffusion is the process by which information is spread from one place to another
through interactions. It is a field that encompasses techniques from a plethora of
sciences and techniques from different fields such as sociology, epidemiology, and
ethnography. Of course, everyone is interested in not getting infected by a contagious
disease. The diffusion process involves three main elements as follows:
• Sender. A sender (or a group of senders) is responsible for initiating the diffusion
process.
• Receiver. A receiver (or a group of receivers) receives the diffusion information from
the sender. Commonly, the number of receivers is higher than the number of senders.
• Medium. This is the channel through which the diffusion information is sent from the
sender to the receiver. This can be TV, newspaper, social media (e.g., a tweet on
Twitter), social ties, air (in the case of a disease spreading process), etc.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 78

Information Diffusion in social media.(CO2)
• From a network point of view: how is the diffusion process handed over? In fact,
social relations play a significant role. They are the channels by which social
contagion and persuasion are done. Particularly, the structural positions of persons
and their personal characteristics make some people more ready to adopt the
innovation than others. Networks with different patterns of connection have different
properties regarding how things are propagated, which have significant implications
for interventions into, for example, rumor propagation.

• A diffusion starts with an adopter (or a few number of adopters) who spreads the
innovation to others. Innovation typically represents newness, it is not the same thing
as invention, it is both a process and an outcome, and it involves discontinuous
change.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 79

Information Diffusion in social media.(CO2)
• Those who adopt early are often too innovative to be influential in a local
network. They contaminate their contacts who in turn contaminate their
contacts and so on. The more people a person is linked to, the greater the
chances that that person will adopt the innovation. At a larger scale, and since
communities are interlinked, it is very likely that an innovation jumps from
one community to another via boundary spanners (or bridges) and starts over
diffusing again. It is a characteristic of social networks.
• However, any diffusion process can be expedited, delayed, or even stopped if it
is discovered that the product (e.g., a video, an audio, a book, etc.) is faulty,
and it should be fixed and then released again. This process is called
an intervention. Intervention can be achieved via several methods such as
stopping the production of the product, limiting the distribution of the product,
restricting the exposure to the product, reducing the interest in the product, or
reducing interactions within the population. In any way, intervention processes
can cause damage to the work of small companies as many customers will no
longer trust the products that are produced by these companies.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 80

MCQs(CO2)
1) What is the height of Google page rank a web page contains;
A)500
B) 100
C) 10
D) None of the above

2) Which of the following process is not involved in the data mining process?
A) Data exploration
B) Data transformation
C) Data archaeology
D) Knowledge extraction

3) Which of the following process uses intelligent methods to extract data patterns?
A) Data mining
B) Text mining
C) Warehousing
D) Data selection

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 81

MCQs(CO2)
4) What are the chief functions of the data mining process?
A) Prediction and characterization
B) Cluster analysis and evolution analysis
C) Association and correction analysis classification
D) All of the above

5) Data used to build a data mining model.

A) training data
B) hidden data
C) test data
D) validation data

6)Application of machine learning methods to large databases is called__________________

A) big data computing
B) artificial intelligence
C) data mining
D) internet of things
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 82
MCQs(CO2)
7 ) A person trained to interact with a human expert in order to capture their knowledge .
A) knowledge developer
B) knowledge programmer
C) knowledge engineer
D) knowledge extractor

8) Social network analysis is process of investigating through use of and __

A) Edges, Graph
B) Vector,graph
C) network , Graph
D) Vector, Edges

9) ____________is a cloud-based text and social networks analyzer

A) Cytoscape
B) Gephi
C) Pajek
D) Netlytic
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 83
MCQs(CO2)
10) Page rank is the method
A ) For rating the importance of web pages objectively and mechanically.
B) Used in google search engine
C) A simple way to count the number of times a web page is citated
D) All of the above

11) In any directed graph if all edges are reciprocal, can have maximum of |E|=
A)1
B)0
C)2
D)None of the above

12) A pair of nodes said to be structurally equivalent to the extent that

A) They occupy identical locations in a network
B) They are connected to exactly the same others
C)They have the identical relations to all outside actors.
D)All of the above
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 84
MCQs(CO2)
13)Which of the following is the most viral section of the internet?
A) Chat Messenger
B)Social networking sites
C)Tutorial sites
D)Chat-rooms

14) Which of the following is not an appropriate measure for securing social networking accounts?
A) Strong passwords
B) Link your account with a phone number
C) Never write your password anywhere
D) Always maintain a soft copy of all your passwords in your PC

15) ________________ is a popular tool to block social-media websites to track your browsing activities.
A ) Fader
B) Blur
C) Social-Media Blocker
D) Ad-blocker
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 85
MCQs(CO2)
16) Increase your security for social media account by always ____________ as you step away from the
system.
A) signing in
B) logging out
C) signing up
D) logging in

17) Which of the following activities is NOT a data mining task?

A)Predicting the future stock price of a company using historical records
B)Monitoring and predicting failures in a hydropower plant
C)Extracting the frequencies of a sound wave
D)Monitoring the heart rate of a patient for abnormalities

18)Social networks are great distribution channel for ___________

A) Customer feedback
B) Viral Content
C) exclusive coupons
06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 86
D) marketing messages
MCQs(CO2)
19) Which of the following is valuable in increasing page rank?
A) paying for placement
B) static content
C) quantity of links from other highly ranked pages to your site
D) None of Above

20) ________________is cross-platform user friendly tool that allows you to draw social
network
A) VOSViewer
B) Social Network Visualizer
C) Commetrix
D) Cuttlefish

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 87

GLOSSARY QUESTION (CO2)
1. software system designed to search for information on the WWW 1. web search

2. website that brings people together to talk 2. social network

3. Inverted Index can only be used for 3. Boolean queries
4. includes the concept of social listening 4. Social media analytics
5. creation of knowledge from structured and unstructured sources. 5. Knowledge extraction

1. is an important social process 1. Diffusion of innovation

2. Web Crawler is also called as 2. Web Spider
3. process where intelligent methods are applied to extract data patterns. 3. Data mining
4. Clustering is a common data mining technique 4.Unsupervised
5. Data mining is also known as 5.KDD
6. integral part of every marketing strategy 6. Social media
7. merges the results of two (or more) queries. 7.OR operator
8. intersects the results of two (or more) queries. 8. AND operator
9. type of sites are known as friend-of-a-friend site. 9.Social networking sites

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 88

GLOSSARY QUESTION (CO2)

1. are organized primarily around people 1. Social networks

2. The size of the graph is 2. |V|= n
3. A directed edge is sometimes called an 3. arc.
4. Edges can have 4. directions..
5. Social media networks have very sparse 5. adjacency matrices
6. In a web graph, “nodes” represent 6.sites
7. allows efficient, full-text searches in the database 7.The inverted index
8. Social media analytics collects and analyzes 8.audience data from social
networks
9. discover and extract information from Web 9. web mining
10. Web mining is used to 10.predict user behavior.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 89

RECAP OF UNIT (CO2)

• Data Mining: the process of discovering hidden and actionable patterns from data
• Aggregation – It is performed when multiple features need to be combined into a
single one or when the scale of the features change
• A decision tree is learned from the dataset – (training data with known classes) •
The learned tree is later applied to predict the class attribute value of new data –
(test data with unknown classes) – Only the feature values are known
• A search engine is a software system designed to carry out web searches. The
most productive way to conduct a search on the internet is through a search
engine
• Vector Space Model In the vector space model, we are given a set of documents
D. Each document is a set of words.

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 90

Thank You

06/19/2025 Dr. Atul Pratap Singh Social Media Analytics Unit 2 91

CSEC Office Administration June 2015 P2
No ratings yet
CSEC Office Administration June 2015 P2
20 pages
Social Media Analytics
No ratings yet
Social Media Analytics
7 pages
Syllabus - CIS 509 Data Mining II (Fall 2019)
No ratings yet
Syllabus - CIS 509 Data Mining II (Fall 2019)
7 pages
Unit 3 - SMA - PPT
No ratings yet
Unit 3 - SMA - PPT
89 pages
Unit 4 - SMA - PPT
No ratings yet
Unit 4 - SMA - PPT
134 pages
Unit5 Sma
No ratings yet
Unit5 Sma
94 pages
Unit 4 - SMA - PPT
No ratings yet
Unit 4 - SMA - PPT
134 pages
Unit2 - M. Abdul Mateen
No ratings yet
Unit2 - M. Abdul Mateen
118 pages
Unit4 - M. Abdul Mateen
No ratings yet
Unit4 - M. Abdul Mateen
141 pages
Unit3 - M. Abdul Mateen
No ratings yet
Unit3 - M. Abdul Mateen
93 pages
Unit 1 - SMA-1
No ratings yet
Unit 1 - SMA-1
97 pages
Unit1 Social Media Analytics
No ratings yet
Unit1 Social Media Analytics
94 pages
Social Media Analytics, Video Analytics, Data Management For ML
No ratings yet
Social Media Analytics, Video Analytics, Data Management For ML
3 pages
Social Media Analytics
No ratings yet
Social Media Analytics
2 pages
Unit 1 Data Analytics
No ratings yet
Unit 1 Data Analytics
81 pages
19ECB455 Syllabus
No ratings yet
19ECB455 Syllabus
3 pages
CIS600 Prin SMDM Spring 2025
No ratings yet
CIS600 Prin SMDM Spring 2025
7 pages
Social Network Analysis
No ratings yet
Social Network Analysis
2 pages
B.Tech. CSE (Arttificial Intelligence & Machine Learning) Syllabus 3rd Year 2024-25 (1) - Removed
No ratings yet
B.Tech. CSE (Arttificial Intelligence & Machine Learning) Syllabus 3rd Year 2024-25 (1) - Removed
3 pages
SWA Unit 3
No ratings yet
SWA Unit 3
25 pages
AIDS Syllabus 218 220
No ratings yet
AIDS Syllabus 218 220
3 pages
STM
No ratings yet
STM
2 pages
DS SEM 8 Curriculum
No ratings yet
DS SEM 8 Curriculum
3 pages
Social Network Analysis
No ratings yet
Social Network Analysis
52 pages
SMA Syllabus2025
No ratings yet
SMA Syllabus2025
5 pages
Atma Qbank CSBS
No ratings yet
Atma Qbank CSBS
8 pages
Social Network Analysis
No ratings yet
Social Network Analysis
18 pages
Unit 4 DA Revised
No ratings yet
Unit 4 DA Revised
102 pages
Web and Social Media Analytics
No ratings yet
Web and Social Media Analytics
9 pages
SYCS Minor Syllabus
No ratings yet
SYCS Minor Syllabus
12 pages
WIP - Tentative-Social Media Analytics-Course Handout 2023-24 - v1
No ratings yet
WIP - Tentative-Social Media Analytics-Course Handout 2023-24 - v1
5 pages
Da 33
No ratings yet
Da 33
76 pages
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-07-15 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-07-15 Reference-Material-I
69 pages
Sma Syllabus
No ratings yet
Sma Syllabus
3 pages
Da Handbook
No ratings yet
Da Handbook
18 pages
Social Network Analytics Session1
No ratings yet
Social Network Analytics Session1
35 pages
Unit 3
No ratings yet
Unit 3
99 pages
Syllabus - Social, Web and Mobile Analytics
No ratings yet
Syllabus - Social, Web and Mobile Analytics
7 pages
Syllabus: Social Network Analysis - CS8085
No ratings yet
Syllabus: Social Network Analysis - CS8085
1 page
Social Media Analytics
No ratings yet
Social Media Analytics
2 pages
Syllabus Sem 6
No ratings yet
Syllabus Sem 6
14 pages
MSC Data Science
No ratings yet
MSC Data Science
20 pages
Lec PH618.01
No ratings yet
Lec PH618.01
20 pages
ASTMA Assingments 231108 113739
No ratings yet
ASTMA Assingments 231108 113739
2 pages
SMA Syllabus2025
No ratings yet
SMA Syllabus2025
2 pages
WSMA-Teaching Plan Final 2023-24
No ratings yet
WSMA-Teaching Plan Final 2023-24
5 pages
815CSE02-Social Network Analysis
No ratings yet
815CSE02-Social Network Analysis
2 pages
Scheme of Work - Data Mining
No ratings yet
Scheme of Work - Data Mining
8 pages
Da Quantum
No ratings yet
Da Quantum
143 pages
M.tech II YEAR Computer Science and Engineering 22-04-2025
No ratings yet
M.tech II YEAR Computer Science and Engineering 22-04-2025
13 pages
DM Courses 9
No ratings yet
DM Courses 9
14 pages
Sen 935
No ratings yet
Sen 935
4 pages
Unit 2
No ratings yet
Unit 2
119 pages
Unit II Data Analytics
100% (1)
Unit II Data Analytics
17 pages
Data Analytics
No ratings yet
Data Analytics
4 pages
Pa - PPT Unit 4
100% (1)
Pa - PPT Unit 4
96 pages
Unit 1
No ratings yet
Unit 1
34 pages
Mit Sap Big Data and Social Analytics Online Short Course Brochure
No ratings yet
Mit Sap Big Data and Social Analytics Online Short Course Brochure
9 pages
Data Analytics Quantum
No ratings yet
Data Analytics Quantum
143 pages
COMP4332/RMBI4310: Big Data Mining and Management Advanced Data Mining For Risk Management and Business Intelligence
No ratings yet
COMP4332/RMBI4310: Big Data Mining and Management Advanced Data Mining For Risk Management and Business Intelligence
45 pages
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next (English Edition)
From Everand
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next (English Edition)
Dr. Gypsy Nandi
No ratings yet
Strategic Analysis Rubric Expanded
No ratings yet
Strategic Analysis Rubric Expanded
4 pages
Design and Implementation of Smart Micro-Grid and Its Digital Replica: First Steps
No ratings yet
Design and Implementation of Smart Micro-Grid and Its Digital Replica: First Steps
7 pages
Finding N The Business Day in Peoplesoft
No ratings yet
Finding N The Business Day in Peoplesoft
3 pages
OVERVIEW Cost Quality
No ratings yet
OVERVIEW Cost Quality
2 pages
Kirubel
No ratings yet
Kirubel
26 pages
Cost Management A Strategic Emphasis 8th Edition Blocher Digital Access
100% (2)
Cost Management A Strategic Emphasis 8th Edition Blocher Digital Access
405 pages
Macbag Msb-I Feb2012
No ratings yet
Macbag Msb-I Feb2012
1 page
2003 Roger Parker - Content Generator
No ratings yet
2003 Roger Parker - Content Generator
167 pages
Completion Diagram: Reda Discharge: UT Pump Oring Oring B/u LT Pump
No ratings yet
Completion Diagram: Reda Discharge: UT Pump Oring Oring B/u LT Pump
2 pages
PFC Users Guide PDF
No ratings yet
PFC Users Guide PDF
332 pages
GoWork Event Space & Price Details (2024)
No ratings yet
GoWork Event Space & Price Details (2024)
29 pages
Swami Samarth Aarti - Google Search
No ratings yet
Swami Samarth Aarti - Google Search
1 page
Dectection Theory Packet
No ratings yet
Dectection Theory Packet
4 pages
ATHE Level 6 Diploma in Business Management (120 Credits)
0% (1)
ATHE Level 6 Diploma in Business Management (120 Credits)
4 pages
Wuthering Heights Timeline Project Questions
No ratings yet
Wuthering Heights Timeline Project Questions
2 pages
Solutions On Quiz 1
No ratings yet
Solutions On Quiz 1
6 pages
NA DeiselShip Latest
No ratings yet
NA DeiselShip Latest
105 pages
U Zaw Lin Aung (Chemistry) Grade 10 Time Allowed: 1:30hours
No ratings yet
U Zaw Lin Aung (Chemistry) Grade 10 Time Allowed: 1:30hours
1 page
Unit One
No ratings yet
Unit One
14 pages
The Teaching Profession 2
No ratings yet
The Teaching Profession 2
11 pages
HIV Prevention in Ethiopia National Road Map 2018 - 2020 FINAL - FINAL
No ratings yet
HIV Prevention in Ethiopia National Road Map 2018 - 2020 FINAL - FINAL
52 pages
AP05 Audit of Receivables
No ratings yet
AP05 Audit of Receivables
4 pages
Flange Pad Calcs
No ratings yet
Flange Pad Calcs
4 pages
AS CRJ Vol5 Aircraft Operating Manual Part 2
No ratings yet
AS CRJ Vol5 Aircraft Operating Manual Part 2
136 pages
Unit 2 - Approaches To Tourism Entrepreneurship
100% (1)
Unit 2 - Approaches To Tourism Entrepreneurship
16 pages
Solution Test2
No ratings yet
Solution Test2
6 pages
Cylinder Liner - Production Recommendation 0742048 3
No ratings yet
Cylinder Liner - Production Recommendation 0742048 3
17 pages
211 CRT Cable Disconnected Loc1 SM 4 139 Scanner Power Cable Out Loc3 LRG 2 149 Printer Paper Jam Loc2 MED 3
No ratings yet
211 CRT Cable Disconnected Loc1 SM 4 139 Scanner Power Cable Out Loc3 LRG 2 149 Printer Paper Jam Loc2 MED 3
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.