0% found this document useful (0 votes)

13 views118 pages

Unit2 - M. Abdul Mateen

The document outlines the curriculum for a course on Social Media Analytics at the Noida Institute of Engineering and Technology, focusing on various aspects of web mining, sentiment analysis, and social media mining. It includes details about the faculty, evaluation schemes, course objectives, outcomes, and specific units covering topics like web structure mining, web analytics, and recent trends in data analysis. The course aims to equip students with skills in data mining, machine learning, and social network analysis for practical applications in various fields.

Uploaded by

Vipul Thakur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views118 pages

Unit2 - M. Abdul Mateen

Uploaded by

Vipul Thakur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 118

Noida Institute of Engineering and Technology, Greater Noida

WEB-MINING

Unit: 2

Social Media
Analytics(ACSAI0622N)
Mr. M. Abdul Mateen Siddiqui
(Assistant Professor)
B. Tech. 6th Sem Department of CSE (Cyber Security)

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2

1
Faculty Introduction

• Name : Mr. M. Abdul Mateen Siddiqui

• Qualification: B. Tech (CSE), M.Tech (CSE-CFIS), Ph.D
(Pursuing)
• Area of Specialization: Cyber Forensics, Information
Security, Digital Forensics, Smart Wearable Devices and IoT
Devices

M. Abdul Mateen Siddiqui ACSAI0622N SOCIAL MEDIA ANALYTICS

4/27/2025 2
Unit-2
EVALUATION SCHEME (CYS)

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 3

EVALUATION SCHEME (AIML)

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 4

Syllabus

UNIT-I:SENTIMENT MINING

Overview: Text and Sentiment Mining, Semantic Analysis

Applications, Sentiment Analysis Process, Speech Analytics, Text
Representation- tokenization, stemming, stop words, TF-IDF, Feature
Vector Representation, NER, N-gram modelling, Text Clustering, Text
Classification, Topic Modelling-LDA, HDP. Sentiment Classification,
feature based opinion mining, comparative sentence, and relational
mining, Opinion summarization, Opinion spam detection.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 5

Syllabus

UNIT-II:WEB MINING
Web Mining Overview, Web Structure Mining, Search
Engine, Web Analytics, Machine Learning for extracting
knowledge from the web, Inverted indices and Boolean
queries. PLSI, Query optimization, SEO, page ranking,
Social Graphs (Interaction, Latent and Following Graphs),
Ethics of Scraping, Static data extraction and Web Scraping
using Python

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 6

Syllabus

UNIT-III: MINING SOCIAL MEDIA

Introduction to Social Media Mining, Challenges in Social

Media Mining, Process of Social media Mining, Essentials
of Social graphs and its types, Social Networks Measures,
Network Models, Information Diffusion in social media,
Behavioural Analytics, Influence and Homophily,
Recommendation in social media.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 7

Syllabus

UNIT-IV: TEXT SUMMARIZATION

Introduction to Text Summarization, Text extraction,

classification and clustering, Anomaly and Trend Detection,
Text Processing, N-gram Frequency Count and Phrase
Mining, Page Rank and Text Rank Algorithm, LDA Topic
Modelling, Machine-Learned Classification and Semantic
Topic Tagging, Python libraries for Text
Summarization(NumPy, Pandas, NLTK, Matplotlib)

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 8

Syllabus

UNIT-V: RECENT TRENDS

Trend Analysis, Types of trend analysis, Recent Trends in

Text, Data Localization, Role of Web Mining in E-Commerce,
Social Media Analytics, Social Media Analytics tools.
Case Studies: Facebook Insights Using Python, Sentiment
and Text Mining of Twitter data and Google analytics.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 9

Branch Wise Application

1.Security
2. Digital Advertising
3. E-Commerce
4. Publishing
5. Massively Multiplayer Online Games
6. Backend Services and Messaging
7. Project Management & Collaboration
8. Real time Monitoring Services
9.Live Charting and Graphing
10. Group and Private Chat

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 10

Course Objective

To understand text mining and social media data analytic activities

and apply the complexities of processing text and network data
from different data sources.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 11

Course Outcomes (COs)

At the end of course, the student will be

able to:
Design new solutions to opinion extraction, sentiment classification and
data summarization problems.
Apply a wide range of classification ,clustering ,estimation and prediction
algorithms on web data.
Perform social network analysis to identify important social actors, subgroups and
network properties in social media sites.

Interpret the terminologies ,metaphors of text summarization.

Apply state of the art mining tools and libraries on realistic data sets as a basic
for business decisions and applications.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 12

Program Outcomes (POs)

Engineering Graduates will be able

to:
PO1 : Engineering Knowledge

PO2 : Problem Analysis

PO3 : Design/Development of solutions

PO4 : Conduct Investigations of complex problems

PO5 : Modern tool usage

PO6 : The engineer and society

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 13

Program Outcomes (POs)

Engineering Graduates will be able

to:

PO7 : Environment and sustainability

PO8 : Ethics

PO9 : Individual and teamwork

PO10 : Communication

PO11 : Project management and finance

PO12 : Life-long learning

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 14

COs - POs Mapping

CO.K PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12

CO1 2 2 2 3 3 - - - - - - -

CO2 3 2 3 2 3 - - - - - - -

CO3 3 2 3 2 3 - - - - - - -

CO4 3 2 3 2 3 - - - - - - -

CO5 3 2 3 3 3 - - - - - - -

AVG 2.8 2.0 2.8 2.4 3.0 - - - - - - -

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 15

Program Specific Outcomes(PSOs)

Program Specific
S. No. PSO Description
Outcomes (PSO)

Should be able to understand the concepts of Data

1 PSO1 Science and their applications in the field of Agriculture,
Healthcare, Education, Environment and other relevant
areas.

Should have an ability to apply technical knowledge and

2 PSO2 usage of modern tools and technologies related to Data
Science for solving real world problems.

Should have the capability to analyze, comprehend,

design & develop Data based applications by working
3 PSO3
individually or and a team and thus demonstrating
professional ethics & concern for societal well being
M. Abdul Mateen Siddiqui Social Media Analytics
Unit 2

4/27/2025 16
COs - PSOs Mapping

CO.K PSO1 PSO2 PSO3 PSO4

CO1 3 - - -

CO2 3 2 - -

CO3 3 3 - -

CO4 3 3 - -

CO5 3 3 - -

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 17

Program Educational Objectives (PEOs)

Program Educational
PEOs Description
Objectives (PEOs)
• To produce graduates with a strong foundation of basic science, Statistics &
Engineering and ability to use modern tools and technologies to solve real-
world complex problems/to address ever changing industrial requirements
PEOs globally.

• To produce graduates who can inculcate life-long learning for up-skilling and
re-skilling and get a successful career as data scientist, entrepreneur and
PEOs bureaucrat for goodwill of the society.

• To produce graduates who can exhibit professional ethics and moral values
PEOs with capability of working as an individual and as a team to contribute
M. Abdul
towards theMateen
need Siddiqui Social
of industry andMedia Analytics
society.
Unit 2

4/27/2025 Aarushi Thusu ACSAI0622 Social Media Analytics Unit 5 18

Result Analysis(Department Result & Subject Result & Individual result

Name of the faculty Subject code Result % of clear passed

Aarushi Thusu ACSAI0622 100

M. Abdul Mateen Siddiqui ACSAI0622N SOCIAL MEDIA ANALYTICS

4/27/2025 19
Unit-1
Pattern of External Exam Question Paper (100 marks)

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 20

Pattern of External Exam Question Paper (100 marks)

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 21

Pattern of External Exam Question Paper (100 marks)

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 22

Pattern of Online External Exam Question Paper (100 marks)

M. Abdul Mateen Siddiqui Social Media Analytics Unit 2

4/27/2025 23
Prerequisite / Recap

• Student should have knowledge of Knowledge of Data Analysis Tools and Web Technology.

• Students should have good knowledge of Python Programming and Python coding experience.

• knowledge of Computer and basic skill.

• Good problem solving Skill .

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 24

Brief Introduction about the Subject with videos

YouTube /other Video Links

• https://www.youtube.com/watch?v=Uqs0GewlMkQ
• https://www.youtube.com/watch?v=tUNwSH7671Y&t=2s
• https://slideplayer.com/slide/14222744/

• https://www.youtube.com/watch?v=KjWu1

• dZn00https://www.youtube.com/watch?v=ntOaoW0T604

M. Abdul Mateen Siddiqui Social Media Analytics

Unit 2

4/27/2025 25
Unit Content

• Web Mining
• Web Structure Mining
• Search Engine
• Web Analytics
• Machine Learning for extracting knowledge from the web
• Inverted indices and Boolean queries.
• PLSI
• Query optimization
• Page ranking
• Social graphs
• Ethics of Scraping
• Static Data Extraction
• Web Scraping using Python

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 26

Unit Objective

1. Web mining can help you to discover your customers'

key initiatives and their financial situation.
2. Student will able to understand mining tools that helped
them to identify various criminal activities..
3. Student will able to define web searches.
4. Describe Data Mining and Social Networks
5. Define Information Diffusion in social media.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 27

Topic Objectives: (CO2)
The student will be able to:
• Define Machine Learning for extracting knowledge from the web.
• Give examples of Web Searches.
• Build Inverted indices and Boolean queries.
• Determine page ranking.
• Define Query optimization.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 28

Web Mining

Web mining is the process of extracting valuable information from the

vast data available on the World Wide Web. The internet is an
enormous repository of information, and web mining techniques allow
organizations to leverage this data for various purposes, such as
marketing, customer relationship management, and business
intelligence. In this article, we will answer some questions, such as,
what is web mining, what is the process of web mining in data mining,
what are applications of web mining, and how web mining is different
from data mining.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 29

Web Mining
Web mining refers to the process of discovering and extracting useful
information from a large amount of data available on the World Wide Web. It
involves applying various data mining techniques to web data to identify
patterns, trends, and relationships. Web mining is a multidisciplinary field that
combines techniques from data mining, machine learning, artificial intelligence,
statistics, and information retrieval.

One example of web mining is to analyze website traffic and user behavior. By
analyzing clickstream data and other user interactions with a website,
organizations can gain insights into how users navigate their site, what content is
most popular, and where users are dropping off. This information can be used to
optimize website design and improve user experience.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 30

Applications of Web Mining
• Personalized marketing:
Web mining can be used to analyze customer behavior on websites and social media platforms. This information
can be used to create personalized marketing campaigns that target customers based on their interests and
preferences.
• E-commerce
Web mining can be used to analyze customer behavior on e-commerce websites. This information can be used to
improve the user experience and increase sales by recommending products based on customer preferences.
• Search engine optimization:
Web mining can be used to analyze search engine queries and search engine results pages (SERPs). This information
can be used to improve the visibility of websites in search engine results and increase traffic to the website.
• Fraud detection:
Web mining can be used to detect fraudulent activity on websites. This information can be used to prevent financial
fraud, identity theft, and other types of online fraud.
• Sentiment analysis:
Web mining can be used to analyze social media data and extract sentiment from posts, comments, and reviews.
This information can be used to understand customer sentiment towards products and services and make informed
business decisions.
4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 31
Process of Web Mining

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 32

Types of Web Mining

Focuses on Relationship and Pattern

Relevant Usage Pattern,
Improve search engine algorithm Results Navigation

Enhances Web Navigation, Top Quality, Ranking

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 33
Web Structure Mining

Web structure mining is the application of discovering structure information from

the web. The structure of the web graph consists of web pages as nodes, and
hyperlinks as edges connecting related pages. Structure mining basically shows
the structured summary of a particular website. It identifies relationship between
web pages linked by information or direct link connection. To determine the
connection between two commercial websites, Web structure mining can be very
useful.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 34

Web Structure Mining

Web Structure Mining is one of the three different types of techniques in Web
Mining. In this article, we will purely discuss about the Web Structure Mining.
Web Structure Mining is the technique of discovering structure information
from the web. It uses graph theory to analyze the nodes and connections in the
structure of a website. Web Structure Mining: Web structure mining is
the application of discovering structure information from the web. The
structure of the web graph consists of web pages as nodes, and
hyperlinks as edges connecting related pages. Structure mining
basically shows the structured summary of a particular website. It
identifies relationship between web pages linked by information or
direct link connection. To determine the connection between two
commercial websites, Web structure mining can be very useful.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 35

Web Structure Mining

Depending upon the type of Web Structural data, Web Structure Mining
can be categorized into two types:

1.Extracting patterns from the hyperlink in the Web: The Web works
through a system of hyperlinks using the hyper text transfer protocol
(http). Hyperlink is a structural component that connects the web page
according to different location. Any page can create a hyperlink of any
other page and that page can also be linked to some other page. the
intertwined or self-referral nature of web lends itself to some unique
network analytical algorithms. The structure of Web pages could also be
analyzed to examine the pattern of hyperlinks among pages.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 36

Web Structure Mining

2. Mining the document structure: It is the analysis of tree like structure of web
page to describe HTML or XML usage or the tags usage . There are different
terms associated with Web Structure Mining :
Web Graph: Web Graph is the directed graph representing Web.
Node: Node represents the web page in the graph.
Edge: Edge represents the hyperlinks of the web page in the graph (Web graph)
In degree: It is the number of hyperlinks pointing to a particular node in the
graph.
Out Degree:
Degree: Degree is the number of links generated from a particular node. These
are also called the Out Degrees.
4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 37
Search Engine

The search engines are the software program that provides information
according to the user query. It finds various websites or web pages that are
available on the internet and gives related results according to the search.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 38

Search Engine

Components of Search Engine:

There three components in search engine. They are web crawler, data base,
and search interface:
Web crawler: A search engine uses multiple web crawlers to crawl through
world wide web and gather information. It is basically a software which is
also known bat or spider.
*Discovering URL
*Importance of Web Pages
*Revisiting Web Pages
*Robots.txt Requirement

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 39

Search Engine

Database: The information which is gathered by web crawler by crawling

through internet is stored on the database.
Search Interface: Search interface is just an interface to the data base which
is employed by the user to search through the data base.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 40

Search Engine Optimization (SEO)

• You publish content on your site

• Google bot or spider crawl your site and review pages
• Google indexes your pages
• If your page successfully answer a user’s question your page can appear
on the search engine results pages for their query

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 41

Search Engine Optimization (SEO)

On Page SEO
Involves optimization do on your side
Changes are made to elements that appear on the different pages of your
website
Title, Tags, Keyword placement, Indexing, Content planning, Display ads,
Internal links, Visuals.

Off Page SEO

Involves elements that are not directly on your site but effect your ranking.
Backlinks, Commenting, Sharing links, Social Networking, Email Marketing

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 42

Web Analytics

Web analytics is the gathering, synthesizing, and analysis of website data

with the goal of improving the website user experience. It’s a practice
that’s useful for managing and optimizing websites, web applications, or
other web products. It’s highly data-driven and assists in making high-
quality website decisions. You might also get ideas on how to improve
your product and drive business growth from web analytics.

Product managers, data scientists, UX designers and others can use web
analytics if they’re looking to enhance their website or product experience
to meet customer needs. They need to know which website metrics to
track while also being mindful of the shortcomings of web analytics.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 43

Web Analytics

Web analytics is important to help you:

• Refine your marketing campaigns
• Understand your website visitors
• Analyze website conversions
• Improve the website user experience
• Boost your search engine ranking
• Understand and optimize referral sources
• Boost online sales

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 44

Machine Learning for extracting knowledge from the web
• Knowledge extraction using machine learning refers to the process of extracting meaningful
knowledge or insights from data using machine learning techniques. It involves utilizing algorithms
and models to analyze and interpret data, uncover patterns, and make predictions or decisions
based on the extracted knowledge. This approach has been applied in various fields such as
catalysis , materials research , cybersecurity , and URL analysis for detecting malicious websites . By
leveraging machine learning, researchers are able to optimize reaction conditions, design novel
materials, enhance network security, and detect and classify malicious URLs. The use of machine
learning in knowledge extraction allows for the exploration of complex data sets, the identification
of hidden patterns, and the generation of valuable insights that can inform decision-making and
drive advancements in various domains.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 45

Machine Learning for extracting knowledge from the web

◼ In machine learning-based algorithms, the information underlying the knowledge is extracted

from the data themselves, which are explored and analyzed in search of recurring patterns or to
discover hidden causal associations or relationships. The prediction model extracts knowledge
through an inductive process: the input is the data and, possibly, a first example of the expected
output, the machine will then learn the algorithm to follow to obtain the same result.

◼ Machine Learning-based algorithms autonomously develop their knowledge thanks to the data
patterns received, without the need to have specific initial inputs from the developer. In these
models, the machine can establish by itself the patterns to follow to obtain the desired result,
therefore, the real factor that distinguishes artificial intelligence is autonomy. In the learning
process that distinguishes these algorithms, the system receives a set of data necessary for
training, estimating the relationships between the input and output data: these relationships
represent the parameters of the model estimated by the system.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 46

Machine Learning for extracting knowledge from the web

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 47

Inverted Indices
• An inverted index is an index data structure storing a mapping from
content, such as words or numbers, to its locations in a document or a set
of documents. In simple words, it is a Hashmap like data structure that
directs you from a word to a document or a web page.
• There are two types of inverted indexes: A record-level inverted
index contains a list of references to documents for each word. A word-
level inverted index additionally contains the positions of each word
within a document. The latter form offers more functionality, but needs
more processing power and space to be created.
• Suppose we want to search the texts “hello everyone, ” “this article is
based on inverted index, ” “which is hashmap like data structure”. If we
index by (text, word within the text), the index with location in text is:

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 48

Inverted indices
hello (1, 1)
everyone (1, 2)
this (2, 1)
article (2, 2)
is (2, 3); (3, 2)
based (2, 4)
on (2, 5)
inverted (2, 6)
index (2, 7)
which (3, 1)
hashmap (3, 3)
like (3, 4)
data (3, 5)
structure (3, 6)
• The word “hello” is in document 1 (“hello everyone”) starting at word 1, so has an entry (1, 1) and word “is”
is in document 2 and 3 at ‘3rd’ and ‘2nd’ positions respectively (here position is based on word).

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 49

Inverted indices
• Document 1: The quick brown fox jumped over the lazy dog.
• Document 2: The lazy dog slept in the sun.
• To create an inverted index for these documents, we first tokenize the documents into terms, as follows.
• Document 1: The, quick, brown, fox, jumped, over, the lazy, dog.
• Document 2: The, lazy, dog, slept, in, the, sun.

• The -> Document 1, Document 2

• Quick -> Document 1
• Brown -> Document 1
• Fox -> Document 1
• Jumped -> Document 1
• Over -> Document 1
• Lazy -> Document 1, Document 2
• Dog -> Document 1, Document 2
• Slept -> Document 2
• In -> Document 2
• Sun -> Document 2

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 50

Inverted indices
• Steps to build an inverted index:

• Fetch the Document

Removing of Stop Words: Stop words are most occurring and useless words in document like “I”, “the”,
“we”, “is”, “an”.
• Stemming of Root Word
Whenever I want to search for “cat”, I want to see a document that has information about it. But the
word present in the document is called “cats” or “catty” instead of “cat”. To relate the both words, I’ll
chop some part of each and every word I read so that I could get the “root word”. There are standard
tools for performing this like “Porter’s Stemmer”.
• Record Document IDs
If word is already present add reference of document to index else create new entry. Add additional
information like frequency of word, location of word etc.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 51

Boolean Queries
A Boolean query is a combination of terms and logical operators, such as AND, OR, and
NOT, that specify the criteria for retrieving documents. For example, the query
"information AND retrieval" means to find documents that contain both terms, while
the query "information OR retrieval" means to find documents that contain either
term. A Boolean query can also use parentheses to group terms and operators, such as
"(information OR retrieval) AND NOT index".

Consider the query: BRutus AND CalpuRnia

To find all matching documents using inverted index:
1. Locate BRutus in the dictionary
2. Retrieve its postings list from the postings file
3. Locate CalpuRnia in the dictionary
4. Retrieve its postings list from the postings file
5. Intersect the two postings lists
6. Return intersection to user

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 52

Boolean Queries

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 53

Boolean Queries

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 54

Boolean Queries

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 55

PLSI
Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI,
especially in information retrieval circles) is a statistical technique for the analysis of two-mode and co-
occurrence data. In effect, one can derive a low-dimensional representation of the observed variables in
terms of their affinity to certain hidden variables, just as in latent semantic analysis, from which PLSA
evolved.
• Compared to standard latent semantic analysis which stems from linear algebra and downsizes the
occurrence tables (usually via a singular value decomposition), probabilistic latent semantic analysis is
based on a mixture decomposition derived from a latent class model.

• Latent Variable model for general co-occurrence data Associate each observation (w,d) with a class
variable z Є Z{z_1,…,z_K}

•Generative Model • Select a doc with probability P(d) • Pick a latent class z with probability P(z|d) •
Generate a word w with probability p(w|z)

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 56

Query Optimization
• Query optimization is a process of defining the most efficient and optimal way
and techniques that can be used to improve query performance based on rational
use of system resources and performance metrics. The purpose of query tuning is
to find a way to decrease the response time of the query, prevent the excessive
consumption of resources, and identify poor query performance.
• In the context of query optimization, query processing identifies how to faster
retrieve data from SQL Server by analyzing execution steps of the query,
optimization techniques, and other information about the query.
• Query optimization tips for better performance
• Monitoring metrics can be used to evaluate query runtime, detect performance
pitfalls, and show how they can be improved. For example, they include:
• Execution plan: A SQL Server query optimizer executes the query step by step,
scans indexes to retrieve data, and provides a detailed overview of metrics during
query execution.
• Input/Output statistics: Used to identify the number of logical and physical reading
operations during the query execution that helps users detect cache/memory
capacity issues.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 57

Query Optimization
• Buffer cache: Used to reduce memory usage on the server.
• Latency: Used to analyze the duration of queries or operations.
• Indexes: Used to accelerate reading operations on the SQL
Server.
• Memory-optimized tables: Used to store table data in memory
to make reading and writing operations run faster.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 58

Query Optimization

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 59

Query Optimization

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 60

Query Optimization

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 61

Search Engine

Components of Search Engine:

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 62

Search Engine

Database: The information which is gathered by web crawler by crawling

through internet is stored on the database.
Search Interface: Search interface is just an interface to the data base which
is employed by the user to search through the data base.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 63

Search Engine Optimization (SEO)

• You publish content on your site

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 64

Search Engine Optimization (SEO)

Off Page SEO

Involves elements that are not directly on your site but effect your ranking.
Backlinks, Commenting, Sharing links, Social Networking, Email Marketing

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 65

Page ranking
In social networks the nodes are people, and the connections between the
nodes represent some type relationship between the people in the
network.The page rank was developed by the Google founders when they
were thinking about how to measure the importance of webpages using
the hyperlink network structure of the web. And the basic idea, is that
PageRank will assign a score of importance to every single node. And the
assumption that it makes, is that important nodes are those that have
many in-links from important pages or important other nodes.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 66

Page ranking

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 67

Page ranking

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 68

Page ranking

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 69

Page ranking

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 70

Page ranking

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 71

Page ranking

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 72

SEO

• SEO stands for “search engine optimization.” In simple terms, SEO means the
process of improving your website to increase its visibility in Google, Microsoft
Bing, and other search engines whenever people search for. Ultimately, the goal
of search engine optimization is to help attract website visitors who will become
customers, clients or an audience that keeps coming back.
• Social SEO is the practice of adding text-based features like captions, alt-text,
and closed captions to your posts to help people browsing social platforms
easily find your content.
• To understand social SEO, you need to understand the basics of traditional SEO. In
digital marketing, SEO stands for search engine optimization. Search engines like
Google or Bing allow you to search for information and then serve up a list of web
results that point you to the content you’re looking for. (Or, at least, the content
algorithms think you would want to see based on the search phrase you used,
your location, previous searches, etc.)

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 73

SEO

• Instagram SEO tips:

• Optimize your Instagram profile SEO. Use keywords in your name, handle, and
bio, and include a location if relevant.
• Include relevant keywords and hashtags in the caption. Hiding hashtags in the
comments is no longer effective. Keywords in the caption help your content
appear on keyword search pages.
• Add alt-text. The main purpose of alt-text is to make visual content more
accessible. However, it serves the added benefit of helping Instagram understand
exactly what your content is so it can serve it in response to relevant searches.
• Use subtitles. Instagram has auto-generated captions, which is great for
accessibility, but using subtitles also means your target keyword will appear
onscreen.
• Tag your location. So your content will appear on the new Instagram Maps, which
can function as a local business search.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 74

Ethics of Scraping

"Scraping" in the web context refers to the process of using automated software
(bots) to extract data or content from a website, essentially "collecting" information
from a webpage by analyzing its underlying HTML code to retrieve specific details like
product prices, news articles, or contact information, which can then be stored and
used for various purposes like market research or price comparison

Key points about web scraping:

How it works:
A scraper essentially "visits" a website, reads its HTML code, identifies relevant data
points using specific instructions, and extracts them to be saved in a structured
format like a spreadsheet or database

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 75

Ethics of Scraping
Legality:
While generally legal when used to access publicly available data, scraping can
become problematic if it violates a website's terms of service or involves scraping
private information.

Common use cases:

Price comparison websites: Scraping competitor prices to show the best deals
Market research: Gathering data on consumer trends from online reviews or social
media
News aggregation: Collecting headlines and articles from various news sources
Lead generation: Extracting contact details from business websites

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 76

Ethics of Scraping

Web scraping is an automatic method to obtain large amounts of data from websites.
Most of this data is unstructured data in an HTML format which is then converted
into structured data in a spreadsheet or a database so that it can be used in various
applications. There are many different ways to perform web scraping to obtain data
from websites. These include using online services, particular API’s or even creating
your code for web scraping from scratch. Many large websites, like Google, Twitter,
Facebook, Stack-Overflow, etc. have API’s that allow you to access their data in a
structured format. This is the best option, but there are other sites that don’t allow
users to access large amounts of data in a structured form or they are simply not
that technologically advanced. In that situation, it’s best to use Web Scraping to
scrape the website for data.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 77

Ethics of Scraping

Web scraping requires two parts, namely the crawler and the scraper. The crawler is
an artificial intelligence algorithm that browses the web to search for the particular
data required by following the links across the internet. The scraper, on the other
hand, is a specific tool created to extract data from the website. The design of the
scraper can vary greatly according to the complexity and scope of the project so that
it can quickly and accurately extract the data.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 78

Static Data Extraction
• "Static data extraction" in web mining refers to the process
of retrieving and collecting structured information from a
website without considering dynamic elements like user
interactions or real-time updates; essentially, it involves
extracting data from a webpage as it appears at a specific
point in time, without any dynamic changes or user input
involved, allowing for a snapshot of the information available
on that page

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 79

Static Data Extraction
• Key points about static data extraction:
• Focus on content: This method primarily extracts text
content from a webpage, including headings, paragraphs,
lists, and other visible elements, using techniques like web
scraping.
• No dynamic interaction: Unlike dynamic data extraction,
static extraction does not involve handling elements that
change based on user actions or server-side calculations,
such as JavaScript or AJAX calls.
• Structured output: The extracted data is usually converted
into a structured format like a spreadsheet or database table
for further analysis.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 80

Static Data Extraction
• Example scenarios for static data extraction:
• Product scraping: Extracting product details (name, price,
description) from an online store's product pages.
• Market research: Gathering information about competitors'
pricing and product features from their websites.
• Content analysis: Extracting text content from news articles
or blog posts for sentiment analysis.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 81

Static Data Extraction
• Important considerations when using static data extraction:
• Website structure: The website's HTML structure needs to be
well-defined and consistent for efficient extraction.
• Data cleaning: Extracted data may require cleaning and
formatting to remove unnecessary characters or
inconsistencies.
• Ethical considerations: Always respect website terms of
service and avoid excessive scraping that could disrupt a
website's functionality.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 82

Social Graphs

• A "social graph" refers to a network representation of relationships

between individuals or entities within a social system, where each
person is a node and connections between them are represented as
edges.
• Report of 2010 – Facebook biggest social network database.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 83

Social Graphs

• A social graph is a global mapping of people, groups, organizations and how

they are related in an online social network. When you tag a friend, like a page,
or check in a place on Facebook, you are completing your social graph.

• According to Matt Hartman, one way to think about the complex social graphs of
different platforms is to break them down to four fundamental components:

• Nodes: actors in the network, e.g., people, places

• Data: the content being shared between the nodes, e.g., tweet
• Edges: lines that denote relationships between nodes, e.g., bidirectional
“friend”, single-directional “follow”
• Jumping functions: specific ways to transmit data from one subgroup of users to
another on the same platform, e.g., retweeting, liking.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 84

Social Graphs

• The social graph is a graph that represents social relations between entities. In short, it
is a model or representation of a social network, where the word graph has been taken
from graph theory. The social graph has been referred to as "the global mapping of
everybody and how they're related".
• Social graph is an effective and widely used mathematical tool to represent the
relationships among users, which benefits the analysis of social interactions and user
behavior characterization. Usually, social networks can be modeled as undirected
graphs (e.g., friendship graph, interaction graph) or directed graphs (e.g., latent graph,
following graph). Below figure lists four different types of social graphs. Based on these
graph types, we discuss the connectivity and interaction among users. Moreover, the
huge size of the social graph challenges the effectiveness of analysis. Thus, graph
sampling and crawling techniques have been proposed to deal with this problem. In this
section, we investigate several measurement, analysis, and modeling works related to
the social graph.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 85

Social Graphs

• CENTRALIZED
• Today, the most popular social media sites are run by gigantic tech companies like Meta, Google,
Twitter, ByteDance, with Facebook getting the lion’s share. These platforms are centralized since
all of your interactions are hosted in the company’s servers.
• Pros:
• Production and running costs are covered by platform’s owners in order to attract users in the
first place
• If users forget their account credentials, they can ask for a password reset
• Cons:
• Users don’t get a say in how the platform should be run or how profit is shared

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 86

Social Graphs

• DECENTRALIZED
• Unlike the centralized design, the decentralized networks operate on independently run servers
and are usually powered by blockchain. Users can choose a server (service provider) to sign up
with, and then have access to the entire network across many different servers. Case in point
for the federated design is email protocol. You can sign up with Gmail and still can communicate
with a Yahoo user or with anyone with an email address.
• Pros:
• Enable users to move seamlessly across platforms without rebuilding their social graph at each
destination.
• Cons:
• The production and running costs are split amongst a number of actors.
• Burden of responsibility when it comes to recovering a lost or stolen password

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 87

Social Graphs

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 88

Social Graphs

• Interaction Graph:
• This graph explicitly depicts observable interactions between users, like "likes",
comments, or direct messages on a social media platform. It shows who directly
interacts with whom, based on recorded actions.

• Latent Graph:
• This graph represents potential or hidden relationships between users that may not be
explicitly visible through direct interactions but can be inferred based on shared
interests, similar behavior, or other latent factors. It aims to uncover underlying
connections that might not be readily apparent in the observed interaction data.

• Following Graph:
• This graph specifically captures the "follow" relationships between users, where one
user actively chooses to see updates from another. This is particularly relevant on
platforms like Twitter where users follow others to see their content in their feed.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 89

Social Graphs

• Applications:
• Recommendation systems: Analyzing interaction and latent graphs
can be used to recommend content or people users might be
interested in based on their connections and behavior.
• Social influence analysis: Studying the structure of a following graph
can help identify influential users within a network.
• Link prediction: By identifying potential latent connections,
algorithms can predict future interactions between users who might
not be directly connected yet.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 90

Social Graphs

• An Interaction Social Graph is a type of social graph that represents

relationships between individuals based on their interactions rather than
just their connections. Unlike a traditional social network graph that maps
friendships or followers, an interaction social graph focuses on how
frequently and meaningfully people communicate or engage with each
other.

• Key Features:
1.Nodes: Represent individuals (users, employees, customers, etc.).
2.Edges: Represent interactions such as messages, comments, likes,
meetings, or shared activities.
3.Edge Weights: Often indicate the frequency or intensity of interactions
(e.g., number of messages exchanged, duration of calls).
4.Temporal Aspect: Some interaction graphs evolve over time, capturing
changing relationships.
4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 91
Social Graphs

• Applications:
• Social Media Analysis: Understanding real engagement beyond just
followers or friends.
• Cybersecurity: Detecting unusual or suspicious communication
patterns.
• Organizational Analysis: Identifying key influencers or
communication bottlenecks in a company.
• Recommender Systems: Suggesting friends, collaborators, or content
based on past interactions.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 92

Social Graphs

• A latent social graph is an implicit network of relationships that exists between

individuals but is not explicitly defined or publicly visible. It is derived from
behavioral data, shared interests, interactions, and other indirect connections
rather than direct social links like friendships or follows.
• Key Aspects of a Latent Social Graph:
1.Implicit Connections: Unlike traditional social graphs (e.g., Facebook friend lists),
these relationships are inferred from interactions, such as common website visits,
email exchanges, or co-attendance at events.
2.Behavioral Analysis: It is built by analyzing user behavior, such as likes,
comments, browsing history, or purchasing patterns.
3.Machine Learning & AI: Algorithms detect hidden connections based on user
activity and recommend new connections, content, or advertisements.
4.Privacy & Ethics: Since it relies on inferred data, it raises concerns about privacy
and data security.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 93

Social Graphs

• Examples of Latent Social Graphs:

• LinkedIn’s "People You May Know" feature, which suggests
connections based on mutual interests and interactions.
• Spotify’s music recommendations, which suggest songs based on
listening patterns of users with similar tastes.
• Amazon’s product recommendations, where users are grouped
based on browsing and purchasing behavior.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 94

Social Graphs

• A following social graph represents the relationships in a social network

where one user follows another. It is commonly used on platforms like
Twitter, Instagram, and LinkedIn to visualize how users are connected
based on who follows whom.
• Key Characteristics of a Following Social Graph
1.Directed Graph: The edges (connections) are one-way, meaning if User A
follows User B, it does not necessarily mean User B follows User A.
2.Asymmetry: Unlike a friendship graph (used in Facebook), following graphs
often have asymmetric relationships.
3.Influencer Identification: Users with a high number of followers are often
influencers or key figures in a network.
4.Information Flow: It shows how information spreads within a network,
such as viral trends or news dissemination.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 95

Social Graphs

• Example
• If the social graph consists of users A, B, and C, and:
• A follows B
• B follows C
• C does not follow anyone

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 96

Daily Quiz
1. It is a program in search tool, that helps the user in searching the information that is interrelated to the specific
topic.
A. Search engine
B. Search directory
C. Search box
D. none of these

2. Which one of the following refers to querying the unstructured textual data?
A. Information access
B. Information update
C.Information retrieval
D. None of these

3. Which of the following is an essential process in which the intelligent methods are applied to extract data
patterns?
A. Warehousing
B.Data Mining
C.Text Mining
4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 97
D.Data Selection
Daily Quiz
4. For what purpose, the analysis tools pre-compute the summaries of the huge amount of data?
A. In order to maintain consistency
B. For authentication
C. For data access
D. To obtain the queries response

5. What are the functions of Data Mining?

A. Association and correctional analysis classification
B. Prediction and characterization
C. Cluster analysis and Evolution analysis
D. All of the above

6. In data mining, how many categories of functions are included?

A .5
B. 4
C. 2
D. 3

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 98

Daily Quiz
7. A data structure that maps terms back to the parts of a document in which they appear is called
a) Lexicon
b) Dictionary
c) Inverted index
d) All of the above

8. How the information retrieval problem can be defined formally?

a) a triple
b) a quadruple
c) a couple
d) None of the above

9. Which of the following is the local method for improving recall of an information retrieval system?
a) Query expansion
b) Relevance feedback
c) Ontology based model
d) None of the above

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 99

Daily Quiz
10. ___________ social network is considered the most popular for business to
business marketing?
a). Facebook
b) .Orkut
c) .Ryze
d). LinkedIn

11. when marketing with social networks is to identify the goals.

a).True
b).False
c).maybe
d).Maybe not

12. What is “social media optimization”?

a). easily creates publicity via social networks
b). Writing clear content
c). Creating short content which is easily indexed
d). create content for social networks hiring people

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 100

Daily Quiz
13. On which social network should you share content most frequently?
(A). Facebook
(B). Pinterest
(C). Twitter
(D). LinkedIn

14. Which is not social media network?

(A). Facebook
(B). Wikipedia
(C). Twitter
(D). LinkedIn

15. The process of removing most common words (and, or, the, etc.) by an
information retrieval system before indexing is known as
a) Lemmatization
b) Stop word removal
c) Inverted indexing
d) Normalization

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 101

Daily Quiz
16. PageRank is a metric for ________documents based on their
quality
A. ranking hypertext
B. ranking document structure
C. ranking web content
D. None of these
17. The main purpose for structure mining is to extract previously
unknown relationships between
A. Web pages
B. Web hyperlinks
C. Web data
D. Web contents
18. Web structure mining is the process of discovering ____
4/27/2025
information from the web
M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 102
Daily Quiz

19.What is the minimum number of spanning tree in a connected graph?

A) 1
B) 2
C) 3
D) none of these

20. What will be the sum of degrees of each vertices for undirected graph G if
it has n vertices and e edges?
A) 2e
B) 2ne
C) ne
D) none of these

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 103

MCQs
1) What is the height of Google page rank a web page contains;
A)500
B) 100
C) 10
D) None of the above

2) Which of the following process is not involved in the data mining process?
A) Data exploration
B) Data transformation
C) Data archaeology
D) Knowledge extraction

3) Which of the following process uses intelligent methods to extract data

patterns?
A) Data mining
B) Text mining
C) Warehousing
D) Data selection

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 104

MCQs
4) What are the chief functions of the data mining process?
A) Prediction and characterization
B) Cluster analysis and evolution analysis
C) Association and correction analysis classification
D) All of the above

5) Data used to build a data mining model.

A) training data
B) hidden data
C) test data
D) validation data

6)Application of machine learning methods to large databases is called__________________

A) big data computing
B) artificial intelligence
C) data mining
D) internet of things

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 105

MCQs
7 ) A person trained to interact with a human expert in order to capture their knowledge.
A) knowledge developer
B) knowledge programmer
C) knowledge engineer
D) knowledge extractor

8) Social network analysis is process of investigating through use of and __

A) Edges, Graph
B) Vector,graph
C) network , Graph
D) Vector, Edges

9) ____________is a cloud-based text and social networks analyzer

A) Cytoscape
B) Gephi
C) Pajek
D) Netlytic

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 106

MCQs
10) Page rank is the method
A ) For rating the importance of web pages objectively and mechanically.
B) Used in google search engine
C) A simple way to count the number of times a web page is citated
D) All of the above

11) In any directed graph if all edges are reciprocal, can have maximum of |E|=
A)1
B)0
C)2
D)None of the above

12) A pair of nodes said to be structurally equivalent to the extent that

A) They occupy identical locations in a network
B) They are connected to exactly the same others
C)They have the identical relations to all outside actors.
D)All of the above

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 107

MCQs
13)Which of the following is the most viral section of the internet?
A) Chat Messenger
B)Social networking sites
C)Tutorial sites
D)Chat-rooms

14) Which of the following is not an appropriate measure for

securing social networking accounts?
A) Strong passwords
B) Link your account with a phone number
C) Never write your password anywhere
4/27/2025
D) Always maintain a soft
M. Abdul copySocial
Mateen Siddiqui ofMedia
allAnalytics
your passwords
Unit 2 in your PC 108
MCQs
16) Increase your security for social media account by always ____________ as you step away from the
system.
A) signing in
B) logging out
C) signing up
D) logging in

17) Which of the following activities is NOT a data mining task?

A)Predicting the future stock price of a company using historical records
B)Monitoring and predicting failures in a hydropower plant
C)Extracting the frequencies of a sound wave
D)Monitoring the heart rate of a patient for abnormalities

18)Social networks are great distribution channel for ___________

A) Customer feedback
B) Viral Content
C) exclusive coupons
4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 109
D) marketing messages
MCQs
19) Which of the following is valuable in increasing page rank?
A) paying for placement
B) static content
C) quantity of links from other highly ranked pages to your site
D) None of Above

20) ________________is cross-platform user friendly tool that allows you to draw social
network
A) VOSViewer
B) Social Network Visualizer
C) Commetrix
D) Cuttlefish

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 110

GLOSSARY QUESTION
1. software system designed to search for information on the WWW 1. web search
2. website that brings people together to talk 2. social network
3. Inverted Index can only be used for 3. Boolean queries
4. includes the concept of social listening 4. Social media analytics
5. creation of knowledge from structured and unstructured sources. 5. Knowledge extraction

1. is an important social process 1. Diffusion of innovation

2. Web Crawler is also called as 2. Web Spider
3. process where intelligent methods are applied to extract data patterns. 3. Data mining
4. Clustering is a common data mining technique 4.Unsupervised
5. Data mining is also known as 5.KDD
6. integral part of every marketing strategy 6. Social media
7. merges the results of two (or more) queries. 7.OR operator
8. intersects the results of two (or more) queries. 8. AND operator
9. type of sites are known as friend-of-a-friend site. 9.Social networking sites

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 111

GLOSSARY QUESTION

1. are organized primarily around people 1. Social networks

2. The size of the graph is 2. |V|= n
3. A directed edge is sometimes called an 3. arc.
4. Edges can have 4. directions..
5. Social media networks have very sparse 5. adjacency matrices
6. In a web graph, “nodes” represent 6.sites
7. allows efficient, full-text searches in the database 7.The inverted index
8. Social media analytics collects and analyzes 8.audience data from
social networks
9. discover and extract information from Web 9. web mining
10. Web mining is used to 10.predict user behavior.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 112

RECAP OF UNIT

• Data Mining: the process of discovering hidden and actionable patterns from
data
• Aggregation – It is performed when multiple features need to be combined into a
single one or when the scale of the features change
• A decision tree is learned from the dataset – (training data with known classes) •
The learned tree is later applied to predict the class attribute value of new data –
(test data with unknown classes) – Only the feature values are known
• A search engine is a software system designed to carry out web searches. The
most productive way to conduct a search on the internet is through a search
engine
• Vector Space Model In the vector space model, we are given a set of documents
D. Each document is a set of words.

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 113

Old Question Paper

M. Abdul Mateen Siddiqui ACSAI0622N SOCIAL MEDIA

4/27/2025 ANALYTICS Unit-1 114
Old Question Paper

M. Abdul Mateen Siddiqui ACSAI0622N SOCIAL MEDIA

4/27/2025 ANALYTICS Unit-1 115
Old Question Paper

M. Abdul Mateen Siddiqui ACSAI0622N SOCIAL MEDIA

4/27/2025 ANALYTICS Unit-1 116
Old Question Paper

M. Abdul Mateen Siddiqui ACSAI0622N SOCIAL MEDIA

4/27/2025 ANALYTICS Unit-1 117
Thank You

4/27/2025 M. Abdul Mateen Siddiqui Social Media Analytics Unit 2 118

Carton Packaging Knowledge
88% (8)
Carton Packaging Knowledge
93 pages
BNN Bootcamp 5 (Combination of Planets Part-3)
100% (3)
BNN Bootcamp 5 (Combination of Planets Part-3)
63 pages
NCOI Annotations Form For Teacher II Applicant
100% (10)
NCOI Annotations Form For Teacher II Applicant
6 pages
Syllabus - CIS 509 Data Mining II (Fall 2019)
No ratings yet
Syllabus - CIS 509 Data Mining II (Fall 2019)
7 pages
Social Media Analytics
No ratings yet
Social Media Analytics
7 pages
Unit 4 Extra Practice KEY
No ratings yet
Unit 4 Extra Practice KEY
4 pages
SMA - Module 3 (B)
No ratings yet
SMA - Module 3 (B)
57 pages
Writing Drills - Answer 2012: A) Exercise 1A: Formal Letter
No ratings yet
Writing Drills - Answer 2012: A) Exercise 1A: Formal Letter
10 pages
U2000 Northbound Performance File Interface Developer Guide (NE-Based)
No ratings yet
U2000 Northbound Performance File Interface Developer Guide (NE-Based)
79 pages
Formulation of Objective
No ratings yet
Formulation of Objective
16 pages
Environmental Law and Jurisprudence
No ratings yet
Environmental Law and Jurisprudence
76 pages
Social Network Analysis
No ratings yet
Social Network Analysis
18 pages
Sentiment Analysis of Social Media Statements
No ratings yet
Sentiment Analysis of Social Media Statements
31 pages
Social Network Analysis
No ratings yet
Social Network Analysis
52 pages
Direction Test-Updated
No ratings yet
Direction Test-Updated
2 pages
Conceptual Development: Low Loss Precast Concrete Frame Buildings With Steel Connections
No ratings yet
Conceptual Development: Low Loss Precast Concrete Frame Buildings With Steel Connections
13 pages
Social Computing
No ratings yet
Social Computing
2 pages
Pruebas y Ajustes r1300G
No ratings yet
Pruebas y Ajustes r1300G
21 pages
815CSE02-Social Network Analysis
No ratings yet
815CSE02-Social Network Analysis
2 pages
Neofiti 1 - Deuteronomio - Translation-English
No ratings yet
Neofiti 1 - Deuteronomio - Translation-English
68 pages
WSMA-Teaching Plan Final 2023-24
No ratings yet
WSMA-Teaching Plan Final 2023-24
5 pages
1 s2.0 S0263224113006519 Main
No ratings yet
1 s2.0 S0263224113006519 Main
11 pages
Data Mining in Social Network
No ratings yet
Data Mining in Social Network
28 pages
Combining Hospitality With Security: Are We Secure Enough?
No ratings yet
Combining Hospitality With Security: Are We Secure Enough?
20 pages
Tiploa LMD Fim 1904 010 1
No ratings yet
Tiploa LMD Fim 1904 010 1
2 pages
AN-IM707 Social Network Analytics PGDM Batch 2020-22
No ratings yet
AN-IM707 Social Network Analytics PGDM Batch 2020-22
6 pages
SMA - Module 6 (B)
No ratings yet
SMA - Module 6 (B)
46 pages
Social Network Analysis
No ratings yet
Social Network Analysis
2 pages
Red Pills
100% (1)
Red Pills
2 pages
Catch Up Friday Research
No ratings yet
Catch Up Friday Research
1 page
Gallup Test
No ratings yet
Gallup Test
25 pages
2012 - 2013 Full Program
No ratings yet
2012 - 2013 Full Program
36 pages
Simple Internet and Ci
No ratings yet
Simple Internet and Ci
4 pages
Lang Aquisition - Emergent Rubric Original All Criteria
No ratings yet
Lang Aquisition - Emergent Rubric Original All Criteria
4 pages
Social Media Analytics
No ratings yet
Social Media Analytics
2 pages
Action Plan in English
No ratings yet
Action Plan in English
4 pages
Sneha SVMCM SC 2023-2024
No ratings yet
Sneha SVMCM SC 2023-2024
2 pages
Alligation & Mixture Updated
No ratings yet
Alligation & Mixture Updated
3 pages
Appendix-74
No ratings yet
Appendix-74
42 pages
Belt Conveyor (V1)
No ratings yet
Belt Conveyor (V1)
45 pages
Ratio, Proportion, Partnership-Updated
No ratings yet
Ratio, Proportion, Partnership-Updated
3 pages
Effective Speculation That Provides Security in Social Network
No ratings yet
Effective Speculation That Provides Security in Social Network
6 pages
ACSE0202
No ratings yet
ACSE0202
5 pages
ASTMA Assingments 231108 113739
No ratings yet
ASTMA Assingments 231108 113739
2 pages
BOQs 444
No ratings yet
BOQs 444
33 pages
CS8085 Social Network Analysis Ripped From Amazon Kindle e Books CS8085 Social Network Analysis Ripped From Amazon Kindle e Books
No ratings yet
CS8085 Social Network Analysis Ripped From Amazon Kindle e Books CS8085 Social Network Analysis Ripped From Amazon Kindle e Books
117 pages
Social Network Analytics Session1
No ratings yet
Social Network Analytics Session1
35 pages
A Circular-Economy-Retrospective
No ratings yet
A Circular-Economy-Retrospective
16 pages
DS SEM 8 Curriculum
No ratings yet
DS SEM 8 Curriculum
3 pages
Sen 935
No ratings yet
Sen 935
4 pages
SMA Manual
No ratings yet
SMA Manual
45 pages
Term 2 Basic 3 Week 3 Lesson Plan
No ratings yet
Term 2 Basic 3 Week 3 Lesson Plan
20 pages
Social Media Analytics
No ratings yet
Social Media Analytics
2 pages
SMW Qun Bank 2ND
No ratings yet
SMW Qun Bank 2ND
3 pages
Geuself Module 3 Solo PDF March 2024
No ratings yet
Geuself Module 3 Solo PDF March 2024
8 pages
TYDS SMA Question Bank
No ratings yet
TYDS SMA Question Bank
2 pages
Syllabus: Social Network Analysis - CS8085
No ratings yet
Syllabus: Social Network Analysis - CS8085
1 page
Course Plan For Web Mining
No ratings yet
Course Plan For Web Mining
8 pages
Sma Process
No ratings yet
Sma Process
14 pages
Unit4 - M. Abdul Mateen
No ratings yet
Unit4 - M. Abdul Mateen
141 pages
Unit3 - M. Abdul Mateen
No ratings yet
Unit3 - M. Abdul Mateen
93 pages
In Mathematics Facts and Concepts
No ratings yet
In Mathematics Facts and Concepts
1 page
AIDS Syllabus 218 220
No ratings yet
AIDS Syllabus 218 220
3 pages
Unit 4 - SMA - PPT
No ratings yet
Unit 4 - SMA - PPT
134 pages
Unit 3 - SMA - PPT
No ratings yet
Unit 3 - SMA - PPT
89 pages
Work With Colleagues and Customers
No ratings yet
Work With Colleagues and Customers
35 pages
Unit 1 Data Analytics
No ratings yet
Unit 1 Data Analytics
81 pages
SMA Syllabus2025
No ratings yet
SMA Syllabus2025
5 pages
Web and Social Media Analytics
No ratings yet
Web and Social Media Analytics
9 pages
List of Companies
No ratings yet
List of Companies
1 page
Syllabus - Social, Web and Mobile Analytics
No ratings yet
Syllabus - Social, Web and Mobile Analytics
7 pages
SWA Unit 3
No ratings yet
SWA Unit 3
25 pages
19ECB455 Syllabus
No ratings yet
19ECB455 Syllabus
3 pages
Unit1 Social Media Analytics
No ratings yet
Unit1 Social Media Analytics
94 pages
Autonomy - SEM VIII - Major-Minor-DataScience
No ratings yet
Autonomy - SEM VIII - Major-Minor-DataScience
4 pages
Unit 4 - SMA - PPT
No ratings yet
Unit 4 - SMA - PPT
134 pages
SMA Unit 2
No ratings yet
SMA Unit 2
91 pages
CIS600 Prin SMDM Spring 2025
No ratings yet
CIS600 Prin SMDM Spring 2025
7 pages
Unit5 Sma
No ratings yet
Unit5 Sma
94 pages
Unit 1 - SMA-1
No ratings yet
Unit 1 - SMA-1
97 pages
WIP - Tentative-Social Media Analytics-Course Handout 2023-24 - v1
No ratings yet
WIP - Tentative-Social Media Analytics-Course Handout 2023-24 - v1
5 pages
WDM - 116AT02 Syllabus (GTURanker - Org)
No ratings yet
WDM - 116AT02 Syllabus (GTURanker - Org)
3 pages
OOPS Lab File
No ratings yet
OOPS Lab File
60 pages
Atma Qbank CSBS
No ratings yet
Atma Qbank CSBS
8 pages
Ibda Course File
No ratings yet
Ibda Course File
33 pages
B.Tech. CSE (Arttificial Intelligence & Machine Learning) Syllabus 3rd Year 2024-25 (1) - Removed
No ratings yet
B.Tech. CSE (Arttificial Intelligence & Machine Learning) Syllabus 3rd Year 2024-25 (1) - Removed
3 pages
Sma Syllabus
No ratings yet
Sma Syllabus
3 pages
COP. JULY 2024: Questions (MCQ'S) & Subjective Type Questions
No ratings yet
COP. JULY 2024: Questions (MCQ'S) & Subjective Type Questions
4 pages
SNS
No ratings yet
SNS
3 pages
Pyq 3
No ratings yet
Pyq 3
4 pages
DM - Unit 2
No ratings yet
DM - Unit 2
122 pages
DM - Unit 3
No ratings yet
DM - Unit 3
113 pages
DM - Unit 1
No ratings yet
DM - Unit 1
76 pages
Social Media Analytics, Video Analytics, Data Management For ML
No ratings yet
Social Media Analytics, Video Analytics, Data Management For ML
3 pages
Ku Rs Merk Blatt
No ratings yet
Ku Rs Merk Blatt
7 pages
Mod-5 Bda Super Imp
No ratings yet
Mod-5 Bda Super Imp
22 pages
STM
No ratings yet
STM
2 pages
All OE-II Subject - Syllabus Dept - Wise With Index
No ratings yet
All OE-II Subject - Syllabus Dept - Wise With Index
55 pages
manual-KVL-c304i (D1) Öá W0208
No ratings yet
manual-KVL-c304i (D1) Öá W0208
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.