0% found this document useful (0 votes)

14 views23 pages

Screenshot 2024-06-04 at 12.02.17 AM

Chapter 3 of the Business Intelligence and Analytics textbook discusses text analytics and text mining, emphasizing the importance of extracting knowledge from unstructured data. It differentiates between text mining, web mining, and data mining, outlines the text mining process, and highlights various applications in fields such as law, finance, and medicine. The chapter also covers natural language processing (NLP) and its challenges, as well as tools available for text mining.

Uploaded by

54saleh53

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views23 pages

Screenshot 2024-06-04 at 12.02.17 AM

Uploaded by

54saleh53

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Business Intelligence and Analytics:

Systems for Decision Support

Global Edition
(10th Edition)

Chapter 3:
Text Analytics, Text Mining
Learning Objectives
n Describe text mining and understand the need
for text mining
n Differentiate between text mining, Web mining,
and data mining
n Understand the different application areas for
text mining
n Know the process of carrying out a text mining
project
n Understand the different methods to introduce
structure to text-based data
(Continued…)
7-2 © Pearson Education Limited 2014
Text Mining Concepts
n 85-90 percent of all corporate data is in some
kind of unstructured form (e.g., text)
n Unstructured corporate data is doubling in size
every 18 months
n Tapping into these information sources is not an
option, but a need to stay competitive
n Answer: text mining
n A semi-automated process of extracting knowledge
from unstructured data sources ‫نظمة‬%‫من مصادر البيانات غير ا‬
‫عرفة‬%‫ستخراج ا‬+ ‫عملية نصف آلية‬

n a.k.a. text data mining or knowledge discovery in

textual databases

7-3 © Pearson Education Limited 2014

Text Analytics and Text Mining
TEXT ANALYTICS

Text Mining
Information
Web Mining
Retrieval

Information
Data Mining
Extraction

Natural Language Processing Linguistic Machine Learning

Computer Science Statistics Management Science Artificial Intelligence

7-4 © Pearson Education Limited 2014

Data Mining versus Text Mining
n Both seek for novel and useful patterns
n Both are semi-automated processes
n Difference is the nature of the data:
n Structured versus unstructured data
n Structured data: in databases
n Unstructured data: Word documents, PDF
files, text excerpts, XML files, and so on
n Text mining – first, impose structure to
the data, then mine the structured data.
7-5 © Pearson Education Limited 2014
Text Mining Concepts
n Benefits of text mining are obvious, especially in
text-rich data environments
n e.g., law (court orders), academic research (research
articles), finance (quarterly reports), medicine
(discharge summaries), biology (molecular
interactions), technology (patent files), marketing
(customer comments), etc.
n Electronic communication records (e.g., Email)
n Spam filtering
n Email prioritization and categorization
n Automatic response generation
7-6 © Pearson Education Limited 2014
Text Mining Application Area
n Information extraction
n Topic tracking
n Summarization
n Categorization
n Clustering
n Concept linking
n Question answering

7-7 © Pearson Education Limited 2014

Natural Language Processing
(NLP)
n Structuring a collection of text
n Old approach: bag-of-words
n New approach: natural language processing
n NLP is …
n a very important concept in text mining
n a subfield of artificial intelligence and computational
linguistics
n the studies of "understanding" the natural human
language
n Syntax versus semantics-based text mining
7-8 © Pearson Education Limited 2014
Natural Language Processing
(NLP)
n What is “Understanding” ?
n Human understands, what about
computers?
n Natural language is vague, context driven
n True understanding requires extensive
knowledge of a topic

n Can/will computers ever understand natural

language the same/accurate way we do?
7-9 © Pearson Education Limited 2014
Natural Language Processing
(NLP)
n Challenges in NLP
n Part-of-speech tagging
n Text segmentation
n Word sense disambiguation
n Syntax ambiguity
n Imperfect or irregular input
n Speech acts

n Dream of AI community
n to have algorithms that are capable of automatically
reading and obtaining knowledge from text
7-10 © Pearson Education Limited 2014
Natural Language Processing
(NLP)
n WordNet
n A laboriously hand-coded database of English words,
their definitions, sets of synonyms, and various
semantic relations between synonym sets.
n A major resource for NLP.
n Need automation to be completed.
n Sentiment Analysis
n A technique used to detect favorable and unfavorable
opinions toward specific products and services
n SentiWordNet

7-11 © Pearson Education Limited 2014

NLP Task Categories
n Information retrieval, information extraction
n Named-entity recognition
n Question answering
n Automatic summarization
n Natural language generation & understanding
n Machine translation
n Foreign language reading & writing
n Speech recognition
n Text proofing, optical character recognition
7-12 © Pearson Education Limited 2014
Text Mining Applications
n Marketing applications
n Enables better CRM
n Security applications
n ECHELON, OASIS
n Deception detection (…)
n Medicine and biology
n Literature-based gene identification (…)
n Academic applications
n Research stream analysis
7-13 © Pearson Education Limited 2014
Text Mining Process
Context diagram for Software/hardware limitations

the text mining Privacy issues

Linguistic limitations
process

Unstructured data (text) Extract Context-specific knowledge

knowledge
from available
Structured data (databases) data sources
A0

Domain expertise
Tools and techniques

7-14 © Pearson Education Limited 2014

Text Mining Process
Task 1 Task 2 Task 3

Establish the Corpus: Create the Term- Extract Knowledge:

Collect & Organize the Document Matrix: Discover Novel
Domain Specific Introduce Structure Patterns from the
Unstructured Data to the Corpus T-D Matrix

Feedback Feedback

The inputs to the process The output of the Task 1 is a The output of the Task 2 is a The output of Task 3 is a
includes a variety of relevant collection of documents in flat file called term-document number of problem specific
unstructured (and semi- some digitized format for matrix where the cells are classification, association,
structured) data sources such computer processing populated with the term clustering models and
as text, XML, HTML, etc. frequencies visualizations

The three-step text mining process

Text Mining Process
n Step 1: Establish the corpus
n Collect all relevant unstructured data
(e.g., textual documents, XML files, emails,
Web pages, short notes, voice recordings…)
n Digitize, standardize the collection
(e.g., all in ASCII text files)
n Place the collection in a common place
(e.g., in a flat file, or in a directory as
separate files)

Text Mining Process
n Step 2: Create the Term-by-Document
Matrix (TDM)
en
t ng
m e eri
Terms sk ge gin
t ri an
a e n en
t
en m are m
stm je ct ftw lop
e ve P
Documents inv pro so de SA ...
Document 1 1 1

Document 2 1

Document 3 3 1

Document 4 1

Document 5 2 1

Document 6 1 1
...

Text Mining Process
n Step 2: Create the Term-by-Document
Matrix (TDM)
n Should all terms be included?
n Stop words, include words
n Synonyms, homonyms
n Stemming
n What is the best representation of the indices
(values in cells)?
n Row counts; binary frequencies; log frequencies;
n Inverse document frequency
7-18 © Pearson Education Limited 2014
the best representation of the
indices (values in cells)
n Binary Frequencies: f(wf) = 1 if wf >0 otherwise f(wf) = 0.
n Log Frequencies
n f(wf) = 1+log(wf) if wf >0 otherwise f(wf) = 0.

n Inverse Document Frequencies (idf)

0 #$ %$!" = 0
n idf(i,j) = !
(1 + log(%$!" ))log(./0$! ) #$ %$!" ≥ 1
Where:
i: ith word, j:jth document.
N: Total number of documents.
Wfi,j : Frequency of word i in document j
!"! : $ℎ& '()*&+ ," !,-(&)'./ .ℎ0. 1'-2(!& .ℎ1/ 3,+!
7-19 © Pearson Education Limited 2014
Text Mining Process
n Step 2: Create the Term–by–Document
Matrix (TDM)
n TDM is a sparse matrix. How can we reduce
the dimensionality of the TDM?
n Manual - a domain expert goes through it
n Eliminate terms with very few occurrences in very
few documents (?)
n Transform the matrix using singular value
decomposition (SVD)
n SVD is similar to principle component analysis

Text Mining Process
n Step 3: Extract patterns/knowledge
n Classification (text categorization)
n Clustering (natural groupings of text)
n Improve search recall
n Improve search precision
n Scatter/gather
n Query-specific clustering
n Association
n Trend Analysis (…)

Text Mining Tools
n Commercial Software Tools
n IBM SPSS Modler - Text Miner
n SAS Enterprise Miner – Text Miner
n Statistical Data Miner – Text Miner
n ClearForest, …
n Free Software Tools
n RapidMiner
n GATE
n Spy-EM, …
7-22 © Pearson Education Limited 2014

Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
64 pages
l4 TP Slides Text Processing
No ratings yet
l4 TP Slides Text Processing
230 pages
Chapter 7 - Text Mining, Sentiment Analysis, and Social Analytics
No ratings yet
Chapter 7 - Text Mining, Sentiment Analysis, and Social Analytics
91 pages
Turban Dss9e Ch07
No ratings yet
Turban Dss9e Ch07
45 pages
Text Mining
No ratings yet
Text Mining
85 pages
Decision Support and Business Intelligence Systems (9 Ed., Prentice Hall) Text and Web Mining
100% (1)
Decision Support and Business Intelligence Systems (9 Ed., Prentice Hall) Text and Web Mining
45 pages
Simad University: Chapter 7: Text and Web Mining
No ratings yet
Simad University: Chapter 7: Text and Web Mining
6 pages
3510-6510 Ch5
No ratings yet
3510-6510 Ch5
73 pages
Text Mining Introduction
No ratings yet
Text Mining Introduction
6 pages
Introduction To Text Mining
No ratings yet
Introduction To Text Mining
82 pages
Lecture 5 - Text Mining Sentiment and Social Media Analytics
No ratings yet
Lecture 5 - Text Mining Sentiment and Social Media Analytics
52 pages
Chapter 07 - in Class
No ratings yet
Chapter 07 - in Class
49 pages
Text Mining: Tools, Techniques, and Applications
No ratings yet
Text Mining: Tools, Techniques, and Applications
19 pages
Chapter 03 - Sharda 11e Full Accessible PPT 07
No ratings yet
Chapter 03 - Sharda 11e Full Accessible PPT 07
29 pages
Lecture 6-Text Mining and Sentiment Analysis
No ratings yet
Lecture 6-Text Mining and Sentiment Analysis
57 pages
7 - Text Analytics Text Mining and Sentiment Analysis
100% (2)
7 - Text Analytics Text Mining and Sentiment Analysis
53 pages
08-Text Mining
No ratings yet
08-Text Mining
38 pages
10 - Session 10 - Text Analytics, Text Mining and Sentiment Analysis
No ratings yet
10 - Session 10 - Text Analytics, Text Mining and Sentiment Analysis
36 pages
UNIT - 1 Text Mining
No ratings yet
UNIT - 1 Text Mining
18 pages
AFM - Module 4
No ratings yet
AFM - Module 4
48 pages
Module 1 Part1
No ratings yet
Module 1 Part1
54 pages
Section 2 Text Analytics and Text Mining Overview
No ratings yet
Section 2 Text Analytics and Text Mining Overview
47 pages
Unit I - Text Mining
No ratings yet
Unit I - Text Mining
48 pages
BI Module 5
No ratings yet
BI Module 5
11 pages
Text and Web Analytics
No ratings yet
Text and Web Analytics
48 pages
Seven Text Mining Techniques
No ratings yet
Seven Text Mining Techniques
21 pages
Turban Dss9e Ch07
No ratings yet
Turban Dss9e Ch07
45 pages
DS Finalexam (Thxtoshravani)
No ratings yet
DS Finalexam (Thxtoshravani)
31 pages
Text Mining
No ratings yet
Text Mining
10 pages
Jo (2019) - Text Mining
No ratings yet
Jo (2019) - Text Mining
376 pages
Datamining 1
No ratings yet
Datamining 1
11 pages
Text Mining
No ratings yet
Text Mining
18 pages
CH 06 PPTaccessible
No ratings yet
CH 06 PPTaccessible
71 pages
Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
63 pages
IMTC634 - Data Science - Chapter 7
No ratings yet
IMTC634 - Data Science - Chapter 7
24 pages
Text Mining
No ratings yet
Text Mining
12 pages
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
No ratings yet
WINSEM2023-24 BCSE206L TH VL2023240501787 2024-02-19 Reference-Material-I
42 pages
Text Mining
No ratings yet
Text Mining
16 pages
Object Oriented Programming Using C++ Second Year Sem II: Two Marks Questions
100% (2)
Object Oriented Programming Using C++ Second Year Sem II: Two Marks Questions
6 pages
Case Study On Text Mining
No ratings yet
Case Study On Text Mining
8 pages
Text Mining
No ratings yet
Text Mining
25 pages
DMTerm Paper
No ratings yet
DMTerm Paper
4 pages
What Is Text Mining
No ratings yet
What Is Text Mining
9 pages
Business Intelligence and Data Mining: by Dr. Atanu Rakshit Email: Atanu - Rakshit@iimrohtak - Ac.in
No ratings yet
Business Intelligence and Data Mining: by Dr. Atanu Rakshit Email: Atanu - Rakshit@iimrohtak - Ac.in
122 pages
Text Mining
No ratings yet
Text Mining
3 pages
Chapter 5 Predictive Analytics II Text J Web J and Social Media Analytics
No ratings yet
Chapter 5 Predictive Analytics II Text J Web J and Social Media Analytics
5 pages
Dept. of ISE, Acit 1
No ratings yet
Dept. of ISE, Acit 1
12 pages
Business Intelligence, Analytics, and Data Science: A Managerial Perspective
No ratings yet
Business Intelligence, Analytics, and Data Science: A Managerial Perspective
73 pages
Method Section-Seminar Paper
No ratings yet
Method Section-Seminar Paper
6 pages
Effective Classification of Text
No ratings yet
Effective Classification of Text
6 pages
1-What Is Text Mining - IBM
No ratings yet
1-What Is Text Mining - IBM
5 pages
Text Mining: 2 History
No ratings yet
Text Mining: 2 History
8 pages
Module 4
No ratings yet
Module 4
63 pages
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
No ratings yet
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
11 pages
Text and Web Mining
No ratings yet
Text and Web Mining
44 pages
Text Mining: A Burgeoning Technology For Knowledge Extraction
100% (1)
Text Mining: A Burgeoning Technology For Knowledge Extraction
5 pages
Text Mining Assignment
No ratings yet
Text Mining Assignment
12 pages
Module5 - Identity and Access Management
No ratings yet
Module5 - Identity and Access Management
84 pages
Text Analytics and Text Mining Overview
No ratings yet
Text Analytics and Text Mining Overview
16 pages
3G Alarm Handling
100% (4)
3G Alarm Handling
39 pages
Chip Design: Professor: Sci.D., Professor Vazgen Melikyan
No ratings yet
Chip Design: Professor: Sci.D., Professor Vazgen Melikyan
43 pages
Understanding Vmware Products and Solutions Slides
No ratings yet
Understanding Vmware Products and Solutions Slides
27 pages
Using Arduino With Matlab and Simulink PDF
No ratings yet
Using Arduino With Matlab and Simulink PDF
15 pages
CHAPTER 2 - Cyber Security and Laws
No ratings yet
CHAPTER 2 - Cyber Security and Laws
66 pages
Security Mechanisms:-: Encipherment
No ratings yet
Security Mechanisms:-: Encipherment
64 pages
Shubham Chhimpa Resume April2023 Without Number
100% (1)
Shubham Chhimpa Resume April2023 Without Number
1 page
OpenShift Container Platform 4.17 Disconnected Environments
No ratings yet
OpenShift Container Platform 4.17 Disconnected Environments
165 pages
Cloud Computing Mid Term
No ratings yet
Cloud Computing Mid Term
2 pages
Intelligent Platform Management Interface Firmware, Upgrade: Operational Instruction
No ratings yet
Intelligent Platform Management Interface Firmware, Upgrade: Operational Instruction
23 pages
CC MCQ Unit-3
No ratings yet
CC MCQ Unit-3
3 pages
MSI MS-1795 User Manual
No ratings yet
MSI MS-1795 User Manual
58 pages
Programming and Data Structure-Ii Lab Manual
No ratings yet
Programming and Data Structure-Ii Lab Manual
164 pages
Chapter 01
No ratings yet
Chapter 01
23 pages
'Computer Project' On Topic Google Apps
No ratings yet
'Computer Project' On Topic Google Apps
9 pages
Nottingham University Business School Undergraduate Programmes
No ratings yet
Nottingham University Business School Undergraduate Programmes
12 pages
Dan Rodney's List of Mac OS X Keyboard Shortcuts & Keystrokes
No ratings yet
Dan Rodney's List of Mac OS X Keyboard Shortcuts & Keystrokes
4 pages
Timetable Distribution System: Department of Computer Science & Information Technology
No ratings yet
Timetable Distribution System: Department of Computer Science & Information Technology
17 pages
8 GHZ To 16 GHZ, 4-Channel, X Band and Ku Band Beamformer: Adar1000
No ratings yet
8 GHZ To 16 GHZ, 4-Channel, X Band and Ku Band Beamformer: Adar1000
65 pages
Lab Task 2 New Sesi 2
No ratings yet
Lab Task 2 New Sesi 2
2 pages
Computer Fundamental MCQ Questions and Answers-Technical Aptitude
No ratings yet
Computer Fundamental MCQ Questions and Answers-Technical Aptitude
5 pages
Apple Versus Corellium Amended Filing
No ratings yet
Apple Versus Corellium Amended Filing
28 pages
Tours Csharp Project Proposal
No ratings yet
Tours Csharp Project Proposal
2 pages
A Survey On Context-Aware Systems PDF
No ratings yet
A Survey On Context-Aware Systems PDF
16 pages
Web Development Policies and Procedures
No ratings yet
Web Development Policies and Procedures
13 pages
Ge2155 Set 4
No ratings yet
Ge2155 Set 4
7 pages
Application Handling of Database Timeouts and Deadlocks - Aleksey Shevchenko
No ratings yet
Application Handling of Database Timeouts and Deadlocks - Aleksey Shevchenko
4 pages
Sitev 3
No ratings yet
Sitev 3
7 pages
Mastering Data Mining with Python – Find patterns hidden in your data
From Everand
Mastering Data Mining with Python – Find patterns hidden in your data
Megan Squire
No ratings yet
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Screenshot 2024-06-04 at 12.02.17 AM

Uploaded by

Screenshot 2024-06-04 at 12.02.17 AM

Uploaded by

Business Intelligence and Analytics:

Systems for Decision Support

n a.k.a. text data mining or knowledge discovery in

7-3 © Pearson Education Limited 2014

Natural Language Processing Linguistic Machine Learning

Computer Science Statistics Management Science Artificial Intelligence

7-4 © Pearson Education Limited 2014

7-7 © Pearson Education Limited 2014

n Can/will computers ever understand natural

7-11 © Pearson Education Limited 2014

the text mining Privacy issues

Unstructured data (text) Extract Context-specific knowledge

7-14 © Pearson Education Limited 2014

Establish the Corpus: Create the Term- Extract Knowledge:

The three-step text mining process

7-15 © Pearson Education Limited 2014

7-16 © Pearson Education Limited 2014

7-17 © Pearson Education Limited 2014

n Inverse Document Frequencies (idf)

7-20 © Pearson Education Limited 2014

7-21 © Pearson Education Limited 2014

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.