0% found this document useful (0 votes)

95 views9 pages

Data Mining and Analysis: Fundamental Concepts and Algorithms

Uploaded by

nahum espinoza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views9 pages

Data Mining and Analysis: Fundamental Concepts and Algorithms

Uploaded by

nahum espinoza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

DATA MINING

AND ANALYSIS
Fundamental Concepts and Algorithms
MOHAMMED J. ZAKI
Rensselaer Polytechnic Institute, Troy, New York

WAGNER MEIRA JR.

Universidade Federal de Minas Gerais, Brazil
32 Avenue of the Americas, New York, NY 10013-2473, USA

Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit of

education, learning, and research at the highest international levels of excellence.

www.cambridge.org
Information on this title: www.cambridge.org/9780521766333

Copyright Mohammed J. Zaki and Wagner Meira Jr. 2014

This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.

First published 2014

A catalog record for this publication is available from the British Library.

Library of Congress Cataloging in Publication Data

Zaki, Mohammed J., 1971–
Data mining and analysis: fundamental concepts and algorithms / Mohammed J. Zaki,
Rensselaer Polytechnic Institute, Troy, New York, Wagner Meira Jr.,
Universidade Federal de Minas Gerais, Brazil.
pages cm
Includes bibliographical references and index.
ISBN 978-0-521-76633-3 (hardback)
1. Data mining. I. Meira, Wagner, 1967– II. Title.
QA76.9.D343Z36 2014
006.3′ 12–dc23 2013037544

ISBN 978-0-521-76633-3 Hardback

Cambridge University Press has no responsibility for the persistence or accuracy of

URLs for external or third-party Internet Web sites referred to in this publication
and does not guarantee that any content on such Web sites is, or will remain,
accurate or appropriate.
Contents

Contents iii
Preface vii

1 Data Mining and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Data Matrix 1
1.2 Attributes 3
1.3 Data: Algebraic and Geometric View 4
1.4 Data: Probabilistic View 14
1.5 Data Mining 25
1.6 Further Reading 30
1.7 Exercises 30

PART I DATA ANALYSIS FOUNDATIONS 31

2 Numeric Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1 Univariate Analysis 33
2.2 Bivariate Analysis 42
2.3 Multivariate Analysis 48
2.4 Data Normalization 52
2.5 Normal Distribution 54
2.6 Further Reading 60
2.7 Exercises 60

3 Categorical Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1 Univariate Analysis 63
3.2 Bivariate Analysis 72
3.3 Multivariate Analysis 82
3.4 Distance and Angle 87
3.5 Discretization 89
3.6 Further Reading 91
3.7 Exercises 91

4 Graph Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.1 Graph Concepts 93

iii
iv Contents

4.2 Topological Attributes 97

4.3 Centrality Analysis 102
4.4 Graph Models 112
4.5 Further Reading 132
4.6 Exercises 132

5 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.1 Kernel Matrix 138
5.2 Vector Kernels 144
5.3 Basic Kernel Operations in Feature Space 148
5.4 Kernels for Complex Objects 154
5.5 Further Reading 161
5.6 Exercises 161

6 High-dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

6.1 High-dimensional Objects 163
6.2 High-dimensional Volumes 165
6.3 Hypersphere Inscribed within Hypercube 168
6.4 Volume of Thin Hypersphere Shell 169
6.5 Diagonals in Hyperspace 171
6.6 Density of the Multivariate Normal 172
6.7 Appendix: Derivation of Hypersphere Volume 175
6.8 Further Reading 180
6.9 Exercises 180

7 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 183

7.1 Background 183
7.2 Principal Component Analysis 187
7.3 Kernel Principal Component Analysis 202
7.4 Singular Value Decomposition 208
7.5 Further Reading 213
7.6 Exercises 214

PART II FREQUENT PATTERN MINING 215

8 Itemset Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

8.1 Frequent Itemsets and Association Rules 217
8.2 Itemset Mining Algorithms 221
8.3 Generating Association Rules 234
8.4 Further Reading 236
8.5 Exercises 237

9 Summarizing Itemsets . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

9.1 Maximal and Closed Frequent Itemsets 242
9.2 Mining Maximal Frequent Itemsets: GenMax Algorithm 245
9.3 Mining Closed Frequent Itemsets: Charm Algorithm 248
9.4 Nonderivable Itemsets 250
9.5 Further Reading 256
9.6 Exercises 256
Contents v

10 Sequence Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

10.1 Frequent Sequences 259
10.2 Mining Frequent Sequences 260
10.3 Substring Mining via Suffix Trees 267
10.4 Further Reading 277
10.5 Exercises 277

11 Graph Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

11.1 Isomorphism and Support 280
11.2 Candidate Generation 284
11.3 The gSpan Algorithm 288
11.4 Further Reading 296
11.5 Exercises 297

12 Pattern and Rule Assessment . . . . . . . . . . . . . . . . . . . . . . . . 301

12.1 Rule and Pattern Assessment Measures 301
12.2 Significance Testing and Confidence Intervals 316
12.3 Further Reading 328
12.4 Exercises 328

PART III CLUSTERING 331

13 Representative-based Clustering . . . . . . . . . . . . . . . . . . . . . . 333

13.1 K-means Algorithm 333
13.2 Kernel K-means 338
13.3 Expectation-Maximization Clustering 342
13.4 Further Reading 360
13.5 Exercises 361

14 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

14.1 Preliminaries 364
14.2 Agglomerative Hierarchical Clustering 366
14.3 Further Reading 372
14.4 Exercises 373

15 Density-based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 375

15.1 The DBSCAN Algorithm 375
15.2 Kernel Density Estimation 379
15.3 Density-based Clustering: DENCLUE 385
15.4 Further Reading 390
15.5 Exercises 391

16 Spectral and Graph Clustering . . . . . . . . . . . . . . . . . . . . . . . 394

16.1 Graphs and Matrices 394
16.2 Clustering as Graph Cuts 401
16.3 Markov Clustering 416
16.4 Further Reading 422
16.5 Exercises 423
vi Contents

17 Clustering Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425

17.1 External Measures 425
17.2 Internal Measures 440
17.3 Relative Measures 448
17.4 Further Reading 461
17.5 Exercises 462

PART IV CLASSIFICATION 464

18 Probabilistic Classification . . . . . . . . . . . . . . . . . . . . . . . . . 466
18.1 Bayes Classifier 466
18.2 Naive Bayes Classifier 472
18.3 K Nearest Neighbors Classifier 476
18.4 Further Reading 478
18.5 Exercises 478

19 Decision Tree Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 480

19.1 Decision Trees 482
19.2 Decision Tree Algorithm 484
19.3 Further Reading 495
19.4 Exercises 495

20 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . 497

20.1 Optimal Linear Discriminant 497
20.2 Kernel Discriminant Analysis 504
20.3 Further Reading 510
20.4 Exercises 511

21 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 513

21.1 Support Vectors and Margins 513
21.2 SVM: Linear and Separable Case 519
21.3 Soft Margin SVM: Linear and Nonseparable Case 523
21.4 Kernel SVM: Nonlinear Case 529
21.5 SVM Training Algorithms 533
21.6 Further Reading 544
21.7 Exercises 545

22 Classification Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . 547

22.1 Classification Performance Measures 547
22.2 Classifier Evaluation 561
22.3 Bias-Variance Decomposition 571
22.4 Further Reading 580
22.5 Exercises 581

Index 585
Preface

This book is an outgrowth of data mining courses at Rensselaer Polytechnic Institute

(RPI) and Universidade Federal de Minas Gerais (UFMG); the RPI course has been
offered every Fall since 1998, whereas the UFMG course has been offered since
2002. Although there are several good books on data mining and related topics, we
felt that many of them are either too high-level or too advanced. Our goal was to
write an introductory text that focuses on the fundamental algorithms in data mining
and analysis. It lays the mathematical foundations for the core data mining methods,
with key concepts explained when first encountered; the book also tries to build the
intuition behind the formulas to aid understanding.
The main parts of the book include exploratory data analysis, frequent pattern
mining, clustering, and classification. The book lays the basic foundations of these
tasks, and it also covers cutting-edge topics such as kernel methods, high-dimensional
data analysis, and complex graphs and networks. It integrates concepts from related
disciplines such as machine learning and statistics and is also ideal for a course on data
analysis. Most of the prerequisite material is covered in the text, especially on linear
algebra, and probability and statistics.
The book includes many examples to illustrate the main technical concepts. It also
has end-of-chapter exercises, which have been used in class. All of the algorithms in the
book have been implemented by the authors. We suggest that readers use their favorite
data analysis and mining software to work through our examples and to implement the
algorithms we describe in text; we recommend the R software or the Python language
with its NumPy package. The datasets used and other supplementary material such
as project ideas and slides are available online at the book’s companion site and its
mirrors at RPI and UFMG:

• http://dataminingbook.info
• http://www.cs.rpi.edu/~ zaki/dataminingbook
• http://www.dcc.ufmg.br/dataminingbook

Having understood the basic principles and algorithms in data mining and data
analysis, readers will be well equipped to develop their own methods or use more
advanced techniques.

vii
viii Preface

2 3

14 6 7 15 5 4 19 18 8

13 16 20 21 11 9 10

17 22 12
Figure 0.1. Chapter dependencies

Suggested Roadmaps
The chapter dependency graph is shown in Figure 0.1. We suggest some typical
roadmaps for courses and readings based on this book. For an undergraduate-level
course, we suggest the following chapters: 1–3, 8, 10, 12–15, 17–19, and 21–22. For an
undergraduate course without exploratory data analysis, we recommend Chapters 1,
8–15, 17–19, and 21–22. For a graduate course, one possibility is to quickly go over the
material in Part I or to assume it as background reading and to directly cover Chapters
9–22; the other parts of the book, namely frequent pattern mining (Part II), clustering
(Part III), and classification (Part IV), can be covered in any order. For a course on
data analysis the chapters covered must include 1–7, 13–14, 15 (Section 2), and 20.
Finally, for a course with an emphasis on graphs and kernels we suggest Chapters 4, 5,
7 (Sections 1–3), 11–12, 13 (Sections 1–2), 16–17, and 20–22.

Acknowledgments
Initial drafts of this book have been used in several data mining courses. We received
many valuable comments and corrections from both the faculty and students. Our
thanks go to

• Muhammad Abulaish, Jamia Millia Islamia, India

• Mohammad Al Hasan, Indiana University Purdue University at Indianapolis
• Marcio Luiz Bunte de Carvalho, Universidade Federal de Minas Gerais, Brazil
• Loı̈c Cerf, Universidade Federal de Minas Gerais, Brazil
• Ayhan Demiriz, Sakarya University, Turkey
• Murat Dundar, Indiana University Purdue University at Indianapolis
• Jun Luke Huan, University of Kansas
• Ruoming Jin, Kent State University
• Latifur Khan, University of Texas, Dallas
Preface ix

• Pauli Miettinen, Max-Planck-Institut für Informatik, Germany

• Suat Ozdemir, Gazi University, Turkey
• Naren Ramakrishnan, Virginia Polytechnic and State University
• Leonardo Chaves Dutra da Rocha, Universidade Federal de São João del-Rei, Brazil
• Saeed Salem, North Dakota State University
• Ankur Teredesai, University of Washington, Tacoma
• Hannu Toivonen, University of Helsinki, Finland
• Adriano Alonso Veloso, Universidade Federal de Minas Gerais, Brazil
• Jason T.L. Wang, New Jersey Institute of Technology
• Jianyong Wang, Tsinghua University, China
• Jiong Yang, Case Western Reserve University
• Jieping Ye, Arizona State University

We would like to thank all the students enrolled in our data mining courses at RPI
and UFMG, as well as the anonymous reviewers who provided technical comments
on various chapters. We appreciate the collegial and supportive environment within
the computer science departments at RPI and UFMG and at the Qatar Computing
Research Institute. In addition, we thank NSF, CNPq, CAPES, FAPEMIG, Inweb –
the National Institute of Science and Technology for the Web, and Brazil’s Science
without Borders program for their support. We thank Lauren Cowles, our editor at
Cambridge University Press, for her guidance and patience in realizing this book.
Finally, on a more personal front, MJZ dedicates the book to his wife, Amina,
for her love, patience and support over all these years, and to his children, Abrar and
Afsah, and his parents. WMJ gratefully dedicates the book to his wife Patricia; to his
children, Gabriel and Marina; and to his parents, Wagner and Marlene, for their love,
encouragement, and inspiration.

(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R pdf download
83% (6)
(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R pdf download
44 pages
Kernel Methods For Pattern Analysis
100% (3)
Kernel Methods For Pattern Analysis
478 pages
Research Methodology and Quantitative Methods
From Everand
Research Methodology and Quantitative Methods
G. NAGESWARA RAO
1/5 (1)
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki - The 2025 ebook edition is available with updated content
100% (11)
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki - The 2025 ebook edition is available with updated content
82 pages
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki instant download
100% (1)
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki instant download
52 pages
2016 Book PrinciplesOfDataMining PDF
100% (3)
2016 Book PrinciplesOfDataMining PDF
530 pages
Data Mining and Machine Learning
No ratings yet
Data Mining and Machine Learning
3 pages
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki pdf download
No ratings yet
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki pdf download
49 pages
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki - The latest ebook version is now available for instant access
100% (8)
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki - The latest ebook version is now available for instant access
82 pages
Dunham - Data Mining PDF
100% (1)
Dunham - Data Mining PDF
156 pages
Dunham - Data Mining PDF
83% (6)
Dunham - Data Mining PDF
156 pages
The Handbook of Data Mining - 1st Edition ISBN 0805840818, 9780805840810 Complete EPUB eBook
No ratings yet
The Handbook of Data Mining - 1st Edition ISBN 0805840818, 9780805840810 Complete EPUB eBook
17 pages
Clusteranalysisanddatamining PDF
100% (1)
Clusteranalysisanddatamining PDF
333 pages
Cluster Analysis and Data Mining
100% (1)
Cluster Analysis and Data Mining
333 pages
Introduction To Data Mining 2005
60% (5)
Introduction To Data Mining 2005
400 pages
Introduction to Data Mining 2005th Edition Pang-Ning Tan pdf download
No ratings yet
Introduction to Data Mining 2005th Edition Pang-Ning Tan pdf download
61 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
200 pages
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R
No ratings yet
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R
41 pages
Previewpdf
No ratings yet
Previewpdf
107 pages
Immediate download Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger ebooks 2024
No ratings yet
Immediate download Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger ebooks 2024
90 pages
(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R instant download
100% (1)
(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R instant download
51 pages
1.3 What Kind of Data Can Be Mined?
No ratings yet
1.3 What Kind of Data Can Be Mined?
5 pages
Data Mining PDF
No ratings yet
Data Mining PDF
24 pages
(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in Rpdf download
100% (4)
(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in Rpdf download
44 pages
Foundations of Machine
No ratings yet
Foundations of Machine
120 pages
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
100% (1)
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
704 pages
Instant ebooks textbook (eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R download all chapters
100% (4)
Instant ebooks textbook (eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R download all chapters
55 pages
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki download
No ratings yet
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki download
75 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
PDF (eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R download
100% (1)
PDF (eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R download
50 pages
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki pdf download
100% (2)
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki pdf download
76 pages
(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R download
No ratings yet
(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R download
48 pages
Is Zc415 (Data Mining BITS-WILP)
No ratings yet
Is Zc415 (Data Mining BITS-WILP)
4 pages
Machine Learning Notes 1
No ratings yet
Machine Learning Notes 1
120 pages
DMbookTOC1
No ratings yet
DMbookTOC1
8 pages
Contrast Data Mining - Concepts, Algorithms, and Applications (Dong & Bailey 2012-09-07)
No ratings yet
Contrast Data Mining - Concepts, Algorithms, and Applications (Dong & Bailey 2012-09-07)
428 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
DM Overview
No ratings yet
DM Overview
52 pages
Kernel Methods For General Pattern Analysis PDF
No ratings yet
Kernel Methods For General Pattern Analysis PDF
77 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Mar 13 Lae 08
No ratings yet
Mar 13 Lae 08
656 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Data Mining1
No ratings yet
Data Mining1
13 pages
231
No ratings yet
231
10 pages
CS-DM MODULE -1
No ratings yet
CS-DM MODULE -1
27 pages
Quant Developers' Tools and Techniques: Quant Books, #2
From Everand
Quant Developers' Tools and Techniques: Quant Books, #2
Manfred Hindering
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Mastering MongoDB: A Comprehensive Guide to NoSQL Database Excellence
From Everand
Mastering MongoDB: A Comprehensive Guide to NoSQL Database Excellence
Kameron Hussain
No ratings yet
Mastering PostgreSQL: A Comprehensive Guide for Developers
From Everand
Mastering PostgreSQL: A Comprehensive Guide for Developers
Kameron Hussain
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
From Everand
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
NAGARAJU CHEVURU
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
From Everand
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Mastering Amazon DynamoDB: From Basics to Scalability
From Everand
Mastering Amazon DynamoDB: From Basics to Scalability
Kameron Hussain
No ratings yet
Unlocking the Power of Vulkan: A Journey into AI and Machine Learning
From Everand
Unlocking the Power of Vulkan: A Journey into AI and Machine Learning
Kameron Hussain
No ratings yet
NoSQL Essentials: Navigating the World of Non-Relational Databases
From Everand
NoSQL Essentials: Navigating the World of Non-Relational Databases
Kameron Hussain
No ratings yet
Fundamentals of Machine Learning: An Introduction to Neural Networks
From Everand
Fundamentals of Machine Learning: An Introduction to Neural Networks
Peter Johnson
No ratings yet
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
From Everand
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
William Smith
No ratings yet
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
From Everand
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
Adam Jones
No ratings yet
Asymptotic Notations
No ratings yet
Asymptotic Notations
101 pages
21 Support Vector Machines 03-10-2024
No ratings yet
21 Support Vector Machines 03-10-2024
72 pages
Java Cheat Sheet: Reminders About Java Syntax and Other Details For CISC 121
No ratings yet
Java Cheat Sheet: Reminders About Java Syntax and Other Details For CISC 121
2 pages
Data Structures and Algorithms: Lecture Notes 1
No ratings yet
Data Structures and Algorithms: Lecture Notes 1
35 pages
ADA A4 Class 19jan2022
No ratings yet
ADA A4 Class 19jan2022
4 pages
COL703 IIT Delhi Assignment 4
No ratings yet
COL703 IIT Delhi Assignment 4
2 pages
Word Guessing Game (Hangman) PPT
0% (1)
Word Guessing Game (Hangman) PPT
7 pages
HW1 电路
No ratings yet
HW1 电路
10 pages
Implementation of de Morgan's Law With Two Input.
No ratings yet
Implementation of de Morgan's Law With Two Input.
3 pages
Makespan Scheduling
No ratings yet
Makespan Scheduling
6 pages
AI Mid Exam1
No ratings yet
AI Mid Exam1
4 pages
13 - Polynomial and Rational Functions - Finding X and y Intercepts Given A Polynomial Function
No ratings yet
13 - Polynomial and Rational Functions - Finding X and y Intercepts Given A Polynomial Function
2 pages
Beginning FPGA Programming - Partie28
No ratings yet
Beginning FPGA Programming - Partie28
5 pages
Appendix B Forouzan
No ratings yet
Appendix B Forouzan
8 pages
Exp2 Motion
No ratings yet
Exp2 Motion
2 pages
Parabola Equations General To Standard Form
100% (1)
Parabola Equations General To Standard Form
4 pages
M.Sc. Grade Mid Exam 120 Minutes 2019-2020: Q1. A Linear Time Invariant System Is Characterized by The System Function
No ratings yet
M.Sc. Grade Mid Exam 120 Minutes 2019-2020: Q1. A Linear Time Invariant System Is Characterized by The System Function
2 pages
Machine - Learning - Content - Python PDF
No ratings yet
Machine - Learning - Content - Python PDF
3 pages
Improving Efficiency of Apriori Algorithm Using Transaction Reduction
No ratings yet
Improving Efficiency of Apriori Algorithm Using Transaction Reduction
4 pages
Symbol Table Design (Compiler Construction)
100% (1)
Symbol Table Design (Compiler Construction)
33 pages
Introduction To Discrete Structures
No ratings yet
Introduction To Discrete Structures
52 pages
Slide 7 - Time Complexity
No ratings yet
Slide 7 - Time Complexity
17 pages
Lab Assignment - (1-5)
No ratings yet
Lab Assignment - (1-5)
17 pages
Fast Fourier Transform
No ratings yet
Fast Fourier Transform
16 pages
Hamming Code in Computer Network
No ratings yet
Hamming Code in Computer Network
5 pages
Lecture 3 - Transportation Problem
No ratings yet
Lecture 3 - Transportation Problem
41 pages
Princeton Substring Search
No ratings yet
Princeton Substring Search
14 pages
Updated Ai Fundamentals Final Exam Source
No ratings yet
Updated Ai Fundamentals Final Exam Source
32 pages
04 Ip 4
No ratings yet
04 Ip 4
15 pages
Genetic
No ratings yet
Genetic
18 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining and Analysis: Fundamental Concepts and Algorithms

Uploaded by

Data Mining and Analysis: Fundamental Concepts and Algorithms

Uploaded by

DATA MINING

WAGNER MEIRA JR.

Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit of

Copyright Mohammed J. Zaki and Wagner Meira Jr. 2014

First published 2014

Library of Congress Cataloging in Publication Data

ISBN 978-0-521-76633-3 Hardback

Cambridge University Press has no responsibility for the persistence or accuracy of

1 Data Mining and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 1

PART I DATA ANALYSIS FOUNDATIONS 31

4.2 Topological Attributes 97

5 Kernel Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6 High-dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

7 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 183

PART II FREQUENT PATTERN MINING 215

8 Itemset Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

9 Summarizing Itemsets . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

10 Sequence Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

11 Graph Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

12 Pattern and Rule Assessment . . . . . . . . . . . . . . . . . . . . . . . . 301

PART III CLUSTERING 331

13 Representative-based Clustering . . . . . . . . . . . . . . . . . . . . . . 333

14 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

15 Density-based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 375

16 Spectral and Graph Clustering . . . . . . . . . . . . . . . . . . . . . . . 394

17 Clustering Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425

PART IV CLASSIFICATION 464

19 Decision Tree Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 480

20 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . 497

21 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 513

22 Classification Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . 547

This book is an outgrowth of data mining courses at Rensselaer Polytechnic Institute

• Muhammad Abulaish, Jamia Millia Islamia, India

• Pauli Miettinen, Max-Planck-Institut für Informatik, Germany

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.