0% found this document useful (0 votes)
8 views4 pages

Jguytibu

The document outlines a comprehensive table of contents for a text on statistical methods and machine learning, covering topics such as Monte Carlo methods, unsupervised learning, regression, regularization, classification, and decision trees. Each section includes subtopics and exercises, indicating a structured approach to teaching these concepts. The content appears to be aimed at providing both theoretical foundations and practical applications in data analysis.

Uploaded by

sabsebada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Jguytibu

The document outlines a comprehensive table of contents for a text on statistical methods and machine learning, covering topics such as Monte Carlo methods, unsupervised learning, regression, regularization, classification, and decision trees. Each section includes subtopics and exercises, indicating a structured approach to teaching these concepts. The content appears to be aimed at providing both theoretical foundations and practical applications in data analysis.

Uploaded by

sabsebada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

viii Contents

3.3.1 Crude Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . 85


3.3.2 Bootstrap Method . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3.3 Variance Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.4 Monte Carlo for Optimization . . . . . . . . . . . . . . . . . . . . . . . . 96
3.4.1 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . 96
3.4.2 Cross-Entropy Method . . . . . . . . . . . . . . . . . . . . . . . 100
3.4.3 Splitting for Optimization . . . . . . . . . . . . . . . . . . . . . . 103
3.4.4 Noisy Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 106
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

4 Unsupervised Learning 121


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2 Risk and Loss in Unsupervised Learning . . . . . . . . . . . . . . . . . . 122
4.3 Expectation–Maximization (EM) Algorithm . . . . . . . . . . . . . . . . 128
4.4 Empirical Distribution and Density Estimation . . . . . . . . . . . . . . . 131
4.5 Clustering via Mixture Models . . . . . . . . . . . . . . . . . . . . . . . 135
4.5.1 Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.5.2 EM Algorithm for Mixture Models . . . . . . . . . . . . . . . . . 137
4.6 Clustering via Vector Quantization . . . . . . . . . . . . . . . . . . . . . 142
4.6.1 K-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.6.2 Clustering via Continuous Multiextremal Optimization . . . . . . 146
4.7 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.8 Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . . 153
4.8.1 Motivation: Principal Axes of an Ellipsoid . . . . . . . . . . . . . 154
4.8.2 PCA and Singular Value Decomposition (SVD) . . . . . . . . . . 155
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

5 Regression 167
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.3 Analysis via Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.3.1 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . 171
5.3.2 Model Selection and Prediction . . . . . . . . . . . . . . . . . . . 172
5.3.3 Cross-Validation and Predictive Residual Sum of Squares . . . . . 173
5.3.4 In-Sample Risk and Akaike Information Criterion . . . . . . . . . 175
5.3.5 Categorical Features . . . . . . . . . . . . . . . . . . . . . . . . 177
5.3.6 Nested Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.3.7 Coefficient of Determination . . . . . . . . . . . . . . . . . . . . 181
5.4 Inference for Normal Linear Models . . . . . . . . . . . . . . . . . . . . 182
5.4.1 Comparing Two Normal Linear Models . . . . . . . . . . . . . . 183
5.4.2 Confidence and Prediction Intervals . . . . . . . . . . . . . . . . 186
5.5 Nonlinear Regression Models . . . . . . . . . . . . . . . . . . . . . . . . 188
5.6 Linear Models in Python . . . . . . . . . . . . . . . . . . . . . . . . . . 191
5.6.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
5.6.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.6.3 Analysis of Variance (ANOVA) . . . . . . . . . . . . . . . . . . 196
Contents ix

5.6.4 Confidence and Prediction Intervals . . . . . . . . . . . . . . . . 198


5.6.5 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.6.6 Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 200
5.7 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . 204
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

6 Regularization and Kernel Methods 215


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
6.2 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.3 Reproducing Kernel Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . 222
6.4 Construction of Reproducing Kernels . . . . . . . . . . . . . . . . . . . . 224
6.4.1 Reproducing Kernels via Feature Mapping . . . . . . . . . . . . . 224
6.4.2 Kernels from Characteristic Functions . . . . . . . . . . . . . . . 225
6.4.3 Reproducing Kernels Using Orthonormal Features . . . . . . . . 227
6.4.4 Kernels from Kernels . . . . . . . . . . . . . . . . . . . . . . . . 229
6.5 Representer Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
6.6 Smoothing Cubic Splines . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.7 Gaussian Process Regression . . . . . . . . . . . . . . . . . . . . . . . . 238
6.8 Kernel PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

7 Classification 251
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
7.2 Classification Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.3 Classification via Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . 257
7.4 Linear and Quadratic Discriminant Analysis . . . . . . . . . . . . . . . . 259
7.5 Logistic Regression and Softmax Classification . . . . . . . . . . . . . . 266
7.6 K-Nearest Neighbors Classification . . . . . . . . . . . . . . . . . . . . . 268
7.7 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
7.8 Classification with Scikit-Learn . . . . . . . . . . . . . . . . . . . . . . . 277
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

8 Decision Trees and Ensemble Methods 287


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
8.2 Top-Down Construction of Decision Trees . . . . . . . . . . . . . . . . . 289
8.2.1 Regional Prediction Functions . . . . . . . . . . . . . . . . . . . 290
8.2.2 Splitting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
8.2.3 Termination Criterion . . . . . . . . . . . . . . . . . . . . . . . . 292
8.2.4 Basic Implementation . . . . . . . . . . . . . . . . . . . . . . . . 294
8.3 Additional Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 298
8.3.1 Binary Versus Non-Binary Trees . . . . . . . . . . . . . . . . . . 298
8.3.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 298
8.3.3 Alternative Splitting Rules . . . . . . . . . . . . . . . . . . . . . 298
8.3.4 Categorical Variables . . . . . . . . . . . . . . . . . . . . . . . . 299
8.3.5 Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
8.4 Controlling the Tree Shape . . . . . . . . . . . . . . . . . . . . . . . . . 300
8.4.1 Cost-Complexity Pruning . . . . . . . . . . . . . . . . . . . . . . 303
xii Contents

D.12 Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486


D.12.1 Series and DataFrame . . . . . . . . . . . . . . . . . . . . . . . . 486
D.12.2 Manipulating Data Frames . . . . . . . . . . . . . . . . . . . . . 487
D.12.3 Extracting Information . . . . . . . . . . . . . . . . . . . . . . . 489
D.12.4 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
D.13 Scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
D.13.1 Partitioning the Data . . . . . . . . . . . . . . . . . . . . . . . . 491
D.13.2 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
D.13.3 Fitting and Prediction . . . . . . . . . . . . . . . . . . . . . . . . 493
D.13.4 Testing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . 493
D.14 System Calls, URL Access, and Speed-Up . . . . . . . . . . . . . . . . . 494

Bibliography 496

Index 505
P REFACE

In our present world of automation, cloud computing, algorithms, artificial intelligence,


and big data, few topics are as relevant as data science and machine learning. Their recent
popularity lies not only in their applicability to real-life questions, but also in their natural
blending of many different disciplines, including mathematics, statistics, computer science,
engineering, science, and finance.
To someone starting to learn these topics, the multitude of computational techniques
and mathematical ideas may seem overwhelming. Some may be satisfied with only learn-
ing how to use off-the-shelf recipes to apply to practical situations. But what if the assump-
tions of the black-box recipe are violated? Can we still trust the results? How should the
algorithm be adapted? To be able to truly understand data science and machine learning it
is important to appreciate the underlying mathematics and statistics, as well as the resulting
algorithms.
The purpose of this book is to provide an accessible, yet comprehensive, account of
data science and machine learning. It is intended for anyone interested in gaining a better
understanding of the mathematics and statistics that underpin the rich variety of ideas and
machine learning algorithms in data science. Our viewpoint is that computer languages
come and go, but the underlying key ideas and algorithms will remain forever and will
form the basis for future developments.
Before we turn to a description of the topics in this book, we would like to say a
few words about its philosophy. This book resulted from various courses in data science
and machine learning at the Universities of Queensland and New South Wales, Australia.
When we taught these courses, we noticed that students were eager to learn not only how
to apply algorithms but also to understand how these algorithms actually work. However,
many existing textbooks assumed either too much background knowledge (e.g., measure
theory and functional analysis) or too little (everything is a black box), and the information
overload from often disjointed and contradictory internet sources made it more difficult for
students to gradually build up their knowledge and understanding. We therefore wanted to
write a book about data science and machine learning that can be read as a linear story,
with a substantial “backstory” in the appendices. The main narrative starts very simply and
builds up gradually to quite an advanced level. The backstory contains all the necessary
xiii

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy