0% found this document useful (0 votes)

9 views23 pages

Week 05

The document discusses the analytics process and data mining, emphasizing the importance of data quality and the steps involved in data pre-processing. It outlines different machine learning approaches, including supervised and unsupervised learning, and introduces concepts of statistical learning and model selection. Additionally, it covers prediction errors, cross-validation methods, and the bias-variance trade-off in model evaluation.

Uploaded by

Riya singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views23 pages

Week 05

Uploaded by

Riya singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

BUSINESS INTELLIGENCE & ANALYTICS

Analytics process

Saji K Mathew, PhD

Professor, Department of Management Studies
INDIAN INSTITUTE OF TECHNOLOGY MADRAS
Mindfulness
I just completed a thorough statistical examination of the
life of President Bush. For fifty eight years close to 21,000
observations, he did not die once. I can hence
pronounce him immortal with a high degree of statistical
significance.
Data mining process
How to decide on variables
Include a variable if:
The variable is important in making a managerial
decision
Eg.: Square foot area in a sales outlet
The variable helps to control for important factors
Eg.: Seasonality
There aren’t too many (Parsimony-Occam’s razor)
Is it really necessary to include this variable?
Availability of data
Data mining process
Inspect data
Data Quality
Good data characterized by (Han et al., 2012):
Accuracy
Completeness
Consistency
Timeliness
Believability
Interpretability
Data problems in the real world
Missing data
Noise
Inconsistency
Units of measurement

SOLUTION: Discrepancy detection, Data pre-

processing
Data pre-processing
Data cleaning
Missing values ignore the tuple, manual replacement,
replacement following a method
Noisy data smoothing techniques
Data integration
Database normalization
Data transformation
Scale, normalization
Data Reduction
Aggregation
Data mining process
Machine learning
In supervised learning, for each observation of the
predictor measurement(s) xi, i = 1, . . . , n there is an
associated response measurement yi.
Eg.: regression,
In unsupervised learning, for every observation i = 1, .
. . , n, we observe a vector of measurements xi but no
associated response yi.
Eg.: clustering
Statistical vs algorithmic (CS)
Explanatory vs predictive modeling

No of credit cards in a family

Explanatory vs predictive modeling

No of credit cards in a family

Statistical learning
More generally, suppose that we observe a quantitative
response Y and p different predictors, X1,X2, . . .,Xp. We
assume that there is some relationship between Y and X
= (X1,X2, . . .,Xp), which can be written in the very
general form
Y = f(X) +
Statistical learning refers to a set of approaches for
estimating f
Training: Residual Sum of Squares,
Choice of models: prediction error-
flexibility (bias-variance) trade off

LHS: Different fits (linear regression (orange)

RHS: Red Line: Test MSE, Grey line: Training MSE
Prediction

Three sources of error in predicted Y:

Reducible error due to inaccurate estimation of f
Irreducible error due to randomness
Test data variation

Reducible error can be reduced by better learning techniques

Cross validation
Training error vs. testing error (prediction error)
Mean Squared Error (MSE), a measure of testing error:

Three kinds of cross validation:

Test set approach
Leave-one-out cross-validation (LOOCV)
K-fold cross validation
The test set method

1. Randomly choose
30% of the data to be in a
test set
2. The remainder is a
y
training set
3. Perform your
regression on the training
set
x 4. Estimate your future
(Linear regression example) performance with the test
Mean Squared Error = 2.4 set
LOOCV (Leave-one-out Cross Validation)

For k=1 to R
1. Let (xk,yk) be the kth record
2. Temporarily remove (xk,yk)
from the dataset
3. Train on the remaining R-1
y datapoints
4. Note your error (xk,yk)

x
When you’ve done all points,
report the mean error.
Randomly break the dataset into k
k-fold Cross partitions (in our example we’ll have k=3
Validation partitions colored Red Green and Blue)
For the red partition: Train on all the
points not in the red partition. Find
the test-set sum of errors on the red
points.
For the green partition: Train on all the
points not in the green partition.
y Find the test-set sum of errors on
the green points.
For the blue partition: Train on all the
points not in the blue partition. Find
x the test-set sum of errors on the
Linear Regression MSE3FOLD=2.05 blue points.
Then report the mean error

Soderstrom T., Stoica P. System Identification (PH 1989) (ISBN S
100% (6)
Soderstrom T., Stoica P. System Identification (PH 1989) (ISBN S
637 pages
Div Card Harvest
100% (1)
Div Card Harvest
7 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Regression v33
No ratings yet
Regression v33
81 pages
CSE3506 PPT Ref1
No ratings yet
CSE3506 PPT Ref1
135 pages
MI - Unit 3
No ratings yet
MI - Unit 3
107 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
Week 6 - Lecture 12-1
No ratings yet
Week 6 - Lecture 12-1
34 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
W1.2 Regression 1
No ratings yet
W1.2 Regression 1
28 pages
Over Fit
No ratings yet
Over Fit
63 pages
Lecture 19
No ratings yet
Lecture 19
25 pages
Capitulo 2 Big Data
No ratings yet
Capitulo 2 Big Data
25 pages
Class 3 - Classification
No ratings yet
Class 3 - Classification
80 pages
Unit 2
No ratings yet
Unit 2
80 pages
Week#2
No ratings yet
Week#2
34 pages
Topic 7.6 Regression Analysis and Learning Regression Analysis
No ratings yet
Topic 7.6 Regression Analysis and Learning Regression Analysis
6 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
Predictive ModellingAnalytics
No ratings yet
Predictive ModellingAnalytics
27 pages
9 - Linear Regression - 1 - 2
No ratings yet
9 - Linear Regression - 1 - 2
25 pages
Lec 5
No ratings yet
Lec 5
28 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
Machine Learning
No ratings yet
Machine Learning
62 pages
Predictive Analytics - Regression
No ratings yet
Predictive Analytics - Regression
27 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
Ch5 Resampling Methods
No ratings yet
Ch5 Resampling Methods
66 pages
INSY662 - F23 - Week 3-1
No ratings yet
INSY662 - F23 - Week 3-1
22 pages
m2 Data Analytic and Visualization
No ratings yet
m2 Data Analytic and Visualization
53 pages
Week 7. Intro To ML. Regression
No ratings yet
Week 7. Intro To ML. Regression
24 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
8 pages
BUET M.Sc. Admission Test Question (CSE) May - 2019
83% (6)
BUET M.Sc. Admission Test Question (CSE) May - 2019
2 pages
Lec 1
No ratings yet
Lec 1
54 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
SVM Regressor
No ratings yet
SVM Regressor
13 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
12 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Regression
No ratings yet
Regression
35 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Linear Regression-1: Prof. Asim Tewari IIT Bombay
No ratings yet
Linear Regression-1: Prof. Asim Tewari IIT Bombay
27 pages
Monte Carlo Simulation
0% (1)
Monte Carlo Simulation
22 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
Mid-1 ML
No ratings yet
Mid-1 ML
12 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
8960 - DWM Experiment 5
No ratings yet
8960 - DWM Experiment 5
6 pages
5 CV Boot-Handout PDF
No ratings yet
5 CV Boot-Handout PDF
44 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Week 11
No ratings yet
Week 11
28 pages
Week 03 Part 02
No ratings yet
Week 03 Part 02
29 pages
Week 10
No ratings yet
Week 10
18 pages
Week 03 Part 01
No ratings yet
Week 03 Part 01
21 pages
Week 02 Part 01
No ratings yet
Week 02 Part 01
15 pages
(ERRATA) An Introduction To Numerical Computation (Wen Shen)
No ratings yet
(ERRATA) An Introduction To Numerical Computation (Wen Shen)
2 pages
Week 02 Part 02
No ratings yet
Week 02 Part 02
9 pages
GP2 2SCP03
No ratings yet
GP2 2SCP03
35 pages
Week 04 Part 02
No ratings yet
Week 04 Part 02
6 pages
Lec 3 - Practical Stream Ciphers 02oct24
No ratings yet
Lec 3 - Practical Stream Ciphers 02oct24
20 pages
Computer Vision ch4
No ratings yet
Computer Vision ch4
100 pages
Hand Written Digit Recognition2106.08267
No ratings yet
Hand Written Digit Recognition2106.08267
12 pages
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
No ratings yet
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
57 pages
Solution For Kriging Calculation
100% (2)
Solution For Kriging Calculation
6 pages
Reducibility and NP Completeness
No ratings yet
Reducibility and NP Completeness
73 pages
Pre-Calculus 11 - Chapter 7 Review Absolute Values
No ratings yet
Pre-Calculus 11 - Chapter 7 Review Absolute Values
5 pages
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks What's The Difference IBM
No ratings yet
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks What's The Difference IBM
11 pages
SSP 3 1 - Spectrum 1
No ratings yet
SSP 3 1 - Spectrum 1
13 pages
Introduction To AI Module 1 Part C
No ratings yet
Introduction To AI Module 1 Part C
5 pages
Course Type Course Code Name of Course L T P Credit: Text Books
No ratings yet
Course Type Course Code Name of Course L T P Credit: Text Books
1 page
Formal Languages Models of Computation: Spring 2005 Costas Busch - RPI
No ratings yet
Formal Languages Models of Computation: Spring 2005 Costas Busch - RPI
36 pages
Particle Swarm Optimization (PSO) - NEW
No ratings yet
Particle Swarm Optimization (PSO) - NEW
18 pages
MTH601 Mid Term Quiz
No ratings yet
MTH601 Mid Term Quiz
9 pages
0-9 and From A To F
No ratings yet
0-9 and From A To F
3 pages
Data Structures VIVA Questions and Answers
100% (1)
Data Structures VIVA Questions and Answers
6 pages
E1039207009 21119 1218595455594
No ratings yet
E1039207009 21119 1218595455594
23 pages
Chapter 6 Logarithmic and Exponential Functions
No ratings yet
Chapter 6 Logarithmic and Exponential Functions
7 pages
Digital Signature: Review Questions
No ratings yet
Digital Signature: Review Questions
6 pages
Math 365 Project
No ratings yet
Math 365 Project
2 pages
Data Mining and Data Warehousing: Inst Ruct Ions T O Cand Idat Es
No ratings yet
Data Mining and Data Warehousing: Inst Ruct Ions T O Cand Idat Es
2 pages
Engineering Mathematics Ii Ras203
No ratings yet
Engineering Mathematics Ii Ras203
2 pages
Simplex Method
No ratings yet
Simplex Method
15 pages
Student Solutions Manual for Mathematics for Economics, fourth edition
From Everand
Student Solutions Manual for Mathematics for Economics, fourth edition
Michael Hoy
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 05

Uploaded by

Week 05

Uploaded by

BUSINESS INTELLIGENCE & ANALYTICS

Saji K Mathew, PhD

SOLUTION: Discrepancy detection, Data pre-

No of credit cards in a family

No of credit cards in a family

LHS: Different fits (linear regression (orange)

Three sources of error in predicted Y:

Reducible error can be reduced by better learning techniques

Three kinds of cross validation:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.