Mlche Lec 1-31
Mlche Lec 1-31
CHE F315
Outline
15 August 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Output Prediction
Classification
Data ML Algorithm Pattern recognition
--
--
15 August 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data
In the form of table (matrix)
Rows –
instances/samples/measurements/observations/records/
patterns/objects/events
Columns –
attributes/variables/features
15 August 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Feature
Classification
extraction
Regression Clustering
15 August 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
AI Vs ML
15 August 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Applications
Spam filter
Biometric/Handwriting/Speech/Text recognition
Image processing
Stock market forecasting/demand forecasting
Medical diagnostics
Drug discovery and optimization
Bioinformatics
Robotics and automation
--
--
15 August 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data driven
process
modeling
Process
Soft sensing of Parameter
monitoring/fault
key variables optimization
detection
15 August 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Applications
15 August 2024 17
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
Ferreira, J., Pedemonte, M., & Torres, A. I. (2022). Development of a machine learning-
based soft sensor for an oil refinery’s distillation column. Computers & Chemical
Engineering, 161, 107756.
15 August 2024 18
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
Zhu, Q. X., Wang, X. W., Li, K., Xu, Y., & He, Y. L. (2022). Enhanced multicorrelation block
process monitoring and abnormity root cause analysis for distributed industrial process:
A visual data-driven approach. Journal of Process Control, 118, 1-15.
15 August 2024 19
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 20
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Handout
15 August 2024 21
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024
22 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Outline
15 August 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Thebelt, A., Wiebe, J., Kronqvist, J., Tsay, C., & Misener, R. (2022). Maximizing information from chemical
engineering data sets: Applications to machine learning. Chemical Engineering Science, 252, 117469.
15 August 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Thebelt, A., Wiebe, J., Kronqvist, J., Tsay, C., & Misener, R. (2022). Maximizing information from chemical
engineering data sets: Applications to machine learning. Chemical Engineering Science, 252, 117469.
15 August 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
15 August 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
15 August 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Missing data imputation
Missing values in process industries refer to entries in the
data set that have no connection with the real state of
the process and take values such as ±∞, 0, nan (not a
number)
There are generally three missing patterns:
Missing completely at random (MCAR)
Missing at random (MAR)
Missing not at random (MNAR)
15 August 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Missing data imputation
A and C – missing values for
single/multiple variables
due to sensor failure
B – values of some variables
missing at same time
instances fault
D – single variable showing
regular missing values
multirate sampling
Xu, S., Lu, B., Baldea, M., Edgar, T. F., Wojsznis, W., Blevins, T., & Nixon, M. (2015). Data cleaning in the process
industries. Reviews in Chemical Engineering, 31(5), 453-490.
15 August 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
15 August 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Outlier detection and removal
• Observations or subsets of
observations that do not show a
consistent behavior with the rest
of the data set from a statistical
perspective
• Causes: malfunction of sensors
Pani, A. K., & Mohanta, H. K. (2016). Online monitoring of cement
and inappropriate treatment of clinker quality using multivariate statistics and Takagi-Sugeno fuzzy-
missing data inference technique. Control Engineering Practice, 57, 1-17.
15 August 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024
17 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Recap
15 August 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
EDA
Descriptive statistics
Does dataset summarization
Data visualization
– Bar chart
– Pie chart
– histogram
15 August 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Descriptive statistics
(Univariate)
Central tendency
Mean, median
Dispersion
Range, variance, standard deviation, quartiles and
interquartile range
Distribution
Skewness, kurtosis
15 August 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Quartile-based identifier and boxplots:
Uses the interquartile distance Q as the scale parameter
Q = Q3 – Q1
where Q1 is the lower quartile, x0.25 and Q3 is the upper
quartile, x0.75
Med = (Q1+ Q3)/2
For a symmetric data distribution, the following condition to
detect outliers:
|xk -med| >2Q
A boxplot is used as a graphical demonstration
of the quartile-based detector
In the plot, any point that lies outside the
upper or lower fences, is considered as an
outlier.
15 August 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Statistics
(Bivariate/Multivariate)
Scatter plot
Covariance
Correlation
Heatmap
15 August 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
15 August 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Quartile-based identifier and boxplots:
Uses the interquartile distance Q as the scale parameter
Q = Q3 – Q1
where Q1 is the lower quartile, x0.25 and Q3 is the upper quartile,
x0.75
13
med = (Q1+ Q3)/2
For a symmetric data distribution, the following condition to detect
outliers:
|xk -med| >2Q
A boxplot is used as a graphical demonstration
of the quartile-based detector
In the plot, any point that lies outside the
upper or lower fences, is considered as an
outlier.
15 August 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Multivariate outlier detection
Mahalanobis distance
Minimum covariance
determinant (MCD)
estimator
Minimum volume ellipsoid
(MVE) estimator
Smallest half volume
Useful References
https://www.machinelearningplus.com/statistics/mahalanobi
s-distance/
Chiang, L. H., Pell, R. J., & Seasholtz, M. B. (2003).
Exploring process data with the use of robust outlier
detection algorithms. Journal of Process Control, 13(5),
437-449.
15 August 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024
13 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Recap
Missing value
Descriptive statistics
– Central tendency
– Dispersion
– Distribution
15 August 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Pani, A. K., & Mohanta, H. K. (2016). Online monitoring of cement clinker quality using multivariate statistics and
Takagi-Sugeno fuzzy-inference technique. Control Engineering Practice, 57, 1-17.
15 August 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Mahalanobis distance
15 August 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Pani, A. K., & Mohanta, H. K. (2016). Online monitoring of cement clinker quality using multivariate statistics and
Takagi-Sugeno fuzzy-inference technique. Control Engineering Practice, 57, 1-17.
15 August 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Useful References
https://www.machinelearningplus.com/statistics/mahalanobi
s-distance/
Hodge, V., & Austin, J. (2004). A survey of outlier detection
methodologies. Artificial intelligence review, 22, 85-126.
Chiang, L. H., Pell, R. J., & Seasholtz, M. B. (2003).
Exploring process data with the use of robust outlier
detection algorithms. Journal of Process Control, 13(5),
437-449.
15 August 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Min-max
z-score
15 August 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
ACROSS
2. Mean and mode are examples of
______________ of univariate data.
4. Noisy data is
(normal/abnormal) data.
1
7. The branch of statistics that is used for
2
summarizing data is called ______
statistics.
10. Kurtosis characterized
the__________ of data.
3 12. The assumption of testing of data is
called a ___________ .
4 5
13. Raw facts are called _______.
6 14. Data wrangling refers to making data
suitable for processing. (Yes/ No)
7 8 9 15. Pairplot is used to visualize univariate
data. (Yes/No)
10
DOWN
1. The averaged square distance from
its mean is called ____________.
3. The characteristics of Big Data are
11
volume, velocity and
12
__________________.
5. The Dataset of two variables is
13 called __________________ data.
6. Visualiztion helps in presentation of
14 data. (Yes/ No)
8. Normalized covariance is called
________________.
9. Processed data is
________________.
15 11. Incorrect rejection of true
hypothesis is called _____________
error.
15 August 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024
13 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Recap
Multivariate data
Euclidean and Mahalanobis distance
Multivariate outlier detection
Data transformation
15 August 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Dimensionality reduction
15 August 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Dimensionality reduction
As the number of dimensions increases time/computation
complexity increases
• Variable (feature) selection
Reduces dataset size by removing irrelevant variables
• Variable (feature) extraction (transformation)
15 August 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Feature selection
Measure of relevant feature
– Mutual information
– Correlation based similarity
– Distance-based similarity
A typical feature selection process consists of four steps:
– Generation of possible subsets
– Subset evaluation
– Stop searching based on some stopping criterion
– Validation of the result
15 August 2024 7
BITS Pilani, Pilani Campus
ET ZC362 Environmental Pollution Control
Feature selection
15 August 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Dimensionality reduction
Wrapper based
– Stepwise forward selection
– Stepwise backward elimination
15 August 2024 9
BITS Pilani, Pilani Campus
Probability
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Why Probability in ML
Designing machines that learn from observed data
Uncertainty in learning from data
Observed data can be consistent with many models and
therefore which model is appropriate, given the data, is
uncertain
Predictions about future data and the future consequences
of actions are uncertain
Many aspects of learning and intelligence crucially depend
on the careful probabilistic representation of uncertainty.
Probabilistic framework describes how to represent and
manipulate uncertainty about models and predictions
Bayesian interpretation use of probability to quantify
uncertainty
15 August 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Review of basics
15 August 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
15 August 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Probability distribution
15 August 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Probability distribution
15 August 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
15 August 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Probability distribution
15 August 2024 17
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
References
15 August 2024 18
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024
19 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Recap
Dimensionality reduction
– Variable selection
– Variable extraction
Variable selection
Filter based
– Mutual information
– Correlation based similarity
– Distance-based similarity
Wrapper based
– Forward selection
– Backward elimination
Embedded methods
– LASSO
– Elastic net
– Ridge regression
15 August 2024 4
BITS Pilani, Pilani Campus
Probability
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Why Probability in ML
Designing machines that learn from observed data
Uncertainty in learning from data
Observed data can be consistent with many models and
therefore which model is appropriate, given the data, is
uncertain
Predictions about future data and the future consequences
of actions are uncertain
Many aspects of learning and intelligence crucially depend
on the careful probabilistic representation of uncertainty.
Probabilistic framework describes how to represent and
manipulate uncertainty about models and predictions
Bayesian interpretation use of probability to quantify
uncertainty
15 August 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Review of basics
15 August 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
15 August 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Probability distribution
15 August 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Probability distribution
15 August 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
15 August 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Probability distribution
15 August 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
References
15 August 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
15 August 2024
15 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Recap
• Introduction
• Normal and faulty operation
• Traditional Monitoring Techniques
• Quality Control Charts
• Shewhart control charts (for subgroup data)
• Mean control chart
• Variability control chart (std dev and R)
16 September 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
where C+ and C- denote the sums for the high and low directions and K is a
constant, the slack parameter.
16 September 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024
17 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Recap
• Introduction
• Normal and faulty operation
• Traditional Monitoring Techniques
• Quality Control Charts
• Shewhart control charts (for subgroup data)
• Shewhart control charts (for individual data)
• Limitations of Shewhart chart
• Western electric rules
• CUSUM chart
16 September 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
where C+ and C- denote the sums for the high and low directions and K is a
constant, the slack parameter.
16 September 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
• For the ideal situation where the normally distributed and IID assumptions are valid,
ARL values have been tabulated for specified values ofδ , K, and H
16 September 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
You are producing a chemical that
contains a small amount of
component Y that is important in the
use of the chemical. In addition,
you know that the optimum
concentration for that component is
0.16 wt. %. You want to control the
manufacturing process as close to
that as possible. You take a sample
every batch. The results for the last
25 batches are shown below.
https://www.spcforexcel.com/knowledge/variable-control-charts/keeping-process-target-cusum-charts
16 September 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Remember that our action limits are +/- 0.1116. This value
is exceeded by SH(23), our 23rd sample. This tells us that
the process has moved significantly off target and needs to
be adjusted to return the process to the target value.
16 September 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024
16 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Recap
• Introduction
• Normal and faulty operation
• Traditional Monitoring Techniques
• Quality Control Charts
• Shewhart control charts (for subgroup data)
• Shewhart control charts (for individual data)
• Limitations of Shewhart chart
• Western electric rules
• CUSUM chart
16 September 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
• where is a constant, 0 ≤ 𝜆 ≤ 1.
• The EWMA control chart consists of a plot of zi vs. i, as well as a
target and upper and lower control limits.
• Note that the EWMA control chart reduces to a Shewhart chart for
= 1.
• The EWMA calculations are initialized by setting z(0) = T.
• If the measurements satisfy the IID condition, the EWMA control
limits can be derived.
16 September 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
16 September 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
16 September 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Multivariate process
monitoring
Kourti, T., & MacGregor, J. F. (1995). Process analysis, monitoring and diagnosis, using multivariate projection methods. Chemometrics and intelligent laboratory systems, 28(1),
16 September 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024
15 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Revision
16 September 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Revision
16 September 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Revision
CUSUM (Tabular)
16 September 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Revision
EWMA
16 September 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Function controlchart
'charttype' The name of a chart type chosen from among the
following:
'xbar' X-bar or mean
's' Standard deviation
'r' Range
'ewma' Exponentially weighted moving average
'i' Individual observation
'mr' Moving range of individual observations
'ma' Moving average of individual observations
'p' Proportion defective
'np' Number of defectives
'u' Defects per unit
'c' Count of defects
Alternatively this parameter can be a cell array listing
multiple compatible chart types. There are four sets
of compatible types: XBAR, S, R, and EWMA;
16 September 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example 17.2
16 September 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
MATLAB Use
controlchart(x,'charttype',{'s'});
16 September 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
controlchart(x,'charttype',{'xbar'});
16 September 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Shewhart Chart
controlchart(runout,'rules','we2');
16 September 2024 17
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
16 September 2024 18
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
CUSUM
16 September 2024 19
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
EWMA
controlchart(x,'charttype',{'ewma'});
16 September 2024 20
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024
21 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Kourti, T., & MacGregor, J. F. (1995). Process analysis, monitoring and diagnosis, using multivariate projection methods. Chemometrics and intelligent laboratory systems, 28(1),
16 September 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Multivariate Statistical
Techniques
• For example, ten or more quality variables are typically
measured for synthetic fibers.
Although applying univariate control charts to each
individual variable is a possible solution, we will see that
this is inefficient and can lead to erroneous conclusions
For these situations, multivariable SPC techniques can
offer significant advantages over the single-variable
methods discussed in.
• In the statistics literature, these techniques are referred to
as multivariate methods, while the standard Shewhart
and CUSUM control charts are examples of univariate
methods.
16 September 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Seborg, D. E., Edgar, T. F., Mellichamp, D. A., & Doyle III, F. J. (2016). Process dynamics and control. John Wiley & Sons.
16 September 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
AI, Data Science, and Statistics Statistics and Machine Learning Toolbox
Descriptive Statistics and Visualization Statistical Visualization
gscatter
biplot
16 September 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Control charts for inner (x1) and outer (x2) bearing diameters
Montgomery, D. C. (2019). Introduction to statistical quality control. John wiley & sons.
16 September 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Examples
16 September 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
http://depts.washington.edu/control/LARRY/TE/download.html#updated_TE_code
16 September 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Descriptions of
process faults in
TE process
16 September 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024
16 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Contents
Hotelling’s T2 chart
Multivariate EWMA
PCA
ICA
Clustering
16 September 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Bivariate case
A control ellipse for two independent variables A control ellipse for two dependent variables
16 September 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
16 September 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
16 September 2024
11 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Contents
Hotelling’s T2 chart
Multivariate EWMA
PCA
ICA
Clustering
2 October 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
UCL = 𝜒 2𝛼,2 where 𝜒 2𝛼,2 is the upper 𝛼 percentage point of the chi-square
distribution with 2 degrees of freedom
2 October 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
2 October 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
2 October 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Multivariate EWMA
2 October 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Multivariate EWMA
2 October 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Multivariate EWMA
The MEWMA is a logical extension of the univariate EWMA
and is defined as
where 0 ≤ ≤ 1 and Z0 = 0
The quantity plotted on the control chart is
the covariance matrix
2 October 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Multivariate EWMA
2 October 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
Bivariate normal distribution data (Lowry et al 1992)
-0.12 0.06
-0.10 0.14 0.49
-0.25 0.17 0.82
-0.20 0.20 2.55
2.26
-0.09 0.10
0.51
0.00 0.19 0.76
-0.03 0.40 3.54
0.04 0.53 5.62
0.19 0.64 7.66
14.48
0.32 0.88
Lowry, C. A., Woodall, W. H., Champ, C. W., & Rigdon, S. E. (1992). A multivariate exponentially
weighted moving average control chart. Technometrics, 34(1), 46-53.
2 October 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
2 October 2024
14 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Contents
Hotelling’s T2 chart
Multivariate EWMA
PCA
ICA
Clustering
2 October 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
UCL = 𝜒 2𝛼,2 where 𝜒 2𝛼,2 is the upper 𝛼 percentage point of the chi-square
distribution with 2 degrees of freedom
2 October 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
2 October 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Multivariate EWMA
The MEWMA is a logical extension of the univariate EWMA
and is defined as
where 0 ≤ ≤ 1 and Z0 = 0
The quantity plotted on the control chart is
the covariance matrix
2 October 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Multivariate EWMA
2 October 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
Control chart on pH and Viscosity data
2 October 2024 10
BITS Pilani, Pilani Campus
ET ZC362 Environmental Pollution Control
https://www.spcforexcel.com/knowledge/variable-control-charts/hotelling-
t2-control-chart/
2 October 2024
12 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Contents
Hotelling’s T2 chart
Multivariate EWMA
PCA
ICA
Clustering
2 October 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Teaching on board
2 October 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
2 October 2024
7 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Variables Description
Ci Inlet concentration
Ti Input reactor temperature
C Output concentration
T Output reactor temperature
Qc Coolant flow rate
Tci Coolant inlet temperature
Tc Coolant outlet temperature
2 October 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
2 October 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
2 October 2024
8 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Basic statistics
Central tendency
Dispersion
Variance, St dev, Covariance, Correlation
Distribution
Probability distribution
Continuous
Normal
Discrete
2 October 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Missing values
– Deletion
– Replacement
Outliers
Univariate
|xk-µ| > 3
|xk-med| > 3×1.483MAD
2 October 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Outliers
Multivariate
– MD
– MVT
– MCD
– MVE
2 October 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Dimensionality reduction
Variable selection
Correlation
Similarity
Forward selection
Backward selection
Variable transformation
PCA, SVD, LDA, ICA
Data scaling
2 October 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Data preprocessing
Dimensionality
reduction
• Variable selection Normalization
• Variable
transormation
2 October 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
2 October 2024
9 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Normal/abnormal operation
Shewhart control chart
(for process mean)
2 October 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
2 October 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
CUSUM (Tabular)
EWMA
2 October 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Unsupervised techniques
(Multivariate process monitoring)
Hotelling’s T2 chart
UCL = 𝜒 2𝛼,𝑝
Multivariate EWMA
PCA
2 October 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
2 October 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
2 October 2024
9 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
PCA (Assumptions)
PCA (Modifications)
4 November 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
PCA (Modifications)
Non-linear process
– Associative network
– Neural network using principal curve
– IT-net
– Kernel PCA
4 November 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Kernel PCA
4 November 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Kernel PCA
Müller, K. R., Mika, S., Tsuda, K., & Schölkopf, K. (2018). An introduction to kernel-based learning algorithms.
IEEE Transactions on Neural Networks, 12, 2001, 181-201.
4 November 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Kernel PCA
Kernel PCA
https://www.cs.mcgill.ca/~dprecup/courses/ML/Lectures/ml-
lecture13.pdf
4 November 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Kernel PCA
𝑑
Polynomial kernel: 𝒌 𝒙𝒊 𝒙𝒋 = 𝒙𝒊 . 𝒙𝒋 + 𝟏
Sigmoid kernel: 𝒌 𝒙𝒊 𝒙𝒋 = 𝑡𝑎𝑛𝒽 𝛽0 𝒙𝒊, 𝒙𝒋 + 𝛽1
1
Inverse multiquadratic kernel: 𝒌 𝒙𝒊 𝒙𝒋 =
2
𝒙𝒊 −𝒙𝒋 +𝑑 2
4 November 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
ICA
4 November 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
𝑋 = 𝐴𝑆 + 𝑌 𝑋𝜖 𝑅𝑛𝑥𝑟
4 November 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Palla, G. L. P., & Pani, A. K. (2023). Independent component analysis application for fault detection in process industries: Literature
review and an application case study for fault detection in multiphase flow systems. Measurement, 209, 112504.
4 November 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Applications
ICA application on multiphase flow system
Palla, G. L. P., & Pani, A. K. (2023). Independent component analysis application for fault detection in process industries: Literature review
and an application case study for fault detection in multiphase flow systems. Measurement, 209, 112504.
4 November 2024 17
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Applications
KPCA application on Biological WWTP
(a) PCA monitoring charts, and (b) KPCA monitoring for the case of a linear decrease
in the nitrification rate (benchmark example).
Lee, J. M., Yoo, C., Choi, S. W., Vanrolleghem, P. A., & Lee, I. B. (2004). Nonlinear process monitoring using kernel
principal component analysis. Chemical engineering science, 59(1), 223-234.
4 November 2024 18
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 19
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Ge, Z., Song, Z., Ding, S. X., & Huang, B. (2017). Data mining and analytics in the process industry: The role of machine learning. Ieee
Access, 5, 20590-20616.
4 November 2024 20
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024
21 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Introduction
An unsupervised machine learning task that automatically
divides the data into clusters (groups of similar items)
Techniques for finding subgroups, or clusters, in a data set
on the basis of the characteristics of the objects
Clustering enables a large set of diverse and varied data to
be represented in a smaller number of groups (reduces
complexity)
As a stand-alone tool to get insight into data distribution
As a preprocessing step for other algorithms
Done using a trial and error approach
4 November 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Proximity measures
Distance
Dij is the distance between objects i and j
Properties of distance measure:
Data types quantitative and categorical
Quantitative data
Minkowski distance
Euclidean, city block (Manhattan), Chebyshev (maximum
value)
4 November 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Proximity measures
Example
Two samples have values of (0,3) and (5,8). Compute the
distance between the two samples
4 November 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Proximity measures
Cosine similarity
Used to measure similarity between objects
Measures the cosine of the angle between two vectors
projected in a multi-dimensional space
Two samples have values of (1,1,0) and (0,1,1). Compute
the cosine similarity between the two samples
4 November 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Techniques
• Partitioning methods
• Hierarchical methods
Creates hierarchical structure through decomposition
• Density-based methods
Identification of dense regions are the basis of cluster
formation
Useful in case of arbitrarily shaped clusters
• Grid-based methods
• Probabilistic model based methods
4 November 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Partitioning methods
4 November 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
k-means
4 November 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Step 1:Select K points in the data space and mark them as initial
centroids
loop
Step 2: Assign each point in the data space to the nearest
centroid to form K clusters
Step 3: Measure the distance of each point in the cluster from
the centroid
Step 4: Calculate the Sum of Squared Error (SSE) to measure
the quality of the clusters
Step 5: Identify the new centroid of each cluster on the basis of
distance between points
Step 6: Repeat Steps 2 to 5 to refine until centroids do not
change
end loop
4 November 2024 14
BITS Pilani, Pilani Campus
( t - 1)
- N(kkF315
x(1i )CHE t - 1)
x(i ) - l(Learning
<Machine t - 1)
for Chemical Engineers
å x (i )
, k = 1, , K
Summary of K-means
N k(t - 1) i =1
clustering algorithm
Step 1: Initialize X, K, 1(0), …, K(0). Set t = 1.
Step 2: Classify N samples according to nearest k:
x(i) Î k cluster if
Identify Nk(t–1); k = 1, …, K.
Step 3: Recompute k:
N k( t - 1)
k(t) = 1
N k(t - 1)
å x(i ) , k = 1, ,K
i =1
4 November 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
1 2 4
2 4 6
3 6 8
4 10 4
5 12 4
4 November 2024 17
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Disadvantage
4 November 2024 18
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
k-medoids
4 November 2024 19
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
PAM Algorithm
Step 1: Randomly choose k points in the data set as the initial
representative points
loop
Step 2: Assign each of the remaining points to the cluster which
has the nearest representative point
Step 3: Randomly select a non-representative point o in each
cluster
Step 4: Swap the representative point o with o and compute the
new SSE after swapping
Step 5: If SSEnew < SSEold, then swap o with o to form the new
set of k representative objects;
Step 6: Refine the k clusters on the basis of the nearest
representative point. Logic continues until there is no change
end loop
4 November 2024 20
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024
21 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Techniques
Partitioning methods
– K-means/K-medoid
– Fuzzy K-means
– Mixture of Gaussian
– Spectral
Hierarchical methods
– Bottom up – Agglomerative
– Top down - divisive
Density-based methods
– DBSCAN
– OPTICS
Grid-based methods
Probabilistic model based methods
4 November 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
800 1.8
700 1.4
500 1.5
4 November 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
k means clustering
4 November 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
k-medoids
4 November 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
PAM Algorithm
Step 1: Randomly choose k points in the data set as the initial
representative points
loop
Step 2: Assign each of the remaining points to the cluster which
has the nearest representative point
Step 3: Randomly select a non-representative point o in each
cluster
Step 4: Swap the representative point o with o and compute the
new SSE after swapping
Step 5: If SSEnew < SSEold, then swap o with o to form the new
set of k representative objects;
Step 6: Refine the k clusters on the basis of the nearest
representative point. Logic continues until there is no change
end loop
4 November 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Industrial applications
4 November 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Industrial applications
4 November 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Point 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Coordinat 2,10 2,6 11, 6,9 6,4 1,2 5,1 4,9 10, 7,5 9,1 4,6 3,1 3,8 6,11
e 11 0 12 1 0
4 November 2024 17
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024
18 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Introduction
Classification Regression
• K-NN • Linear
• Decision tree • Non-linear
• Random forest • ANN
• SVM • SVM
4 November 2024 4
BITS Pilani, Pilani Campus
Linear regression
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Regression
4 November 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
b0 =
å y - b åx1
4 November 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Assumptions of regression
Linearity
– The relationship between X and Y is linear
Independence of Errors
– Error values are statistically independent
– Particularly important when data are collected over a
period of time
Normality of Error
– Error values are normally distributed for any given value
of X
Equal Variance (also called homoscedasticity)
– The probability distribution of the errors has constant
variance
4 November 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Residual analysis
4 November 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024
17 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
4 November 2024 3
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024
4 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Introduction
Classification Regression
• K-NN • Linear
• Decision tree • Non-linear
• Random forest • ANN
• SVM • SVM
4 November 2024 4
BITS Pilani, Pilani Campus
Linear regression
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Recap
𝑌 = 𝛽𝑥 + 𝜀
4 November 2024 6
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Assumptions of regression
Linearity
– The relationship between X and Y is linear
Independence of Errors
– Error values are statistically independent
– Particularly important when data are collected over a
period of time
Normality of Error
– Error values are normally distributed for any given value
of X
Equal Variance (also called homoscedasticity)
– The probability distribution of the errors has constant
variance
4 November 2024 7
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
When to apply:
• Errors should be independent, normal and randomly distributed
• Variance of errors should be constant
4 November 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Polynomial regression
4 November 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Model accuracy
Graphical Statistical
• Standard Regression
• Dimensionless
• Error index
4 November 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Standard Regression
4 November 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Standard Regression
4 November 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Graphical
Slope and y-Intercept
1 and 0
4 November 2024 15
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Error Index
N
Mean of squared errors (MSE) y ˆ
y 2
i 1
N
N
4 November 2024 16
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Error Index
• RMSE and MAE values less than half of the standard deviation of the measured
data are acceptable
• RMSE and MAE both have the units of difference between actual and predicted
value, for any model, the RMSE value is higher than MAE value because large
errors get amplified because of squaring phenomenon.
• MAE has been stated to be a better performance measuring criterion than the
RMSE
4 November 2024 17
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024
18 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Introduction
Classification Regression
• K-NN • Linear
• Decision tree • Non-linear
• Random forest • ANN
• SVM • SVM
4 November 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
https://in.mathworks.com/discovery/overfitting.html
4 November 2024 6
BITS Pilani, Pilani Campus
Linear regression
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Linear Regression
𝑌 = 𝛽𝑥 + 𝜀
4 November 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Linear Regression
4 November 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Linear Regression
Built-in function polyfit
MATLAB has a built-in function polyfit that fits a least-
squares nth order polynomial to data:
– p = polyfit(x, y, n)
• x: independent data
• y: dependent data
• n: order of polynomial to fit
• p: coefficients of polynomial
f(x)=p1xn+p2xn-1+…+pnx+pn+1
MATLAB’s polyval command can be used to compute a
value using the coefficients.
4 November 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Example
X = [1 2 4 5 7 9 11 13 14 16]
Y = [101 105 109 112 117 116 122 123 129 130]
Use the built-in function Use MATLAB’s left divide operator
4 November 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024
15 BITS Pilani, Pilani Campus
Machine Learning for Chemical Engineers
CHE F315
Introduction
Classification Regression
• K-NN • Linear
• Decision tree • Non-linear
• Random forest • ANN
• SVM • SVM
4 November 2024 4
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 5
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
https://in.mathworks.com/discovery/overfitting.html
4 November 2024 6
BITS Pilani, Pilani Campus
Linear regression
BITS Pilani
Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Linear Regression
𝑌 = 𝛽𝑥 + 𝜀
4 November 2024 8
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Linear Regression
4 November 2024 9
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 10
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 11
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
>> plot(x1,x2,'.')
xlabel('x1')
ylabel('x2')
4 November 2024 12
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 13
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
4 November 2024 14
BITS Pilani, Pilani Campus
CHE F315 Machine Learning for Chemical Engineers
Compare
4 November 2024
16 BITS Pilani, Pilani Campus