0% found this document useful (0 votes)
27 views39 pages

Industrial Data Science

industrial data scinece piece

Uploaded by

kennethwong45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views39 pages

Industrial Data Science

industrial data scinece piece

Uploaded by

kennethwong45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Reaction

Chemistry &
Engineering
View Article Online
REVIEW View Journal | View Issue

Industrial data science – a review of machine


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

Cite this: React. Chem. Eng., 2022, 7,


learning applications for chemical and process
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

1471 industries†
Max Mowbray, a Mattia Vallerio, b Carlos Perez-Galvan, b
Dongda Zhang, ac

Antonio Del Rio Chanona c and Francisco J. Navarro-Brull *bc

In the literature, machine learning (ML) and artificial intelligence (AI) applications tend to start with examples
Received 1st December 2021, that are irrelevant to process engineers (e.g. classification of images between cats and dogs, house pricing,
Accepted 21st February 2022
types of flowers, etc.). However, process engineering principles are also based on pseudo-empirical
DOI: 10.1039/d1re00541c
correlations and heuristics, which are a form of ML. In this work, industrial data science fundamentals will
be explained and linked with commonly-known examples in process engineering, followed by a review of
rsc.li/reaction-engineering industrial applications using state-of-art ML techniques.

1 Introduction teams, software, and infrastructure for the past years.1–3 Trying
to mimic big technological companies whose profit is
The potential of data-driven applications in industrial processes determined by better data-driven decisions than random ones
has encouraged the industry to invest in machine learning (e.g. recommending films to watch or advertisements), process
industries need to deal with the safety of such
a recommendations in a physical setting (rather than virtual) and
The University of Manchester, Manchester, M13 9PL, UK
b
Solvay SA, Rue de Ransbeek 310, 1120, Brussels, Belgium. the inevitable challenges imposed by the physicochemical and
E-mail: francisco.navarro@solvay.com engineering constraints.4–6 In the same spirit of mimicking big
c
Imperial College London, London, SW7 2AZ, UK tech companies, the IT challenge focuses on the cost,
† Electronic supplementary information (ESI) available: Annex I illustrates how complexity, and security risk of moving process data to the
to use machine learning to find meaningful correlations between several sensors
cloud when in reality its majority is needed mainly locally.7 On
(tags). Annex II describes sources of uncertainty in more detail. Annex III
provides a glossary for machine learning terms. See DOI: 10.1039/d1re00541c the other hand, chemical companies are continuously looking

Max Mowbray is a Chemical Mattia Vallerio graduated from


Engineering PhD student. He Politecnico di Milano in
completed undergraduate study Chemical engineering in 2010.
at the University of Birmingham, Afterwards, He moved to Belgium
where he was fortunate to where he was awarded a
develop perspective of a wide personal grant to persue his Ph.
range of research opportunities D. @KU Leuven in multiobjective
in healthcare and energy. By the optimization of (bio)chemical
end of undergraduate study, he processes. After that he joined
had cultivated a desire to BASF Antwerp, first as APC
positively contribute to industrial engineer and then as advanced
transformation, which naturally data analytics lead. In this role
Max Mowbray led to pursuit of further study. Mattia Vallerio he kick-started the industrial
Currently, he is undertaking data science field within BASF
postgraduate research at the University of Manchester, where he and he was at the fore-front of the site digital transformation. He
focuses on the development of data-driven methods for modelling recently joined Solvay as Advanced Process Control specialist. The
and optimization of (bio)chemical process systems. His research focus of his work is on control and optimization of chemical
extends from systems modelling to decision-making under processes.
uncertainty.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1471
View Article Online

Review Reaction Chemistry & Engineering

at how to improve the environmental sustainability of their (AlphaFold) for the task of protein structure prediction was
processes by better monitoring (maintenance) as well as yield recently proposed at CASP14, which was able to predict test
and energy optimization. This begs the question; what are the protein structures with 90% accuracy. The solution could
machine learning applications that have worked so far in this potentially provide a basis for future medical breakthroughs.9
Industry 4.0 revolution? What are the biggest challenges the Similar breakthroughs have been made in short-term weather
industry is facing? prediction.10 Current hardware and telecommunications cost,
From a historical perspective, after the 1980s and 1990s, a as well as access to powerful software (either proprietary or
new wave of technological innovations reflected by open-source), has undoubtedly lowered the barriers to the
developments such as expert systems and neural networks realization of such advances. However, it is not trivial to
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

promised to revolutionize the industry.8 Recently, balance the value and the cost-complexity of developing a
applications long marked as ‘grand challenges’ have observed reliable machine learning solution, which can be trusted and
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

significant breakthroughs. For example, a solution maintained in the long term. Thus, are these ML solutions

Carlos Perez Galvan is an Dr. Dongda Zhang is a University


industrial data scientist at Lecturer at the Department of
Solvay in Belgium. Currently, he Chemical Engineering, the
focuses on the solution of University of Manchester, and an
optimal scheduling and utility Honorary Research Fellow at the
network problems using machine Centre for Process Systems
learning and process system Engineering, Imperial College
engineering methods. His career London. He holds BSc degree
in fast-moving consumer goods (2011) from Tianjin University,
and manufacturing companies MSc (Distinction) degree (2013)
has lead him to develop practical from Imperial College London,
expertise in the fields of and PhD degree (2016) from the
Carlos Perez-Galvan modeling, simulation, Dongda Zhang University of Cambridge. He
optimization and machine currently leads research in
learning. During his PhD studies at University College London, Process Systems Engineering and Machine Learning at the Centre
Department of Chemical Engineering (CPSE), he tackled the for Process Integration, the University of Manchester. His expertise
problem of uncertainty in nonlinear dynamic systems. He includes developing industrially-focused data-driven and hybrid
graduated from Universidad Autonoma de Coahuila in Mexico as models for chemical and biochemical process simulation,
chemical engineer in 2012. optimisation and upscaling.

Head of the Optimisation and Industrial data scientist, Imperial


Machine Learning for Process College London visiting
Systems Engineering group at researcher, and chemical
Imperial College London. engineer working at Solvay, a
Received his MEng from UNAM Belgian company providing
in Mexico, and his PhD from the advanced materials and specialty
University of Cambridge which chemicals worldwide. Francisco
received the Danckwerts- Navarro has been leading the
Pergamon Prize as the best optimization and trouble-
doctoral thesis of his year. shooting of process engineering
Received the EPSRC fellowship to problems using machine learning
adopt automation and intelligent on top of advanced process
Antonio Del Rio Chanona technologies into bioprocess Francisco J. Navarro-Brull control. His Ph.D. focused on the
scaleup and industrialization modeling and simulation of
and has received awards from the International Federation of multiphase-flow sonoreactors, on which he holds two patents. He
Automatic Control (IFAC), and the Institution of Chemical also visited Prof. Jensen Lab at MIT (USA) and co-created
Engineers (IChemE) in recognition for research in process systems CAChemE.org, an open-source ChemE community based at the
engineering, industrialisation of bioprocesses, and adoption of University of Alicante (Spain).
intelligent and autonomous learning algorithms to chemical
engineering.

1472 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

really needed in the process industries? Or are we sometimes models (e.g. regularization and other penalized methods in
reinventing the wheel without knowing? machine learning), should be adopted. To better understand
There is a common consensus in the literature4,8,11 that these similarities, let us revisit the main types of machine
addresses how: learning: supervised, unsupervised, and reinforcement learning.
• applying machine learning techniques without the
proper process knowledge leads to correlations that can be
either obvious or misleading. 2.1 Supervised models
• data science training for engineers can be more effective If the desired output or target is known (labeled) or measured,
than educating data scientists in engineering topics.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

the problem is defined as a type of categorical, discrete, or


The second point might be surprising, but process continuous regression. For instance, the estimation of heat- and
engineering principles were based on empirical correlations
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

mass-transfer coefficients during chemical reactor design13 can


and rules-of-thumb in the past.4 And yet, main resources in be seen as a supervised model that predicts the output based
the literature for machine learning tend to provide examples on a non-linear continuous fitting (see Fig. 2). Traditional
that are irrelevant to process engineers. The novelty of this pseudo-empirical correlations reduce the dimensionality of the
review is to explain the fundamentals of machine learning problem to a few relevant, dimensionless variables. In machine
with commonly-known examples in process engineering, learning, variable selection based on the variability towards the
followed by a wide range of industrial applications, from target and feature engineering can achieve the same result.
simple to state-of-art. Notice that the dominant physics and range of operating
conditions are always given in pseudo-empirical models. The
2 Machine learning and process risk of extrapolation errors due to a change of the flow regime,
systems engineering: the intuition for example, is a problem that limits the application of machine
learning as well. In addition, a purely data-driven approach has
Given the high cost of generating data during the design and the risk of overfitting when data split favors interpolation (e.g.
optimization of processes, science and engineering are built random split), as these highly non-linear approximation
on first-principle model equations and statistical methods functions can easily capture the noise of the training
(e.g. design of experiments with a surface response data set.
approach12). In this way, initial designs can be performed The benefits of combining machine learning with physics
with preliminary calculations for sizing, fine-tuned with first- have proven to improve model accuracy and interpretability.15
principle simulations, and validated with a minimum In this context, machine learning has also been commonly
number of prototypes and experiments. Contrarily, machine applied to explain the differences between first-principle
learning assumes having access to a vast amount of data, models and the real plant and the real process (a.k.a.
with enough variability, to capture all the interactions within discrepancy models).16
an empirical model (Fig. 1).
In reality, practical applications of machine learning borrow
many of the ideas used in traditional methods, as the
assumption of vast and information-rich data usually falls short.
For example, the hypothesis when using machine learning is to
utilize the abundance of data to avoid overfitting, so that
models generalize. However, as with traditional methods, the
concept of parsimony, i.e., the common practice to favor simpler

Fig. 2 Examples adapted from the literature where a non-linear model


is fitted using a pseudo-empirical approach. Notice how dimensionless
numbers (Re, Pr, Nu, etc.,) achieve similar results to those techniques in
unsupervised machine learning, namely: feature selection, feature
engineering, and dimensionality reduction. The risk of extrapolation has
Fig. 1 Contrary to the traditional approach, where first principles always been present in pseudo-empirical correlations, as models are
models are used, machine learning fits empirical models using specific to similar systems and operating conditions (the same
experimental data (training data). A proper data split is necessary to applies to data distributions in ML). Adapted from ref. 14 with
introduce the right amount of model complexity and avoid overfitting. permission.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1473
View Article Online

Review Reaction Chemistry & Engineering

2.2 Unsupervised models data-driven techniques are being used to discover and predict
Instead of predicting a label or a measurement, the desired flow patterns (see Fig. 4) in microfluidic applications,19 as well
outcome of these models is to identify patterns or groups as turbulent and porous flows.20,21
which remained previously unknown. More generally in process engineering, dimensionality
The simplest form of an unsupervised model is, for example, reduction naturally occurs with redundancy or excess of sensors
a control chart (see Fig. 3). In statistical process control, as well. For example, if several thermocouples are used to
measurements are categorized into two groups (in-control or measure a critical temperature, these can be summarized by
out-of-control) by tracking how distant they are from the taking the average of all the sensor readings. The average is a
linear combination of all these terms with equal weight. This
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

statistical model. No output is required during the training/


fitting, while the information (or dimensionality) is reduced way, the information is being reduced to one latent variable, the
temperature we want to monitor. If a big variation exists
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

from several samples to a simpler model with two statistics, in


the simplest case, an average and its standard deviation. Flow between the average of the sensors and one thermocouple, in
maps achieved a similar goal as different fluid-dynamics particular, an alert can be triggered. This reasoning22 is the
patterns were discovered and grouped together via the similarity same behind principal component analysis (PCA), and it has
observed during experimentation.17,18 Classical dimensionless been widely used for multivariate process analysis, monitoring,
numbers (see Fig. 2) normalize inertial, viscous, thermal- and and control.23–26
mass-transfer magnitudes. In machine learning terminology,
the use of these will be called feature selection (only relevant 2.3 Reinforcement learning
variables are used), feature engineering (non-linear
transformations as ratios and products are calculated), and Up to this point, examples given have assumed there is
dimensionality reduction (lower number of variables to project already data with enough variability for the purpose of
the data and make it easier to find patterns). In this regard, estimation (model construction). However, it is often the case
that a process will vary in time depending on the dynamics
and control strategy implemented. For example, a PID
controller is a feedback control loop that does not require
any data (or model) to start (of course, the control
performance will be very poor without a proper tuning of the
parameters, however). Reinforcement learning (RL) is a type
of machine learning method applied in the context of
sequential decision-making under uncertainty (e.g. process
control and optimization). As with the PID controller where
the objective is to minimize the present, past, and immediate
future error between the setpoint and the process variable;
Fig. 3 Control charts are a form of unsupervised models where only
RL requires the definition of a reward function. Tuning the
input training data is given and a statistical model is built (mean and
standard deviation). Both univariate control charts (a and b) or
PID parameters can be done by trial and error through a
multivariate using principal component analysis (c) can classify data combination of the user and the controller (policy), or by
points as in or out of control. various tuning methods and heuristics. In RL (Fig. 5), a
similar process of controller tuning is conducted through
either simulation of an approximate process model, or from
process data, by a set of methods known broadly as
generalized policy iteration. Other heuristic approaches such
as apprenticeship27,28 and transfer learning29 may also be used
to identify the tuned controller.
Being a data-driven approach, RL provides more flexibility
to learn non-linear, non-deterministic, more complex, and

Fig. 4 The discovery of flow maps where different fluid-dynamic


regimes were grouped can be seen today as an unsupervised model. Fig. 5 Reinforcement learning can be described30 as a method to
Adapted from ref. 17 with permission. tune, enhance or substitute traditional control systems.

1474 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

multiple input and output behaviors. The similarities


between RL and advanced process control such as model
predictive control or iterative learning control have been
covered in the literature.29,31,32 Despite its potential, RL is
not exempt from open challenges, including guaranteeing
constraints, interpretability, and safety of the operations. This
is covered more in detail later in this review.

3 Industrial applications in
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

manufacturing
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Oil and gas, chemical, and manufacturing industries store


instrumentation and control data in what is known as
operational historians. These time-series databases and their
corresponding software collect, historize and utilize the
streaming data from each sensor and actuator, which is
commonly known as ‘tag’ as those physically placed to identify
Fig. 7 Industrial data science workflow based on the IBM cross-
them at the plant level. Operational historians are usually at
industry standard process for data mining (CRISP-DM).
level 3 in the hierarchical view of automation infrastructures33
for the ANSI/ISA-95 (see Fig. 6). Sensors and actuators in the
field are operationalized by programmable logic controllers
(PLC) and/or distributed control systems (DCS). Supervisory understood. Diagnostics correspond to the beginning of any
control and data acquisition (SCADA) software is often industrial application (see Fig. 7). Industrial data science can
complemented with manufacturing execution systems (MES) accelerate the process of discriminating what are the tags
that historized this operational data. Enterprise resource (sensors) that can help explain the problem while capturing
planning (ERP) data drives transactions and decisions that nonlinearities via data-driven modeling techniques (see
occur at a higher response time (months to years).33 Machine Fig. 9). The general idea is always to perform simpler, more
learning takes advantage of this vast amount of historical data interpretable, tree-based models for screening followed by
for the following industrial applications: condition or predictive more complex modeling techniques such as neural networks.
monitoring, quality prediction, process, control optimization, Partition models (also known as decision trees) are common
and scheduling.6 Before implementation and industrialization, for screening, as they can handle tags with different units,
a diagnostic study is often conducted (see Fig. 7 and 8). Utilizing the presence of missing values, and outliers while uncovering
ML to accelerate the understanding and discovery of the root non-linear relationships. Tree-based models create simple if-
cause, which perhaps does not need a complex solution to be then logics via data partitions that can better explain the
corrected. target. As the model grows in complexity, a better fit is
obtained (i.e. higher number of splits or depth in the tree). A
bootstrap forest (also known as random forest34) consists of
3.1 Process understanding several of these trees that are generated by sampling the
In any process or control-related problem, there will be a dataset (a subset of tags and timestamps). Combining the
certain lack of information or wrong assumptions despite the average of the models, a more exhaustive list of potential tags
amount of data stored or knowledge available. During the (features) is obtained and ranked according to their feature
first phase, which can be called diagnostics, it is usually importance (see Fig. 9a). However, noise within the data can
common to iterate through several data and modeling steps be also captured. Random numbers with several types of
until the problem and potential solution are better distributions (e.g. normal, uniform…) or the target time-
shuffled can be intentionally added as model parameters.
This technique35,36 is used as a cut-off and allows better
separation between signal and noise, as well as the creation
of simple tree models (Fig. 9b). Once the data-set is better
understood and prepared, neural networks (Fig. 9c) are used
to capture higher order of non-linear interactions among
tags. To better illustrate common techniques in this iterative
workflow, an example is provided in the annex adapting the
open-source column distillation data set [Kevin Dunn,
learnche.org (CC BY-SA)]. The analysis has been obtained
Fig. 6 Simplified hierarchical view of automation infrastructures in the with commercial software [JMP Pro (SAS Institute Inc.)], while
standard ANSI/ISA-95. all the methods are accessible via other open-source libraries

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1475
View Article Online

Review Reaction Chemistry & Engineering


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Fig. 8 Classification of industrial data applications where offline analysis is commonly conducted to diagnose the problem being addressed, with
the solution later implemented online.

Fig. 9 Common modeling steps using an industrial data set with hundreds of tags and a well-defined target (e.g. yield of the process). First, a
screening of variables and selection of tags (sensors) using random forest (a). Many tags will end up being weakly correlated to the target, perhaps
trying to explain its noise. By adding known noise as an additional tag(s), the selection of tags with a certain contribution is facilitated. Then, a
decision tree to obtain a robust non-linear but interpretable model (b). And finally, neural networks (c) once data is cleaned and better understood
to capture all the non-linearities present in the data.

[scikit learn,37] as well. The working principle of these is common to find measured disturbances or manipulated
modeling techniques needs to be understood to avoid variables higher in the contribution. With chemical processes
common mistakes when dealing with time-series data. For designed to keep critical process variables under control,
example: inexperienced analysts will fail to interpret supervised and
• To interpret the contribution of the predictors as unsupervised analysis based on variability (e.g. the cooling
important towards the process design or process control. For flow rate in a jacketed reactor is more important than reactor
example, the design of a reactor impeller might be critical in temperature itself, which is always constant).
explaining the average quality of a product. However, if the • Not managing outliers, shutdowns, and other
impeller is not changed in operation, from a machine singularities in the data. As explained above, tree-based
learning perspective is not important at all. Contrarily, if the models are robust techniques for screening predictors as they
current consumed by the motor was changing due to an partition the data independently from its distribution. Yet,
increase/decrease of viscosity, then the current can appear as the predictors will try to explain the major sources of
a predictor. variability, which might be meaningless (e.g. shutdowns can
• Similarly, without considering the process knowledge be explained with pump current). The use of robust statistics
and process dynamics, it is likely to confuse correlated effects using, for example, medians or interquartile ranges instead
that can be consequences instead of causes. In this regard, it of averages and standard deviations, are a simple way to filter

1476 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

singular data events. However, outliers might carry crucial 3.1.1 Model interpretation and explainable AI. During
information as well (e.g. why the yield dropped at those diagnostics, machine learning models are primarily used as
specific times stamps). In this regard, gradient boosted trees screening tools to identify which inputs (tags) are affecting the
are an alternative as they increase the importance of those target of interest. For example, support vector machines (SVM)
points that could not be explained with prior models (see can also be used to improve process operations similarly to
section 3.3.1 for more discussion). decisions trees.39,40 Pragmatically, several models with their
• By default in most common algorithms, data samples tuning parameters can be fitted (known as autoML).41,42 What
are assumed to be independent of each other. This is still relevant is: what question to ask the data, how to avoid
assumption can be true if each sample contains information over-fitting, and the use of Explainable AI43 (data-driven
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

from batch-to-batch or during steady-state conditions. In the techniques to interpret what more complex ML models are able
majority of the cases, data pre-processing will be required to to capture, see Fig. 10 as an example). For example, resampling
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

remove periods where time delays, dead-times, lags, and inputs while maintaining their distribution (a.k.a. shuffling) will
other process dynamics perturbations affect the target have a measurable impact on the prediction results. Given the
temporarily.38 Section 3.4 will describe the applications of non-linear interactions in the model, the interpretation of
machine learning for dynamic systems and process control. multidimensional local perturbations requires high order
In any case, a proper time-split of the dataset between polynomials,44 or even tree-based models can be
training/validation/test is needed to decrease the risk of used to approximate the response of a higher complex model.
models that were useful in the past only (they only learned The latter approach, known as TreeSHAP (SHapley Additive
how to interpolate the data). exPlanations), has gained popularity in the ML community as it
is starting to be applied in manufacturing environments.45–48

3.2 Condition monitoring and digital twins


Often marketed as predictive maintenance, the goal is to
keep critical assets working as long as possible anticipating
the need for repairs (reliability increase and minimization of
unplanned stops). If the assets are operated until failure and
time-to-event is recorded, lifetime distributions and survival
analysis can be used for prediction instead. However, the
limitation when trying to apply this approach is that,
fortunately, these critical assets are designed and maintained
to avoid downtime failures. Therefore, a more reasonable
objective is what is called anomaly detection or condition
monitoring which promotes the early discovery or warning of
uncommon operations. Three main methods exist.
3.2.1 Data-driven approach: statistical or machine
learning. Instead of tracking time series data independently
in control charts, a common step is to monitor correlated
variables. What is important in this approach is to have
robust dimensionality reduction, clustering, and regression
methods in order to deal with potential outliers and
nonlinearities that are commonly found in the data sets (e.g.
planned shutdowns).
Dimensionality reduction techniques such as PCA, or PLS
in case of regression, have been widely used for multivariate
process analysis, monitoring, and control.23–26 Similarly, in
machine learning the basic idea is to create a model with
Fig. 10 The more traditional parallel coordinates plot (a) provides a
historical data—which is assumed the normal operation—so
multivariate data visualization of the distribution for different tags,
which can be ordered by contribution to the model and colored by the
an alert or anomaly will be triggered when something
target. In this example, pressure should be kept constant to achieve previously unseen happens. These models that learn the
higher targeted yields (yellow vs. blue color). A machine learning usual behavior of the asset are often marketed as digital
model can be used to approximate and visualize the conditional twins, which, if accurate enough, can later be used for
relationship between yield and a given predictor (b). SHAP values (c)
process optimization as well. From univariate control charts
combine the visualization for the direction of interest (higher or lower
values of the inputs in blue to magenta) but also their effect on the
to parallel coordinate plots (see Fig. 10a), current technology
target. For instance, the small impact of the synthetic noise parameters is able to provide these visualizations in interactive
(slope and SHAP value of shuffle yield in b and c, respectively). dashboards which can be updated regularly or in real-time.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1477
View Article Online

Review Reaction Chemistry & Engineering


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Fig. 11 A transition between two steady-state regimes for the Tennessee Eastman process (simulated data49) is detected using PCA. If the model
is built using historical data before the perturbation (a) the step changes in the feed flow of chemical A (b and c) are found in the current dataset
for the points highlighted in blue. If all of the historical data is used to build the model (d) the contribution of recent data points in blue (e) shows
signals close to random noise. The plant wide control in the simulation stabilizes the control loops and anomalies are only seen in the transition
period, even though the plant is operating in a different state for chemical A.

Although classical statistical process control methods are out Autoencoders are a type of neural network (see Fig. 13)
of the scope of this work, they should not be disregarded as a where the aim is to learn a compressed latent representation
powerful way to provide descriptive statistics that can ease of the input in the hidden layers. The amount of information
day-to-day decision-making in operations with little that these latent dimensions express is maximized by trying
technological effort. to recover the information given (notice that inputs and
For example, in Fig. 11 diagnostic plots for the PCA-based outputs in the neural network are the same in Fig. 13a). By
multivariate control chart identify a large step change in the restricting the neural network to a reduced number of
flow of a reactant into the reactor. This affects many variables intermediate nodes (i.e. latent dimensions), intrinsic and not
across the Tennessee Eastman process plant which are brought necessarily linear correlations are found in order to minimize
back to their original control limits, with the exception of the the prediction error (Fig. 13b). This way, the variability and
chemical A feed flow variable, where the step change was contribution in noisy inputs will only appear if a higher
introduced (details can be found in ref. 49 and 50). number of nodes is used (similar to having a higher number
The addition of machine learning analysis using, for of principal components). Reducing the number of
example, recent dimensionality reduction techniques, adds redundant sensors to look at while capturing the system
another layer of powerful visualizations that can enhance dynamics is a necessary step for realistic industrial data
monitoring activities. The reader is referred to Joswiak
et al.51 who recently published examples visualizing
industrial chemical processes both with classical approaches
(PCA and PLS) but also more recent and powerful techniques
in machine learning (UMAP52 and HDBSCAN,53,54
particularly). The main advantage of these state-of-art
techniques is the better separation (dimensionality
reduction) and classification (clustering) of events when
dealing with non-stationary multivariate processes (see
Fig. 12). However, if processes are under control, PCA/PLS-
based techniques provide faster, less complex, and more
interpretable insights (e.g. understanding variable
Fig. 12 A transition between two steady-state regimes for the
contributions for linearized systems). Isolation forests have Tennessee Eastman process (simulated data49) visualized with (a) PCA
also been explored in order to detect and explain sources of and (b) UMAP.52 UMAP is able to better reduce the number of tags into
anomalies in industrial datasets.55 two dimensions. The reader is referred to ref. 51.

1478 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

features and generating new synthetic data. While, the other


network, referred to as the discriminator (D), has the task to
correctly label the presented data (i.e. original vs. generated)
based on the data generated by the generator. A schematic
representation of this approach for time series data can be
seen in Fig. 15. See ref. 62 and 63 and references therein for
early applications of GANS to time-series data.
3.2.2 Model-driven approach. Traditionally, KPIs of critical
Fig. 13 Similar to PCA, autoencoders are neural networks (a) that assets are monitored by tracking their efficiency or
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

reduce the dimensionality of the data by restricting the number of throughput via energy or mass balances (see Fig. 16a). In
nodes in the middle layers. The transition between two feeding
machine learning terminology this is covered by the feature
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

steady-state regimes for a Tennessee Eastman process (simulated


data49,50) is captured (b) while noisy and redundant measures are engineering step which can be implemented using templates
discarded. for specific assets. A frequency analysis of rotary equipment,
for example, can be seen as another kind of model-based
approach as it provides fingerprints that are connected to the
applications,56–58 a topic we will cover in more detail later in performance of rotary machines, for example (see Fig. 16b).
this manuscript. 3.2.3 Network analysis approach. Process data contains
One important use of anomaly detection is to minimize sensor deviations and errors, known or unknown changes in
the risk of extrapolation in a regression model. This is a operating modes or shutdowns, etc. which make the task of
common problem if the model is to be utilized for simulation maintaining models online very challenging. Contextualizing
or optimization, where the combination of input values may information is crucial to minimize the number of false alerts
not be physically realizable. One approach shown in Fig. 14 and to increase the use of these tools for root-cause
is to use a regularized Hotelling T2, which can be used to analysis.64–68 An anomaly that propagates and diverges
find data-driven optimal values without the risk of through the process causes a higher priority set of alarms
extrapolation.59,60 First principle, energy and mass balance than those created by unusual operations. Graph analysis can
can be used as additional restrictions for this regard. Finally, be used in this regard69–71 to include the topology of the
generative adversarial networks (GANs) represent the most plant and the relations among operating units (see Fig. 17).
recent development in the field of data-driven anomaly This approach can cover the entire plant from anomaly
detection.61 GANs emerge from research in computer vision operations and reduce the number of false positives. This is
and image recognition where two competing neural networks a similar line of thinking to the use of knowledge graphs for
are pitted against each other. The first network the generator complex analyses, which are able to provide an integrated
(G), has the objective to capture the distribution of the input view of macro, meso- and microscale processes.72
dataset (in our case process data) by identifying relevant

3.3 Quality predictive models and inferential (or soft.)


sensors
In industry, quality measurements and KPIs are often
manually sampled and then analyzed in the lab. Machine
learning models can find process variables that correlate with
such measurements, where both causes or consequences can
be used to obtain an online estimation. Commonly known as
inferential sensors or software sensors, one can also describe
these models as semi-supervised learning since the majority
of process data does not contain the target (label) to predict
in the first place.
In these types of applications, a common mistake is to
rapidly discard consequences from the predictor list. For
example, when analyzing the quality of a granular product
(good if particles are a certain size or bad if particles are
smaller) one can easily find that the pressure drop in a
downstream filter appears as a predictor. While this is not
the root-cause of bad quality but rather a clear consequence,
Fig. 14 Scatterplot matrix showing historical data of a manufacturing
it can still be used for an online estimation increasing the
process where the optimal prediction point is shown without
extrapolation control (red) and with extrapolation control (green). A
amount of data to analyze to more than the available via lab
boosted neural network was previously trained to predict failure and analysis. There is this famous machine learning problem
quality. The reader is referred to ref. 60 for a detailed discussion. where the algorithm mistakenly learns how to classify images

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1479
View Article Online

Review Reaction Chemistry & Engineering


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Fig. 15 This figure shows a simplified schematic representation of training (a) and use (b) of generative adversarial networks (GANs) for anomaly
detection on time-series data. Generator (G) and discriminant (D) models are trained through iterations based on the performance feedback of the
D model. Both models compete until satisfactory performances are achieved. Then, D model can be used as an online classifier for anomaly
detection, bottom scheme.

Fig. 16 Compressor characteristic (a) and spectrogram (b) are two traditional approaches to detect inefficient or anomaly operating modes. These
calculations can be considered feature engineering to be combined with statistical or machine learning methods.

between huskies or wolves as a function of snow in the merits of monitoring melt viscosity, temperature profile, and
background.73 As with the snow, consequences are often flow index as indicators of product quality in the context of
stronger or simpler predictors than perhaps other features polymer processing. As a result, soft sensors may be
that process experts were listing as root-causes only. For this constructed to infer these qualities from other available
reason, soft sensor models need to be approached separately process measurements (such as screw speed, die melt
as their main objective is to only provide online estimation temperature, feed rates, and pressures) either via first
and monitoring of quality, yield, and lab measurements. principles, data-driven or hybrid modeling approaches.
As with other online sensors (e.g. NIR, near-infrared Hybrid- or grey-box models are commonly known in the
sensor), soft sensors require calibration and maintenance to literature.77–79 A combination of data-driven models with first-
ensure acceptable levels of accuracy and precision. In that principles models can remove variability or capture unknown
regard, several techniques exist to handle prior knowledge or mechanisms, e.g. discrepancy models.16 For example, if a heat
the lack of it (this being a form of uncertainty). An industrial or mass balance can foresee issues in quality or productivity,
example that illustrates the challenges when building soft- predictors that are part of these terms will be immediately
sensors for continuous processes can be found here.74 Its found. Simply removing them from the input list will not
analysis (as detailed in ref. 75) combines data preparation, change the variability on the target, so a better approach is to
anomaly detection, multivariate regression and model focus on explaining the residuals. For example, if an oscillation
interpretability, so far discussed in this manuscript. in the yield is found to be correlated to seasons due to better/
In this section, we will focus our discussion on estimating worse cooling in winter/summer, it will be better to remove
quality or yield for batch processes, which represents an such effect from the target (not from the list of inputs) and
additional challenge from the data analytics perspective. refocus the analysis on the remaining and unexplained
3.3.1 Discrepancy models and boosting. Consider that in a variability. This is what boosted tree models achieve in machine
production process, it is often desired to infer end-quality of learning (see Fig. 18), and the same approach can be used in
the product. For example, in ref. 76, the authors discuss the neural networks as mentioned in Fig. 14.

1480 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

A common approach is to summarize each batch using


statistics and process knowledge (peak temperature or its
average rate of change during the reaction phase). In the
literature, these are known as landmark points or
fingerprints (see Fig. 19), but it usually assumes we know
what are the important features to generate. Generalizing this
approach, one can calculate common statistics (average, max,
min, range, std, first, last, or their robust equivalent), for
every sensor during every phase, for every batch and grade.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

In auto-machine learning, this is known as feature


engineering so a final feature selection is made using only
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

the best predictors. Instead of trying to summarize the


information in statistical calculations that can aggregate and
dilute important information, functional principal
components analysis (FPCA) is a data-driven method to
capture the variation between a set of “functions”, such as
the profiles of temperature versus time for a set of batches.
With FPCA, the functions are decomposed into the mean
function and a series of “eigenfunctions” or functional
principal components (FPCs). Each original function can be
reconstituted as a combination of the mean function plus
some amount of each functional principal component. The
first step is to turn the semi-continuous data of the sensor
value at each timepoint for each batch into a continuous
function. This is done by fitting smoothing models, such as
Fig. 17 Process variables from a plant (a) contextualized using
splines, to create continuous functions. This means it is
directed graphs (b) to reduce the number of false alerts and infer
causality. Adapted with permission from ref. 70. possible to use both dense (observations are on the same
equally spaced grid of time points for all batches) and sparse
(batches have different numbers of observations and are
unequally spaced over time) functional data. Then a
functional principal components analysis is carried out.
FPCA is analogous to standard PCA in that it seeks to reduce
the data into a smaller number of components describing as
much information in the data as possible. FPCA finds a set
of component functions that explain the maximal amount of
variation in the observed functions. These component
functions can usually be interpreted as distinctive features
that are seen in the process for some batches (see Fig. 20).
For example, a temperature “spike” at a certain point in the
Fig. 18 By subsequently fitting the residuals of smaller trees, boosted
trees can be used as discrepancy models where the first layers (a and
process, or a “shoulder” in the cool-down part of the process.
b) capture the major variability within the data. Weaker but perhaps Finally, the results from the FPCA, especially the FPC scores,
more interesting predictors can be identified by examining deep layers are saved and used as features for further analysis. The FPC
(c). Following the example used earlier, the first two layers are able to
identify major drivers separately: a) flow and temperature; b) pressure
stability.

3.3.2 Batch-to-batch or iterative models. Utilizing all the


data from batch processes represents a challenge, as the
model output can be one measurement (e.g. predicting
quality) while model inputs range from raw materials
properties to initial and evolving conditions that are or were
changing during the batch. Different approaches on how to
effectively reduce this apparent excess of data
Fig. 19 Model inputs for batch processes can be generated by
(dimensionality) while maintaining the information to summarizing the information, which is known as landmark points in
understand, detect anomalies or use predictions for control the literature. Here, the maximum temperature reached during
can be found in ref. 24, 25 and 80. fermentation can be found to be correlated to the quality of the batch.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1481
View Article Online

Review Reaction Chemistry & Engineering

estimates (or control inputs) when the model is used to


predict the ongoing process.87,88 The inference of these batch
properties can be used to inform process operation as well as
optimization and control.89
3.3.3 Uncertainty. As demonstrated, data-driven models
allow process engineers to screen and identify correlated or
anomalous tags. However, the construction of a model is
naturally subject to sources of uncertainty that can change
over time. Despite the sources of uncertainty, often we are
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

able to construct models that capture the underlying physics


Fig. 20 Functional PCA summarizes the batch information into new of the process in the domain of interest. For example,76
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

coordinate variables that capture variability seen during the batch. In reports many examples of data-driven and first principle
the image, batch curves can be described again using a combination
of components 1 and 2.
models, in the context of polymer processing, that are able to
successfully predict the desired property (e.g. melt viscosity,
temperature profile, and flow index). More widely, this is
scores can be thought of as the “amount” of each primarily due to well-established statistical practices, as
characteristic functional component that there is in each encompassed by data reconciliation and validation
function (batch). approaches,90,91 model selection, validation tools,92 data
FPCA requires the alignment of batches to remove assimilation practice,93,94 and the field of estimation theory
variability in the time axis. Some reaction phases can take (which is generally concerned with identifying models of
longer due to different kinetics or simply waiting times due systems from data).95,96
to scheduling decisions. On some occasions, using In the following, we discuss data-driven techniques to
conversion instead of time will automatically align the briefly illustrate a general approach to reduce redundant tags
batches. When this information or other variables such as with similar effect size, quantify the historical variability or
automation triggers81 are not measured or unknown, uncertainty, to provide insight into possible future process
dynamic time warping techniques (DTW) can be used to conditions.
statistically align the batch trajectories (Fig. 21).80,82,282 DTW 3.3.3.1 Effect size, variable, and model selection. Data-driven
can also be used to classify anomalous batches and to models are, by definition, determined by the selection of
identify correlating parameters (Fig. 22).82–86 inputs and outputs. In the previous section, synthetic noise
3.3.2.1 Iterative learning control. Generally, the model inputs were intentionally used as additional variables to find
construction process and estimation of uncertainty are and remove those tags which showed a similar contribution
subject to a finite amount of data, which can lead to over- or towards the target.35,36 The idea behind is that the model
under-estimation. Sampling and bootstrap techniques (see starts using noise as a predictor once overfitting has been
next section) can be used to handle such a scenario and this reached. Another similar approach known as dropout97
is often useful in the estimation of the underlying consists in removing model parameters during training,
distribution of data empirically. Various iterative-learning which will also take care of redundant sensors that will
(control) methods also exist that help to adapt model appear as co-linear factors in screening models. Alternatively,

Fig. 21 Alignment of several batches using the temperature profile and dynamic time warping (a before and b after the alignment).

1482 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Fig. 24 Propagation of input–output uncertainties. The lack of control


in pressure is simulated by including a random normal distribution in
the predictive model input, generating a distribution of yield (output).

Fig. 22 A comparison between the (original) batch time vs. time


dynamically warped shows the rate at which the batches are tuning. However, these frameworks are often associated with
progressing relative to the reference (batchID 1). In this illustrative high computational expense, with further bottlenecks
example, batches are getting shorter so the rate is always positive.
provided by what metric to assess and how to partition the
data available. Ultimately, several optimized models need to
one can fit predictive models by penalizing the weights (if be interpreted and verified by a domain expert (process
the model is parametric) of pre-selected predictors, as well as system engineers, in this case).
the weights of their interactions with other variables (e.g. as 3.3.3.2 Variability in process data. Process variables (flow,
expressed in high order polynomials). In machine learning pressures, etc.) are likely to observe some form of variation.
and statistical estimation, this penalization is also called This may arise from the presence of unquantified
model regularization. Two of the best-known methods of disturbances, sub-optimal control, variability in an upstream
model regularization are Lasso regression,98 where the sum process, imperfect system measurement, etc. Assuming
of the absolute value of each of the weights (known as the L1 process variables are random variables distributed according
norm) is penalized; and the second is ridge regression,99 to a distribution of choice (this can also be estimated),
which penalizes the sum of the squares of all elements of the computational simulations (known as Monte Carlo
weight vector (known as the L2 norm) (Fig. 23). Other simulation) can provide a hypothesis about the resultant
penalization formulas using a variety of norms or their effects of their variation on end-product quality. The analysis
combination also exist (e.g. elastic net100). can help determine the variables with the strongest
Despite the screening methods discussed that focus on correlation to end-quality variation, which may ultimately
identifying inputs with high correlation to outputs, the guide process operation. This is shown in Fig. 24.
selection of model class and the associated hyperparameters One can also augment data inputs and outputs with noisy
also provides a basis for the identification of a strong replications of the original data to mimic process variation.
predictive soft sensor. Current trends encompassed by This is thought to provide a form of regularization and
AutoML41 try to automate both the identification of features mitigates the limits associated with small amounts of
and selection of models including their hyperparameter

Fig. 23 Regularization is a technique that avoids over-fitting and co-


linearity by penalizing a higher number and magnitude of regression
terms. Ridge regression (left) penalizes the roots of squared
magnitudes but is unable to remove irrelevant terms (e.g. noise) as it Fig. 25 Uncertainty can be estimated by resampling the data points
assumes variable selection has been done already. On the contrary, and then analyzing the distributions of the models obtained. Here, a
Lasso (right) minimizes absolute values being able to shrink irrelevant residence time distribution curve is generated by constructing different
(e.g. noise) coefficients to zero. The red arrow line indicates the models subsampling data and randomly changing the importance
penalization parameter, increasing towards the right. of each point (weighted bootstrap).

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1483
View Article Online

Review Reaction Chemistry & Engineering

variants of well-known models, such as stochastic gradient


boosted trees.105
All of these approaches act to robustify model construction,
however, ultimately the construction process itself is always
subject to finite data. As a result, cross-validation is used to
assess model complexity and optimize it by evaluating the
model performance using a (or numerous) validation datasets
and different combinations of training and validation data (see
Annex A). This reduces the risk of over-fitting to the correlation
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

Fig. 26 Uncertainty can be estimated by comparing a model (or sample expressed in the finite amount of data and is a well-known
statistic) with its simulated distribution using resampling techniques. For
practice within the domain of model construction.92
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

example, the slope obtained in a linear model can be compared to a


distribution of the same parameter that was generated by resampling the 3.3.3.3 Significance. By resampling data and ensembling
training data. Adapted from ref. 107 with permission. the resultant models, the distribution of model parameters is
obtained. If the correlations expressed in one model are not
shared across the majority of the samples, a low probability
data.101 Such additional data can either be generated via of the event can be inferred (see Fig. 26). This approach
knowledge of the physical process or statistically (via e.g. follows the same ideas behind hypothesis testing,104,106,107
generative adversarial networks, GANs).40,102 A similar and is a common problem in manufacturing where rare or
approach to ensure robustification is to resample training temporal events are often no longer present in recent data.
and validation data in order to analyze the distribution of 3.3.3.4 Uncertainty aware data-driven modeling. The
model outputs (see Fig. 25). Resampling techniques103,104 expression of uncertainty can be captured via a model that
also receive the name of bootstrap (as the bootstrap tree predicts a distribution directly. As described above, the first
model used for screening) and include various methods example of this is the use of a combination of models that
(shuffling, random sampling with replacement, etc.). Such an are created by resampling the training data; the ensemble of
approach acts as a form of regularization and leads to models that are created are then used to provide a bootstrap

Fig. 27 Figurative description of the Bayesian approach to express modeling uncertainty in neural networks. The top two subplots show the
covariance between two-parameter distributions in the first and second layers of the network, respectively. The bottom subplot demonstrates the
generation of a predictive distribution by Monte Carlo sampling the parametric distributions identified via approximate Bayesian inference.

1484 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Fig. 28 Expression of a Gaussian process posterior (i.e. its mean and uncertainty predictions) for the modeling of a smooth noiseless function.
The figure demonstrates the effects of an increasing number of data points: a) 5 data points, b) 6 data points, c) 7 data points. Note how as the
number of data points increases, the uncertainty estimate (i.e. the 95% confidence interval) reduces and the mean GP prediction becomes a better
estimate of the ground truth.

estimate of the uncertainty.40,108 This has been demonstrated inference (as performed in variational GPs). For more detailed
in ANN,109 hybrid approaches,110 and random forest models information on the mathematics underlying GPs, we direct
(see annex),108 amongst others. to,115 and for an introductory tutorial, we recommend.116
Another approach to training ANNs is provided by the
Bayesian learning paradigm. Bayesian neural networks (BNN)
share the same topology as conventional neural networks, 3.4 Process control and process optimization
but instead of having point estimates for parameters, they Despite functioning in narrow operational regions, process
instead have a distribution over parameters (Fig. 27). Treating dynamics need to be considered if the aim is to use
the network parameters as random variables then allows for predictive models for control applications that are not
the generation of a predictive distribution (given a model maintained strictly at steady-state conditions (i.e. main flows
input) via the Monte Carlo method. Similarly, Bayesian and levels are fairly stable38,117,118).
extensions to other models such as support vector machines System inertia or residence time (in chemical
(SVMs)111 exist. engineering), response time or time constant (in process
One eloquent approach is to identify a predictive model control), and autocorrelation (in time series models) are
that expresses both a nominal and uncertainty prediction in different characteristics of dynamical systems. For example,
closed form.108,112 However, unlike the Bayesian paradigm, transportation delay (also known as dead-time) will hinder
this approach produces an uncertainty estimate of the any conclusion done from pure correlation analysis (e.g.
underlying data (i.e. the natural variance of the underlying upstream changes affecting the target hours or days later). In
data-generating process, otherwise known as aleatoric addition, applications of machine learning modifying
uncertainty113) and is not reflective of the uncertainty arising operation parameters need to monitor the presence or
from the lack of information (or data, otherwise known as creation of plant-wide oscillations given close-loop process
epistemic uncertainty114) used to train the model. control or the presence of recycling streams.119,120
Gaussian processes (GPs) are non-parametric models, In this section, we now explore the use of data-driven
which means that the model structure is not apriori-defined. methods not only as monitoring or supervisory systems, but for
This provides a highly flexible model class as GPs enable the their direct application in process control and optimization. In
information expressed by the model to grow as more data is both cases, we are concerned with the identification of a
acquired. In GPs, given a model input, one can directly dynamical system. For more specific discussion regarding
construct a predictive distribution (i.e. a distribution over state-of-the-art, data-driven derivative-free approaches to
target variables) analytically via Bayesian inference and optimization, we direct the interested reader to this work.121
exploitation of the statistical relationships between 3.4.1 Dynamical systems modeling and system
datapoints. Further the uncertainty estimate of a GP identification. A simplified problem statement for the
expresses both aleatoric and epistemic uncertainty. The latter modeling of dynamical systems is: given a dataset of process
is reducible upon receipt of more data, but the former trajectories that express temporal observations of the system
element is irreducible. This is expressed by Fig. 28. state variable, x, and control inputs, u; identify either a
In the scope of practical use, it should be noted the function, fd, expressive of a mapping between system inputs
computational complexity of GPs grows cubically with the and states at the current time index, t, and states at the next
number of datapoints, so they either become intractable with time index, t + 1, or a function, fc, that describes the total
large datasets or require the use of approximate Bayesian derivative of the system state with respect to time, as well as

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1485
View Article Online

Review Reaction Chemistry & Engineering


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Fig. 29 A second-order linear dynamical system with one (a) observed state, y(t), and (b) control input, u(t). The discrete evolution of y(t + 1) can
be approximated as a function of the cumulative sum (cusum) of state (over a past horizon) and the most recent control input, instead of simply
using the previous measurement. A comparison is shown in subfigure c – cusum in red vs. most recent state in green. The cusum is thought to
properly account for the inertia of the system,122,123 whereas using the most recent state produces essentially a memoryless model. Training,
validation, and tests datasets are partitioned and evaluated using multi-step ahead prediction (recurrent) from an initial condition (d).

a mapping descriptive of the mechanism of system theory. It follows then that the traditional ethos of SI, in the
observation, g. A general definition of discrete-time process domain of PSE, constructs models that a) entail tractable
evolution and observation is provided as follows: parameter identification (i.e. that this estimation procedure
is at the very least identifiable, but more preferably convex or
xt+1 = fd(xt, ut) + wt (System model) (1a) analytical),124 b) are convenient for further use in process
control and optimization, and c) apply the concept of
yt = g(xt, ut) + et (Measurement model) (1b) Occam's Razor.125 As a result, this means that the models
identified in classical SI are often linear in the parameters126
where yt is the measured variable, xt is the real system state, i.e. that process evolution can be described as a linear
wt is additive system disturbance and et is typically a zero- combination of basis functions of the system state and
mean Gaussian noise. An example of such a system is shown control input.‡ It is also worth emphasizing that such a class
in Fig. 29, which shows a second-order system. The measured of models can still express nonlinearities, whilst typically
output y(t + 1) is, therefore, a function of u(t) but also the gaining the ability to conduct estimation online, due to the
inertia of the system. This is implicit and observed through efficiency of the algorithms available.127 As a result, these
the evolution of the state variable, x(t), which in this example techniques are applied not only in process industries, but
corresponds to the measured y(t). also widely used in navigation and robotics.128
There are two primary approaches to the identification of Given the narrow operational region of the process
such a function – first principles (white-box) and data-driven industries, it has historically been dominated by the
modeling (black-box). Generally, the benefits of first-principles prevalence of linear time-invariant (LTI) models of dynamical
approaches arise in the identification of a model structure, systems. The general idea here is to construct the evolution
which is based on an understanding of the physical of state (i.e. fd or fc), as well as its observation (i.e. g), as a
mechanisms driving the process. This tends to be highly useful linear combination of the current state and control input.
when one would like to extrapolate away from the region of the The field of SI pioneered the efficient identification of the
process dynamics seen in the data. Given the remit of this associated model parameters, θLTI, through the development
paper, we focus on data-driven modeling approaches. of subspace identification methods.129 One of the
Particularly when interest lies in control applications, foundational methods provided independently by Ho and
data-driven modeling of dynamical systems has been ruled Kalman (and others) leverages the concepts of system
by the field of system identification (SI). SI lies at the
intersection of probability theory, statistical estimation ‡ Note that, when the basis function selected is linear, the control will be able
theory, control theory, design of experiments, and realization to guarantee stability, reachability, controllability, and observability.

1486 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

controllability and observability to identify θLTI in closed example is provided by reinforcement learning, the general
form, given measurements of the system state in response to process of which can be conceptualized as simultaneous
an impulse control input signal. The insight provided by this system identification and learning of control and optimization.
method is that the singular value decomposition (SVD) of the Further discussion of reinforcement learning is provided by
block Hankel matrix (composed of the output response) section 3.4.5. In the following, we outline the second (and
provides a basis decomposition equivalent to the emerging) approach to data-driven modeling of dynamical
controllability and observability matrices. This ultimately systems as provided by the field of ML.
enables the identification of θLTI via a solution of the normal In keeping with the previous discussion, again in the ML
equations – hence mitigating the requirement for gradient- paradigm, one can identify either discrete dynamics fd or
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

based (iterative search) optimization algorithms. Clearly, a continuous dynamics fc. However, what the use of ML implies is
number of assumptions are required from realization theory the availability of a large, diverse, and highly flexible class of
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

and on the data generation process. However, a body of models and estimation techniques (i.e. one can select from
algorithms has been developed since to account for various supervised, unsupervised, and reinforcement learning
stochasticity130 and other input signals.131 approaches). Hence, selection of a) the most appropriate model
Given the relatively restrictive nature of LTI, innovative type, b) structure, c) use of features (model inputs and outputs),
model structures and various modeling paradigms have been d) training algorithm and e) partitioning of data and model
exploited in order to approximate systems (common to PSE) evaluation metric can only be guided by cross-validation
that exhibit nonlinear or time delay behavior. From the techniques, domain knowledge and certain qualities of the data
perspective of tackling nonlinearity, parametric and non- available. In some sense, this prevents the admittance of general
parametric models include (but are certainly not limited to) recommendations. However, in the following paragraphs, we
the Hammerstein and Wiener and their structural variants,132 explore some ideas as gathered from experience.
polynomials, nonlinear autoregressive models,133 and various • Selection of model type: clearly, for certain systems, a
kernel methods, such as Volterra series expansion models134 given model class will be more effective at modeling the
and radial basis functions.135 There have also been a number associated dynamics than others. For example, if the system
of methods developed to handle approximation of processes observes smooth, lipshitz continuous behavior (e.g. as is
with time delay, such as first-order plus dead time generally the case if no phase transition is present in the
(FOPDT)136 and second-order plus dead time (SOPDT) process), and we are interested in identifying discrete dynamics
systems137 as well as nonlinear autoregressive moving fd, then the use of neural networks141 and Gaussian
average models with exogenous inputs (NARMAX).133 Given processes142 are particularly appealing, primarily because of
the number and diversity of the models firmly rooted within the existing proofs pertaining to the universal approximation
the SI toolbox, as well as the inevitable sources of uncertainty theorem, which considers continuous functions. If the data
arising in the construction of models, many of the same expresses discontinuities (as would be the case if generated
model validation practices are employed in SI, as were from a process observing phase transitions), then perhaps the
discussed in section 3.3.3.124 With respect to parameter use of decision tree-based models would be more effective (as
estimation, many algorithms have been developed to identify these models can be conceptualized as a weighted combination
the associated model parameters in closed form. However, of step functions – although it should be noted that e.g. random
arguably, the more expressive or unconstrained the model forest models are often poor at generalizing predictions for the
structure becomes, the greater the dependence of parameter very same reason). Similarly, if the process dynamics are
estimation on search-based maximum likelihood routines nonstationary, then perhaps the use of e.g. deep Gaussian
(otherwise known as the prediction error method (PEM) in processes143 would be more desirable, given the inability of
the SI community). Perhaps the most obvious example of this single Gaussian processes to express nonstationary dynamics
is the training of neural networks, which are commonplace (given selection of a stationary covariance function).
within the SI toolbox.138 Alternatively, one could retain the use of GPs but instead
3.4.2 Machine learning for dynamical systems modeling. consider the use of either input or output warping, which has
The mention of neural networks seems to have brought us full been shown to remedy issues caused by non-stationarity
circle to the field of machine learning (ML). It is therefore a among other features of the data available.144,145 Various other
good idea to make the point that ML and SI are not so distinct extensions for GPs also exist.146 If one would like to express
as one may think. In fact, both fields are deeply rooted in continuous dynamics fc, then two approaches could be
statistical theory and estimation practice. Perhaps the considered. Either, one could predict the parameters of a
overarching difference between traditional ML and SI is that mechanistic or first principles model conditional to different
the developments of ML are somewhat unconstrained by the points in the input space (i.e. construct a hybrid model), using
concerns relevant to SI. These concerns primarily relate to the a neural network, Gaussian process, etc.;79 or one could take
use of the models derived for the purposes of control and the approach provided by neural ordinary differential
optimization. However, there is a certain symbiosis observed equations (neural ODE) models,147 which directly learn the
currently in the advent of many learning-based system total derivative of the system. Despite the suitability of a given
identification139 and control algorithms.140 A particular model class to a given dynamical system, innovative algorithms

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1487
View Article Online

Review Reaction Chemistry & Engineering

can be conceptualized to handle the perceived weakness of a framework used to formulate the inverse problem.152
given model class to the problem at hand. For example, Definition of the former typically considers the
returning to the problem of nonstationary dynamics, one could dimensionality of the parameter space, as well as the
conceivably partition the input space and switch between a nonlinearity and differentiability of the model itself.
number of Gaussian process models (with stationary covariance Meanwhile, the latter is governed by the decision to operate
functions) depending on the current state of the system.148 within either a Bayesian or frequentist framework (e.g. see
• Selection of model structure: the choice of model discussion in uncertainty appendix), which subsequently
structure pertains to decisions regarding the hyperparameters gives rise to an appropriate loss function for estimation (e.g.
of a given model. For example, in polynomial models, the MSE). Further decisions regarding the addition of
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

identification of higher-order terms describes the effects of regularization terms into the loss function may also be
interaction between input variables (i.e. enables the expression considered. Recent works in the domain of physics-informed
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

of nonlinear behavior). Similar considerations also apply when deep learning, aim to extend the traditional bias-variance
choosing activation functions in neural networks. Such a analysis to regularise predictions to satisfy known differential
problem is not trivial and even under the choice of the correct equations.153 This appears a promising approach to
(parametric) model class, the predictive performance is often incorporate physical information into ML models beyond
largely dependent on the quality of structure selection. At a traditional hybrid modeling approaches, however, it is
high level, such a problem is negated in the setting of non- generally not known how well these approaches perform
parametric models, or more specifically in the case of Gaussian when assumptions regarding the system's behavior are
processes. However, consideration is still required in the inaccurate (i.e. depart from ideal behavior). The selection of a
appropriate selection of a covariance function. This has led to statistical estimation framework also has implications for the
the development of automated algorithmic frameworks, as expression of various model uncertainties as discussed
demonstrated by algorithms such as sparse identification of previously in section 3.3.3.4. Clearly, uncertainties are
nonlinear dynamics (SINDy),149 ALAMO150 and various important to consider (and propagate) in the (multi-step
hyperparameter optimization frameworks.41 ahead) prediction of dynamical systems. Secondary to the
• Selection of features: it is important to emphasize the points discussed, the training algorithm should also consider
use of feature selection (relating both to the input and output the ultimate purpose of the model. For example if we are
of the model). Perhaps the most important feature selection looking to make predictions for ‘multiple-steps’ or many
(in relation to the model input) is the determination of those time indices ahead (e.g. predicting xt+3 = fd(fd(fd(xt))) from
process variables which have physical relationships to those some initial state, xt), one should consider how the training
states we are interested in predicting the evolution of. This is algorithm can account for this (see ref. 154), as it is an extension
enabled both by operational knowledge as well as building of the previous problem of identifying discrete dynamics.
decision tree-based models on the data available and then This can also be approached by considering the selection of
conducting further analysis to identify important process model structure and features (e.g. directly predicting multiple
variables.92 Further, even in systems that are assumed to be steps ahead).
Markovian (i.e. where the dynamics are governed purely by • Selection of data partition and model evaluation metric:
the current state of the system and not by the past sequence the blueprint for model training (i.e. training, validation, and
of states), it is often the case that predictive capabilities are testing92) necessitates the appropriate partitioning of data
enhanced by the inclusion of system states at a window of into respective sets. It is important in dynamical systems
previous time indices or incremental changes in the state. modeling that the datapoints for validation and testing are
Intuitively, such an approach provides more information to independent from those used in training. Therefore,
the model. A similar idea exists in the use of a cumulative generations of partitions by randomly subsampling a dataset
sum of past states over a horizon.122,123 Similarly, in the is not sufficient in the case of time-series data. To expand,
context of output feature selection and predicting discrete consider data from a batch process. One should split the data
dynamics fd, one could construct a model, fΔ, to estimate the such that separate (and entire) runs constitute data in
discrete increment in states between time indices (such that training, validation and testing. Equally, the means of
xt+1 = xt + fΔ(xt, ut)), which strikes similarities to the (explicit) evaluation155 should be strictly guided by a model's intended
Euler method. It is thought that the comparative advantage use. Typically, in use of models for dynamical systems, we
of such a scheme (over xt+1 = fd(xt, ut)), is that information are interested in predicting ‘multiple-steps’. In such a case, it
provided from the previous state is maximised. Recent work is likely that model errors will propagate through predictions.
has developed this philosophy further via a Runge–Kutta Therefore, if intended for such, quantification of the
(RK) and implicit trapezoidal (IT) scheme,151 demonstrating predictive accuracy of a single-step ahead is unlikely to be a
both schemes are able to well predict stiff systems (with the sufficient metric.
IT scheme performing better, as one would expect). In view of the extensive discussion provided on dynamical
• Selection of training algorithm: primarily quantifies the systems modeling, the discussion now turns to data-driven
means of parameter estimation, i.e. the optimization control and optimization of processes with a focus on plant
algorithm, and (extensions too) the statistical estimation and process operation.

1488 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

3.4.3 Model predictive control. Model predictive control produce some product C from a given reactant A:
(MPC) is currently the benchmark scheme in the domain of
k1A k2B
advanced process control and optimization (APC). The 2A → B → 3C (2)
general idea of MPC is to identify a discrete and finite
sequence of control inputs that optimizes the temporal where k1A and k2B are kinetic constants and B is an
evolution of a dynamical system over a time horizon intermediate product. The reaction kinetics are first order
according to some objective function.156 MPC is reliant upon and the compositions of A, B and C are manipulated through
the identification of some finite dimensional description of control of the reactor temperature via a cooling jacket and
process evolution as a model. Various optimization schemes also flowrates of A into the reactor (otherwise known as
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

(such as direct single-shooting, direct multiple shooting and control inputs, u). At specific instances in time throughout
direct collocation157) can be deployed to identify such a the batch, the control element is able to change the setting
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

sequence of control inputs according to the description of these control inputs. The objective of process operation is
provided by the model. Additionally, if operational to maximise the production of C at the end of the batch
constraints are imposed upon the problem and the operation, with penalty for the absolute magnitude of
underlying model is a perfect description of the system, the changes in controls between each control interaction. Given
solution identified will be (at least locally) optimal under that the operation is fed-batch there are a finite number of
both the dynamical model and operational constraints, given interactions the control element has available to maximize
that the control solution must satisfy the Karush–Kuhn– the process objective function.
Tucker (KKT) conditions. However, the models we identify of In practice, we are able to identify a model describing the
our processes are not perfect descriptions and often evolution of the underlying system composition and
processes are influenced by various uncertainties and temperature (state, x) as a system of continuous differential
disturbances. MPC schemes handle this by incorporating equations. To deploy MPC, we can simply estimate the model
state-feedback. This means at each discrete control parameters, discretize the model with respect to time via a
interaction the MPC scheme is able to observe (measure) the given numerical method of choice and integrate it into one
current state of the system, and then (through optimization) of the optimization schemes detailed previously. One can
identifies an optimal sequence of controls over a finite then optimize the process online by incorporating
discrete time horizon – the first control identified within the observation of the real system state as the process evolves
sequence is then input to the system and the process and reoptimizing the control inputs over a given discrete
repeated as the system evolves. This is expressed by Fig. 30, time horizon (as displayed by Fig. 30).
which specifically shows a receding horizon MPC, where the There are a number of drivers within the domain of MPC
length of the finite discrete time horizon used in research including handling nonlinear dynamics,281
optimization is maintained as the process evolves. uncertainty and improving dynamical models online (or from
To further explore the use of MPC and alternative data- batch to batch) using data accrued from the ongoing process.
driven methods with potential in the chemical process 3.4.4 Data-driven MPC. As alluded, MPC algorithms
industries, we conceptualise a batch chemical process case exploit various types of models, commonly developed by first
study as outlined in ref. 160. Specifically, we are concerned principles or based on process mechanisms.161 Many
with the following series reaction (catalysed by H2SO4) to mechanistic and empirical models are however often too

Fig. 30 Demonstration of the use of state-feedback in receding horizon MPC for online optimization of an uncertain, nonlinear fed-batch
process. Optimized forecast and evolution of a) the state trajectory, b) the control trajectory (composed of piecewise constant control inputs). See
ref. 158 and 159 for more information on the system detailed.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1489
View Article Online

Review Reaction Chemistry & Engineering

complex to be used online and in addition have often high underlying system (i.e. the state). Hence, in order to act
development costs. Data-driven MPC, which uses black-box optimally within a sequential decision making problem, the
identification techniques to construct its models has been control element should be reactive to observations of state
exploited instead, such techniques include support vector (i.e. the control element should be a control policy, π). Here
machines,162 fuzzy models,163 neural networks (NNs),164 and we note that implementation of an MPC scheme is essentially
Gaussian processes (GPs).165 More recently, GP-based MPC the identification of a control policy, as realizations of
algorithms that take into account online learning have been process uncertainty are accounted for via state feedback as
proposed.166,167 These algorithms take information from new discussed in section 3.4.3.
samples and update the existing data-driven model to account RL describes a set of different methods capable of
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

for better performance in terms of constraint satisfaction and learning a functionalization of such a control policy, π(θ, ·),
objective function value.168 Similar ideas have been taken into where θ are parameters of the functionalization. Further, RL
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

account in recent distributionally robust variants.169 does so within a closed-loop feedback control framework,
Additionally, the paradigm of MPC with learning is an MPC independently of explicit assumptions as to the form of
scheme with a nominal tracking objective and an additional process uncertainty or the underlying system dynamics. This
learning objective.170 Generally, the construction of the learning is achieved generally via sampling the underlying system with
term is based on an economic optimal experiment design different control strategies (known as exploration) and
criterion,170–174 furthermore, Gaussian processes have been used improving the functionalization thereafter by using feedback
for optimal design of experiments.175 This framework allows from the system and objective function (this process is
gathering information from the system under consideration, known as generalized policy iteration178). An intuitive way to
while at the same time optimizing it, ultimately trying to address think about this is in terms of the design of experiments
the exploration–exploitation dilemma. (DoE). Generally, DoE methodologies include elements that
3.4.5 Reinforcement learning. The automated control of explore the design space and then subsequently exploit the
chemical processes has become paramount in today's knowledge that is derived from that exploration process. This
competitive industrial setting. However, along with dynamic process is often iterative. RL uses similar concepts but
optimization, control is a challenging task, particularly for instead learns a control policy for a given sequential
nonlinear and complex processes. This section introduces decision-making problem.
reinforcement learning as a tool to control and optimise To further elucidate as to the benefits of RL, we now explore
chemical processes. While PID and model predictive (MPC) the conceptual fed-batch chemical process introduced in
controllers dominate industrial practice, reinforcement section 3.4.3. Now, assume we can estimate the uncertainties
learning is an attractive alternative,29,176 as it has the potential of the variables that constitute our dynamical model. If we were
to outperform existing techniques in a variety of applications, able to jointly express the uncertainties of the model, we could
such as online optimization and control of batch processes.177 equivalently describe discrete-time dynamical evolution of the
We only discuss model-free reinforcement learning here, as system state (i.e. reactor composition and temperature) as a
model-based reinforcement learning is very closely related to conditional probability density function. In practice, we cannot
data-driven MPC for chemical process applications, and a full express this conditional probability density function in closed
discussion on this topic is out of the scope of this section. form, however, we can approximate it via Monte Carlo
3.4.5.1 Intuition. In any (discrete-time) sequential decision- simulation (i.e. sampling). Here lies the fundamental
making problem, there are three principal elements: an advantage of RL: through simulation one can express any form
underlying system, a control element, and an objective of uncertainty associated with a model, and through
function. The aim of the control element is to identify generalized policy iteration an optimal control policy for the
optimal control decisions, given observations or uncertain system can be learned. This removes the
measurements of the underlying system. The underlying requirement to identify expressions descriptive of process
system then evolves (between control decisions) according to uncertainty in closed form (as is required in stochastic and
some dynamics. The optimality of the decisions selected by robust variants of MPC). The use of simulation is what makes
the control element and the evolution of the system is RL an incredibly general paradigm for decision making, as it
assessed by the objective function. This is a very high-level enables us to consider all types of model and process
and general way to think of any decision-making process. uncertainties jointly. In the following, we provide intuition as
Under some assumptions, there is at least one sequence to how generalized policy iteration functions.
of decisions that is able to globally maximize a given As the uncertainty of the process is realised through
objective function. If the evolution (or observation) of the simulation, at each discrete time index, t ∈ {0,…, T − 1},
underlying system is uncertain (stochastic), then this process evolution is rated with respect to the process
sequence of decisions must be reactive or conditional to the objective via a reward function, R(xt, ut, xt+1). The reward
realisation of the uncertainty. In the RL paradigm, one function provides a scalar feedback signal, Rt+1 (that is
assumes that all of the information regarding realisation of equivalent to negative stage cost, as used in conventional
the uncertainty and current position of the system is controls terminology). This feedback signal can be used
expressed within observation or measurement of the together with data descriptive of process evolution (i.e. {xt, ut,

1490 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

optimal policies. A (condensed) schematic representation of


the RL algorithm landscape is shown in Fig. 32. We give an
overview of these two main families of methods in the
following sections below.
3.4.6 Reinforcement learning – dynamic programming. RL
approaches based on (approximate) dynamic programming
Fig. 31 a) A general feedback control framework for decision-making are generally termed value-based methods. This is because
in an uncertain process. A control element interacts with an underlying (for complex and continuous problems) these methods use
system at discrete intervals in time, by changing control inputs to the function approximations (e.g. neural networks) to
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

system conditional to the observation of the system state. The system


approximate the value or the action-value function.
state then evolves in time, such that at the next time index it may be
Intuitively, the value function measures how good a specific
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

observed together with a scalar feedback signal indicative of the


quality of process evolution with respect to control objectives. b) state is under a given policy, action-value methods measure
High-level intuition behind the policy optimization algorithm, how good a state-action pair is. RL algorithms use these value
REINFORCE. The system is sampled via different control strategies and value-action scores to compute optimal policies. To
generated by the policy, which are exploratory and exploitative, and
calculate either the value function, or the action-value
then the resultant data is used to improve the policy further.
function, these methods use some recursion on the Bellman
equation.
Reinforcement learning, in an approximate dynamic
xt+1}t=0:T−1) via various different learning strategies to improve programming (ADP) philosophy, has been explored by the
the policy of the control element. The general intuition of the chemical process control community for some time now. For
application of RL to batch processing is provided by Fig. 31. example, in ref. 180 a model-based strategy and a model-free
Using the feedback provided by the system and the strategy for control of nonlinear processes were proposed, in
general algorithms that comprise the RL landscape, one may ref. 181 ADP strategies were used to address fed-batch reactor
learn a functional parameterization of the optimal control optimization, in ref. 182 mixed-integer decision problems
policy for a given process. Such a parameterization is were addressed with applications to scheduling. In ref. 183
typically suited to end-to-end learning e.g. recurrent or feed- with the inclusion of distributed optimization techniques, an
forward neural networks. There are two main families of RL input-constrained optimal control problem solution
algorithms, those based on (approximate) dynamic technique was presented,184,185 using Gaussian processes in
programming, and those that use policy gradients to create this line of research, among other works (e.g. ref. 186 and

Fig. 32 An overview of the RL algorithm landscape. Methods such as Q learning, which provided foundational breakthroughs for the field, are
based on principles common to dynamic programming. All of these methods aim to learn the state-(action) value function. Policy optimization
algorithms provide an alternative approach and specifically parameterize a policy directly. Actor-critic methods combine both approaches to
enhance sample efficiency by trading-off bias and variance in learning. Figure reproduced with permission from ref. 179.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1491
View Article Online

Review Reaction Chemistry & Engineering


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Fig. 33 The state trajectories generated in online optimization of an uncertain, nonlinear fed-batch biochemical process via RL and NMPC. In this
case, the controller is able to observe a noisy measurement, y = [y1, y2], of the system state, x. Reproduced with permission from the authors.

187). All these approaches rely on the (approximate) solution Further,193 a systematic incremental learning method is
of the Hamilton–Jacobi–Bellman equation and have been presented for RL in continuous spaces where the system is
shown to be reliable and robust for several problem dynamic, this is the case in many chemical processes, where
instances. future ambient conditions and feeds are unknown and
Some popular value-based RL algorithms include DQN,188 varying, amongst other developments.194,195
hindsight experience replay (HER),189 distributional Recent research has been focusing on another side of RL
reinforcement learning with quantile regression (QR- for chemical process control, that of using policy
DQN),190 and rainbow191 which combines state-of-the-art gradients.29,196 Policy gradient methods directly estimate the
improvements into DQN. control policy, without the need of a model, or an online
3.4.7 Reinforcement learning – policy optimization. RL optimisation. Therefore, aside from the benefits of RL, policy
algorithms based on policy optimization directly parametrize gradient methods additionally exhibit the following
the policy by some function approximator (say a neural advantages over action-value RL methods (e.g. deep Q-
network), this is schematically represented in Fig. 35. Policy learning):
gradient methods are advantageous in many problem • Policy gradient methods enable the selection of control
instances, and there have been many developments that have actions with arbitrary probabilities. In some cases (e.g.
made them suitable for process optimization and control. partially observable systems), the best policy may be
For example, in ref. 192 the authors develop an approximate stochastic.178
policy-based accelerated (APA) algorithm that allows the RL • In policy gradient methods, the approximate (possibly
algorithms to converge when using more aggressive learning stochastic) policy can naturally approach a deterministic
rates, which significantly speeds up the learning process. policy in deterministic systems,29 whereas action-value

Fig. 34 Comparison of the control trajectories generated via RL and NMPC in the same problem instance as in Fig. 33. The control trajectories are
composed of piecewise constant control actions.

1492 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Fig. 35 A schematic representation of a framework for the application of RL to chemical process optimization. Initial policy learning is first
conducted offline via simulation of an approximate process model. The policy is then transferred to the real system where it may be improved
either via iterative improvement of the offline model or directly from the data accrued from process operation.

methods (that use epsilon-greedy or Boltzmann functions) controls between successive control interactions. It can be
select a random control action with some heuristic rule.178 seen that the RL policy generally observes smaller changes in
• Although it is possible to estimate the objective value of the controls than the NMPC. In practice, this may lead to less
state-action pairs in continuous action spaces by function wear of process valves and reduce process downtime.
approximators, this does not help choose a control action. The process systems engineering community has been
Therefore, online optimization over the action space for each dealing with stochastic systems for a long time. For example,
time-step should be performed, which can be slow and nonlinear dynamic optimization and particularly nonlinear
inefficient. Policy gradient methods work directly with model predictive control (NMPC) are powerful methodologies
policies that output control actions, which is much faster to address uncertain dynamic systems, however, there are
and does not require an online optimization step. several properties that make its application less attractive. All
• Policy gradient methods are guaranteed to converge at the approaches in NMPC require the knowledge of a detailed
least to a locally optimal policy even in high dimensional (and finite-dimensional) model that describes the system
continuous state and action spaces, unlike action-value dynamics, and even with a detailed model, NMPC only
methods where convergence to local optima is not addresses uncertainty via its finite-horizon feedback. An
guaranteed.196 approach that explicitly takes into account uncertainties is
• In addition, policy gradients can establish a policy in a stochastic NMPC (sNMPC), however, this additionally
model-free fashion and excel at online computational time. requires an assumption for the uncertainty quantification
This is because the online computations require only and propagation, which is difficult to estimate or even
evaluation of a policy since all the computational cost is validate. Furthermore, the online computational time is a
shifted offline. bottleneck for real-time applications since a nonlinear
The drawback of policy gradient methods is their optimization problem has to be solved. In contrast, RL
inefficiency with respect to data, as value-based methods are directly accounts for the effect of future uncertainty and its
much more data-efficient. feedback in a proper ‘closed-loop’ manner, whereas
3.4.8 Reinforcement learning vs. NMPC. To demonstrate conventional NMPC assumes open-loop control actions at
the performance of RL relative to current methods, in Fig. 33 future time points in the prediction, which can lead to overly
and 34 we present one of the results from recent work.29 conservative control actions.180
Here, the authors employ policy optimization based RL and 3.4.9 A framework for RL in process systems engineering.
provide a comparison of the performance to an advanced Using RL directly on a process to construct an accurate
nonlinear model predictive control (NMPC) scheme. The controller would necessitate prohibitive amounts of data, and
figures show the distribution of process trajectories (i.e. therefore process models must be used for the initial part of
states and controls) from an uncertain, nonlinear fed-batch the training. This can be a detailed “knowledge-based”
process. The work shows that the performance of the RL is model, a data-driven model, or a hybrid model.29
certainly comparable to NMPC, but accounts for process The main computational cost in RL is offline, hence in
uncertainty slightly better. For example, Fig. 34 shows the addition to the use of models, it is possible to use an existing
distribution of control trajectories generated by the two controller to warm-start the RL algorithm to alleviate the
approaches. The work employs a penalty for changing computational burden. RL algorithms are computationally

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1493
View Article Online

Review Reaction Chemistry & Engineering

expensive in their offline stage; initially, the agent (or the model in a proper closed-loop sense and accounting for the
controller) explores the control action space randomly. In the modeled stochastic behavior (which could be from any
case of process optimization and control, it is possible to use distribution of disturbance model). Furthermore, the controller
a preliminary controller, along with supervised learning or will continue to adapt and learn to better control and optimize
apprenticeship learning28 to hot-start the policy, and the chemical process, addressing plant-model mismatch.159
significantly speed-up convergence. 3.4.10 Real-time optimization. Real-time optimization
The main idea here is to have data from some policy or (RTO) systems are well-accepted by industrial practitioners,
state-feedback control (e.g. PID controller, (economic) model with numerous successful applications reported over the last
predictive controller) to compute control actions given few decades.201,202 These systems rely on knowledge-based
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

observed states. The initial parameterization for the policy is (first principles) models, and in those processes where the
trained in a supervised learning fashion where the states are optimization execution period is much longer than the
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

the inputs and the control actions are the outputs. closed-loop process dynamics, steady-state models are
Subsequently, this parameterized policy is used to initialize the commonly employed to conduct the optimization.203
policy and then trained by the RL algorithm to account for the Traditionally, the model is updated in real-time using the
full stochasticity of the system and avoid online numerical available measurements, before repeating the optimization.
optimization along with the previously mentioned benefits of This two-step RTO approach (also known as model parameter
RL. A general methodology for conducting policy pre-training adaptation, MPA) is both intuitive and popular.
in the setting of a computational model, and then in the true Unfortunately, although MPA is largely the most widely used
system has been proposed in ref. 29, and is generally as follows: RTO strategy in the industry,202 it can be hindered from
Step 0, initialization. The algorithm is initialized by convergence to the actual plant optimum due to structural
considering an initial policy network (e.g. RNN policy plant-model mismatch.204,205 This has motivated the
network) with initialized parameters (preferably by development of alternative adaptation schemes in RTO, such
apprenticeship learning) θ0.28 as modifier adaptation.206
Step 1, preliminary learning (offline). It is assumed that a Similar to MPA, modifier adaptation (MA) embeds the
preliminary model can be constructed from previous existing available process model into a nonlinear optimization
process data, hence, the policy is learned by closed-loop problem that is solved at each RTO execution. The key
simulations from this model. difference is that the process measurements are now used to
Given that the experiments are in silico, a large number of update the so-called modifiers that are added to the cost and
episodes and trajectories can be generated that corresponds to constraint function in the optimization model, keeping the
different actions from the probability distribution of ut, and a phenomenological model fixed at a given nominal condition.
specific set of parameters of the RNN, respectively. The resulting This methodology greatly alleviates the problem of offset
control policy is a good approximation of the optimal policy. from the actual plant optimum, by enforcing that the KKT
Notice that if a stochastic preliminary model exists, this approach conditions determined by the model match those of the plant
can immediately exploit it, contrary to traditional NMPC upon convergence. However, this desirable property comes at
approaches. This finishes the in silico part of the algorithm, the cost of having to estimate the cost and constraint
subsequent steps would be run in the true system. Therefore, gradients from process measurements.
emphasis after this step is given on sampling as least as possible, The estimation of such plant gradients is a very difficult
as every new sample results in a ‘real’ process sample. task to implement in practice, due to lack of information and
Step 2, transfer learning. The policy can now be used on a measurement noise.207,208 These problems have a significant
‘real’ process, and learning can ensue by adapting all the effect on the gradient estimation, consequently, they reduce
weights from the policy network according to the policy the overall performance of the MA scheme. Recent advances
gradient algorithm. However, this may result in undesired in MA schemes are reviewed in the survey paper by.209 Among
effects. The control policy might have a deep structure, as a them, there are MA-based algorithms that do not require the
result a large number of weights could be present. Thus, the computation of plant derivatives. A nested MA scheme
optimization to update the policy may easily be stuck in a proposed by ref. 210 removes the need for estimating the
low-quality local optimum or completely diverge. To plant gradients by embedding the modified optimization
overcome this issue the concept of transfer learning is
adopted, which is not exclusive of RL.197 In transfer learning,
a subset of training parameters is kept constant to avoid the
use of a large number of epochs and episodes, applying
knowledge that has been stored in a different but related
problem. This technique originated from the task of image
classification, where several examples exist, e.g. in ref.
198–200. See Fig. 36 for a schematic representation.
Step 3, controlling the chemical process (online). In this step Fig. 36 Part of the network is kept frozen to adapt to new situations
RL is applied to the chemical process by using knowledge from more efficiently.

1494 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

model into an outer problem that optimizes over the gradient 3.5 Production scheduling and supply chain
modifiers using a derivative-free algorithm.211 combined MA
with a quadratic surrogate trained with historical data in an Planning and scheduling is the primary plant-wide decision-
algorithm called MAWQA. Likewise,212 investigated data- making strategy for the current process industries such as
driven approaches based on quadratic surrogates. the petroleum, chemical, pharmaceutical, and biochemical
Unfortunately, these procedures demand a series of time- industry. Optimal planning and scheduling can greatly
consuming experimental measurements in order to evaluate improve process efficiency and profit, reduce raw material
the gradients of a large set of functions and variables. Given the waste, energy and storage cost, and mitigate process
considerable impact on productivity, these implementations operational risks. Within the context of globalization and
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

are virtually absent in current industrial practice.202 circular economy, planning and scheduling have become
3.4.11 Real-time optimization via machine learning. The increasingly challenging due to the varying demand on both
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

main contributions of ML to RTO have been primarily product quantity and quality. Although many solution
directed towards improving the modifier adaptation (MA) approaches have been proposed from the domain of process
scheme. In ref. 213, the authors augment the conventional systems engineering, they are not often applicable for solving
MA scheme (i.e. using zeroth and first-order feedback from large-scale planning and scheduling problems due to the
the plant) with a feedforward scheme, which provides a data- process complexity. Furthermore, unexpected uncertainties
driven approach to handling non-stationarity in plant such as volatile customer demands, variations in process
disturbances. Specifically, an ANN is constructed in order to times, equipment malfunction, and fluctuations in socio-
classify the disturbance and suggest a suitable initial point economics frequently arise in a manufacturing site, causing
for the MA scheme thereafter. The results presented in the an intractable problem to the online decision-making of
work demonstrate impressive performance improvements process scheduling and planning. As a result, developing a
when the feedforward classification structure is data-driven based adaptive online planning and scheduling
implemented. However, the results also detail the sensitivity technique is of critical importance.
of the method to low data regimes and the appropriate 3.5.1 Reinforcement learning for process scheduling and
selection of ANN model structure. planning. Traditionally, optimal scheduling plans are made
An approach that efficiently handles low data regimes is using mathematical programming methods,218 in particular,
provided by the augmentation of MA schemes with Gaussian mixed integer linear programming (MILP) if only mass flow
process (GP). Here, (multiple) GPs are used to provide a is considered, or mixed integer nonlinear programming
mapping from control inputs to terms descriptive of (MINLP) if energy utilization is also taken into account. The
mismatch in the constraints and in the objective function. general procedure to calculate an optimal scheduling
This mitigates the requirement to identify zeroth and first- solution is to first construct a process-wide model by
order terms descriptive of a mismatch from plant considering material balance and energy balance, with binary
measurements as in the original MA scheme.214 This variables (e.g. variables that can only take a value of 0 or 1)
approach was further extended in ref. 215, where a filtering being assigned within the process model to explore different
scheme was proposed to reduce large changes in control scheduling options. Then, MILP or MINLP is performed to
inputs between RTO iterations; and in ref. 216, where a trust- calculate the optimal solution. However, given a large
region and Bayesian optimization were combined to balance number of scheduling alternatives and complex model
exploration and exploitation of the GP models. Both works structures, mathematical programming is often extremely
demonstrated good results, however, unlike the previous time-consuming, thus not feasible for online scheduling.
work of ref. 213 all of these works assume that the plant To resolve this issue, some initial studies have been
disturbance is stationary. proposed since 2020 in which reinforcement learning is
Another approach proposed recently deployed RL for adopted to learn from training examples to solve the process
RTO.217 The approach was completely data-driven and did model and to generate (approximated) optimal policies for
not require a description of plant dynamics. Whilst the online scheduling.219,220 Instead of using a surrogate model,
work provided an interesting, innovative preliminary study, the advantage of RL is that, upon its construction, it will
and performed comparably to a full information nonlinear rapidly amend the original optimal scheduling plan whenever
programming (NLP) model, further work should consider a new disruption occurs during the process. Based on the
the issues of training an RL policy purely from a case study provided,219 it is found that RL can outperform
stationary data set (with no simulated description of plant the traditional mathematical programming approach.
dynamics). The nature of such a training scheme has the Additionally, analysing the optimal solutions proposed by RL
potential to drive the plant into dangerous operational models, new heuristics can be discovered. Nonetheless, it is
regions due to the bias of the value function used in the worth emphasising that using RL for online scheduling is
approach. This is discussed further in section 4 within the still at its infant stage, thus more thorough investigation
context of safety. In addition, merging domain knowledge must be conducted before it can be actually applied to the
(via a model) and data is generally preferred to a purely process industry. Basic intuition for the use of RL in the
data-driven approach. domain of batch chemical production scheduling follows.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1495
View Article Online

Review Reaction Chemistry & Engineering


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Fig. 37 Handling control constraints innately in RL-based chemical production scheduling via identification of transformations of the control
prediction through standard operating procedures (i.e. precedence and disjunctive constraints and requirements for unit cleaning). a) Augmenting
the decision-making process by identifying the set of controls which satisfy the logic provided by standard operating procedure at each time index,
and b) implementation of a rounding policy to ensure that RL control selection satisfies the associated logic.

Briefly, the function of the scheduling element is to control predicted by the policy function to select one of those
identify the sequencing of various production operations on available controls (see Fig. 37b). Perhaps the largest downside
available equipment to minimize some operational cost (that of this approach is that derivative-free approaches to RL are
may consider resource consumption, tardiness, etc.). The most suitable. These algorithms are particularly suited when
sequencing of these operations may be subject to constraints the effective dimensionality of the problem is low. However,
that define: which operations may precede or succeed others the approach is known to become less efficacious when the
in given equipment; limits of resources available for effective dimensionality of the parameter space is large (as
operation (including e.g. energy, raw material, storage etc.); may be the case in the typical neural network models used in
and, various constraints on unit availability. At given time RL policy functionalization).
intervals then, the scheduling element should be able to Clearly, the discussion provided in the latter part of this
predict the scheduling of future operations on equipment section is just one approach to handling constraints in a very
items, conditional to the current state of the plant. The state particular scheduling problem instance. There is a general
of the plant may consist of: inventory levels of raw material, need for further research in the application of RL to
intermediates and products; the amount of resource available scheduling tasks in chemical processes. This poses challenge
to operation; unit availability and idling; and, the time until and something both the academic and industrial
client orders are due (obviously dependent on problem communities can combine efforts in approaching. For more
instance). How one handles the various constraints imposed information, we direct the reader to a recent review.222
on the scheduling element is not clear, clearly there is scope 3.5.2 Reinforcement learning for supply chain
to handle them through a penalty function method, however, optimization. The operation of supply chains is subject to
the number of constraints imposed is often large, which inherent uncertainty as derived from market mechanisms
often provides difficulty for the RL algorithms, as there are (i.e. supply and demand),223 transportation, supply chain
many discontinuities in the ‘reward landscape’. Further, structure and the interactions that take place between
there are typically many operations that a given unit can organizations, and various other exogenous uncertainties
process, and given the nature of RL (i.e. using a functional (such as global weather and humanitarian events).224
parameterization of a control policy), it is not clear how best
to select controls. Fig. 37 and 38 show one idea proposed in
recent work221 and a corresponding schedule generated for
the case study detailed there.
The basic idea of that work is that generally the definition
of many of the constraints imposed on scheduling problems
are related to control selection and governed by standard
operating procedure (SOPs) (i.e. the requirement for cleaning
times, the presence of precedence constraints, etc.). These
SOPs essentially define logic rules, fSOP, that govern the way
in which the plant is operated and the set of operations one
could schedule in units, t, given the current state of the
plant, xt (see Fig. 37a). As a result, one can often pre-identify Fig. 38 Solving a MILP problem via RL to produce an optimal
the controls, which innately satisfy those constraints defined production schedule via the framework displayed in Fig. 37. A discrete
by SOPs and implement a rounding policy, fr to alter the time interval is equivalent to 0.5 days in this study.

1496 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review


This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Fig. 39 Solving a supply chain optimization problem via evolutionary RL methods. Reproduced with permission from ref. 225. The plots show the
training process of a) a hybrid stochastic search algorithm, b) evolutionary strategies, c) particle swarm optimization, d) artificial bee colony. The
algorithms demonstrate performance competitive with state-of-the-art RL approaches.

Due to the large uncertainties that exist within supply amounts of manufacturing data in their operational
chains, there is an effort to ensure that organizational historians.
behavior is more cohesive and coordinated with other More accessible and easier to use advanced analytical
operators within the chain. For example, graph neural tools are evolving to the point where many data steps are
networks (GNNs)226,227 have been applied to help infer or will be mostly automated, including the use of
hidden relationships or behaviors within existing screening models via machine learning (i.e. AutoML).
networks.228,229 Furthermore, the combination of an Therefore, process engineering expertise are and will be
increasing degree of globalization and the availability of crucial to identify and define manufacturing problems to
informative data sources, has led to an interest in RL as a solve as well as interpret the solutions found through
potential approach to supply chain optimization. This is data-driven approaches. In many situations, once the root-
again due to the presence of a wide range of uncertainties, cause of the problem is found, well-known solutions that
combined with complex supply chain dynamics, which can include new sensors and/or process control will be
generally provide obstacle to existing methods. The preferred over a complex approach difficult to maintain in
application of RL to supply chain optimization is similarly in the long run.
its infant stage, however efforts such as OR-gym230 provide Advanced monitoring systems that notify suboptimal (or
means for researchers to develop suitable algorithms for anomalous) behavior, list correlated factors, and allow
standard benchmark problems. Again, this area would largely engineers to interactively visualize process data will become
benefit from greater collaboration between academia and the new standard in manufacturing environments.
industry. Fig. 39 shows some training results from the Historians with good quality and well-structured
inventory management problem described in ref. 230 manufacturing data (e.g. batch) will become a competitive
generated by different evolutionary RL approaches including advantage, especially if a data ownership culture at the
particle swarm optimization (PSO),231 evolutionary strategies plant level is well-established.
(ES),232 artificial bee colony (ABC)233 and a hybrid algorithm Combined with process engineering and control
with a space reduction approach.234 knowledge, ML can be used for steady-state or batch-to-
batch applications, where recommended set-points or
recipe changes are suggested to operators/process
4 Challenges and opportunities engineers similar to expert systems or pseudo-empirical
correlations learned from historical data. However, if the
In this manuscript, we have covered the intuition behind ambition is closed-loop (dynamic) systems, both data-
machine learning techniques and their application to driven MPC or reinforcement learning are limited by the
industrial processes, which have traditionally stored vasts following two challenges.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1497
View Article Online

Review Reaction Chemistry & Engineering

Implementation of the policies they output even when initialized with feasible
Data-driven solutions and their dedicated infrastructures are initial policies.238 Various approaches have been proposed in
less reliable than process control strategies and their systems the literature, where usually penalties are applied for the
(DCS). This has been put forward by many studies, but constraints. Such approaches can be very problematic, easily
particularly the recent study235 summarises the concerns for losing optimality or feasibility,239 especially in the case of a fixed
the deployment of RL machinery into engineering penalty. The main approaches to incorporate constraints in
applications. We quote the following: “we [the scientific this way make use of trust-region and fixed penalties,239,240
community] do not understand how the parts comprising as well as cross entropy.238 As observed in ref. 239, when
penalty methods are applied in policy optimization,
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

deep RL algorithms impact agent [controller] training, either


separately or as a whole. This unsatisfactory understanding depending on the value of the penalty parameter the
behaviour of the policy may change. If a large value of the
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

suggests that we should re-evaluate the inner workings of our


algorithms. Indeed, the overall question motivating our work penalty parameter is used, then the policy tends to be over-
is: how do the multitude of mechanisms used in deep RL conservative resulting in feasible areas that are not optimal;
training algorithms impact agent [controller] behavior?” on the other hand, when the value for the penalty parameter
Probably the two main takeaways from the is too small, the policy tends to ignore the constraints as in
aforementioned analysis are 1) heuristics and rules of the unconstrained optimization case.
thumb in the implementation of RL algorithms is of the
utmost importance, and performance is very reliant on 5 Computational tools for data-driven
these details 2) large neural networks are limited by their modeling, control, and optimization
interpretability and maintenance, and this should be
further investigated. In this section, we provide signpost to some of the favorite
computational tools of the Process Systems Engineering and
Machine Learning group, University of Manchester and the
Safety Optimisation and Machine Learning for Process Systems
The inclusion of safety or operational constraints is not Engineering group, Imperial College London for select model
straightforward. For example, existing methods for constrained and problem classes (see Table 1). Clearly, this list is not
reinforcement learning, often described as safe RL,236,237 that exhaustive, but we hope it is of use to those interested in a
are based on policy gradients cannot guarantee strict feasibility wide range of PSE applications, who can also benefit from a

Table 1 Computational tools used by the authors and colleagues for data-driven modeling, control, and optimization in Python and Julia. This list is
not exhaustive

Modeling
Model class Python packages Julia packages
245
Differential equations SciPy SciML246
Neural ODEs torchdiffeq,247 JAX248 DiffEqFlux249
Support vector machines Scikit-learn37 Julia statistics – SVM
Decision tree models Scikit-learn DecisionTree
Gaussian processes GPy,250 GPyTorch,251 GPflow252 AbstractGPs
Artificial neural networks PyTorch,253 Keras,254 JAX Flux,255 Knet256
Latent variable methods Scikit-learn, SciPy, UMAP257 MultivariateStats,258 UMAP
Explainable AI SHAP,259 LIME260 ShapML261
Classical Sys. ID SciPy, SysIdentPy262 Controlsystemidentification

Optimizationa
Problem class Python packages Julia packages
Linear programming SciPy, CVXPY,263 GEKKO264 JuMP265
Semidefinite programming CVXPY JuMP
Quadratic programming CVXPy, GEKKO JuMP
Nonlinear programming SciPy, Pyomo,266 NLOpt, GEKKO JuMP, Optim, NLOpt
Mixed integer programming Pyomo, GEKKO JuMP
Bayesian optimization GPyOpt,267 HEBO,145 BoTorch,268 GPflowOpt269 BayesianOptimization
MPC and dynamic opt. Pyomo, CasADi,270 GEKKO InfiniteOpt271,272
Automatic differentiation JAX, CasADi ForwardDiff,273 zygote274
Reinforcement learning Ray,275 RLlib,276 Gym277 ReinforcementLearning278
AutoML Ray Tune,279 Optuna280 AutoMLPipeline
a
Generally we detail packages that interface with well-established solvers, such as Gurobi241 for mixed-integer problems and IPOPT242 for
nonlinear programming problems. This does not include commercial packages such as the MATLAB243 Toolbox, which also provides options
such as Aladin244 for distributed optimization.

1498 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

Table 2 Annex III – glossary of terms

Term Explanation
Anomaly detection Identifies data points, events, and/or observations that deviate from a dataset's normal behavior
AutoML (model selection) Systematic approach to select the best algorithm and its tuning parameters
Basis functions Basic transformations used as building blocks to capture higher complexity in the data using simpler structures.
For example, powers x that when added together from polynomials
Bayesian inference Specifies how one should update one's beliefs (probability density function) about a random variable upon
observing data (new and historical)
Bias-variance trade-off Related to model complexity and generally analyzed on training data. If the model overfits the training data, it
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

will capture all of the variability (variance), while simpler models will underfit having a higher overall error (bias)
Bootstrap Resampling of the data to fit more robust models
Covariance Similarity in terms of correlation between two variables affected by noise
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Cross validation Resampling technique mostly used when data availability is limited and to avoid overfitting. It consists of
dividing the dataset into multiple different subsets. N-1 of these subsets are used to train the model, while the
remaining one is used for validation. The chosen subset is changed iteratively till all subsets are used for
validation
Dimensionality reduction Techniques to reduce the number of input variables (e.g. tags) in a dataset by finding inner correlations (e.g.
linear correlation of multiple sensors measuring the same process temperature)
Dynamic programming Algorithmic technique for solving a sequential decision making problem by breaking it down into simpler
subproblems using a recursive relationship, known as the Bellman equation
Dynamic time warping Algorithm used to align and compare the similarity between two batches (or time series sequences) with different
duration
A common example is drying or reacting process, where time to finish depends on initial conditions and rate of
change
Feature engineering Generation of additional inputs (Xs) by transforming the original ones (usually tags). For example, the √pressure
helps to find a linear relationship with respect to the flow rate. These calculations can be done automatically or
by domain knowledge
Feature selection Reduction of model inputs (e.g. tags) based on its contribution towards an output (e.g. yield) or identified group
(e.g. normal/abnormal)
First-principle Based on fundamental principles like physics or chemistry
Functional principal Algorithm similar to PCA to reduce the number of co-linear inputs with minimal loss of information. The main
components difference is that FPCE also takes into consideration both time and space dependencies of these inputs
Gaussian processes Learning method for making predictions probabilistically in regression and classification problems
Generalized (model) Achieved when the model is able to generate accurate outcome (predictions) in unseen data
Gradient boosted trees Combination of decision trees that are built consecutively where each fits the residuals (unexplained variability)
Gradient methods Optimization approach that iteratively updates one or more parameters using the rate of change to increase or
decrease the goal (objective function)
Hyperparameter Parameter used to tune the model or optimization process e.g., weights in a weighted sum objective function
Input/s (model) Any variable that might be used by a model to generate predictions (as regressor or classifier, for example). These
are known with various names, X, factors, independent variables, features… and correspond to sensor readings
(tags) or their transformation (features)
Loss (or cost) function Objective function that has to be minimized in a machine learning algorithm, usually the aggregated difference
between predictions and reality
Machine learning Data-driven models able to find: 1) correlations and classifications, 2) groups (clusters) or 3) best strategy for
manipulated variables
These types are known by 1) supervised, 2) unsupervised, and 3) reinforcement learning
Model input Any variable that enters the model, also referred as features or Xs. Mostly, they correspond to sensor readings
(tags) or a calculation from those (engineered features)
Monte Carlo simulation Method used to generate different scenarios by varying one or more model parameters according to a chosen
distribution, e.g. normal
Neural networks Model that uses a composition of non-linear functions (e.g. linear with saturation, exponential…) in series so it
can approximate any input/output relationship
Non linear System in which the change of the output is not proportional to the change of the input
Output/s (model) Variable or measurement to predict in supervised models. It is often referred to as Y, y, target, dependent variable...
For example, y = fIJx), where y is the output of the model
Partition the data Creation of subsets for fitting the model (training), avoiding overfitting (validation) and comparing the final
result with unseen data (test)
Piecewise linear Technique to approximate non-linear functions into smaller intervals that can be considered linear
Policy optimization Used in reinforcement learning, it finds the direction (gradient) at which the actions can improve the long-term
(gradient) cumulative goal (reward)
Predictive control Method that anticipates the behavior of the system, based on a model, several steps ahead so the optimal set of
actions (manipulated variables) are calculated and perform in each iteration
Principal component Dimensionality reduction technique that finds the correlation between input variables (tags or Xs), unveiling
analysis (PCA) hidden (latent) variables that can be used instead of all them independently
Random forest Learning algorithm that operates by subsampling the data and then constructing a multiple of decision trees in
order to obtain a combined (ensembled) model that is more robust to data
Regularization/penalization Mathematical method that introduces additional parameters in the objective/cost function to penalize the
possibility that the fitting parameters would assume extreme values (e.g. LASSO, Ridge Regression, etc.)

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1499
View Article Online

Review Reaction Chemistry & Engineering

Table 2 (continued)

Term Explanation
Reinforcement learning Fitting algorithm (training) that finds the best possible series of actions (policy) to maximize a goal (reward).
(RL) Tuning a PID can be seen as a reinforcement learning task, for example
Resampling Used when data availability is limited or contains minimal information. It consists of selecting several different
data subsets combinations out of the collected data. This allows a more robust estimate of model parameters,
estimating their uncertainty more accurately. A typical example in process engineering can be the the analysis of
sporadic events like failures, start-ups or shut-down
Reward function Goal of the learning process, used in RL to find the set of actions that maximizes it. Similar to an objective
function in optimization, its definition will determine the solution found
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

Soft sensors Type of model which is able to infer and construct state variables (whose measurement is technically difficult or
relatively expensive, as for example a lab analysis) from variables that can be captured constantly from common
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

instruments such as thermocouples, pressure transmitters, ph-meters, etc.


Supervised If data contains an output or variable to predict (often called labels). Examples are regression or classification of
images where its group is known beforehand
Supervised learning/model Type of problem where the output of the system, sometimes called labels, is known in advance. For example, it
can be numeric (e.g. regression y = fIJx), being y the output) or categorical (e.g. logistic regression to predict if a
lab sample will be in or out of specification looking at measurements of pH or temperature)
Support vector machines Learning algorithm that identifies the best fit regressor (or classifier) considering a number of points within a
threshold (margin). Classical regression or classification, will try to minimize the error between prediction and
reality. A special type of variable transformation is used for its application to non-linear problems (known as the
Kernel trick)
Tags Unique identifier for an instrumentation signal, e.g., temperature at try 20 of distillation column or flow of
material x to reactor y
Test (data) Subset of data that a model does not use for its training or validation
Training (data) It is a data set of examples used during the learning process and is used to fit the parameters. The goal is to
produce a trained (fitted) model that generalizes well to new, unknown data
Tree-based models Model that uses a series of if-then rules to generate predictions (model output) from one (decision tree) or more
(random forest, boosted tree)
Unsupervised When data does not contain the output to predict, sometimes called unlabeled data. These models can still
learning/model obtain information by grouping (clustering) similar inputs by correlation or other similarities (e.g. control
chart only has data inputs but a model is able to classify them as in- or out-of-control/anomaly)
Validation (data) Subset of data used to avoid model overfitting

glossary explaining common marching learning terms (see Acknowledgements


Table 2).
The authors appreciate the support from JMP (SAS Institute
Disclaimer of liability Inc.) for facilitating the open access of this manuscript.

Authors and their institutions shall not assume any liability,


for any legal reason whatsoever, including, without Notes and references
limitation, liability for the usability, availability, 1 D. A. C. Beck, J. M. Carothers, V. R. Subramanian and J.
completeness, and freedom from defects of the reviewed Pfaendtner, Data science: Accelerating innovation and
examples as well as for related information, configuration, discovery in chemical engineering, AIChE J., 2016, 62,
and performance data and any damage caused thereby. 1402–1416.
2 Industry 4.0: How to navigate digitization of the manufacturing
Author contributions sector, April 2015. [Online; accessed 13. Jul. 2020].
Conceptualization M. M., A. D. R. C., and F. J. N. B.; data 3 The potential of advanced process controls in energy and
curation M. M. and F. J. N. B.; formal analysis M. M. and F. J. materials, Nov 2020. [Online; accessed 17. Sep. 2022].
N. B.; investigation M. M. and F. J. N. B.; methodology M. M. 4 P. M. Piccione, Realistic interplays between data science and
and F. J. N. B.; project administration D. Z., A. D. R. C. and F. chemical engineering in the first quarter of the 21st century:
J. N. B.; resources M. M. and F. J. N. B.; software M. M., C. P. Facts and a vision, Chem. Eng. Res. Des., 2019, 147, 668–675.
G. and F. J. N. B.; supervision D. Z., A. D. R. C., and F. J. N. 5 N. Clarke, Analytics is not just about patterns in big data,
B.; validation M. M., D. Z., and F. J. N. B.; visualization M. ComputerWeekly.com, Nov 2016.
M., M. V., A. D. R. C., and F. J. N. B.; writing original draft M. 6 C. Shang and F. You, Data analytics and machine learning
M., M. V., A. D. R. C., and F. J. N. B.; writing – review editing for smart process manufacturing: Recent advances and
M. M., C. P. G., M. V., D. Z, and F. J. N. B. perspectives in the big data era, Engineering, 2019, 5(6),
1010–1016.
Conflicts of interest 7 R. Carpi, A. Littmann and C. Schmitz, Chemicals
manufacturing 2030 : More of the same…but different, Aug
There are no conflicts to declare. 2019. [Online; accessed 13. Jul. 2020].

1500 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

8 V. Venkatasubramanian, The promise of artificial 24 S. García-Muñoz, T. Kourti and J. F. MacGregor, Model


intelligence in chemical engineering: Is it here, finally?, predictive monitoring for batch processes, Ind. Eng. Chem.
AIChE J., 2019, 65, 466–478. Res., 2004, 43(18), 5929–5941.
9 J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. 25 S. García-Muñoz, T. Kourti, J. F. MacGregor, A. G. Mateos
Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek and A. and G. Murphy, Troubleshooting of an industrial batch
Potapenko, et al., Highly accurate protein structure prediction process using multivariate methods, Ind. Eng. Chem. Res.,
with alphafold, Nature, 2021, 596(7873), 583–589. 2003, 42(15), 3592–3601.
10 S. Ravuri, K. Lenc, M. Willson, D. Kangin, R. Lam, P. 26 F. Destro, P. Facco, S. García Muñoz, F. Bezzo and M.
Mirowski, M. Fitzsimons, M. Athanassiadou, S. Kashem Barolo, A hybrid framework for process monitoring:
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

and S. Madge, et al., Skillful precipitation nowcasting using Enhancing data-driven methodologies with state and
deep generative models of radar, 2021, arXiv preprint parameter estimation, J. Process Control, 2020, 92, 333–351.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

arXiv:2104.00954. 27 B. D. Ziebart, A. L. Maas, J. A. Bagnell and A. K. Dey, et al.,


11 S. J. Qin and L. H. Chiang, Advances and opportunities in Maximum entropy inverse reinforcement learning, in Aaai,
machine learning for process data analytics, Comput. Chem. Chicago, IL, USA, 2008, vol. 8, pp. 1433–1438.
Eng., 2019, 126, 465–473. 28 M. Mowbray, R. Smith, E. A. Del Rio-Chanona and D.
12 R. Leardi, Experimental design in chemistry: A tutorial, Zhang, Using process data to generate an optimal control
Anal. Chim. Acta, 2009, 652, 161–172. policy via apprenticeship and reinforcement learning,
13 J. R. Couper, W. R. Penney, J. R. Fair and S. M. Walas, 17 - AIChE J., 2021, e17306.
chemical reactors, in Chemical Process Equipment (Third Edition), 29 P. Petsagkourakis, I. O. Sandoval, E. Bradford, D. Zhang and
ed. J. R. Couper, W. R. Penney, J. R. Fair and S. M. Walas, E. A. del Rio-Chanona, Reinforcement learning for batch
Butterworth-Heinemann, Boston, 3rd edn, 2012, pp. 591–653. bioprocess optimization, Comput. Chem. Eng., 2020, 133, 106649.
14 D. Vader, F. Incropera and R. Viskanta, Local convective 30 B. Douglas, Reinforcement Learning, Dec 2021. [Online;
heat transfer from a heated surface to an impinging, planar accessed 1. Dec. 2021].
jet of water, Int. J. Heat Mass Transfer, 1991, 34(3), 611–623. 31 I. A. Udugama, C. L. Gargalo, Y. Yamashita, M. A. Taube, A.
15 T. Bikmukhametov and J. Jäschke, Combining machine Palazoglu, B. R. Young, K. V. Gernaey, M. Kulahci and C.
learning and process engineering physics towards Bayer, The role of big data in industrial (bio)chemical process
enhanced accuracy and explainability of data-driven operations, Ind. Eng. Chem. Res., 2020, 59(34), 15283–15297.
models, Comput. Chem. Eng., 2020, 138, 106834. 32 D. Görges, Relations between model predictive control and
16 E. Bradford, L. Imsland, M. Reble and E. A. del Rio- reinforcement learning, IFAC-PapersOnLine, 20th IFAC
Chanona, Hybrid gaussian process modeling applied to World Congress, 2017, vol. 50, 1, pp. 4920–4928.
economic stochastic model predictive control of batch 33 M. Foehr, J. Vollmar, A. Calà, P. Leitão, S. Karnouskos and
processes, in Recent Advances in Model Predictive Control, A. W. Colombo, Engineering of Next Generation Cyber-
Springer, 2021, pp. 191–218. Physical Automation System Architectures, SpringerLink,
17 J. Mandhane, G. Gregory and K. Aziz, A flow pattern map 2017, pp. 185–206.
for gas–liquid flow in horizontal pipes, Int. J. Multiphase 34 L. Breiman, Random Forests, Mach. Learn., 2001, 45, 5–32.
Flow, 1974, 1(4), 537–553. 35 Y. Wu, D. D. Boos and L. A. Stefanski, Controlling Variable
18 S. Corneliussen, J.-P. Couput, E. Dahl, E. Dykesteen, K.-E. Selection by the Addition of Pseudovariables, J. Am. Stat.
Frøysa, E. Malde, H. Moestue, P. O. Moksnes, L. Scheers Assoc., 2007, 102, 235–243.
and H. Tunheim, Handbook of Multiphase Flow Metering, 36 S. Janitza, C. Strobl and A.-L. Boulesteix, An AUC-based
Norwegian Society for Oil and Gas Measurement, 2015. permutation variable importance measure for random
19 J. Zhang, S. Zhang, J. Zhang and Z. Wang, Machine forests, BMC Bioinf., 2013, 14, 119.
Learning Model of Dimensionless Numbers to Predict Flow 37 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.
Patterns and Droplet Characteristics for Two-Phase Digital Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V.
Flows, Appl. Sci., 2021, 11, 4251. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M.
20 P. G. Constantine, Z. del Rosario and G. Iaccarino, Data- Brucher, M. Perrot and E. Duchesnay, Scikit-learn: Machine
driven dimensional analysis: algorithms for unique and learning in Python, J. Mach. Learn. Res., 2011, 12, 2825–2830.
relevant dimensionless groups, 2017, arXiv:1708.04303. 38 J. D. Kelly and J. D. Hedengren, A steady-state detection
21 X. Xie, W. K. Liu and Z. Gan, Data-driven discovery of (SSD) algorithm to detect non-stationary drifts in processes,
dimensionless numbers and scaling laws from experimental J. Process Control, 2013, 23, 326–331.
measurements, Dec 2021. [Online; accessed 30. Jan. 2022]. 39 G. T. Jemwa and C. Aldrich, Improving process operations
22 K. Dunn, Extracting value from data, in Process Improvement using support vector machines and decision trees, AIChE J.,
Using Data, [Online; accessed 30. Jan. 2022, ch. 6.3]. 2005, 51, 526–543.
23 J. F. MacGregor, H. Yu, S. García Muñoz and J. Flores- 40 M. Mowbray, T. Savage, C. Wu, Z. Song, B. A. Cho, E. A. Del
Cerrillo, Data-based latent variable methods for process Rio-Chanona and D. Zhang, Machine learning for
analysis, monitoring and control, Comput. Chem. Eng., biochemical engineering: A review, Biochem. Eng. J.,
2005, 29, 1217–1223. 2021, 108054.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1501
View Article Online

Review Reaction Chemistry & Engineering

41 F. Hutter, L. Kotthoff and J. Vanschoren, Automated machine 59 J. Ash, L. Lancaster and C. Gotwalt, A method for
learning: methods, systems, challenges, Springer Nature, 2019. controlling extrapolation when visualizing and optimizing
42 C. Thon, B. Finke, A. Kwade and C. Schilde, Artificial the prediction profiles of statistical and machine learning,
Intelligence in Process Engineering, Adv. Intell. Syst., Discovery Summit Europe 2021 Presentations, 2021.
2021, 3, 2000261. 60 J. Ash, L. Lancaster and C. Gotwalt, A method for controlling
43 C. Molnar, Interpretable machine learning, Lulu. com, 2020. extrapolation when visualizing and optimizing the prediction
44 JMP, Profilers: Jmp 12, https://www.jmp.com/support/help/ profiles of statistical and machine learning models, 2022.
Profilers.shtml#377608, 2021. 61 I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
45 S. M. Lundberg and S.-I. Lee, A unified approach to S. Ozair, A. Courville and Y. Bengio, Generative adversarial nets,
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

interpreting model predictions, in Proceedings of the 31st Adv. Neural Inf. Process. Syst., 2014, 27, 2672–2680.
international conference on neural information processing 62 M. Nixon and S. Xu, Anomaly Detection in Process Data
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

systems, 2017, pp. 4768–4777. Using Generative Adversarial Networks (GAN), Aug 2021.
46 S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. [Online; accessed 1. Dec. 2021].
Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal and S.-I. 63 A. Geiger, D. Liu, S. Alnegheimish, A. Cuesta-Infante and K.
Lee, From local explanations to global understanding with Veeramachaneni, Tadgan: Time series anomaly detection
explainable ai for trees, Nat. Mach. Intell., 2020, 2(1), 56–67. using generative adversarial networks, arXiv, 2020, preprint,
47 J. Senoner, T. Netland and S. Feuerriegel, Using explainable arXiv:2009.07769, https://arxiv.org/abs/2009.07769.
artificial intelligence to improve process quality: Evidence 64 F. Yang and D. Xiao, Progress in root cause and fault
from semiconductor manufacturing, Management Science, propagation analysis of large-scale industrial processes,
2021, 1–20. J. Control. Sci. Eng., 2012, 2012, 1–10.
48 J. Wang, J. Wiens and S. Lundberg, Shapley flow: A graph- 65 F. Yang, S. Shah and D. Xiao, Signed directed graph based
based approach to interpreting model predictions, in modeling and its validation from process knowledge and
International Conference on Artificial Intelligence and process data, Int. J. Appl. Math. Comput. Sci., 2012, 22, 41–53.
Statistics, PMLR, 2021, pp. 721–729. 66 N. F. Thornhill and A. Horch, Advances and new directions
49 Fault Detection and Diagnosis of the Tennessee Eastman in plant-wide disturbance detection and diagnosis, Control
Process using Multivariate Control Charts (2020-US-45MP- Eng. Pract., 2007, 15, 1196–1206.
606), Oct 2020. [Online; accessed 19. Dec. 2020]. 67 M. Bauer and N. F. Thornhill, A practical method for
50 J. Ash and J. Ding, Fault Detection and Diagnosis of the identifying the propagation path of plant-wide
Tennessee Eastman Process using Multivariate Control Charts, disturbances, J. Process Control, 2008, 18, 707–719.
ResearchGate, 2022. 68 V. Venkatasubramanian, R. Rengaswamy and S. N. Kavuri,
51 M. Joswiak, Y. Peng, I. Castillo and L. H. Chiang, A review of process fault detection and diagnosis: Part II:
Dimensionality reduction for visualizing industrial Qualitative models and search strategies, Comput. Chem.
chemical process data, Control Eng. Pract., 2019, 93, 104189. Eng., 2003, 27, 313–326.
52 L. McInnes, J. Healy and J. Melville, Umap: Uniform manifold 69 M. A. Kramer and B. L. Palowitch, A rule-based approach to
approximation and projection for dimension reduction, 2020. fault diagnosis using the signed directed graph, AIChE J.,
53 L. McInnes, J. Healy and S. Astels, hdbscan: Hierarchical 1987, 33, 1067–1078.
density based clustering, J. Open Source Softw., 2017, 2(11), 205. 70 C. Palmer and P. W. H. Chung, Creating signed directed
54 R. J. Campello, D. Moulavi and J. Sander, Density-based graph models for process plants, Ind. Eng. Chem. Res.,
clustering based on hierarchical density estimates, in 2000, 39(7), 2548–2558.
Pacific-Asia conference on knowledge discovery and data 71 C. Reinartz, D. Kirchhübel, O. Ravn and M. Lind, Generation
mining, Springer, 2013, pp. 160–172. of signed directed graphs using functional models [U+204E][U
55 M. Carletti, C. Masiero, A. Beghi and G. A. Susto, +204E] this work is supported by the danish hydrocarbon
Explainable machine learning in industry 4.0: Evaluating research and technology centre, IFAC-PapersOnLine, 5th IFAC
feature importance in anomaly detection to enable root Conference on Intelligent Control and Automation Sciences
cause analysis, in 2019 IEEE International Conference on ICONS 2019, 2019, vol. 52, 11, pp. 37–42.
Systems, Man and Cybernetics (SMC), IEEE, 2019, pp. 21–26. 72 T. Savage, J. Akroyd, S. Mosbach, N. Krdzavac, M. Hillman
56 S. J. Qin, Y. Liu and Y. Dong, Plant-wide troubleshooting and M. Kraft, Universal Digital Twin – integration of
and diagnosis using dynamic embedded latent feature national-scale energy systems and climate data, 2021,
analysis, Comput. Chem. Eng., 2021, 107392. submitted for publication. Preprint available at https://
57 S. J. Qin, Y. Dong, Q. Zhu, J. Wang and Q. Liu, Bridging como.ceb.cam.ac.uk/preprints/279/.
systems theory and data science: A unifying review of 73 M. T. Ribeiro, S. Singh and C. Guestrin, why should i trust
dynamic latent variable analytics and process monitoring, you?, Explaining the predictions of any classifier, 2016.
Annu. Rev. Control, 2020, 50, 29–48. 74 B. Braun, I. Castillo, M. Joswiak, Y. Peng, R. Rendall, A.
58 Q. Zhu, S. J. Qin and Y. Dong, Dynamic latent variable Schmidt, Z. Wang, L. Chiang and B. Colegrove, Data
regression for inferential sensor modeling and monitoring, science challenges in chemical manufacturing, IFAC
Comput. Chem. Eng., 2020, 137, 106809. preprints, 2020.

1502 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

75 S. J. Qin, S. Guo, Z. Li, L. H. Chiang, I. Castillo, B. Braun 89 D. Bonvin and G. François, Control and optimization of batch
and Z. Wang, Integration of process knowledge and chemical processes, tech. rep., Butterworth-Heinemann, 2017.
statistical learning for the dow data challenge problem, 90 J. A. Romagnoli and M. C. Sánchez, Data processing and
Comput. Chem. Eng., 2021, 153, 107451. reconciliation for chemical process operations, Elsevier, 1999.
76 C. Abeykoon, Design and applications of soft sensors in 91 J. Loyola-Fuentes, M. Jobson and R. Smith, Estimation of
polymer processing: A review, IEEE Sens. J., 2019, 19, fouling model parameters for shell side and tube side of
2801–2813. crude oil heat exchangers using data reconciliation and
77 R. Oliveira, Combining first principles modelling and parameter estimation, Ind. Eng. Chem. Res., 2019, 58(24),
artificial neural networks: a general framework, Comput. 10418–10436.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

Chem. Eng., 2004, 28(5), 755–766. 92 J. Friedman, T. Hastie and R. Tibshirani, et al., The elements
78 M. Von Stosch, R. Oliveira, J. Peres and S. F. de Azevedo, of statistical learning, Springer series in statistics, New York,
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Hybrid semi-parametric modeling in process systems 2001, vol. 1.


engineering: Past, present and future, Comput. Chem. Eng., 93 M. Asch, M. Bocquet and M. Nodet, Data assimilation:
2014, 60, 86–101. methods, algorithms, and applications, SIAM, 2016.
79 F. Vega, X. Zhu, T. R. Savage, P. Petsagkourakis, K. Jing and 94 R. Arcucci, J. Zhu, S. Hu and Y.-K. Guo, Deep data
D. Zhang, Kinetic and hybrid modelling for yeast assimilation: integrating deep learning with data
astaxanthin production under uncertainty, Biotechnol. assimilation, Appl. Sci., 2021, 11(3), 1114.
Bioeng., 2021, 118, 4854–4866. 95 S. Arridge, P. Maass, O. Öktem and C.-B. Schönlieb, Solving
80 S. Wold, N. Kettaneh-Wold, J. MacGregor and K. Dunn, 2.10 inverse problems using data-driven models, Acta Numer.,
- batch process modeling and mspc, in Comprehensive 2019, 28, 1–174.
Chemometrics, ed. S. D. Brown, R. Tauler and B. Walczak, 96 A. M. Stuart, Inverse problems: a bayesian perspective, Acta
Elsevier, Oxford, 2009, pp. 163–197. Numer., 2010, 19, 451–559.
81 S. García-Muñoz, M. Polizzi, A. Prpich, C. Strain, A. Lalonde 97 N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R.
and V. Negron, Experiences in batch trajectory alignment Salakhutdinov, Dropout: A simple way to prevent neural
for pharmaceutical process improvement through networks from overfitting, J. Mach. Learn. Res., 2014, 15(56),
multivariate latent variable modelling, J. Process Control, 1929–1958.
2011, 21(10), 1370–1377, Special issue: selected papers from 98 R. Tibshirani, Regression shrinkage and selection via the
two joint IFAC conferences: 9th International Symposium lasso, J. R. Stat. Soc. Series B Stat. Methodol., 1996, 58(1),
on Dynamics and Control of Process Systems and the 11th 267–288.
International Symposium on Computer Applications in 99 A. E. Hoerl and R. W. Kennard, Ridge regression: Biased
Biotechnology, Leuven, Belgium, July 5–9, 2010. estimation for nonorthogonal problems, Technometrics,
82 F. Zuecco, M. Cicciotti, P. Facco, F. Bezzo and M. Barolo, 1970, 12(1), 55–67.
Backstepping methodology to troubleshoot plant-wide 100 H. Zou and T. Hastie, Regularization and variable selection
batch processes in data-rich industrial environments, via the elastic net, J. R. Stat Soc. Series B Stat. Methodol.,
Processes, 2021, 9(6), 1074. 2005, 67(2), 301–320.
83 M. Spooner, D. Kold and M. Kulahci, Harvest time 101 M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel and A.
prediction for batch processes, Comput. Chem. Eng., Srinivas, Reinforcement learning with augmented data,
2018, 117, 32–41. 2020, arXiv preprint arXiv:2004.14990.
84 M. Spooner and M. Kulahci, Monitoring batch processes 102 J. Yoon, D. Jarrett and M. Van der Schaar, Time-series
with dynamic time warping and k-nearest neighbours, generative adversarial networks, 2019.
Chemom. Intell. Lab. Syst., 2018, 183, 102–112. 103 S. Lahiri and S. Lahiri, Resampling methods for dependent
85 J. M. González-Martínez, A. Ferrer and J. A. Westerhuis, data, Springer Science & Business Media, 2003.
Real-time synchronization of batch trajectories for on-line 104 Resampling — Elements of Data Science, May 2021, [Online;
multivariate statistical process control using dynamic time accessed 30. Nov. 2021].
warping, Chemom. Intell. Lab. Syst., 2011, 105(2), 195–206. 105 J. H. Friedman, Stochastic gradient boosting, Comput. Stat.
86 M. Spooner, D. Kold and M. Kulahci, Selecting local Data Anal., 2002, 38(4), 367–378.
constraint for alignment of batch process data with 106 A. B. Downey, Think stats, O'Reilly Media, Inc., 2011.
dynamic time warping, Chemom. Intell. Lab. Syst., 107 There is still only one test, Nov 2021, [Online; accessed 30.
2017, 167, 161–170. Nov. 2021].
87 J. H. Lee and K. S. Lee, Iterative learning control applied to 108 J. W. Coulston, C. E. Blinn, V. A. Thomas and R. H. Wynne,
batch processes: An overview, Control Eng. Pract., Approximating prediction uncertainty for random forest
2007, 15(10), 1306–1318. regression models, Photogramm. Eng. Remote Sens.,
88 M. Barton, C. A. Duran-Villalobos and B. Lennox, 2016, 82(3), 189–197.
Multivariate batch to batch optimisation of fermentation 109 B. Lakshminarayanan, A. Pritzel and C. Blundell, Simple
processes to improve productivity, J. Process Control, and scalable predictive uncertainty estimation using deep
2021, 108, 148–156. ensembles, 2016, arXiv preprint arXiv:1612.01474.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1503
View Article Online

Review Reaction Chemistry & Engineering

110 J. Pinto, C. R. de Azevedo, R. Oliveira and M. von Stosch, A 128 A. Simpkins, System identification: Theory for the user, 2nd
bootstrap-aggregated hybrid semi-parametric modeling edition (ljung, l.; 1999) [on the shelf], IEEE Robot. Autom.
framework for bioprocess development, Bioprocess Biosyst. Mag., 2012, 19(2), 95–96.
Eng., 2019, 42(11), 1853–1865. 129 M. Verhaegen, Subspace techniques in system
111 W. Chu, S. S. Keerthi and C. J. Ong, Bayesian support vector identification, in Encyclopedia of Systems and Control,
regression using a unified loss function, IEEE Trans. Neural Springer, 2015, pp. 1386–1396.
Netw., 2004, 15(1), 29–44. 130 P. Van Overschee and B. De Moor, Subspace algorithms for
112 M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. the stochastic identification problem, Automatica, 1993, 29(3),
Liu, M. Ghavamzadeh, P. Fieguth, X. Cao, A. Khosravi, 649–660.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

U. R. Acharya, V. Makarenkov and S. Nahavandi, A review 131 T. Katayama, et al., Subspace methods for system
of uncertainty quantification in deep learning: identification, Springer, 2005, vol. 1.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

Techniques, applications and challenges, arXiv, 2020, 132 A. Wills, T. B. Schön, L. Ljung and B. Ninness,
preprint, arXiv:2011.06225, https://arxiv.org/abs/ Identification of hammerstein–wiener models, Automatica,
2011.06225. 2013, 49(1), 70–81.
113 R.-R. Griffiths, A. A. Aldrick, M. Garcia-Ortegon and V. 133 S. Chen and S. A. Billings, Representations of non-linear
Lalchand, et al., Achieving robustness to aleatoric systems: the narmax model, Int. J. Control, 1989, 49(3),
uncertainty with heteroscedastic bayesian optimisation, 1013–1032.
Mach. Learn.: Sci. Technol., 2021, 3(1), 015004. 134 C. Gao, L. Jian, X. Liu, J. Chen and Y. Sun, Data-driven
114 A. Kendall and Y. Gal, What uncertainties do we need in modeling based on volterra series for multidimensional
bayesian deep learning for computer vision?, 2017. blast furnace system, IEEE Trans. Neural Netw.,
115 C. K. Williams and C. E. Rasmussen, Gaussian processes for 2011, 22(12), 2272–2283.
machine learning, MIT press Cambridge, MA, 2006, vol. 2. 135 M. Pottmann and D. E. Seborg, A nonlinear predictive
116 R. Turner and M. P. Deisenroth, Ml tutorial: Gaussian control strategy based on radial basis function models,
processes (richard turner). Comput. Chem. Eng., 1997, 21(9), 965–980.
117 M. Elie, Discovering hidden relationships in production 136 Q. Bi, W.-J. Cai, E.-L. Lee, Q.-G. Wang, C.-C. Hang and Y. Zhang,
data (EU2018 113), Discovery Summit Europe, JMP (SAS), Robust identification of first-order plus dead-time model from
Mar 2018. [Online; accessed 30. Jan. 2022]. step response, Control Eng. Pract., 1999, 7(1), 71–77.
118 V. Mattia and S. Salvador, DOE for World-Scale 137 G. P. Rangaiah and P. R. Krishnaswamy, Estimating second-
Manufacturing Processes: Can We Do Better? (2019-EU- order plus dead time model parameters, Ind. Eng. Chem.
45MP-073), Discovery Summit Europe, JMP (SAS), Mar 2019. Res., 1994, 33(7), 1867–1871.
[Online; accessed 30. Jan. 2022]. 138 S. Chen, S. A. Billings and P. Grant, Non-linear system
119 M. Shoukat Choudhury, V. Kariwala, N. F. Thornhill, H. identification using neural networks, Int. J. Control,
Douke, S. L. Shah, H. Takada and J. F. Forbes, Detection 1990, 51(6), 1191–1214.
and diagnosis of plant-wide oscillations, Can. J. Chem. Eng., 139 M. Forgione, A. Muni, D. Piga and M. Gallieri, On the
2007, 85(2), 208–219. adaptation of recurrent neural networks for system
120 W. L. Luyben, Snowball effects in reactor/separator identification, 2022.
processes with recycle, Ind. Eng. Chem. Res., 1994, 33(2), 140 L. Hewing, K. P. Wabersich, M. Menner and M. N.
299–305. Zeilinger, Learning-based model predictive control: Toward
121 D. van de Berg, T. Savage, P. Petsagkourakis, D. Zhang, N. safe learning in control, Annu. Rev. Control Robot. Auton.
Shah and E. A. del Rio-Chanona, Data-driven optimization Syst., 2020, 3, 269–296.
for process systems engineering applications, Chem. Eng. 141 K. Hornik, M. Stinchcombe and H. White, Multilayer
Sci., 2021, 117135. feedforward networks are universal approximators, Neural
122 Q.-G. Wang and Y. Zhang, Robust identification of Netw., 1989, 2(5), 359–366.
continuous systems with dead-time from step responses, 142 M. P. Deisenroth, R. D. Turner, M. F. Huber, U. D.
Automatica, 2001, 37(3), 377–390. Hanebeck and C. E. Rasmussen, Robust filtering and
123 H. Schaeffer and S. G. McCalla, Sparse model selection via smoothing with gaussian processes, IEEE Trans. Autom.
integral terms, Phys. Rev. E, 2017, 96, 023302. Control, 2011, 57(7), 1865–1871.
124 L. Ljung, Perspectives on system identification, Annu. Rev. 143 A. Damianou and N. D. Lawrence, Deep gaussian processes, in
Control, 2010, 34(1), 1–12. Artificial intelligence and statistics, PMLR, 2013, pp. 207–215.
125 M. Viberg, Subspace methods in system identification, IFAC 144 E. Snelson, C. E. Rasmussen and Z. Ghahramani, Warped
Proceedings Volumes, 1994, 27(8), 1–12. gaussian processes, Adv. Neural Inf. Process. Syst., 2004, 16,
126 K. J. Åström and P. Eykhoff, System identification–a survey, 337–344.
Automatica, 1971, 7(2), 123–162. 145 A. I. Cowen-Rivers, W. Lyu, R. Tutunov, Z. Wang, A. Grosnit,
127 F. Tasker, A. Bosse and S. Fisher, Real-time modal R. R. Griffiths, A. M. Maraval, H. Jianye, J. Wang, J. Peters
parameter estimation using subspace methods: theory, and H. B. Ammar, An empirical study of assumptions in
Mech. Syst. Signal Process, 1998, 12(6), 797–808. bayesian optimisation, 2021.

1504 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

146 A. McHutchon and C. Rasmussen, Gaussian process 162 X.-C. Xi, A.-N. Poo and S.-K. Chou, Support vector
training with input noise, Adv. Neural Inf. Process. Syst., regression model predictive control on a HVAC plant,
2011, 24, 1341–1349. Control Eng. Pract., 2007, 15(8), 897–908.
147 R. T. Chen, Y. Rubanova, J. Bettencourt and D. Duvenaud, 163 K. Kavsek-Biasizzo, I. Skrjanc and D. Matko, Fuzzy
Neural ordinary differential equations, 2018, arXiv preprint predictive control of highly nonlinear pH process, Comput.
arXiv:1806.07366. Chem. Eng., 1997, 21, S613–S618.
148 S. T. Bukkapatnam and C. Cheng, Forecasting the evolution 164 S. Piche, B. Sayyar-Rodsari, D. Johnson and M. Gerules,
of nonlinear and nonstationary systems using recurrence- Nonlinear model predictive control using neural networks,
based local gaussian process models, Phys. Rev. E, IEEE Control Systems Magazine, 2000, 20(3), 53–62.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

2010, 82(5), 056206. 165 J. Kocijan, R. Murray-Smith, C. E. Rasmussen and A.


149 S. L. Brunton, J. L. Proctor and J. N. Kutz, Discovering Girard, Gaussian process model based predictive control, in
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

governing equations from data by sparse identification of American Control Conference (ACC), IEEE, 2004, vol. 3, pp.
nonlinear dynamical systems, Proc. Natl. Acad. Sci. U. S. A., 2214–2219.
2016, 113(15), 3932–3937. 166 E. Bradford, L. Imsland and E. A. del Rio-Chanona,
150 Z. T. Wilson and N. V. Sahinidis, The alamo approach to Nonlinear model predictive control with explicit back-offs
machine learning, Comput. Chem. Eng., 2017, 106, 785–795. for gaussian process state space models, in 58th
151 D. Machalek, T. Quah and K. M. Powell, A novel implicit Conference on Decision and Control (CDC), IEEE, 2019, pp.
hybrid machine learning model and its application for 4747–4754.
reinforcement learning, Comput. Chem. Eng., 2021, 107496. 167 M. Maiworm, D. Limon, J. M. Manzano and R. Findeisen,
152 J. W. Myers, K. B. Laskey and T. S. Levitt, Learning bayesian Stability of gaussian process learning based output
networks from incomplete data with stochastic search feedback model predictive control, IFAC-PapersOnLine,
algorithms, 2013, arXiv preprint arXiv:1301.6726. 2018, vol. 51, 20, pp. 455–461, 6th IFAC Conference on
153 M. Raissi, P. Perdikaris and G. E. Karniadakis, Physics- Nonlinear Model Predictive Control NMPC 2018.
informed neural networks: A deep learning framework for 168 E. Bradford, L. Imsland, D. Zhang and E. A. del Rio
solving forward and inverse problems involving nonlinear Chanona, Stochastic data-driven model predictive control
partial differential equations, J. Comput. Phys., 2019, 378, using gaussian processes, Comput. Chem. Eng., 2020, 139,
686–707. 106844.
154 M. Raissi, P. Perdikaris and G. E. Karniadakis, Multistep 169 Z. Zhong, E. A. del Rio-Chanona and P. Petsagkourakis,
neural networks for data-driven discovery of nonlinear Data-driven distributionally robust mpc using the wasserstein
dynamical systems, 2018, arXiv preprint arXiv:1801.01236. metric, 2021.
155 L. Zhang and S. Garcia-Munoz, A comparison of different 170 X. Feng and B. Houska, Real-time algorithm for self-
methods to estimate prediction uncertainty using partial reflective model predictive control, J. Process Control,
least squares (pls): a practitioner's perspective, Chemom. 2018, 65, 68–77.
Intell. Lab. Syst., 2009, 97(2), 152–158. 171 C. A. Larsson, C. R. Rojas, X. Bombois and H. Hjalmarsson,
156 J. B. Rawlings, D. Q. Mayne and M. Diehl, Model predictive Experimental evaluation of model predictive control with
control: theory, computation, and design, Nob Hill Publishing excitation (mpc-x) on an industrial depropanizer, J. Process
Madison, WI, 2017, vol. 2. Control, 2015, 31, 1–16.
157 M. Kelly, An introduction to trajectory optimization: How to 172 B. Houska, D. Telen, F. Logist, M. Diehl and J. F. V.
do your own direct collocation, SIAM Rev., 2017, 59(4), Impe, An economic objective for the optimal experiment
849–904. design of nonlinear dynamic processes, Automatica,
158 E. A. del Rio-Chanona, N. R. Ahmed, D. Zhang, Y. Lu and 2015, 51, 98–103.
K. Jing, Kinetic modeling and process analysis for 173 D. Telen, B. Houska, M. Vallerio, F. Logist and J. Van Impe,
desmodesmus sp. lutein photo-production, AIChE J., A study of integrated experiment design for nmpc applied
2017, 63(7), 2546–2554. to the droop model, Chem. Eng. Sci., 2017, 160, 370–383.
159 M. Mowbray, P. Petsagkourakis, E. A. D. R. Chanona, R. 174 C. A. Larsson, M. Annergren, H. Hjalmarsson, C. R. Rojas,
Smith and D. Zhang, Safe chance constrained X. Bombois, A. Mesbah and P. E. Modén, Model predictive
reinforcement learning for batch process control, 2021, control with integrated experiment design for output error
arXiv preprint arXiv:2104.11706. systems, in 2013 European Control Conference (ECC), 2013,
160 E. Bradford and L. Imsland, Economic stochastic model pp. 3790–3795.
predictive control using the unscented kalman filter, IFAC- 175 S. Olofsson, M. Deisenroth and R. Misener, Design of
PapersOnLine, 2018, vol. 51, 18, pp. 417–422. experiments for model discrimination hybridising
161 Z. K. Nagy, B. Mahn, R. Franke and F. Allgöwer, Real-time analytical and data-driven approaches, in Proceedings of the
implementation of nonlinear model predictive control of 35th International Conference on Machine Learning, ed. J. Dy
batch processes in an industrial framework, in Assessment and A. Krause, Stockholmsmässan, Stockholm Sweden,
and Future Directions of Nonlinear Model Predictive Control, PMLR, 10–15 Jul 2018, vol. 80 of Proceedings of Machine
Springer, 2007, pp. 465–472. Learning Research, pp. 3908–3917.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1505
View Article Online

Review Reaction Chemistry & Engineering

176 N. P. Lawrence, M. G. Forbes, P. D. Loewen, D. G. 190 W. Dabney, M. Rowland, M. G. Bellemare and R. Munos,
McClement, J. U. Backstrom and R. B. Gopaluni, Deep Distributional reinforcement learning with quantile
reinforcement learning with shallow controllers: An regression, arXiv, 2017, preprint, arXiv:1710.10044, https://
experimental application to pid tuning, 2021. arxiv.org/abs/1710.10044.
177 H. Yoo, H. E. Byun, D. Han and J. H. Lee, Reinforcement 191 M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G.
learning for batch process control: Review and perspectives, Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar and D.
Annu. Rev. Control, 2021, 52, 108–119. Silver, Rainbow: Combining improvements in deep
178 R. Sutton and A. Barto, Reinforcement Learning: An reinforcement learning, Thirty-second AAAI conference on
Introduction, MIT Press, 2nd edn, 2018. artificial intelligence, 2018, vol. 393, pp. 3215–3222.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

179 E. Pan, P. Petsagkourakis, M. Mowbray, D. Zhang and E. A. 192 X. Wang, Y. Gu, Y. Cheng, A. Liu and C. L. P. Chen,
del Rio-Chanona, Constrained model-free reinforcement Approximate policy-based accelerated deep reinforcement
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

learning for process optimization, Comput. Chem. Eng., learning, IEEE Transactions on Neural Networks and Learning
2021, 154, 107462. Systems, 2019, pp. 1–11.
180 J. M. Lee and J. H. Lee, Approximate dynamic 193 Z. Wang, H. Li and C. Chen, Incremental reinforcement
programming-based approaches for input-output data- learning in continuous spaces via policy relaxation and
driven control of nonlinear processes, Automatica, importance weighting, IEEE Transactions on Neural Networks
2005, 41(7), 1281–1288. and Learning Systems, 2019, pp. 1–14.
181 C. Peroni, N. Kaisare and J. Lee, Optimal control of a fed- 194 Y. Hu, W. Wang, H. Liu and L. Liu, Reinforcement learning
batch bioreactor using simulation-based approximate tracking control for robotic manipulator with kernel-based
dynamic programming, IEEE Trans. Control Syst. Technol., dynamic model, IEEE Transactions on Neural Networks and
2005, 13(5), 786–790. Learning Systems, 2019, pp. 1–9.
182 J. H. Lee and J. M. Lee, Approximate dynamic programming 195 W. Meng, Q. Zheng, L. Yang, P. Li and G. Pan, Qualitative
based approach to process control and scheduling, Comput. measurements of policy discrepancy for return-based deep
Chem. Eng., 2006, 30(10–12), 1603–1618. q-network, IEEE Transactions on Neural Networks and
183 W. Tang and P. Daoutidis, Distributed adaptive dynamic Learning Systems, 2019, pp. 1–7.
programming for data-driven optimal control, Syst. Control. 196 R. S. Sutton, D. McAllester, S. Singh and Y. Mansour, Policy
Lett., 2018, 120, 36–43. gradient methods for reinforcement learning with function
184 S. Sæmundsson, K. Hofmann and M. P. Deisenroth, Meta approximation, in Proceedings of the 12th International
reinforcement learning with latent variable gaussian processes, Conference on Neural Information Processing Systems,
2018. NIPS’99, MIT Press, Cambridge, MA, USA, 1999, pp. 1057–
185 S. Kamthe and M. Deisenroth, Data-efficient reinforcement 1063.
learning with probabilistic model predictive control, in 197 P. Facco, E. Tomba, F. Bezzo, S. García-Muñoz and M.
Proceedings of the Twenty-First International Conference on Barolo, Transfer of process monitoring models between
Artificial Intelligence and Statistics, ed. A. Storkey and F. different plants using latent variable techniques, Ind. Eng.
Perez-Cruz, Proceedings of Machine Learning Research, Chem. Res., 2012, 51(21), 7327–7339.
Playa Blanca, Lanzarote, Canary Islands, PMLR, 2018, vol. 198 A. Krizhevsky, I. Sutskever and G. E. Hinton, ImageNet
84, pp. 1701–1710. Classification with Deep Convolutional Neural Networks, in
186 D. Chaffart and L. A. Ricardez-Sandoval, Optimization and Advances in Neural Information Processing Systems 25, ed. F.
control of a thin film growth process: A hybrid first Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger,
principles/artificial neural network based multiscale Curran Associates, Inc., 2012, pp. 1097–1105.
modelling approach, Comput. Chem. Eng., 2018, 119, 199 O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
465–479. Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg and
187 H. Shah and M. Gopal, Model-Free Predictive Control of L. Fei-Fei, ImageNet Large Scale Visual Recognition Challenge,
Nonlinear Processes Based on Reinforcement Learning, Int. J. Comput. Vis., 2015, 115(3), 211–252.
IFAC-PapersOnLine, 2016, vol. 49, 1, pp. 89–94. 200 J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E.
188 V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Tzeng and T. Darrell, DeCAF: A Deep Convolutional Activation
Veness, M. G. Bellemare, A. Graves, M. Riedmiller, Feature for Generic Visual Recognition, 2013.
A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. 201 M. L. Darby, M. Nikolaou, J. Jones and D. Nicholson, RTO:
Sadik, I. Antonoglou, H. King, D. Kumaran, D. An overview and assessment of current practice, J. Process
Wierstra, S. Legg and D. Hassabis, Human-level control Control, 2011, 21(6), 874–884.
through deep reinforcement learning, Nature, 2015, 518, 202 M. M. Câmara, A. D. Quelhas and J. C. Pinto, Performance
529–533. evaluation of real industrial RTO systems, Processes,
189 M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, 2016, 4(4), 1–20.
P. Welinder, B. McGrew, J. Tobin, P. Abbeel and W. 203 T. E. Marlin and A. N. Hrymak, Real-time operations
Zaremba, Hindsight experience replay, arXiv, 2017, optimization of continuous processes, in AIChE Symposium
preprint, arXiv:1707.01495, https://arxiv.org/abs/1707.01495. Series - CPC-V, 1997, vol. 93, pp. 156–164.

1506 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

204 P. Tatjewski, Iterative optimizing set-point control – The 220 T. J. Ikonen, K. Heljanko and I. Harjunkoski,
basic principle redesigned, IFAC Proceedings Volumes, Reinforcement learning of adaptive online rescheduling
2002, 35(1), 49–54. timing and computing time allocation, Comput. Chem. Eng.,
205 B. Chachuat, B. Srinivasan and D. Bonvin, Adaptation 2020, 141, 106994.
strategies for real-time optimization, Comput. Chem. Eng., 221 M. Mowbray, D. Zhang and E. A. Del Rio Chanona,
2009, 33(10), 1557–1567. Distributional Reinforcement Learning for Scheduling of
206 A. Marchetti, B. Chachuat and D. Bonvin, Modifier- (Bio)chemical Production Processes, 2022, arXiv preprint
adaptation methodology for real-time optimization, Ind. arXiv:2203.00636.
Eng. Chem. Res., 2009, 48(13), 6022–6033. 222 C. Waubert de Puiseau, R. Meyes and T. Meisen, On
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

207 T. Piotr, et al., Iterative algorithms for multilayer optimizing reliability of reinforcement learning based production
control, World Scientific, 2005. scheduling systems: a comparative survey, J. Intell. Manuf.,
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

208 D. H. Jeong, C. J. Lee and J. M. Lee, Experimental gradient 2022, 1–17.


estimation of multivariable systems with correlation by 223 P. Tsiakis, N. Shah and C. C. Pantelides, Design of multi-
various regression methods and its application to modifier echelon supply chain networks under demand uncertainty,
adaptation, J. Process Control, 2018, 70, 65–79. Ind. Eng. Chem. Res., 2001, 40(16), 3585–3604.
209 A. Marchetti, G. François, T. Faulwasser and D. Bonvin, 224 K. Govindan, M. Fattahi and E. Keyvanshokooh, Supply
Modifier adaptation for real-time optimization – Methods chain network design under uncertainty: A comprehensive
and applications, Processes, 2016, 4(4), 55. review and future research directions, Eur. J. Oper. Res.,
210 D. Navia, L. Briceño, G. Gutiérrez and C. de Prada, 2017, 263(1), 108–141.
Modifier-adaptation methodology for real-time optimization 225 G. Wu, M. A. de Carvalho Servia, M. Mowbray, D. Zhang, P.
reformulated as a nested optimization problem, Ind. Eng. Petsagkourakis and E. A. Del Río Chanona, Distributional
Chem. Res., 2015, 54(48), 12054–12071. Reinforcement Learning to optimize multi-echelon supply
211 W. Gao, S. Wenzel and S. Engell, Modifier adaptation with chains, 2022, Submitted to Journal.
quadratic approximation in iterative optimizing control, in 226 Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang and S. Y. Philip,
European Control Conference (ECC’15), 2015. A comprehensive survey on graph neural networks, IEEE
212 M. Singhal, A. G. Marchetti, T. Faulwasser and D. Trans. Neural Netw. Learn. Syst, 2020, 32(1), 4–24.
Bonvin, Improved directional derivatives for modifier- 227 M. M. Bronstein, J. Bruna, T. Cohen and P. Veličković,
adaptation schemes, IFAC-PapersOnLine, 2016, vol. 50, pp. Geometric deep learning: Grids, groups, graphs, geodesics, and
5718–5723. gauges, 2021.
213 D. H. Jeong and J. M. Lee, Enhancement of modifier 228 A. Aziz, E. E. Kosasih, R.-R. Griffiths and A. Brintrup,
adaptation scheme via feedforward decision maker using Data considerations in graph representation learning for
historical disturbance data and deep machine learning, supply chain networks, 2021, arXiv preprint
Comput. Chem. Eng., 2018, 108, 31–46. arXiv:2107.10609.
214 T. de Avila Ferreira, H. A. Shukla, T. Faulwasser, C. N. Jones and 229 E. E. Kosasih and A. Brintrup, A machine learning
D. Bonvin, Real-time optimization of uncertain process systems approach for predicting hidden links in supply chain with
via modifier adaptation and gaussian processes, in 2018 graph neural networks, Int. J. Prod. Res., 2021, 1–14.
European Control Conference (ECC), IEEE, 2018, pp. 465–470. 230 C. D. Hubbs, H. D. Perez, O. Sarwar, N. V. Sahinidis, I. E.
215 L. E. Andersson and L. Imsland, Real-time optimization of Grossmann and J. M. Wassick, Or-gym: A reinforcement
wind farms using modifier adaptation and machine learning library for operations research problems, 2020.
learning, Wind Energy Sci., 2020, 5(3), 885–896. 231 J. Kennedy and R. Eberhart, Particle swarm optimization, in
216 E. A. del Rio Chanona, P. Petsagkourakis, E. Bradford, J. A. Proceedings of ICNN’95-international conference on neural
Graciano and B. Chachuat, Real-time optimization meets networks, IEEE, 1995, vol. 4, pp. 1942–1948.
bayesian optimization and derivative-free optimization: A 232 T. Salimans, J. Ho, X. Chen, S. Sidor and I. Sutskever,
tale of modifier adaptation, Comput. Chem. Eng., 2021, 147, Evolution strategies as a scalable alternative to reinforcement
107249. learning, 2017, arXiv preprint arXiv:1703.03864.
217 K. M. Powell, D. Machalek and T. Quah, Real-time 233 B. Akay and D. Karaboga, Artificial bee colony algorithm for
optimization using reinforcement learning, Comput. Chem. large-scale problems and engineering design optimization,
Eng., 2020, 143, 107077. J. Intell. Manuf., 2012, 23(4), 1001–1014.
218 C. A. Méndez, J. Cerdá, I. E. Grossmann, I. Harjunkoski 234 J.-B. Park, K.-S. Lee, J.-R. Shin and K. Y. Lee, A particle
and M. Fahl, State-of-the-art review of optimization swarm optimization for economic dispatch with
methods for short-term scheduling of batch processes, nonsmooth cost functions, IEEE Trans. Power Syst.,
Comput. Chem. Eng., 2006, 30(6), 913–946. 2005, 20(1), 34–42.
219 C. D. Hubbs, C. Li, N. V. Sahinidis, I. E. Grossmann and 235 L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, L.
J. M. Wassick, A deep reinforcement learning approach for Rudolph and A. Madry, Implementation matters in deep rl:
chemical production scheduling, Comput. Chem. Eng., A case study on ppo and trpo, in International Conference on
2020, 141, 106982. Learning Representations, 2020.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1507
View Article Online

Review Reaction Chemistry & Engineering

236 J. García and F. Fernández, A comprehensive survey on safe 252 A. G. D. G. Matthews, M. van der Wilk, T. Nickson, K. Fujii,
reinforcement learning, J. Mach. Learn. Res., 2015, 16(42), A. Boukouvalas, P. León-Villagrá, Z. Ghahramani and J.
1437–1480. Hensman, GPflow: A Gaussian process library using
237 P. Petsagkourakis, I. O. Sandoval, E. Bradford, F. Galvanin, TensorFlow, J. Mach. Learn. Res., 2017, 18, 1–6.
D. Zhang and E. A. del Rio-Chanona, Chance constrained 253 A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury,
policy optimization for process control and optimization, 2020. G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L.
238 M. Wen, Constrained Cross-Entropy Method for Safe Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M.
Reinforcement Learning, Neural Information Processing Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang,
Systems (NIPS), no. Nips, 2018. J. Bai and S. Chintala, Pytorch: An imperative style,
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

239 J. Achiam, D. Held, A. Tamar and P. Abbeel, Constrained high-performance deep learning library, in Advances in
Policy Optimization, 2017, arXiv preprint 1705.10528. Neural Information Processing Systems 32, ed. H.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

240 C. Tessler, D. J. Mankowitz and S. Mannor, Reward Wallach, H. Larochelle, A. Beygelzimer, F. dAlché Buc,
Constrained Policy Optimization, 2018, arXiv preprint E. Fox and R. Garnett, Curran Associates, Inc., 2019,
1805.11074, 2016, pp. 1–15. pp. 8024–8035.
241 Gurobi Optimization, LLC, Gurobi Optimizer Reference 254 F. Chollet, et al., Keras, https://keras.io, 2015.
Manual, 2021. 255 M. Innes, Flux: Elegant machine learning with julia, J. Open
242 A. Wächter and L. T. Biegler, On the implementation of an Source Softw., 2018, 3, 60.
interior-point filter line-search algorithm for large-scale 256 D. Yuret, Knet: beginning deep learning with 100 lines of
nonlinear programming, Math. Program., 2006, 106(1), julia, in Machine Learning Systems Workshop at NIPS, 2016,
25–57. vol. 2016, p. 5.
243 The Mathworks, Inc., Natick, Massachusetts, MATLAB 257 L. McInnes, J. Healy, N. Saul and L. Grossberger, Umap:
version 9.11 (R2021b), 2021. Uniform manifold approximation and projection, J. Open
244 A. Engelmann, Y. Jiang, H. Benner, R. Ou, B. Houska and T. Source Softw., 2018, 3(29), 861.
Faulwasser, Aladin-α – an open-source matlab toolbox for 258 D. Lin, Multivariatestats documentation, 2018.
distributed non-convex optimization, 2021. 259 S. M. Lundberg and S.-I. Lee, A unified approach to
245 P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. interpreting model predictions, in Advances in Neural
Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Information Processing Systems 30, ed. I. Guyon, U. V.
Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan
K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. and R. Garnett, Curran Associates, Inc., 2017, pp. 4765–
Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, 4774.
J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. 260 M. T. Ribeiro, S. Singh and C. Guestrin, why should I trust
Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, you?: Explaining the predictions of any classifier, in
A. H. Ribeiro, F. Pedregosa, P. van Mulbregt and SciPy 1.0 Proceedings of the 22nd ACM SIGKDD International
Contributors, SciPy 1.0: Fundamental Algorithms for Conference on Knowledge Discovery and Data Mining, San
Scientific Computing in Python, Nat. Methods, 2020, 17, Francisco, CA, USA, 2016, pp. 1135–1144.
261–272. 261 E. Štrumbelj and I. Kononenko, Explaining prediction
246 C. Rackauckas and Q. Nie, Differentialequations.jl–a models and individual predictions with feature
performant and feature-rich ecosystem for solving differential contributions, Knowl. Inf. Syst., 2014, 41(3), 647–665.
equations in julia, J. Open Res. Softw., 2017, 5, 1–10. 262 W. R. Lacerda, L. P. C. da Andrade, S. C. P. Oliveira and
247 R. T. Q. Chen, Y. Rubanova, J. Bettencourt and D. S. A. M. Martins, Sysidentpy: A python package for system
Duvenaud, Neural ordinary differential equations, Adv. identification using narmax models, J. Open Source Softw.,
Neural Inf. Process. Syst., 2018, 31, 6571–6583. 2020, 5(54), 2384.
248 J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, 263 S. Diamond and S. Boyd, Cvxpy: A python-embedded
D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. modeling language for convex optimization, J. Mach. Learn.
Wanderman-Milne and Q. Zhang, JAX: composable Res., 2016, 17(1), 2909–2913.
transformations of Python+NumPy programs, 2018. 264 L. Beal, D. Hill, R. Martin and J. Hedengren, Gekko
249 C. Rackauckas, M. Innes, Y. Ma, J. Bettencourt, L. White optimization suite, Processes, 2018, 6(8), 106.
and V. Dixit, Diffeqflux.jl - A julia library for neural 265 I. Dunning, J. Huchette and M. Lubin, Jump: A modeling
differential equations, arXiv, 2019, preprint, language for mathematical optimization, SIAM Rev.,
arXiv:1902.02376, https://arxiv.org/abs/1902.02376. 2017, 59(2), 295–320.
250 GPy, GPy: A gaussian process framework in python, http:// 266 W. E. Hart, J.-P. Watson and D. L. Woodruff, Pyomo:
github.com/SheffieldML/GPy, since 2012. modeling and solving mathematical programs in python,
251 J. R. Gardner, G. Pleiss, D. Bindel, K. Q. Weinberger and Math. Program. Comput., 2011, 3(3), 219–260.
A. G. Wilson, Gpytorch: Blackbox matrix-matrix gaussian 267 The GpyOpt authors, GPyOpt: A bayesian optimization
process inference with gpu acceleration, in Advances in framework in python, http://github.com/SheffieldML/
Neural Information Processing Systems, 2018. GPyOpt, 2016.

1508 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online

Reaction Chemistry & Engineering Review

268 M. Balandat, B. Karrer, D. Jiang, S. Daulton, B. Letham, Stoica, Ray: A distributed framework for emerging ai
A. G. Wilson and E. Bakshy, Botorch: A framework for applications, 2018.
efficient monte-carlo bayesian optimization, Adv. Neural Inf. 276 E. Liang, R. Liaw, P. Moritz, R. Nishihara, R. Fox, K.
Process. Syst., 2020, 33, 21524–21538. Goldberg, J. E. Gonzalez, M. I. Jordan and I. Stoica,
269 N. Knudde, J. van der Herten, T. Dhaene and I. Couckuyt, Rllib: Abstractions for distributed reinforcement learning,
Gpflowopt: A bayesian optimization library using 2018.
tensorflow, 2017, arXiv preprint arXiv:1711.03845. 277 G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J.
270 J. A. E. Andersson, J. Gillis, G. Horn, J. B. Rawlings and M. Schulman, J. Tang and W. Zaremba, Openai gym, 2016,
Diehl, CasADi – A software framework for nonlinear arXiv preprint arXiv:1606.01540.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

optimization and optimal control, Math. Program. Comput., 278 J. Tian and other contributors, Reinforcementlearning.jl: A
2019, 11(1), 1–36. reinforcement learning package for the julia programming
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.

271 J. L. Pulsipher, W. Zhang, T. J. Hongisto and V. M. language, 2020.


Zavala, A unifying modeling abstraction for infinite- 279 R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E. Gonzalez
dimensional optimization, Comput. Chem. Eng., and I. Stoica, Tune: A research platform for distributed
2022, 156, 107567. model selection and training, 2018, arXiv preprint
272 J. L. Pulsipher, B. R. Davidson and V. M. Zavala, Random arXiv:1807.05118.
field optimization, 2022. 280 T. Akiba, S. Sano, T. Yanase, T. Ohta and M. Koyama,
273 J. Revels, M. Lubin and T. Papamarkou, Forward-mode automatic Optuna: A next-generation hyperparameter optimization
differentiation in Julia, 2016, arXiv:1607.07892 [cs.MS]. framework, 2019.
274 M. Innes, A. Edelman, K. Fischer, C. Rackauckas, E. Saba, 281 L. T. Biegler, A perspective on nonlinear model predictive
V. B. Shah and W. Tebbutt, A differentiable programming control, Korean J. Chem. Eng., 2021, 38(7), 1317–1332.
system to bridge machine learning and scientific computing, 282 V. Brunner, M. Siegl, D. Geier and T. Becker, Challenges
2019. in the Development of Soft Sensors for Bioprocesses: A
275 P. Moritz, R. Nishihara, S. Wang, A. Tumanov, R. Liaw, E. Critical Review, Front. Bioeng. Biotechnol., 2021, 9,
Liang, M. Elibol, Z. Yang, W. Paul, M. I. Jordan and I. 722202.

This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1509

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy