Industrial Data Science
Industrial Data Science
Chemistry &
Engineering
View Article Online
REVIEW View Journal | View Issue
1471 industries†
Max Mowbray, a Mattia Vallerio, b Carlos Perez-Galvan, b
Dongda Zhang, ac
In the literature, machine learning (ML) and artificial intelligence (AI) applications tend to start with examples
Received 1st December 2021, that are irrelevant to process engineers (e.g. classification of images between cats and dogs, house pricing,
Accepted 21st February 2022
types of flowers, etc.). However, process engineering principles are also based on pseudo-empirical
DOI: 10.1039/d1re00541c
correlations and heuristics, which are a form of ML. In this work, industrial data science fundamentals will
be explained and linked with commonly-known examples in process engineering, followed by a review of
rsc.li/reaction-engineering industrial applications using state-of-art ML techniques.
1 Introduction teams, software, and infrastructure for the past years.1–3 Trying
to mimic big technological companies whose profit is
The potential of data-driven applications in industrial processes determined by better data-driven decisions than random ones
has encouraged the industry to invest in machine learning (e.g. recommending films to watch or advertisements), process
industries need to deal with the safety of such
a recommendations in a physical setting (rather than virtual) and
The University of Manchester, Manchester, M13 9PL, UK
b
Solvay SA, Rue de Ransbeek 310, 1120, Brussels, Belgium. the inevitable challenges imposed by the physicochemical and
E-mail: francisco.navarro@solvay.com engineering constraints.4–6 In the same spirit of mimicking big
c
Imperial College London, London, SW7 2AZ, UK tech companies, the IT challenge focuses on the cost,
† Electronic supplementary information (ESI) available: Annex I illustrates how complexity, and security risk of moving process data to the
to use machine learning to find meaningful correlations between several sensors
cloud when in reality its majority is needed mainly locally.7 On
(tags). Annex II describes sources of uncertainty in more detail. Annex III
provides a glossary for machine learning terms. See DOI: 10.1039/d1re00541c the other hand, chemical companies are continuously looking
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1471
View Article Online
at how to improve the environmental sustainability of their (AlphaFold) for the task of protein structure prediction was
processes by better monitoring (maintenance) as well as yield recently proposed at CASP14, which was able to predict test
and energy optimization. This begs the question; what are the protein structures with 90% accuracy. The solution could
machine learning applications that have worked so far in this potentially provide a basis for future medical breakthroughs.9
Industry 4.0 revolution? What are the biggest challenges the Similar breakthroughs have been made in short-term weather
industry is facing? prediction.10 Current hardware and telecommunications cost,
From a historical perspective, after the 1980s and 1990s, a as well as access to powerful software (either proprietary or
new wave of technological innovations reflected by open-source), has undoubtedly lowered the barriers to the
developments such as expert systems and neural networks realization of such advances. However, it is not trivial to
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
promised to revolutionize the industry.8 Recently, balance the value and the cost-complexity of developing a
applications long marked as ‘grand challenges’ have observed reliable machine learning solution, which can be trusted and
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
significant breakthroughs. For example, a solution maintained in the long term. Thus, are these ML solutions
1472 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
really needed in the process industries? Or are we sometimes models (e.g. regularization and other penalized methods in
reinventing the wheel without knowing? machine learning), should be adopted. To better understand
There is a common consensus in the literature4,8,11 that these similarities, let us revisit the main types of machine
addresses how: learning: supervised, unsupervised, and reinforcement learning.
• applying machine learning techniques without the
proper process knowledge leads to correlations that can be
either obvious or misleading. 2.1 Supervised models
• data science training for engineers can be more effective If the desired output or target is known (labeled) or measured,
than educating data scientists in engineering topics.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1473
View Article Online
2.2 Unsupervised models data-driven techniques are being used to discover and predict
Instead of predicting a label or a measurement, the desired flow patterns (see Fig. 4) in microfluidic applications,19 as well
outcome of these models is to identify patterns or groups as turbulent and porous flows.20,21
which remained previously unknown. More generally in process engineering, dimensionality
The simplest form of an unsupervised model is, for example, reduction naturally occurs with redundancy or excess of sensors
a control chart (see Fig. 3). In statistical process control, as well. For example, if several thermocouples are used to
measurements are categorized into two groups (in-control or measure a critical temperature, these can be summarized by
out-of-control) by tracking how distant they are from the taking the average of all the sensor readings. The average is a
linear combination of all these terms with equal weight. This
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
1474 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
3 Industrial applications in
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
manufacturing
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1475
View Article Online
Fig. 8 Classification of industrial data applications where offline analysis is commonly conducted to diagnose the problem being addressed, with
the solution later implemented online.
Fig. 9 Common modeling steps using an industrial data set with hundreds of tags and a well-defined target (e.g. yield of the process). First, a
screening of variables and selection of tags (sensors) using random forest (a). Many tags will end up being weakly correlated to the target, perhaps
trying to explain its noise. By adding known noise as an additional tag(s), the selection of tags with a certain contribution is facilitated. Then, a
decision tree to obtain a robust non-linear but interpretable model (b). And finally, neural networks (c) once data is cleaned and better understood
to capture all the non-linearities present in the data.
[scikit learn,37] as well. The working principle of these is common to find measured disturbances or manipulated
modeling techniques needs to be understood to avoid variables higher in the contribution. With chemical processes
common mistakes when dealing with time-series data. For designed to keep critical process variables under control,
example: inexperienced analysts will fail to interpret supervised and
• To interpret the contribution of the predictors as unsupervised analysis based on variability (e.g. the cooling
important towards the process design or process control. For flow rate in a jacketed reactor is more important than reactor
example, the design of a reactor impeller might be critical in temperature itself, which is always constant).
explaining the average quality of a product. However, if the • Not managing outliers, shutdowns, and other
impeller is not changed in operation, from a machine singularities in the data. As explained above, tree-based
learning perspective is not important at all. Contrarily, if the models are robust techniques for screening predictors as they
current consumed by the motor was changing due to an partition the data independently from its distribution. Yet,
increase/decrease of viscosity, then the current can appear as the predictors will try to explain the major sources of
a predictor. variability, which might be meaningless (e.g. shutdowns can
• Similarly, without considering the process knowledge be explained with pump current). The use of robust statistics
and process dynamics, it is likely to confuse correlated effects using, for example, medians or interquartile ranges instead
that can be consequences instead of causes. In this regard, it of averages and standard deviations, are a simple way to filter
1476 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
singular data events. However, outliers might carry crucial 3.1.1 Model interpretation and explainable AI. During
information as well (e.g. why the yield dropped at those diagnostics, machine learning models are primarily used as
specific times stamps). In this regard, gradient boosted trees screening tools to identify which inputs (tags) are affecting the
are an alternative as they increase the importance of those target of interest. For example, support vector machines (SVM)
points that could not be explained with prior models (see can also be used to improve process operations similarly to
section 3.3.1 for more discussion). decisions trees.39,40 Pragmatically, several models with their
• By default in most common algorithms, data samples tuning parameters can be fitted (known as autoML).41,42 What
are assumed to be independent of each other. This is still relevant is: what question to ask the data, how to avoid
assumption can be true if each sample contains information over-fitting, and the use of Explainable AI43 (data-driven
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
from batch-to-batch or during steady-state conditions. In the techniques to interpret what more complex ML models are able
majority of the cases, data pre-processing will be required to to capture, see Fig. 10 as an example). For example, resampling
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
remove periods where time delays, dead-times, lags, and inputs while maintaining their distribution (a.k.a. shuffling) will
other process dynamics perturbations affect the target have a measurable impact on the prediction results. Given the
temporarily.38 Section 3.4 will describe the applications of non-linear interactions in the model, the interpretation of
machine learning for dynamic systems and process control. multidimensional local perturbations requires high order
In any case, a proper time-split of the dataset between polynomials,44 or even tree-based models can be
training/validation/test is needed to decrease the risk of used to approximate the response of a higher complex model.
models that were useful in the past only (they only learned The latter approach, known as TreeSHAP (SHapley Additive
how to interpolate the data). exPlanations), has gained popularity in the ML community as it
is starting to be applied in manufacturing environments.45–48
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1477
View Article Online
Fig. 11 A transition between two steady-state regimes for the Tennessee Eastman process (simulated data49) is detected using PCA. If the model
is built using historical data before the perturbation (a) the step changes in the feed flow of chemical A (b and c) are found in the current dataset
for the points highlighted in blue. If all of the historical data is used to build the model (d) the contribution of recent data points in blue (e) shows
signals close to random noise. The plant wide control in the simulation stabilizes the control loops and anomalies are only seen in the transition
period, even though the plant is operating in a different state for chemical A.
Although classical statistical process control methods are out Autoencoders are a type of neural network (see Fig. 13)
of the scope of this work, they should not be disregarded as a where the aim is to learn a compressed latent representation
powerful way to provide descriptive statistics that can ease of the input in the hidden layers. The amount of information
day-to-day decision-making in operations with little that these latent dimensions express is maximized by trying
technological effort. to recover the information given (notice that inputs and
For example, in Fig. 11 diagnostic plots for the PCA-based outputs in the neural network are the same in Fig. 13a). By
multivariate control chart identify a large step change in the restricting the neural network to a reduced number of
flow of a reactant into the reactor. This affects many variables intermediate nodes (i.e. latent dimensions), intrinsic and not
across the Tennessee Eastman process plant which are brought necessarily linear correlations are found in order to minimize
back to their original control limits, with the exception of the the prediction error (Fig. 13b). This way, the variability and
chemical A feed flow variable, where the step change was contribution in noisy inputs will only appear if a higher
introduced (details can be found in ref. 49 and 50). number of nodes is used (similar to having a higher number
The addition of machine learning analysis using, for of principal components). Reducing the number of
example, recent dimensionality reduction techniques, adds redundant sensors to look at while capturing the system
another layer of powerful visualizations that can enhance dynamics is a necessary step for realistic industrial data
monitoring activities. The reader is referred to Joswiak
et al.51 who recently published examples visualizing
industrial chemical processes both with classical approaches
(PCA and PLS) but also more recent and powerful techniques
in machine learning (UMAP52 and HDBSCAN,53,54
particularly). The main advantage of these state-of-art
techniques is the better separation (dimensionality
reduction) and classification (clustering) of events when
dealing with non-stationary multivariate processes (see
Fig. 12). However, if processes are under control, PCA/PLS-
based techniques provide faster, less complex, and more
interpretable insights (e.g. understanding variable
Fig. 12 A transition between two steady-state regimes for the
contributions for linearized systems). Isolation forests have Tennessee Eastman process (simulated data49) visualized with (a) PCA
also been explored in order to detect and explain sources of and (b) UMAP.52 UMAP is able to better reduce the number of tags into
anomalies in industrial datasets.55 two dimensions. The reader is referred to ref. 51.
1478 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
reduce the dimensionality of the data by restricting the number of throughput via energy or mass balances (see Fig. 16a). In
nodes in the middle layers. The transition between two feeding
machine learning terminology this is covered by the feature
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1479
View Article Online
Fig. 15 This figure shows a simplified schematic representation of training (a) and use (b) of generative adversarial networks (GANs) for anomaly
detection on time-series data. Generator (G) and discriminant (D) models are trained through iterations based on the performance feedback of the
D model. Both models compete until satisfactory performances are achieved. Then, D model can be used as an online classifier for anomaly
detection, bottom scheme.
Fig. 16 Compressor characteristic (a) and spectrogram (b) are two traditional approaches to detect inefficient or anomaly operating modes. These
calculations can be considered feature engineering to be combined with statistical or machine learning methods.
between huskies or wolves as a function of snow in the merits of monitoring melt viscosity, temperature profile, and
background.73 As with the snow, consequences are often flow index as indicators of product quality in the context of
stronger or simpler predictors than perhaps other features polymer processing. As a result, soft sensors may be
that process experts were listing as root-causes only. For this constructed to infer these qualities from other available
reason, soft sensor models need to be approached separately process measurements (such as screw speed, die melt
as their main objective is to only provide online estimation temperature, feed rates, and pressures) either via first
and monitoring of quality, yield, and lab measurements. principles, data-driven or hybrid modeling approaches.
As with other online sensors (e.g. NIR, near-infrared Hybrid- or grey-box models are commonly known in the
sensor), soft sensors require calibration and maintenance to literature.77–79 A combination of data-driven models with first-
ensure acceptable levels of accuracy and precision. In that principles models can remove variability or capture unknown
regard, several techniques exist to handle prior knowledge or mechanisms, e.g. discrepancy models.16 For example, if a heat
the lack of it (this being a form of uncertainty). An industrial or mass balance can foresee issues in quality or productivity,
example that illustrates the challenges when building soft- predictors that are part of these terms will be immediately
sensors for continuous processes can be found here.74 Its found. Simply removing them from the input list will not
analysis (as detailed in ref. 75) combines data preparation, change the variability on the target, so a better approach is to
anomaly detection, multivariate regression and model focus on explaining the residuals. For example, if an oscillation
interpretability, so far discussed in this manuscript. in the yield is found to be correlated to seasons due to better/
In this section, we will focus our discussion on estimating worse cooling in winter/summer, it will be better to remove
quality or yield for batch processes, which represents an such effect from the target (not from the list of inputs) and
additional challenge from the data analytics perspective. refocus the analysis on the remaining and unexplained
3.3.1 Discrepancy models and boosting. Consider that in a variability. This is what boosted tree models achieve in machine
production process, it is often desired to infer end-quality of learning (see Fig. 18), and the same approach can be used in
the product. For example, in ref. 76, the authors discuss the neural networks as mentioned in Fig. 14.
1480 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1481
View Article Online
coordinate variables that capture variability seen during the batch. In reports many examples of data-driven and first principle
the image, batch curves can be described again using a combination
of components 1 and 2.
models, in the context of polymer processing, that are able to
successfully predict the desired property (e.g. melt viscosity,
temperature profile, and flow index). More widely, this is
scores can be thought of as the “amount” of each primarily due to well-established statistical practices, as
characteristic functional component that there is in each encompassed by data reconciliation and validation
function (batch). approaches,90,91 model selection, validation tools,92 data
FPCA requires the alignment of batches to remove assimilation practice,93,94 and the field of estimation theory
variability in the time axis. Some reaction phases can take (which is generally concerned with identifying models of
longer due to different kinetics or simply waiting times due systems from data).95,96
to scheduling decisions. On some occasions, using In the following, we discuss data-driven techniques to
conversion instead of time will automatically align the briefly illustrate a general approach to reduce redundant tags
batches. When this information or other variables such as with similar effect size, quantify the historical variability or
automation triggers81 are not measured or unknown, uncertainty, to provide insight into possible future process
dynamic time warping techniques (DTW) can be used to conditions.
statistically align the batch trajectories (Fig. 21).80,82,282 DTW 3.3.3.1 Effect size, variable, and model selection. Data-driven
can also be used to classify anomalous batches and to models are, by definition, determined by the selection of
identify correlating parameters (Fig. 22).82–86 inputs and outputs. In the previous section, synthetic noise
3.3.2.1 Iterative learning control. Generally, the model inputs were intentionally used as additional variables to find
construction process and estimation of uncertainty are and remove those tags which showed a similar contribution
subject to a finite amount of data, which can lead to over- or towards the target.35,36 The idea behind is that the model
under-estimation. Sampling and bootstrap techniques (see starts using noise as a predictor once overfitting has been
next section) can be used to handle such a scenario and this reached. Another similar approach known as dropout97
is often useful in the estimation of the underlying consists in removing model parameters during training,
distribution of data empirically. Various iterative-learning which will also take care of redundant sensors that will
(control) methods also exist that help to adapt model appear as co-linear factors in screening models. Alternatively,
Fig. 21 Alignment of several batches using the temperature profile and dynamic time warping (a before and b after the alignment).
1482 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1483
View Article Online
Fig. 26 Uncertainty can be estimated by comparing a model (or sample expressed in the finite amount of data and is a well-known
statistic) with its simulated distribution using resampling techniques. For
practice within the domain of model construction.92
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
Fig. 27 Figurative description of the Bayesian approach to express modeling uncertainty in neural networks. The top two subplots show the
covariance between two-parameter distributions in the first and second layers of the network, respectively. The bottom subplot demonstrates the
generation of a predictive distribution by Monte Carlo sampling the parametric distributions identified via approximate Bayesian inference.
1484 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
Fig. 28 Expression of a Gaussian process posterior (i.e. its mean and uncertainty predictions) for the modeling of a smooth noiseless function.
The figure demonstrates the effects of an increasing number of data points: a) 5 data points, b) 6 data points, c) 7 data points. Note how as the
number of data points increases, the uncertainty estimate (i.e. the 95% confidence interval) reduces and the mean GP prediction becomes a better
estimate of the ground truth.
estimate of the uncertainty.40,108 This has been demonstrated inference (as performed in variational GPs). For more detailed
in ANN,109 hybrid approaches,110 and random forest models information on the mathematics underlying GPs, we direct
(see annex),108 amongst others. to,115 and for an introductory tutorial, we recommend.116
Another approach to training ANNs is provided by the
Bayesian learning paradigm. Bayesian neural networks (BNN)
share the same topology as conventional neural networks, 3.4 Process control and process optimization
but instead of having point estimates for parameters, they Despite functioning in narrow operational regions, process
instead have a distribution over parameters (Fig. 27). Treating dynamics need to be considered if the aim is to use
the network parameters as random variables then allows for predictive models for control applications that are not
the generation of a predictive distribution (given a model maintained strictly at steady-state conditions (i.e. main flows
input) via the Monte Carlo method. Similarly, Bayesian and levels are fairly stable38,117,118).
extensions to other models such as support vector machines System inertia or residence time (in chemical
(SVMs)111 exist. engineering), response time or time constant (in process
One eloquent approach is to identify a predictive model control), and autocorrelation (in time series models) are
that expresses both a nominal and uncertainty prediction in different characteristics of dynamical systems. For example,
closed form.108,112 However, unlike the Bayesian paradigm, transportation delay (also known as dead-time) will hinder
this approach produces an uncertainty estimate of the any conclusion done from pure correlation analysis (e.g.
underlying data (i.e. the natural variance of the underlying upstream changes affecting the target hours or days later). In
data-generating process, otherwise known as aleatoric addition, applications of machine learning modifying
uncertainty113) and is not reflective of the uncertainty arising operation parameters need to monitor the presence or
from the lack of information (or data, otherwise known as creation of plant-wide oscillations given close-loop process
epistemic uncertainty114) used to train the model. control or the presence of recycling streams.119,120
Gaussian processes (GPs) are non-parametric models, In this section, we now explore the use of data-driven
which means that the model structure is not apriori-defined. methods not only as monitoring or supervisory systems, but for
This provides a highly flexible model class as GPs enable the their direct application in process control and optimization. In
information expressed by the model to grow as more data is both cases, we are concerned with the identification of a
acquired. In GPs, given a model input, one can directly dynamical system. For more specific discussion regarding
construct a predictive distribution (i.e. a distribution over state-of-the-art, data-driven derivative-free approaches to
target variables) analytically via Bayesian inference and optimization, we direct the interested reader to this work.121
exploitation of the statistical relationships between 3.4.1 Dynamical systems modeling and system
datapoints. Further the uncertainty estimate of a GP identification. A simplified problem statement for the
expresses both aleatoric and epistemic uncertainty. The latter modeling of dynamical systems is: given a dataset of process
is reducible upon receipt of more data, but the former trajectories that express temporal observations of the system
element is irreducible. This is expressed by Fig. 28. state variable, x, and control inputs, u; identify either a
In the scope of practical use, it should be noted the function, fd, expressive of a mapping between system inputs
computational complexity of GPs grows cubically with the and states at the current time index, t, and states at the next
number of datapoints, so they either become intractable with time index, t + 1, or a function, fc, that describes the total
large datasets or require the use of approximate Bayesian derivative of the system state with respect to time, as well as
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1485
View Article Online
Fig. 29 A second-order linear dynamical system with one (a) observed state, y(t), and (b) control input, u(t). The discrete evolution of y(t + 1) can
be approximated as a function of the cumulative sum (cusum) of state (over a past horizon) and the most recent control input, instead of simply
using the previous measurement. A comparison is shown in subfigure c – cusum in red vs. most recent state in green. The cusum is thought to
properly account for the inertia of the system,122,123 whereas using the most recent state produces essentially a memoryless model. Training,
validation, and tests datasets are partitioned and evaluated using multi-step ahead prediction (recurrent) from an initial condition (d).
a mapping descriptive of the mechanism of system theory. It follows then that the traditional ethos of SI, in the
observation, g. A general definition of discrete-time process domain of PSE, constructs models that a) entail tractable
evolution and observation is provided as follows: parameter identification (i.e. that this estimation procedure
is at the very least identifiable, but more preferably convex or
xt+1 = fd(xt, ut) + wt (System model) (1a) analytical),124 b) are convenient for further use in process
control and optimization, and c) apply the concept of
yt = g(xt, ut) + et (Measurement model) (1b) Occam's Razor.125 As a result, this means that the models
identified in classical SI are often linear in the parameters126
where yt is the measured variable, xt is the real system state, i.e. that process evolution can be described as a linear
wt is additive system disturbance and et is typically a zero- combination of basis functions of the system state and
mean Gaussian noise. An example of such a system is shown control input.‡ It is also worth emphasizing that such a class
in Fig. 29, which shows a second-order system. The measured of models can still express nonlinearities, whilst typically
output y(t + 1) is, therefore, a function of u(t) but also the gaining the ability to conduct estimation online, due to the
inertia of the system. This is implicit and observed through efficiency of the algorithms available.127 As a result, these
the evolution of the state variable, x(t), which in this example techniques are applied not only in process industries, but
corresponds to the measured y(t). also widely used in navigation and robotics.128
There are two primary approaches to the identification of Given the narrow operational region of the process
such a function – first principles (white-box) and data-driven industries, it has historically been dominated by the
modeling (black-box). Generally, the benefits of first-principles prevalence of linear time-invariant (LTI) models of dynamical
approaches arise in the identification of a model structure, systems. The general idea here is to construct the evolution
which is based on an understanding of the physical of state (i.e. fd or fc), as well as its observation (i.e. g), as a
mechanisms driving the process. This tends to be highly useful linear combination of the current state and control input.
when one would like to extrapolate away from the region of the The field of SI pioneered the efficient identification of the
process dynamics seen in the data. Given the remit of this associated model parameters, θLTI, through the development
paper, we focus on data-driven modeling approaches. of subspace identification methods.129 One of the
Particularly when interest lies in control applications, foundational methods provided independently by Ho and
data-driven modeling of dynamical systems has been ruled Kalman (and others) leverages the concepts of system
by the field of system identification (SI). SI lies at the
intersection of probability theory, statistical estimation ‡ Note that, when the basis function selected is linear, the control will be able
theory, control theory, design of experiments, and realization to guarantee stability, reachability, controllability, and observability.
1486 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
controllability and observability to identify θLTI in closed example is provided by reinforcement learning, the general
form, given measurements of the system state in response to process of which can be conceptualized as simultaneous
an impulse control input signal. The insight provided by this system identification and learning of control and optimization.
method is that the singular value decomposition (SVD) of the Further discussion of reinforcement learning is provided by
block Hankel matrix (composed of the output response) section 3.4.5. In the following, we outline the second (and
provides a basis decomposition equivalent to the emerging) approach to data-driven modeling of dynamical
controllability and observability matrices. This ultimately systems as provided by the field of ML.
enables the identification of θLTI via a solution of the normal In keeping with the previous discussion, again in the ML
equations – hence mitigating the requirement for gradient- paradigm, one can identify either discrete dynamics fd or
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
based (iterative search) optimization algorithms. Clearly, a continuous dynamics fc. However, what the use of ML implies is
number of assumptions are required from realization theory the availability of a large, diverse, and highly flexible class of
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
and on the data generation process. However, a body of models and estimation techniques (i.e. one can select from
algorithms has been developed since to account for various supervised, unsupervised, and reinforcement learning
stochasticity130 and other input signals.131 approaches). Hence, selection of a) the most appropriate model
Given the relatively restrictive nature of LTI, innovative type, b) structure, c) use of features (model inputs and outputs),
model structures and various modeling paradigms have been d) training algorithm and e) partitioning of data and model
exploited in order to approximate systems (common to PSE) evaluation metric can only be guided by cross-validation
that exhibit nonlinear or time delay behavior. From the techniques, domain knowledge and certain qualities of the data
perspective of tackling nonlinearity, parametric and non- available. In some sense, this prevents the admittance of general
parametric models include (but are certainly not limited to) recommendations. However, in the following paragraphs, we
the Hammerstein and Wiener and their structural variants,132 explore some ideas as gathered from experience.
polynomials, nonlinear autoregressive models,133 and various • Selection of model type: clearly, for certain systems, a
kernel methods, such as Volterra series expansion models134 given model class will be more effective at modeling the
and radial basis functions.135 There have also been a number associated dynamics than others. For example, if the system
of methods developed to handle approximation of processes observes smooth, lipshitz continuous behavior (e.g. as is
with time delay, such as first-order plus dead time generally the case if no phase transition is present in the
(FOPDT)136 and second-order plus dead time (SOPDT) process), and we are interested in identifying discrete dynamics
systems137 as well as nonlinear autoregressive moving fd, then the use of neural networks141 and Gaussian
average models with exogenous inputs (NARMAX).133 Given processes142 are particularly appealing, primarily because of
the number and diversity of the models firmly rooted within the existing proofs pertaining to the universal approximation
the SI toolbox, as well as the inevitable sources of uncertainty theorem, which considers continuous functions. If the data
arising in the construction of models, many of the same expresses discontinuities (as would be the case if generated
model validation practices are employed in SI, as were from a process observing phase transitions), then perhaps the
discussed in section 3.3.3.124 With respect to parameter use of decision tree-based models would be more effective (as
estimation, many algorithms have been developed to identify these models can be conceptualized as a weighted combination
the associated model parameters in closed form. However, of step functions – although it should be noted that e.g. random
arguably, the more expressive or unconstrained the model forest models are often poor at generalizing predictions for the
structure becomes, the greater the dependence of parameter very same reason). Similarly, if the process dynamics are
estimation on search-based maximum likelihood routines nonstationary, then perhaps the use of e.g. deep Gaussian
(otherwise known as the prediction error method (PEM) in processes143 would be more desirable, given the inability of
the SI community). Perhaps the most obvious example of this single Gaussian processes to express nonstationary dynamics
is the training of neural networks, which are commonplace (given selection of a stationary covariance function).
within the SI toolbox.138 Alternatively, one could retain the use of GPs but instead
3.4.2 Machine learning for dynamical systems modeling. consider the use of either input or output warping, which has
The mention of neural networks seems to have brought us full been shown to remedy issues caused by non-stationarity
circle to the field of machine learning (ML). It is therefore a among other features of the data available.144,145 Various other
good idea to make the point that ML and SI are not so distinct extensions for GPs also exist.146 If one would like to express
as one may think. In fact, both fields are deeply rooted in continuous dynamics fc, then two approaches could be
statistical theory and estimation practice. Perhaps the considered. Either, one could predict the parameters of a
overarching difference between traditional ML and SI is that mechanistic or first principles model conditional to different
the developments of ML are somewhat unconstrained by the points in the input space (i.e. construct a hybrid model), using
concerns relevant to SI. These concerns primarily relate to the a neural network, Gaussian process, etc.;79 or one could take
use of the models derived for the purposes of control and the approach provided by neural ordinary differential
optimization. However, there is a certain symbiosis observed equations (neural ODE) models,147 which directly learn the
currently in the advent of many learning-based system total derivative of the system. Despite the suitability of a given
identification139 and control algorithms.140 A particular model class to a given dynamical system, innovative algorithms
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1487
View Article Online
can be conceptualized to handle the perceived weakness of a framework used to formulate the inverse problem.152
given model class to the problem at hand. For example, Definition of the former typically considers the
returning to the problem of nonstationary dynamics, one could dimensionality of the parameter space, as well as the
conceivably partition the input space and switch between a nonlinearity and differentiability of the model itself.
number of Gaussian process models (with stationary covariance Meanwhile, the latter is governed by the decision to operate
functions) depending on the current state of the system.148 within either a Bayesian or frequentist framework (e.g. see
• Selection of model structure: the choice of model discussion in uncertainty appendix), which subsequently
structure pertains to decisions regarding the hyperparameters gives rise to an appropriate loss function for estimation (e.g.
of a given model. For example, in polynomial models, the MSE). Further decisions regarding the addition of
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
identification of higher-order terms describes the effects of regularization terms into the loss function may also be
interaction between input variables (i.e. enables the expression considered. Recent works in the domain of physics-informed
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
of nonlinear behavior). Similar considerations also apply when deep learning, aim to extend the traditional bias-variance
choosing activation functions in neural networks. Such a analysis to regularise predictions to satisfy known differential
problem is not trivial and even under the choice of the correct equations.153 This appears a promising approach to
(parametric) model class, the predictive performance is often incorporate physical information into ML models beyond
largely dependent on the quality of structure selection. At a traditional hybrid modeling approaches, however, it is
high level, such a problem is negated in the setting of non- generally not known how well these approaches perform
parametric models, or more specifically in the case of Gaussian when assumptions regarding the system's behavior are
processes. However, consideration is still required in the inaccurate (i.e. depart from ideal behavior). The selection of a
appropriate selection of a covariance function. This has led to statistical estimation framework also has implications for the
the development of automated algorithmic frameworks, as expression of various model uncertainties as discussed
demonstrated by algorithms such as sparse identification of previously in section 3.3.3.4. Clearly, uncertainties are
nonlinear dynamics (SINDy),149 ALAMO150 and various important to consider (and propagate) in the (multi-step
hyperparameter optimization frameworks.41 ahead) prediction of dynamical systems. Secondary to the
• Selection of features: it is important to emphasize the points discussed, the training algorithm should also consider
use of feature selection (relating both to the input and output the ultimate purpose of the model. For example if we are
of the model). Perhaps the most important feature selection looking to make predictions for ‘multiple-steps’ or many
(in relation to the model input) is the determination of those time indices ahead (e.g. predicting xt+3 = fd(fd(fd(xt))) from
process variables which have physical relationships to those some initial state, xt), one should consider how the training
states we are interested in predicting the evolution of. This is algorithm can account for this (see ref. 154), as it is an extension
enabled both by operational knowledge as well as building of the previous problem of identifying discrete dynamics.
decision tree-based models on the data available and then This can also be approached by considering the selection of
conducting further analysis to identify important process model structure and features (e.g. directly predicting multiple
variables.92 Further, even in systems that are assumed to be steps ahead).
Markovian (i.e. where the dynamics are governed purely by • Selection of data partition and model evaluation metric:
the current state of the system and not by the past sequence the blueprint for model training (i.e. training, validation, and
of states), it is often the case that predictive capabilities are testing92) necessitates the appropriate partitioning of data
enhanced by the inclusion of system states at a window of into respective sets. It is important in dynamical systems
previous time indices or incremental changes in the state. modeling that the datapoints for validation and testing are
Intuitively, such an approach provides more information to independent from those used in training. Therefore,
the model. A similar idea exists in the use of a cumulative generations of partitions by randomly subsampling a dataset
sum of past states over a horizon.122,123 Similarly, in the is not sufficient in the case of time-series data. To expand,
context of output feature selection and predicting discrete consider data from a batch process. One should split the data
dynamics fd, one could construct a model, fΔ, to estimate the such that separate (and entire) runs constitute data in
discrete increment in states between time indices (such that training, validation and testing. Equally, the means of
xt+1 = xt + fΔ(xt, ut)), which strikes similarities to the (explicit) evaluation155 should be strictly guided by a model's intended
Euler method. It is thought that the comparative advantage use. Typically, in use of models for dynamical systems, we
of such a scheme (over xt+1 = fd(xt, ut)), is that information are interested in predicting ‘multiple-steps’. In such a case, it
provided from the previous state is maximised. Recent work is likely that model errors will propagate through predictions.
has developed this philosophy further via a Runge–Kutta Therefore, if intended for such, quantification of the
(RK) and implicit trapezoidal (IT) scheme,151 demonstrating predictive accuracy of a single-step ahead is unlikely to be a
both schemes are able to well predict stiff systems (with the sufficient metric.
IT scheme performing better, as one would expect). In view of the extensive discussion provided on dynamical
• Selection of training algorithm: primarily quantifies the systems modeling, the discussion now turns to data-driven
means of parameter estimation, i.e. the optimization control and optimization of processes with a focus on plant
algorithm, and (extensions too) the statistical estimation and process operation.
1488 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
3.4.3 Model predictive control. Model predictive control produce some product C from a given reactant A:
(MPC) is currently the benchmark scheme in the domain of
k1A k2B
advanced process control and optimization (APC). The 2A → B → 3C (2)
general idea of MPC is to identify a discrete and finite
sequence of control inputs that optimizes the temporal where k1A and k2B are kinetic constants and B is an
evolution of a dynamical system over a time horizon intermediate product. The reaction kinetics are first order
according to some objective function.156 MPC is reliant upon and the compositions of A, B and C are manipulated through
the identification of some finite dimensional description of control of the reactor temperature via a cooling jacket and
process evolution as a model. Various optimization schemes also flowrates of A into the reactor (otherwise known as
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
(such as direct single-shooting, direct multiple shooting and control inputs, u). At specific instances in time throughout
direct collocation157) can be deployed to identify such a the batch, the control element is able to change the setting
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
sequence of control inputs according to the description of these control inputs. The objective of process operation is
provided by the model. Additionally, if operational to maximise the production of C at the end of the batch
constraints are imposed upon the problem and the operation, with penalty for the absolute magnitude of
underlying model is a perfect description of the system, the changes in controls between each control interaction. Given
solution identified will be (at least locally) optimal under that the operation is fed-batch there are a finite number of
both the dynamical model and operational constraints, given interactions the control element has available to maximize
that the control solution must satisfy the Karush–Kuhn– the process objective function.
Tucker (KKT) conditions. However, the models we identify of In practice, we are able to identify a model describing the
our processes are not perfect descriptions and often evolution of the underlying system composition and
processes are influenced by various uncertainties and temperature (state, x) as a system of continuous differential
disturbances. MPC schemes handle this by incorporating equations. To deploy MPC, we can simply estimate the model
state-feedback. This means at each discrete control parameters, discretize the model with respect to time via a
interaction the MPC scheme is able to observe (measure) the given numerical method of choice and integrate it into one
current state of the system, and then (through optimization) of the optimization schemes detailed previously. One can
identifies an optimal sequence of controls over a finite then optimize the process online by incorporating
discrete time horizon – the first control identified within the observation of the real system state as the process evolves
sequence is then input to the system and the process and reoptimizing the control inputs over a given discrete
repeated as the system evolves. This is expressed by Fig. 30, time horizon (as displayed by Fig. 30).
which specifically shows a receding horizon MPC, where the There are a number of drivers within the domain of MPC
length of the finite discrete time horizon used in research including handling nonlinear dynamics,281
optimization is maintained as the process evolves. uncertainty and improving dynamical models online (or from
To further explore the use of MPC and alternative data- batch to batch) using data accrued from the ongoing process.
driven methods with potential in the chemical process 3.4.4 Data-driven MPC. As alluded, MPC algorithms
industries, we conceptualise a batch chemical process case exploit various types of models, commonly developed by first
study as outlined in ref. 160. Specifically, we are concerned principles or based on process mechanisms.161 Many
with the following series reaction (catalysed by H2SO4) to mechanistic and empirical models are however often too
Fig. 30 Demonstration of the use of state-feedback in receding horizon MPC for online optimization of an uncertain, nonlinear fed-batch
process. Optimized forecast and evolution of a) the state trajectory, b) the control trajectory (composed of piecewise constant control inputs). See
ref. 158 and 159 for more information on the system detailed.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1489
View Article Online
complex to be used online and in addition have often high underlying system (i.e. the state). Hence, in order to act
development costs. Data-driven MPC, which uses black-box optimally within a sequential decision making problem, the
identification techniques to construct its models has been control element should be reactive to observations of state
exploited instead, such techniques include support vector (i.e. the control element should be a control policy, π). Here
machines,162 fuzzy models,163 neural networks (NNs),164 and we note that implementation of an MPC scheme is essentially
Gaussian processes (GPs).165 More recently, GP-based MPC the identification of a control policy, as realizations of
algorithms that take into account online learning have been process uncertainty are accounted for via state feedback as
proposed.166,167 These algorithms take information from new discussed in section 3.4.3.
samples and update the existing data-driven model to account RL describes a set of different methods capable of
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
for better performance in terms of constraint satisfaction and learning a functionalization of such a control policy, π(θ, ·),
objective function value.168 Similar ideas have been taken into where θ are parameters of the functionalization. Further, RL
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
account in recent distributionally robust variants.169 does so within a closed-loop feedback control framework,
Additionally, the paradigm of MPC with learning is an MPC independently of explicit assumptions as to the form of
scheme with a nominal tracking objective and an additional process uncertainty or the underlying system dynamics. This
learning objective.170 Generally, the construction of the learning is achieved generally via sampling the underlying system with
term is based on an economic optimal experiment design different control strategies (known as exploration) and
criterion,170–174 furthermore, Gaussian processes have been used improving the functionalization thereafter by using feedback
for optimal design of experiments.175 This framework allows from the system and objective function (this process is
gathering information from the system under consideration, known as generalized policy iteration178). An intuitive way to
while at the same time optimizing it, ultimately trying to address think about this is in terms of the design of experiments
the exploration–exploitation dilemma. (DoE). Generally, DoE methodologies include elements that
3.4.5 Reinforcement learning. The automated control of explore the design space and then subsequently exploit the
chemical processes has become paramount in today's knowledge that is derived from that exploration process. This
competitive industrial setting. However, along with dynamic process is often iterative. RL uses similar concepts but
optimization, control is a challenging task, particularly for instead learns a control policy for a given sequential
nonlinear and complex processes. This section introduces decision-making problem.
reinforcement learning as a tool to control and optimise To further elucidate as to the benefits of RL, we now explore
chemical processes. While PID and model predictive (MPC) the conceptual fed-batch chemical process introduced in
controllers dominate industrial practice, reinforcement section 3.4.3. Now, assume we can estimate the uncertainties
learning is an attractive alternative,29,176 as it has the potential of the variables that constitute our dynamical model. If we were
to outperform existing techniques in a variety of applications, able to jointly express the uncertainties of the model, we could
such as online optimization and control of batch processes.177 equivalently describe discrete-time dynamical evolution of the
We only discuss model-free reinforcement learning here, as system state (i.e. reactor composition and temperature) as a
model-based reinforcement learning is very closely related to conditional probability density function. In practice, we cannot
data-driven MPC for chemical process applications, and a full express this conditional probability density function in closed
discussion on this topic is out of the scope of this section. form, however, we can approximate it via Monte Carlo
3.4.5.1 Intuition. In any (discrete-time) sequential decision- simulation (i.e. sampling). Here lies the fundamental
making problem, there are three principal elements: an advantage of RL: through simulation one can express any form
underlying system, a control element, and an objective of uncertainty associated with a model, and through
function. The aim of the control element is to identify generalized policy iteration an optimal control policy for the
optimal control decisions, given observations or uncertain system can be learned. This removes the
measurements of the underlying system. The underlying requirement to identify expressions descriptive of process
system then evolves (between control decisions) according to uncertainty in closed form (as is required in stochastic and
some dynamics. The optimality of the decisions selected by robust variants of MPC). The use of simulation is what makes
the control element and the evolution of the system is RL an incredibly general paradigm for decision making, as it
assessed by the objective function. This is a very high-level enables us to consider all types of model and process
and general way to think of any decision-making process. uncertainties jointly. In the following, we provide intuition as
Under some assumptions, there is at least one sequence to how generalized policy iteration functions.
of decisions that is able to globally maximize a given As the uncertainty of the process is realised through
objective function. If the evolution (or observation) of the simulation, at each discrete time index, t ∈ {0,…, T − 1},
underlying system is uncertain (stochastic), then this process evolution is rated with respect to the process
sequence of decisions must be reactive or conditional to the objective via a reward function, R(xt, ut, xt+1). The reward
realisation of the uncertainty. In the RL paradigm, one function provides a scalar feedback signal, Rt+1 (that is
assumes that all of the information regarding realisation of equivalent to negative stage cost, as used in conventional
the uncertainty and current position of the system is controls terminology). This feedback signal can be used
expressed within observation or measurement of the together with data descriptive of process evolution (i.e. {xt, ut,
1490 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
Fig. 32 An overview of the RL algorithm landscape. Methods such as Q learning, which provided foundational breakthroughs for the field, are
based on principles common to dynamic programming. All of these methods aim to learn the state-(action) value function. Policy optimization
algorithms provide an alternative approach and specifically parameterize a policy directly. Actor-critic methods combine both approaches to
enhance sample efficiency by trading-off bias and variance in learning. Figure reproduced with permission from ref. 179.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1491
View Article Online
Fig. 33 The state trajectories generated in online optimization of an uncertain, nonlinear fed-batch biochemical process via RL and NMPC. In this
case, the controller is able to observe a noisy measurement, y = [y1, y2], of the system state, x. Reproduced with permission from the authors.
187). All these approaches rely on the (approximate) solution Further,193 a systematic incremental learning method is
of the Hamilton–Jacobi–Bellman equation and have been presented for RL in continuous spaces where the system is
shown to be reliable and robust for several problem dynamic, this is the case in many chemical processes, where
instances. future ambient conditions and feeds are unknown and
Some popular value-based RL algorithms include DQN,188 varying, amongst other developments.194,195
hindsight experience replay (HER),189 distributional Recent research has been focusing on another side of RL
reinforcement learning with quantile regression (QR- for chemical process control, that of using policy
DQN),190 and rainbow191 which combines state-of-the-art gradients.29,196 Policy gradient methods directly estimate the
improvements into DQN. control policy, without the need of a model, or an online
3.4.7 Reinforcement learning – policy optimization. RL optimisation. Therefore, aside from the benefits of RL, policy
algorithms based on policy optimization directly parametrize gradient methods additionally exhibit the following
the policy by some function approximator (say a neural advantages over action-value RL methods (e.g. deep Q-
network), this is schematically represented in Fig. 35. Policy learning):
gradient methods are advantageous in many problem • Policy gradient methods enable the selection of control
instances, and there have been many developments that have actions with arbitrary probabilities. In some cases (e.g.
made them suitable for process optimization and control. partially observable systems), the best policy may be
For example, in ref. 192 the authors develop an approximate stochastic.178
policy-based accelerated (APA) algorithm that allows the RL • In policy gradient methods, the approximate (possibly
algorithms to converge when using more aggressive learning stochastic) policy can naturally approach a deterministic
rates, which significantly speeds up the learning process. policy in deterministic systems,29 whereas action-value
Fig. 34 Comparison of the control trajectories generated via RL and NMPC in the same problem instance as in Fig. 33. The control trajectories are
composed of piecewise constant control actions.
1492 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
Fig. 35 A schematic representation of a framework for the application of RL to chemical process optimization. Initial policy learning is first
conducted offline via simulation of an approximate process model. The policy is then transferred to the real system where it may be improved
either via iterative improvement of the offline model or directly from the data accrued from process operation.
methods (that use epsilon-greedy or Boltzmann functions) controls between successive control interactions. It can be
select a random control action with some heuristic rule.178 seen that the RL policy generally observes smaller changes in
• Although it is possible to estimate the objective value of the controls than the NMPC. In practice, this may lead to less
state-action pairs in continuous action spaces by function wear of process valves and reduce process downtime.
approximators, this does not help choose a control action. The process systems engineering community has been
Therefore, online optimization over the action space for each dealing with stochastic systems for a long time. For example,
time-step should be performed, which can be slow and nonlinear dynamic optimization and particularly nonlinear
inefficient. Policy gradient methods work directly with model predictive control (NMPC) are powerful methodologies
policies that output control actions, which is much faster to address uncertain dynamic systems, however, there are
and does not require an online optimization step. several properties that make its application less attractive. All
• Policy gradient methods are guaranteed to converge at the approaches in NMPC require the knowledge of a detailed
least to a locally optimal policy even in high dimensional (and finite-dimensional) model that describes the system
continuous state and action spaces, unlike action-value dynamics, and even with a detailed model, NMPC only
methods where convergence to local optima is not addresses uncertainty via its finite-horizon feedback. An
guaranteed.196 approach that explicitly takes into account uncertainties is
• In addition, policy gradients can establish a policy in a stochastic NMPC (sNMPC), however, this additionally
model-free fashion and excel at online computational time. requires an assumption for the uncertainty quantification
This is because the online computations require only and propagation, which is difficult to estimate or even
evaluation of a policy since all the computational cost is validate. Furthermore, the online computational time is a
shifted offline. bottleneck for real-time applications since a nonlinear
The drawback of policy gradient methods is their optimization problem has to be solved. In contrast, RL
inefficiency with respect to data, as value-based methods are directly accounts for the effect of future uncertainty and its
much more data-efficient. feedback in a proper ‘closed-loop’ manner, whereas
3.4.8 Reinforcement learning vs. NMPC. To demonstrate conventional NMPC assumes open-loop control actions at
the performance of RL relative to current methods, in Fig. 33 future time points in the prediction, which can lead to overly
and 34 we present one of the results from recent work.29 conservative control actions.180
Here, the authors employ policy optimization based RL and 3.4.9 A framework for RL in process systems engineering.
provide a comparison of the performance to an advanced Using RL directly on a process to construct an accurate
nonlinear model predictive control (NMPC) scheme. The controller would necessitate prohibitive amounts of data, and
figures show the distribution of process trajectories (i.e. therefore process models must be used for the initial part of
states and controls) from an uncertain, nonlinear fed-batch the training. This can be a detailed “knowledge-based”
process. The work shows that the performance of the RL is model, a data-driven model, or a hybrid model.29
certainly comparable to NMPC, but accounts for process The main computational cost in RL is offline, hence in
uncertainty slightly better. For example, Fig. 34 shows the addition to the use of models, it is possible to use an existing
distribution of control trajectories generated by the two controller to warm-start the RL algorithm to alleviate the
approaches. The work employs a penalty for changing computational burden. RL algorithms are computationally
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1493
View Article Online
expensive in their offline stage; initially, the agent (or the model in a proper closed-loop sense and accounting for the
controller) explores the control action space randomly. In the modeled stochastic behavior (which could be from any
case of process optimization and control, it is possible to use distribution of disturbance model). Furthermore, the controller
a preliminary controller, along with supervised learning or will continue to adapt and learn to better control and optimize
apprenticeship learning28 to hot-start the policy, and the chemical process, addressing plant-model mismatch.159
significantly speed-up convergence. 3.4.10 Real-time optimization. Real-time optimization
The main idea here is to have data from some policy or (RTO) systems are well-accepted by industrial practitioners,
state-feedback control (e.g. PID controller, (economic) model with numerous successful applications reported over the last
predictive controller) to compute control actions given few decades.201,202 These systems rely on knowledge-based
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
observed states. The initial parameterization for the policy is (first principles) models, and in those processes where the
trained in a supervised learning fashion where the states are optimization execution period is much longer than the
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
the inputs and the control actions are the outputs. closed-loop process dynamics, steady-state models are
Subsequently, this parameterized policy is used to initialize the commonly employed to conduct the optimization.203
policy and then trained by the RL algorithm to account for the Traditionally, the model is updated in real-time using the
full stochasticity of the system and avoid online numerical available measurements, before repeating the optimization.
optimization along with the previously mentioned benefits of This two-step RTO approach (also known as model parameter
RL. A general methodology for conducting policy pre-training adaptation, MPA) is both intuitive and popular.
in the setting of a computational model, and then in the true Unfortunately, although MPA is largely the most widely used
system has been proposed in ref. 29, and is generally as follows: RTO strategy in the industry,202 it can be hindered from
Step 0, initialization. The algorithm is initialized by convergence to the actual plant optimum due to structural
considering an initial policy network (e.g. RNN policy plant-model mismatch.204,205 This has motivated the
network) with initialized parameters (preferably by development of alternative adaptation schemes in RTO, such
apprenticeship learning) θ0.28 as modifier adaptation.206
Step 1, preliminary learning (offline). It is assumed that a Similar to MPA, modifier adaptation (MA) embeds the
preliminary model can be constructed from previous existing available process model into a nonlinear optimization
process data, hence, the policy is learned by closed-loop problem that is solved at each RTO execution. The key
simulations from this model. difference is that the process measurements are now used to
Given that the experiments are in silico, a large number of update the so-called modifiers that are added to the cost and
episodes and trajectories can be generated that corresponds to constraint function in the optimization model, keeping the
different actions from the probability distribution of ut, and a phenomenological model fixed at a given nominal condition.
specific set of parameters of the RNN, respectively. The resulting This methodology greatly alleviates the problem of offset
control policy is a good approximation of the optimal policy. from the actual plant optimum, by enforcing that the KKT
Notice that if a stochastic preliminary model exists, this approach conditions determined by the model match those of the plant
can immediately exploit it, contrary to traditional NMPC upon convergence. However, this desirable property comes at
approaches. This finishes the in silico part of the algorithm, the cost of having to estimate the cost and constraint
subsequent steps would be run in the true system. Therefore, gradients from process measurements.
emphasis after this step is given on sampling as least as possible, The estimation of such plant gradients is a very difficult
as every new sample results in a ‘real’ process sample. task to implement in practice, due to lack of information and
Step 2, transfer learning. The policy can now be used on a measurement noise.207,208 These problems have a significant
‘real’ process, and learning can ensue by adapting all the effect on the gradient estimation, consequently, they reduce
weights from the policy network according to the policy the overall performance of the MA scheme. Recent advances
gradient algorithm. However, this may result in undesired in MA schemes are reviewed in the survey paper by.209 Among
effects. The control policy might have a deep structure, as a them, there are MA-based algorithms that do not require the
result a large number of weights could be present. Thus, the computation of plant derivatives. A nested MA scheme
optimization to update the policy may easily be stuck in a proposed by ref. 210 removes the need for estimating the
low-quality local optimum or completely diverge. To plant gradients by embedding the modified optimization
overcome this issue the concept of transfer learning is
adopted, which is not exclusive of RL.197 In transfer learning,
a subset of training parameters is kept constant to avoid the
use of a large number of epochs and episodes, applying
knowledge that has been stored in a different but related
problem. This technique originated from the task of image
classification, where several examples exist, e.g. in ref.
198–200. See Fig. 36 for a schematic representation.
Step 3, controlling the chemical process (online). In this step Fig. 36 Part of the network is kept frozen to adapt to new situations
RL is applied to the chemical process by using knowledge from more efficiently.
1494 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
model into an outer problem that optimizes over the gradient 3.5 Production scheduling and supply chain
modifiers using a derivative-free algorithm.211 combined MA
with a quadratic surrogate trained with historical data in an Planning and scheduling is the primary plant-wide decision-
algorithm called MAWQA. Likewise,212 investigated data- making strategy for the current process industries such as
driven approaches based on quadratic surrogates. the petroleum, chemical, pharmaceutical, and biochemical
Unfortunately, these procedures demand a series of time- industry. Optimal planning and scheduling can greatly
consuming experimental measurements in order to evaluate improve process efficiency and profit, reduce raw material
the gradients of a large set of functions and variables. Given the waste, energy and storage cost, and mitigate process
considerable impact on productivity, these implementations operational risks. Within the context of globalization and
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
are virtually absent in current industrial practice.202 circular economy, planning and scheduling have become
3.4.11 Real-time optimization via machine learning. The increasingly challenging due to the varying demand on both
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
main contributions of ML to RTO have been primarily product quantity and quality. Although many solution
directed towards improving the modifier adaptation (MA) approaches have been proposed from the domain of process
scheme. In ref. 213, the authors augment the conventional systems engineering, they are not often applicable for solving
MA scheme (i.e. using zeroth and first-order feedback from large-scale planning and scheduling problems due to the
the plant) with a feedforward scheme, which provides a data- process complexity. Furthermore, unexpected uncertainties
driven approach to handling non-stationarity in plant such as volatile customer demands, variations in process
disturbances. Specifically, an ANN is constructed in order to times, equipment malfunction, and fluctuations in socio-
classify the disturbance and suggest a suitable initial point economics frequently arise in a manufacturing site, causing
for the MA scheme thereafter. The results presented in the an intractable problem to the online decision-making of
work demonstrate impressive performance improvements process scheduling and planning. As a result, developing a
when the feedforward classification structure is data-driven based adaptive online planning and scheduling
implemented. However, the results also detail the sensitivity technique is of critical importance.
of the method to low data regimes and the appropriate 3.5.1 Reinforcement learning for process scheduling and
selection of ANN model structure. planning. Traditionally, optimal scheduling plans are made
An approach that efficiently handles low data regimes is using mathematical programming methods,218 in particular,
provided by the augmentation of MA schemes with Gaussian mixed integer linear programming (MILP) if only mass flow
process (GP). Here, (multiple) GPs are used to provide a is considered, or mixed integer nonlinear programming
mapping from control inputs to terms descriptive of (MINLP) if energy utilization is also taken into account. The
mismatch in the constraints and in the objective function. general procedure to calculate an optimal scheduling
This mitigates the requirement to identify zeroth and first- solution is to first construct a process-wide model by
order terms descriptive of a mismatch from plant considering material balance and energy balance, with binary
measurements as in the original MA scheme.214 This variables (e.g. variables that can only take a value of 0 or 1)
approach was further extended in ref. 215, where a filtering being assigned within the process model to explore different
scheme was proposed to reduce large changes in control scheduling options. Then, MILP or MINLP is performed to
inputs between RTO iterations; and in ref. 216, where a trust- calculate the optimal solution. However, given a large
region and Bayesian optimization were combined to balance number of scheduling alternatives and complex model
exploration and exploitation of the GP models. Both works structures, mathematical programming is often extremely
demonstrated good results, however, unlike the previous time-consuming, thus not feasible for online scheduling.
work of ref. 213 all of these works assume that the plant To resolve this issue, some initial studies have been
disturbance is stationary. proposed since 2020 in which reinforcement learning is
Another approach proposed recently deployed RL for adopted to learn from training examples to solve the process
RTO.217 The approach was completely data-driven and did model and to generate (approximated) optimal policies for
not require a description of plant dynamics. Whilst the online scheduling.219,220 Instead of using a surrogate model,
work provided an interesting, innovative preliminary study, the advantage of RL is that, upon its construction, it will
and performed comparably to a full information nonlinear rapidly amend the original optimal scheduling plan whenever
programming (NLP) model, further work should consider a new disruption occurs during the process. Based on the
the issues of training an RL policy purely from a case study provided,219 it is found that RL can outperform
stationary data set (with no simulated description of plant the traditional mathematical programming approach.
dynamics). The nature of such a training scheme has the Additionally, analysing the optimal solutions proposed by RL
potential to drive the plant into dangerous operational models, new heuristics can be discovered. Nonetheless, it is
regions due to the bias of the value function used in the worth emphasising that using RL for online scheduling is
approach. This is discussed further in section 4 within the still at its infant stage, thus more thorough investigation
context of safety. In addition, merging domain knowledge must be conducted before it can be actually applied to the
(via a model) and data is generally preferred to a purely process industry. Basic intuition for the use of RL in the
data-driven approach. domain of batch chemical production scheduling follows.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1495
View Article Online
Fig. 37 Handling control constraints innately in RL-based chemical production scheduling via identification of transformations of the control
prediction through standard operating procedures (i.e. precedence and disjunctive constraints and requirements for unit cleaning). a) Augmenting
the decision-making process by identifying the set of controls which satisfy the logic provided by standard operating procedure at each time index,
and b) implementation of a rounding policy to ensure that RL control selection satisfies the associated logic.
Briefly, the function of the scheduling element is to control predicted by the policy function to select one of those
identify the sequencing of various production operations on available controls (see Fig. 37b). Perhaps the largest downside
available equipment to minimize some operational cost (that of this approach is that derivative-free approaches to RL are
may consider resource consumption, tardiness, etc.). The most suitable. These algorithms are particularly suited when
sequencing of these operations may be subject to constraints the effective dimensionality of the problem is low. However,
that define: which operations may precede or succeed others the approach is known to become less efficacious when the
in given equipment; limits of resources available for effective dimensionality of the parameter space is large (as
operation (including e.g. energy, raw material, storage etc.); may be the case in the typical neural network models used in
and, various constraints on unit availability. At given time RL policy functionalization).
intervals then, the scheduling element should be able to Clearly, the discussion provided in the latter part of this
predict the scheduling of future operations on equipment section is just one approach to handling constraints in a very
items, conditional to the current state of the plant. The state particular scheduling problem instance. There is a general
of the plant may consist of: inventory levels of raw material, need for further research in the application of RL to
intermediates and products; the amount of resource available scheduling tasks in chemical processes. This poses challenge
to operation; unit availability and idling; and, the time until and something both the academic and industrial
client orders are due (obviously dependent on problem communities can combine efforts in approaching. For more
instance). How one handles the various constraints imposed information, we direct the reader to a recent review.222
on the scheduling element is not clear, clearly there is scope 3.5.2 Reinforcement learning for supply chain
to handle them through a penalty function method, however, optimization. The operation of supply chains is subject to
the number of constraints imposed is often large, which inherent uncertainty as derived from market mechanisms
often provides difficulty for the RL algorithms, as there are (i.e. supply and demand),223 transportation, supply chain
many discontinuities in the ‘reward landscape’. Further, structure and the interactions that take place between
there are typically many operations that a given unit can organizations, and various other exogenous uncertainties
process, and given the nature of RL (i.e. using a functional (such as global weather and humanitarian events).224
parameterization of a control policy), it is not clear how best
to select controls. Fig. 37 and 38 show one idea proposed in
recent work221 and a corresponding schedule generated for
the case study detailed there.
The basic idea of that work is that generally the definition
of many of the constraints imposed on scheduling problems
are related to control selection and governed by standard
operating procedure (SOPs) (i.e. the requirement for cleaning
times, the presence of precedence constraints, etc.). These
SOPs essentially define logic rules, fSOP, that govern the way
in which the plant is operated and the set of operations one
could schedule in units, t, given the current state of the
plant, xt (see Fig. 37a). As a result, one can often pre-identify Fig. 38 Solving a MILP problem via RL to produce an optimal
the controls, which innately satisfy those constraints defined production schedule via the framework displayed in Fig. 37. A discrete
by SOPs and implement a rounding policy, fr to alter the time interval is equivalent to 0.5 days in this study.
1496 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
Fig. 39 Solving a supply chain optimization problem via evolutionary RL methods. Reproduced with permission from ref. 225. The plots show the
training process of a) a hybrid stochastic search algorithm, b) evolutionary strategies, c) particle swarm optimization, d) artificial bee colony. The
algorithms demonstrate performance competitive with state-of-the-art RL approaches.
Due to the large uncertainties that exist within supply amounts of manufacturing data in their operational
chains, there is an effort to ensure that organizational historians.
behavior is more cohesive and coordinated with other More accessible and easier to use advanced analytical
operators within the chain. For example, graph neural tools are evolving to the point where many data steps are
networks (GNNs)226,227 have been applied to help infer or will be mostly automated, including the use of
hidden relationships or behaviors within existing screening models via machine learning (i.e. AutoML).
networks.228,229 Furthermore, the combination of an Therefore, process engineering expertise are and will be
increasing degree of globalization and the availability of crucial to identify and define manufacturing problems to
informative data sources, has led to an interest in RL as a solve as well as interpret the solutions found through
potential approach to supply chain optimization. This is data-driven approaches. In many situations, once the root-
again due to the presence of a wide range of uncertainties, cause of the problem is found, well-known solutions that
combined with complex supply chain dynamics, which can include new sensors and/or process control will be
generally provide obstacle to existing methods. The preferred over a complex approach difficult to maintain in
application of RL to supply chain optimization is similarly in the long run.
its infant stage, however efforts such as OR-gym230 provide Advanced monitoring systems that notify suboptimal (or
means for researchers to develop suitable algorithms for anomalous) behavior, list correlated factors, and allow
standard benchmark problems. Again, this area would largely engineers to interactively visualize process data will become
benefit from greater collaboration between academia and the new standard in manufacturing environments.
industry. Fig. 39 shows some training results from the Historians with good quality and well-structured
inventory management problem described in ref. 230 manufacturing data (e.g. batch) will become a competitive
generated by different evolutionary RL approaches including advantage, especially if a data ownership culture at the
particle swarm optimization (PSO),231 evolutionary strategies plant level is well-established.
(ES),232 artificial bee colony (ABC)233 and a hybrid algorithm Combined with process engineering and control
with a space reduction approach.234 knowledge, ML can be used for steady-state or batch-to-
batch applications, where recommended set-points or
recipe changes are suggested to operators/process
4 Challenges and opportunities engineers similar to expert systems or pseudo-empirical
correlations learned from historical data. However, if the
In this manuscript, we have covered the intuition behind ambition is closed-loop (dynamic) systems, both data-
machine learning techniques and their application to driven MPC or reinforcement learning are limited by the
industrial processes, which have traditionally stored vasts following two challenges.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1497
View Article Online
Implementation of the policies they output even when initialized with feasible
Data-driven solutions and their dedicated infrastructures are initial policies.238 Various approaches have been proposed in
less reliable than process control strategies and their systems the literature, where usually penalties are applied for the
(DCS). This has been put forward by many studies, but constraints. Such approaches can be very problematic, easily
particularly the recent study235 summarises the concerns for losing optimality or feasibility,239 especially in the case of a fixed
the deployment of RL machinery into engineering penalty. The main approaches to incorporate constraints in
applications. We quote the following: “we [the scientific this way make use of trust-region and fixed penalties,239,240
community] do not understand how the parts comprising as well as cross entropy.238 As observed in ref. 239, when
penalty methods are applied in policy optimization,
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Table 1 Computational tools used by the authors and colleagues for data-driven modeling, control, and optimization in Python and Julia. This list is
not exhaustive
Modeling
Model class Python packages Julia packages
245
Differential equations SciPy SciML246
Neural ODEs torchdiffeq,247 JAX248 DiffEqFlux249
Support vector machines Scikit-learn37 Julia statistics – SVM
Decision tree models Scikit-learn DecisionTree
Gaussian processes GPy,250 GPyTorch,251 GPflow252 AbstractGPs
Artificial neural networks PyTorch,253 Keras,254 JAX Flux,255 Knet256
Latent variable methods Scikit-learn, SciPy, UMAP257 MultivariateStats,258 UMAP
Explainable AI SHAP,259 LIME260 ShapML261
Classical Sys. ID SciPy, SysIdentPy262 Controlsystemidentification
Optimizationa
Problem class Python packages Julia packages
Linear programming SciPy, CVXPY,263 GEKKO264 JuMP265
Semidefinite programming CVXPY JuMP
Quadratic programming CVXPy, GEKKO JuMP
Nonlinear programming SciPy, Pyomo,266 NLOpt, GEKKO JuMP, Optim, NLOpt
Mixed integer programming Pyomo, GEKKO JuMP
Bayesian optimization GPyOpt,267 HEBO,145 BoTorch,268 GPflowOpt269 BayesianOptimization
MPC and dynamic opt. Pyomo, CasADi,270 GEKKO InfiniteOpt271,272
Automatic differentiation JAX, CasADi ForwardDiff,273 zygote274
Reinforcement learning Ray,275 RLlib,276 Gym277 ReinforcementLearning278
AutoML Ray Tune,279 Optuna280 AutoMLPipeline
a
Generally we detail packages that interface with well-established solvers, such as Gurobi241 for mixed-integer problems and IPOPT242 for
nonlinear programming problems. This does not include commercial packages such as the MATLAB243 Toolbox, which also provides options
such as Aladin244 for distributed optimization.
1498 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
Term Explanation
Anomaly detection Identifies data points, events, and/or observations that deviate from a dataset's normal behavior
AutoML (model selection) Systematic approach to select the best algorithm and its tuning parameters
Basis functions Basic transformations used as building blocks to capture higher complexity in the data using simpler structures.
For example, powers x that when added together from polynomials
Bayesian inference Specifies how one should update one's beliefs (probability density function) about a random variable upon
observing data (new and historical)
Bias-variance trade-off Related to model complexity and generally analyzed on training data. If the model overfits the training data, it
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
will capture all of the variability (variance), while simpler models will underfit having a higher overall error (bias)
Bootstrap Resampling of the data to fit more robust models
Covariance Similarity in terms of correlation between two variables affected by noise
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
Cross validation Resampling technique mostly used when data availability is limited and to avoid overfitting. It consists of
dividing the dataset into multiple different subsets. N-1 of these subsets are used to train the model, while the
remaining one is used for validation. The chosen subset is changed iteratively till all subsets are used for
validation
Dimensionality reduction Techniques to reduce the number of input variables (e.g. tags) in a dataset by finding inner correlations (e.g.
linear correlation of multiple sensors measuring the same process temperature)
Dynamic programming Algorithmic technique for solving a sequential decision making problem by breaking it down into simpler
subproblems using a recursive relationship, known as the Bellman equation
Dynamic time warping Algorithm used to align and compare the similarity between two batches (or time series sequences) with different
duration
A common example is drying or reacting process, where time to finish depends on initial conditions and rate of
change
Feature engineering Generation of additional inputs (Xs) by transforming the original ones (usually tags). For example, the √pressure
helps to find a linear relationship with respect to the flow rate. These calculations can be done automatically or
by domain knowledge
Feature selection Reduction of model inputs (e.g. tags) based on its contribution towards an output (e.g. yield) or identified group
(e.g. normal/abnormal)
First-principle Based on fundamental principles like physics or chemistry
Functional principal Algorithm similar to PCA to reduce the number of co-linear inputs with minimal loss of information. The main
components difference is that FPCE also takes into consideration both time and space dependencies of these inputs
Gaussian processes Learning method for making predictions probabilistically in regression and classification problems
Generalized (model) Achieved when the model is able to generate accurate outcome (predictions) in unseen data
Gradient boosted trees Combination of decision trees that are built consecutively where each fits the residuals (unexplained variability)
Gradient methods Optimization approach that iteratively updates one or more parameters using the rate of change to increase or
decrease the goal (objective function)
Hyperparameter Parameter used to tune the model or optimization process e.g., weights in a weighted sum objective function
Input/s (model) Any variable that might be used by a model to generate predictions (as regressor or classifier, for example). These
are known with various names, X, factors, independent variables, features… and correspond to sensor readings
(tags) or their transformation (features)
Loss (or cost) function Objective function that has to be minimized in a machine learning algorithm, usually the aggregated difference
between predictions and reality
Machine learning Data-driven models able to find: 1) correlations and classifications, 2) groups (clusters) or 3) best strategy for
manipulated variables
These types are known by 1) supervised, 2) unsupervised, and 3) reinforcement learning
Model input Any variable that enters the model, also referred as features or Xs. Mostly, they correspond to sensor readings
(tags) or a calculation from those (engineered features)
Monte Carlo simulation Method used to generate different scenarios by varying one or more model parameters according to a chosen
distribution, e.g. normal
Neural networks Model that uses a composition of non-linear functions (e.g. linear with saturation, exponential…) in series so it
can approximate any input/output relationship
Non linear System in which the change of the output is not proportional to the change of the input
Output/s (model) Variable or measurement to predict in supervised models. It is often referred to as Y, y, target, dependent variable...
For example, y = fIJx), where y is the output of the model
Partition the data Creation of subsets for fitting the model (training), avoiding overfitting (validation) and comparing the final
result with unseen data (test)
Piecewise linear Technique to approximate non-linear functions into smaller intervals that can be considered linear
Policy optimization Used in reinforcement learning, it finds the direction (gradient) at which the actions can improve the long-term
(gradient) cumulative goal (reward)
Predictive control Method that anticipates the behavior of the system, based on a model, several steps ahead so the optimal set of
actions (manipulated variables) are calculated and perform in each iteration
Principal component Dimensionality reduction technique that finds the correlation between input variables (tags or Xs), unveiling
analysis (PCA) hidden (latent) variables that can be used instead of all them independently
Random forest Learning algorithm that operates by subsampling the data and then constructing a multiple of decision trees in
order to obtain a combined (ensembled) model that is more robust to data
Regularization/penalization Mathematical method that introduces additional parameters in the objective/cost function to penalize the
possibility that the fitting parameters would assume extreme values (e.g. LASSO, Ridge Regression, etc.)
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1499
View Article Online
Table 2 (continued)
Term Explanation
Reinforcement learning Fitting algorithm (training) that finds the best possible series of actions (policy) to maximize a goal (reward).
(RL) Tuning a PID can be seen as a reinforcement learning task, for example
Resampling Used when data availability is limited or contains minimal information. It consists of selecting several different
data subsets combinations out of the collected data. This allows a more robust estimate of model parameters,
estimating their uncertainty more accurately. A typical example in process engineering can be the the analysis of
sporadic events like failures, start-ups or shut-down
Reward function Goal of the learning process, used in RL to find the set of actions that maximizes it. Similar to an objective
function in optimization, its definition will determine the solution found
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Soft sensors Type of model which is able to infer and construct state variables (whose measurement is technically difficult or
relatively expensive, as for example a lab analysis) from variables that can be captured constantly from common
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
1500 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
and S. Madge, et al., Skillful precipitation nowcasting using Enhancing data-driven methodologies with state and
deep generative models of radar, 2021, arXiv preprint parameter estimation, J. Process Control, 2020, 92, 333–351.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1501
View Article Online
41 F. Hutter, L. Kotthoff and J. Vanschoren, Automated machine 59 J. Ash, L. Lancaster and C. Gotwalt, A method for
learning: methods, systems, challenges, Springer Nature, 2019. controlling extrapolation when visualizing and optimizing
42 C. Thon, B. Finke, A. Kwade and C. Schilde, Artificial the prediction profiles of statistical and machine learning,
Intelligence in Process Engineering, Adv. Intell. Syst., Discovery Summit Europe 2021 Presentations, 2021.
2021, 3, 2000261. 60 J. Ash, L. Lancaster and C. Gotwalt, A method for controlling
43 C. Molnar, Interpretable machine learning, Lulu. com, 2020. extrapolation when visualizing and optimizing the prediction
44 JMP, Profilers: Jmp 12, https://www.jmp.com/support/help/ profiles of statistical and machine learning models, 2022.
Profilers.shtml#377608, 2021. 61 I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
45 S. M. Lundberg and S.-I. Lee, A unified approach to S. Ozair, A. Courville and Y. Bengio, Generative adversarial nets,
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
interpreting model predictions, in Proceedings of the 31st Adv. Neural Inf. Process. Syst., 2014, 27, 2672–2680.
international conference on neural information processing 62 M. Nixon and S. Xu, Anomaly Detection in Process Data
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
systems, 2017, pp. 4768–4777. Using Generative Adversarial Networks (GAN), Aug 2021.
46 S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. [Online; accessed 1. Dec. 2021].
Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal and S.-I. 63 A. Geiger, D. Liu, S. Alnegheimish, A. Cuesta-Infante and K.
Lee, From local explanations to global understanding with Veeramachaneni, Tadgan: Time series anomaly detection
explainable ai for trees, Nat. Mach. Intell., 2020, 2(1), 56–67. using generative adversarial networks, arXiv, 2020, preprint,
47 J. Senoner, T. Netland and S. Feuerriegel, Using explainable arXiv:2009.07769, https://arxiv.org/abs/2009.07769.
artificial intelligence to improve process quality: Evidence 64 F. Yang and D. Xiao, Progress in root cause and fault
from semiconductor manufacturing, Management Science, propagation analysis of large-scale industrial processes,
2021, 1–20. J. Control. Sci. Eng., 2012, 2012, 1–10.
48 J. Wang, J. Wiens and S. Lundberg, Shapley flow: A graph- 65 F. Yang, S. Shah and D. Xiao, Signed directed graph based
based approach to interpreting model predictions, in modeling and its validation from process knowledge and
International Conference on Artificial Intelligence and process data, Int. J. Appl. Math. Comput. Sci., 2012, 22, 41–53.
Statistics, PMLR, 2021, pp. 721–729. 66 N. F. Thornhill and A. Horch, Advances and new directions
49 Fault Detection and Diagnosis of the Tennessee Eastman in plant-wide disturbance detection and diagnosis, Control
Process using Multivariate Control Charts (2020-US-45MP- Eng. Pract., 2007, 15, 1196–1206.
606), Oct 2020. [Online; accessed 19. Dec. 2020]. 67 M. Bauer and N. F. Thornhill, A practical method for
50 J. Ash and J. Ding, Fault Detection and Diagnosis of the identifying the propagation path of plant-wide
Tennessee Eastman Process using Multivariate Control Charts, disturbances, J. Process Control, 2008, 18, 707–719.
ResearchGate, 2022. 68 V. Venkatasubramanian, R. Rengaswamy and S. N. Kavuri,
51 M. Joswiak, Y. Peng, I. Castillo and L. H. Chiang, A review of process fault detection and diagnosis: Part II:
Dimensionality reduction for visualizing industrial Qualitative models and search strategies, Comput. Chem.
chemical process data, Control Eng. Pract., 2019, 93, 104189. Eng., 2003, 27, 313–326.
52 L. McInnes, J. Healy and J. Melville, Umap: Uniform manifold 69 M. A. Kramer and B. L. Palowitch, A rule-based approach to
approximation and projection for dimension reduction, 2020. fault diagnosis using the signed directed graph, AIChE J.,
53 L. McInnes, J. Healy and S. Astels, hdbscan: Hierarchical 1987, 33, 1067–1078.
density based clustering, J. Open Source Softw., 2017, 2(11), 205. 70 C. Palmer and P. W. H. Chung, Creating signed directed
54 R. J. Campello, D. Moulavi and J. Sander, Density-based graph models for process plants, Ind. Eng. Chem. Res.,
clustering based on hierarchical density estimates, in 2000, 39(7), 2548–2558.
Pacific-Asia conference on knowledge discovery and data 71 C. Reinartz, D. Kirchhübel, O. Ravn and M. Lind, Generation
mining, Springer, 2013, pp. 160–172. of signed directed graphs using functional models [U+204E][U
55 M. Carletti, C. Masiero, A. Beghi and G. A. Susto, +204E] this work is supported by the danish hydrocarbon
Explainable machine learning in industry 4.0: Evaluating research and technology centre, IFAC-PapersOnLine, 5th IFAC
feature importance in anomaly detection to enable root Conference on Intelligent Control and Automation Sciences
cause analysis, in 2019 IEEE International Conference on ICONS 2019, 2019, vol. 52, 11, pp. 37–42.
Systems, Man and Cybernetics (SMC), IEEE, 2019, pp. 21–26. 72 T. Savage, J. Akroyd, S. Mosbach, N. Krdzavac, M. Hillman
56 S. J. Qin, Y. Liu and Y. Dong, Plant-wide troubleshooting and M. Kraft, Universal Digital Twin – integration of
and diagnosis using dynamic embedded latent feature national-scale energy systems and climate data, 2021,
analysis, Comput. Chem. Eng., 2021, 107392. submitted for publication. Preprint available at https://
57 S. J. Qin, Y. Dong, Q. Zhu, J. Wang and Q. Liu, Bridging como.ceb.cam.ac.uk/preprints/279/.
systems theory and data science: A unifying review of 73 M. T. Ribeiro, S. Singh and C. Guestrin, why should i trust
dynamic latent variable analytics and process monitoring, you?, Explaining the predictions of any classifier, 2016.
Annu. Rev. Control, 2020, 50, 29–48. 74 B. Braun, I. Castillo, M. Joswiak, Y. Peng, R. Rendall, A.
58 Q. Zhu, S. J. Qin and Y. Dong, Dynamic latent variable Schmidt, Z. Wang, L. Chiang and B. Colegrove, Data
regression for inferential sensor modeling and monitoring, science challenges in chemical manufacturing, IFAC
Comput. Chem. Eng., 2020, 137, 106809. preprints, 2020.
1502 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
75 S. J. Qin, S. Guo, Z. Li, L. H. Chiang, I. Castillo, B. Braun 89 D. Bonvin and G. François, Control and optimization of batch
and Z. Wang, Integration of process knowledge and chemical processes, tech. rep., Butterworth-Heinemann, 2017.
statistical learning for the dow data challenge problem, 90 J. A. Romagnoli and M. C. Sánchez, Data processing and
Comput. Chem. Eng., 2021, 153, 107451. reconciliation for chemical process operations, Elsevier, 1999.
76 C. Abeykoon, Design and applications of soft sensors in 91 J. Loyola-Fuentes, M. Jobson and R. Smith, Estimation of
polymer processing: A review, IEEE Sens. J., 2019, 19, fouling model parameters for shell side and tube side of
2801–2813. crude oil heat exchangers using data reconciliation and
77 R. Oliveira, Combining first principles modelling and parameter estimation, Ind. Eng. Chem. Res., 2019, 58(24),
artificial neural networks: a general framework, Comput. 10418–10436.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
Chem. Eng., 2004, 28(5), 755–766. 92 J. Friedman, T. Hastie and R. Tibshirani, et al., The elements
78 M. Von Stosch, R. Oliveira, J. Peres and S. F. de Azevedo, of statistical learning, Springer series in statistics, New York,
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1503
View Article Online
110 J. Pinto, C. R. de Azevedo, R. Oliveira and M. von Stosch, A 128 A. Simpkins, System identification: Theory for the user, 2nd
bootstrap-aggregated hybrid semi-parametric modeling edition (ljung, l.; 1999) [on the shelf], IEEE Robot. Autom.
framework for bioprocess development, Bioprocess Biosyst. Mag., 2012, 19(2), 95–96.
Eng., 2019, 42(11), 1853–1865. 129 M. Verhaegen, Subspace techniques in system
111 W. Chu, S. S. Keerthi and C. J. Ong, Bayesian support vector identification, in Encyclopedia of Systems and Control,
regression using a unified loss function, IEEE Trans. Neural Springer, 2015, pp. 1386–1396.
Netw., 2004, 15(1), 29–44. 130 P. Van Overschee and B. De Moor, Subspace algorithms for
112 M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. the stochastic identification problem, Automatica, 1993, 29(3),
Liu, M. Ghavamzadeh, P. Fieguth, X. Cao, A. Khosravi, 649–660.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
U. R. Acharya, V. Makarenkov and S. Nahavandi, A review 131 T. Katayama, et al., Subspace methods for system
of uncertainty quantification in deep learning: identification, Springer, 2005, vol. 1.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
Techniques, applications and challenges, arXiv, 2020, 132 A. Wills, T. B. Schön, L. Ljung and B. Ninness,
preprint, arXiv:2011.06225, https://arxiv.org/abs/ Identification of hammerstein–wiener models, Automatica,
2011.06225. 2013, 49(1), 70–81.
113 R.-R. Griffiths, A. A. Aldrick, M. Garcia-Ortegon and V. 133 S. Chen and S. A. Billings, Representations of non-linear
Lalchand, et al., Achieving robustness to aleatoric systems: the narmax model, Int. J. Control, 1989, 49(3),
uncertainty with heteroscedastic bayesian optimisation, 1013–1032.
Mach. Learn.: Sci. Technol., 2021, 3(1), 015004. 134 C. Gao, L. Jian, X. Liu, J. Chen and Y. Sun, Data-driven
114 A. Kendall and Y. Gal, What uncertainties do we need in modeling based on volterra series for multidimensional
bayesian deep learning for computer vision?, 2017. blast furnace system, IEEE Trans. Neural Netw.,
115 C. K. Williams and C. E. Rasmussen, Gaussian processes for 2011, 22(12), 2272–2283.
machine learning, MIT press Cambridge, MA, 2006, vol. 2. 135 M. Pottmann and D. E. Seborg, A nonlinear predictive
116 R. Turner and M. P. Deisenroth, Ml tutorial: Gaussian control strategy based on radial basis function models,
processes (richard turner). Comput. Chem. Eng., 1997, 21(9), 965–980.
117 M. Elie, Discovering hidden relationships in production 136 Q. Bi, W.-J. Cai, E.-L. Lee, Q.-G. Wang, C.-C. Hang and Y. Zhang,
data (EU2018 113), Discovery Summit Europe, JMP (SAS), Robust identification of first-order plus dead-time model from
Mar 2018. [Online; accessed 30. Jan. 2022]. step response, Control Eng. Pract., 1999, 7(1), 71–77.
118 V. Mattia and S. Salvador, DOE for World-Scale 137 G. P. Rangaiah and P. R. Krishnaswamy, Estimating second-
Manufacturing Processes: Can We Do Better? (2019-EU- order plus dead time model parameters, Ind. Eng. Chem.
45MP-073), Discovery Summit Europe, JMP (SAS), Mar 2019. Res., 1994, 33(7), 1867–1871.
[Online; accessed 30. Jan. 2022]. 138 S. Chen, S. A. Billings and P. Grant, Non-linear system
119 M. Shoukat Choudhury, V. Kariwala, N. F. Thornhill, H. identification using neural networks, Int. J. Control,
Douke, S. L. Shah, H. Takada and J. F. Forbes, Detection 1990, 51(6), 1191–1214.
and diagnosis of plant-wide oscillations, Can. J. Chem. Eng., 139 M. Forgione, A. Muni, D. Piga and M. Gallieri, On the
2007, 85(2), 208–219. adaptation of recurrent neural networks for system
120 W. L. Luyben, Snowball effects in reactor/separator identification, 2022.
processes with recycle, Ind. Eng. Chem. Res., 1994, 33(2), 140 L. Hewing, K. P. Wabersich, M. Menner and M. N.
299–305. Zeilinger, Learning-based model predictive control: Toward
121 D. van de Berg, T. Savage, P. Petsagkourakis, D. Zhang, N. safe learning in control, Annu. Rev. Control Robot. Auton.
Shah and E. A. del Rio-Chanona, Data-driven optimization Syst., 2020, 3, 269–296.
for process systems engineering applications, Chem. Eng. 141 K. Hornik, M. Stinchcombe and H. White, Multilayer
Sci., 2021, 117135. feedforward networks are universal approximators, Neural
122 Q.-G. Wang and Y. Zhang, Robust identification of Netw., 1989, 2(5), 359–366.
continuous systems with dead-time from step responses, 142 M. P. Deisenroth, R. D. Turner, M. F. Huber, U. D.
Automatica, 2001, 37(3), 377–390. Hanebeck and C. E. Rasmussen, Robust filtering and
123 H. Schaeffer and S. G. McCalla, Sparse model selection via smoothing with gaussian processes, IEEE Trans. Autom.
integral terms, Phys. Rev. E, 2017, 96, 023302. Control, 2011, 57(7), 1865–1871.
124 L. Ljung, Perspectives on system identification, Annu. Rev. 143 A. Damianou and N. D. Lawrence, Deep gaussian processes, in
Control, 2010, 34(1), 1–12. Artificial intelligence and statistics, PMLR, 2013, pp. 207–215.
125 M. Viberg, Subspace methods in system identification, IFAC 144 E. Snelson, C. E. Rasmussen and Z. Ghahramani, Warped
Proceedings Volumes, 1994, 27(8), 1–12. gaussian processes, Adv. Neural Inf. Process. Syst., 2004, 16,
126 K. J. Åström and P. Eykhoff, System identification–a survey, 337–344.
Automatica, 1971, 7(2), 123–162. 145 A. I. Cowen-Rivers, W. Lyu, R. Tutunov, Z. Wang, A. Grosnit,
127 F. Tasker, A. Bosse and S. Fisher, Real-time modal R. R. Griffiths, A. M. Maraval, H. Jianye, J. Wang, J. Peters
parameter estimation using subspace methods: theory, and H. B. Ammar, An empirical study of assumptions in
Mech. Syst. Signal Process, 1998, 12(6), 797–808. bayesian optimisation, 2021.
1504 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
146 A. McHutchon and C. Rasmussen, Gaussian process 162 X.-C. Xi, A.-N. Poo and S.-K. Chou, Support vector
training with input noise, Adv. Neural Inf. Process. Syst., regression model predictive control on a HVAC plant,
2011, 24, 1341–1349. Control Eng. Pract., 2007, 15(8), 897–908.
147 R. T. Chen, Y. Rubanova, J. Bettencourt and D. Duvenaud, 163 K. Kavsek-Biasizzo, I. Skrjanc and D. Matko, Fuzzy
Neural ordinary differential equations, 2018, arXiv preprint predictive control of highly nonlinear pH process, Comput.
arXiv:1806.07366. Chem. Eng., 1997, 21, S613–S618.
148 S. T. Bukkapatnam and C. Cheng, Forecasting the evolution 164 S. Piche, B. Sayyar-Rodsari, D. Johnson and M. Gerules,
of nonlinear and nonstationary systems using recurrence- Nonlinear model predictive control using neural networks,
based local gaussian process models, Phys. Rev. E, IEEE Control Systems Magazine, 2000, 20(3), 53–62.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
governing equations from data by sparse identification of American Control Conference (ACC), IEEE, 2004, vol. 3, pp.
nonlinear dynamical systems, Proc. Natl. Acad. Sci. U. S. A., 2214–2219.
2016, 113(15), 3932–3937. 166 E. Bradford, L. Imsland and E. A. del Rio-Chanona,
150 Z. T. Wilson and N. V. Sahinidis, The alamo approach to Nonlinear model predictive control with explicit back-offs
machine learning, Comput. Chem. Eng., 2017, 106, 785–795. for gaussian process state space models, in 58th
151 D. Machalek, T. Quah and K. M. Powell, A novel implicit Conference on Decision and Control (CDC), IEEE, 2019, pp.
hybrid machine learning model and its application for 4747–4754.
reinforcement learning, Comput. Chem. Eng., 2021, 107496. 167 M. Maiworm, D. Limon, J. M. Manzano and R. Findeisen,
152 J. W. Myers, K. B. Laskey and T. S. Levitt, Learning bayesian Stability of gaussian process learning based output
networks from incomplete data with stochastic search feedback model predictive control, IFAC-PapersOnLine,
algorithms, 2013, arXiv preprint arXiv:1301.6726. 2018, vol. 51, 20, pp. 455–461, 6th IFAC Conference on
153 M. Raissi, P. Perdikaris and G. E. Karniadakis, Physics- Nonlinear Model Predictive Control NMPC 2018.
informed neural networks: A deep learning framework for 168 E. Bradford, L. Imsland, D. Zhang and E. A. del Rio
solving forward and inverse problems involving nonlinear Chanona, Stochastic data-driven model predictive control
partial differential equations, J. Comput. Phys., 2019, 378, using gaussian processes, Comput. Chem. Eng., 2020, 139,
686–707. 106844.
154 M. Raissi, P. Perdikaris and G. E. Karniadakis, Multistep 169 Z. Zhong, E. A. del Rio-Chanona and P. Petsagkourakis,
neural networks for data-driven discovery of nonlinear Data-driven distributionally robust mpc using the wasserstein
dynamical systems, 2018, arXiv preprint arXiv:1801.01236. metric, 2021.
155 L. Zhang and S. Garcia-Munoz, A comparison of different 170 X. Feng and B. Houska, Real-time algorithm for self-
methods to estimate prediction uncertainty using partial reflective model predictive control, J. Process Control,
least squares (pls): a practitioner's perspective, Chemom. 2018, 65, 68–77.
Intell. Lab. Syst., 2009, 97(2), 152–158. 171 C. A. Larsson, C. R. Rojas, X. Bombois and H. Hjalmarsson,
156 J. B. Rawlings, D. Q. Mayne and M. Diehl, Model predictive Experimental evaluation of model predictive control with
control: theory, computation, and design, Nob Hill Publishing excitation (mpc-x) on an industrial depropanizer, J. Process
Madison, WI, 2017, vol. 2. Control, 2015, 31, 1–16.
157 M. Kelly, An introduction to trajectory optimization: How to 172 B. Houska, D. Telen, F. Logist, M. Diehl and J. F. V.
do your own direct collocation, SIAM Rev., 2017, 59(4), Impe, An economic objective for the optimal experiment
849–904. design of nonlinear dynamic processes, Automatica,
158 E. A. del Rio-Chanona, N. R. Ahmed, D. Zhang, Y. Lu and 2015, 51, 98–103.
K. Jing, Kinetic modeling and process analysis for 173 D. Telen, B. Houska, M. Vallerio, F. Logist and J. Van Impe,
desmodesmus sp. lutein photo-production, AIChE J., A study of integrated experiment design for nmpc applied
2017, 63(7), 2546–2554. to the droop model, Chem. Eng. Sci., 2017, 160, 370–383.
159 M. Mowbray, P. Petsagkourakis, E. A. D. R. Chanona, R. 174 C. A. Larsson, M. Annergren, H. Hjalmarsson, C. R. Rojas,
Smith and D. Zhang, Safe chance constrained X. Bombois, A. Mesbah and P. E. Modén, Model predictive
reinforcement learning for batch process control, 2021, control with integrated experiment design for output error
arXiv preprint arXiv:2104.11706. systems, in 2013 European Control Conference (ECC), 2013,
160 E. Bradford and L. Imsland, Economic stochastic model pp. 3790–3795.
predictive control using the unscented kalman filter, IFAC- 175 S. Olofsson, M. Deisenroth and R. Misener, Design of
PapersOnLine, 2018, vol. 51, 18, pp. 417–422. experiments for model discrimination hybridising
161 Z. K. Nagy, B. Mahn, R. Franke and F. Allgöwer, Real-time analytical and data-driven approaches, in Proceedings of the
implementation of nonlinear model predictive control of 35th International Conference on Machine Learning, ed. J. Dy
batch processes in an industrial framework, in Assessment and A. Krause, Stockholmsmässan, Stockholm Sweden,
and Future Directions of Nonlinear Model Predictive Control, PMLR, 10–15 Jul 2018, vol. 80 of Proceedings of Machine
Springer, 2007, pp. 465–472. Learning Research, pp. 3908–3917.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1505
View Article Online
176 N. P. Lawrence, M. G. Forbes, P. D. Loewen, D. G. 190 W. Dabney, M. Rowland, M. G. Bellemare and R. Munos,
McClement, J. U. Backstrom and R. B. Gopaluni, Deep Distributional reinforcement learning with quantile
reinforcement learning with shallow controllers: An regression, arXiv, 2017, preprint, arXiv:1710.10044, https://
experimental application to pid tuning, 2021. arxiv.org/abs/1710.10044.
177 H. Yoo, H. E. Byun, D. Han and J. H. Lee, Reinforcement 191 M. Hessel, J. Modayil, H. van Hasselt, T. Schaul, G.
learning for batch process control: Review and perspectives, Ostrovski, W. Dabney, D. Horgan, B. Piot, M. Azar and D.
Annu. Rev. Control, 2021, 52, 108–119. Silver, Rainbow: Combining improvements in deep
178 R. Sutton and A. Barto, Reinforcement Learning: An reinforcement learning, Thirty-second AAAI conference on
Introduction, MIT Press, 2nd edn, 2018. artificial intelligence, 2018, vol. 393, pp. 3215–3222.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
179 E. Pan, P. Petsagkourakis, M. Mowbray, D. Zhang and E. A. 192 X. Wang, Y. Gu, Y. Cheng, A. Liu and C. L. P. Chen,
del Rio-Chanona, Constrained model-free reinforcement Approximate policy-based accelerated deep reinforcement
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
learning for process optimization, Comput. Chem. Eng., learning, IEEE Transactions on Neural Networks and Learning
2021, 154, 107462. Systems, 2019, pp. 1–11.
180 J. M. Lee and J. H. Lee, Approximate dynamic 193 Z. Wang, H. Li and C. Chen, Incremental reinforcement
programming-based approaches for input-output data- learning in continuous spaces via policy relaxation and
driven control of nonlinear processes, Automatica, importance weighting, IEEE Transactions on Neural Networks
2005, 41(7), 1281–1288. and Learning Systems, 2019, pp. 1–14.
181 C. Peroni, N. Kaisare and J. Lee, Optimal control of a fed- 194 Y. Hu, W. Wang, H. Liu and L. Liu, Reinforcement learning
batch bioreactor using simulation-based approximate tracking control for robotic manipulator with kernel-based
dynamic programming, IEEE Trans. Control Syst. Technol., dynamic model, IEEE Transactions on Neural Networks and
2005, 13(5), 786–790. Learning Systems, 2019, pp. 1–9.
182 J. H. Lee and J. M. Lee, Approximate dynamic programming 195 W. Meng, Q. Zheng, L. Yang, P. Li and G. Pan, Qualitative
based approach to process control and scheduling, Comput. measurements of policy discrepancy for return-based deep
Chem. Eng., 2006, 30(10–12), 1603–1618. q-network, IEEE Transactions on Neural Networks and
183 W. Tang and P. Daoutidis, Distributed adaptive dynamic Learning Systems, 2019, pp. 1–7.
programming for data-driven optimal control, Syst. Control. 196 R. S. Sutton, D. McAllester, S. Singh and Y. Mansour, Policy
Lett., 2018, 120, 36–43. gradient methods for reinforcement learning with function
184 S. Sæmundsson, K. Hofmann and M. P. Deisenroth, Meta approximation, in Proceedings of the 12th International
reinforcement learning with latent variable gaussian processes, Conference on Neural Information Processing Systems,
2018. NIPS’99, MIT Press, Cambridge, MA, USA, 1999, pp. 1057–
185 S. Kamthe and M. Deisenroth, Data-efficient reinforcement 1063.
learning with probabilistic model predictive control, in 197 P. Facco, E. Tomba, F. Bezzo, S. García-Muñoz and M.
Proceedings of the Twenty-First International Conference on Barolo, Transfer of process monitoring models between
Artificial Intelligence and Statistics, ed. A. Storkey and F. different plants using latent variable techniques, Ind. Eng.
Perez-Cruz, Proceedings of Machine Learning Research, Chem. Res., 2012, 51(21), 7327–7339.
Playa Blanca, Lanzarote, Canary Islands, PMLR, 2018, vol. 198 A. Krizhevsky, I. Sutskever and G. E. Hinton, ImageNet
84, pp. 1701–1710. Classification with Deep Convolutional Neural Networks, in
186 D. Chaffart and L. A. Ricardez-Sandoval, Optimization and Advances in Neural Information Processing Systems 25, ed. F.
control of a thin film growth process: A hybrid first Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger,
principles/artificial neural network based multiscale Curran Associates, Inc., 2012, pp. 1097–1105.
modelling approach, Comput. Chem. Eng., 2018, 119, 199 O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
465–479. Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg and
187 H. Shah and M. Gopal, Model-Free Predictive Control of L. Fei-Fei, ImageNet Large Scale Visual Recognition Challenge,
Nonlinear Processes Based on Reinforcement Learning, Int. J. Comput. Vis., 2015, 115(3), 211–252.
IFAC-PapersOnLine, 2016, vol. 49, 1, pp. 89–94. 200 J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E.
188 V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Tzeng and T. Darrell, DeCAF: A Deep Convolutional Activation
Veness, M. G. Bellemare, A. Graves, M. Riedmiller, Feature for Generic Visual Recognition, 2013.
A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. 201 M. L. Darby, M. Nikolaou, J. Jones and D. Nicholson, RTO:
Sadik, I. Antonoglou, H. King, D. Kumaran, D. An overview and assessment of current practice, J. Process
Wierstra, S. Legg and D. Hassabis, Human-level control Control, 2011, 21(6), 874–884.
through deep reinforcement learning, Nature, 2015, 518, 202 M. M. Câmara, A. D. Quelhas and J. C. Pinto, Performance
529–533. evaluation of real industrial RTO systems, Processes,
189 M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, 2016, 4(4), 1–20.
P. Welinder, B. McGrew, J. Tobin, P. Abbeel and W. 203 T. E. Marlin and A. N. Hrymak, Real-time operations
Zaremba, Hindsight experience replay, arXiv, 2017, optimization of continuous processes, in AIChE Symposium
preprint, arXiv:1707.01495, https://arxiv.org/abs/1707.01495. Series - CPC-V, 1997, vol. 93, pp. 156–164.
1506 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
204 P. Tatjewski, Iterative optimizing set-point control – The 220 T. J. Ikonen, K. Heljanko and I. Harjunkoski,
basic principle redesigned, IFAC Proceedings Volumes, Reinforcement learning of adaptive online rescheduling
2002, 35(1), 49–54. timing and computing time allocation, Comput. Chem. Eng.,
205 B. Chachuat, B. Srinivasan and D. Bonvin, Adaptation 2020, 141, 106994.
strategies for real-time optimization, Comput. Chem. Eng., 221 M. Mowbray, D. Zhang and E. A. Del Rio Chanona,
2009, 33(10), 1557–1567. Distributional Reinforcement Learning for Scheduling of
206 A. Marchetti, B. Chachuat and D. Bonvin, Modifier- (Bio)chemical Production Processes, 2022, arXiv preprint
adaptation methodology for real-time optimization, Ind. arXiv:2203.00636.
Eng. Chem. Res., 2009, 48(13), 6022–6033. 222 C. Waubert de Puiseau, R. Meyes and T. Meisen, On
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
207 T. Piotr, et al., Iterative algorithms for multilayer optimizing reliability of reinforcement learning based production
control, World Scientific, 2005. scheduling systems: a comparative survey, J. Intell. Manuf.,
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1507
View Article Online
236 J. García and F. Fernández, A comprehensive survey on safe 252 A. G. D. G. Matthews, M. van der Wilk, T. Nickson, K. Fujii,
reinforcement learning, J. Mach. Learn. Res., 2015, 16(42), A. Boukouvalas, P. León-Villagrá, Z. Ghahramani and J.
1437–1480. Hensman, GPflow: A Gaussian process library using
237 P. Petsagkourakis, I. O. Sandoval, E. Bradford, F. Galvanin, TensorFlow, J. Mach. Learn. Res., 2017, 18, 1–6.
D. Zhang and E. A. del Rio-Chanona, Chance constrained 253 A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury,
policy optimization for process control and optimization, 2020. G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L.
238 M. Wen, Constrained Cross-Entropy Method for Safe Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M.
Reinforcement Learning, Neural Information Processing Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang,
Systems (NIPS), no. Nips, 2018. J. Bai and S. Chintala, Pytorch: An imperative style,
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
239 J. Achiam, D. Held, A. Tamar and P. Abbeel, Constrained high-performance deep learning library, in Advances in
Policy Optimization, 2017, arXiv preprint 1705.10528. Neural Information Processing Systems 32, ed. H.
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
240 C. Tessler, D. J. Mankowitz and S. Mannor, Reward Wallach, H. Larochelle, A. Beygelzimer, F. dAlché Buc,
Constrained Policy Optimization, 2018, arXiv preprint E. Fox and R. Garnett, Curran Associates, Inc., 2019,
1805.11074, 2016, pp. 1–15. pp. 8024–8035.
241 Gurobi Optimization, LLC, Gurobi Optimizer Reference 254 F. Chollet, et al., Keras, https://keras.io, 2015.
Manual, 2021. 255 M. Innes, Flux: Elegant machine learning with julia, J. Open
242 A. Wächter and L. T. Biegler, On the implementation of an Source Softw., 2018, 3, 60.
interior-point filter line-search algorithm for large-scale 256 D. Yuret, Knet: beginning deep learning with 100 lines of
nonlinear programming, Math. Program., 2006, 106(1), julia, in Machine Learning Systems Workshop at NIPS, 2016,
25–57. vol. 2016, p. 5.
243 The Mathworks, Inc., Natick, Massachusetts, MATLAB 257 L. McInnes, J. Healy, N. Saul and L. Grossberger, Umap:
version 9.11 (R2021b), 2021. Uniform manifold approximation and projection, J. Open
244 A. Engelmann, Y. Jiang, H. Benner, R. Ou, B. Houska and T. Source Softw., 2018, 3(29), 861.
Faulwasser, Aladin-α – an open-source matlab toolbox for 258 D. Lin, Multivariatestats documentation, 2018.
distributed non-convex optimization, 2021. 259 S. M. Lundberg and S.-I. Lee, A unified approach to
245 P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. interpreting model predictions, in Advances in Neural
Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Information Processing Systems 30, ed. I. Guyon, U. V.
Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan
K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. and R. Garnett, Curran Associates, Inc., 2017, pp. 4765–
Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, 4774.
J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. 260 M. T. Ribeiro, S. Singh and C. Guestrin, why should I trust
Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, you?: Explaining the predictions of any classifier, in
A. H. Ribeiro, F. Pedregosa, P. van Mulbregt and SciPy 1.0 Proceedings of the 22nd ACM SIGKDD International
Contributors, SciPy 1.0: Fundamental Algorithms for Conference on Knowledge Discovery and Data Mining, San
Scientific Computing in Python, Nat. Methods, 2020, 17, Francisco, CA, USA, 2016, pp. 1135–1144.
261–272. 261 E. Štrumbelj and I. Kononenko, Explaining prediction
246 C. Rackauckas and Q. Nie, Differentialequations.jl–a models and individual predictions with feature
performant and feature-rich ecosystem for solving differential contributions, Knowl. Inf. Syst., 2014, 41(3), 647–665.
equations in julia, J. Open Res. Softw., 2017, 5, 1–10. 262 W. R. Lacerda, L. P. C. da Andrade, S. C. P. Oliveira and
247 R. T. Q. Chen, Y. Rubanova, J. Bettencourt and D. S. A. M. Martins, Sysidentpy: A python package for system
Duvenaud, Neural ordinary differential equations, Adv. identification using narmax models, J. Open Source Softw.,
Neural Inf. Process. Syst., 2018, 31, 6571–6583. 2020, 5(54), 2384.
248 J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, 263 S. Diamond and S. Boyd, Cvxpy: A python-embedded
D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. modeling language for convex optimization, J. Mach. Learn.
Wanderman-Milne and Q. Zhang, JAX: composable Res., 2016, 17(1), 2909–2913.
transformations of Python+NumPy programs, 2018. 264 L. Beal, D. Hill, R. Martin and J. Hedengren, Gekko
249 C. Rackauckas, M. Innes, Y. Ma, J. Bettencourt, L. White optimization suite, Processes, 2018, 6(8), 106.
and V. Dixit, Diffeqflux.jl - A julia library for neural 265 I. Dunning, J. Huchette and M. Lubin, Jump: A modeling
differential equations, arXiv, 2019, preprint, language for mathematical optimization, SIAM Rev.,
arXiv:1902.02376, https://arxiv.org/abs/1902.02376. 2017, 59(2), 295–320.
250 GPy, GPy: A gaussian process framework in python, http:// 266 W. E. Hart, J.-P. Watson and D. L. Woodruff, Pyomo:
github.com/SheffieldML/GPy, since 2012. modeling and solving mathematical programs in python,
251 J. R. Gardner, G. Pleiss, D. Bindel, K. Q. Weinberger and Math. Program. Comput., 2011, 3(3), 219–260.
A. G. Wilson, Gpytorch: Blackbox matrix-matrix gaussian 267 The GpyOpt authors, GPyOpt: A bayesian optimization
process inference with gpu acceleration, in Advances in framework in python, http://github.com/SheffieldML/
Neural Information Processing Systems, 2018. GPyOpt, 2016.
1508 | React. Chem. Eng., 2022, 7, 1471–1509 This journal is © The Royal Society of Chemistry 2022
View Article Online
268 M. Balandat, B. Karrer, D. Jiang, S. Daulton, B. Letham, Stoica, Ray: A distributed framework for emerging ai
A. G. Wilson and E. Bakshy, Botorch: A framework for applications, 2018.
efficient monte-carlo bayesian optimization, Adv. Neural Inf. 276 E. Liang, R. Liaw, P. Moritz, R. Nishihara, R. Fox, K.
Process. Syst., 2020, 33, 21524–21538. Goldberg, J. E. Gonzalez, M. I. Jordan and I. Stoica,
269 N. Knudde, J. van der Herten, T. Dhaene and I. Couckuyt, Rllib: Abstractions for distributed reinforcement learning,
Gpflowopt: A bayesian optimization library using 2018.
tensorflow, 2017, arXiv preprint arXiv:1711.03845. 277 G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J.
270 J. A. E. Andersson, J. Gillis, G. Horn, J. B. Rawlings and M. Schulman, J. Tang and W. Zaremba, Openai gym, 2016,
Diehl, CasADi – A software framework for nonlinear arXiv preprint arXiv:1606.01540.
This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
optimization and optimal control, Math. Program. Comput., 278 J. Tian and other contributors, Reinforcementlearning.jl: A
2019, 11(1), 1–36. reinforcement learning package for the julia programming
Open Access Article. Published on 21 April 2022. Downloaded on 7/18/2024 9:03:24 AM.
This journal is © The Royal Society of Chemistry 2022 React. Chem. Eng., 2022, 7, 1471–1509 | 1509