Zhang 2018
Zhang 2018
Abstract: This paper studies product yields forecasting for fluid catalytic cracking unit (FCCU). Conventional product yields
forecasting is usually based on mechanism model, which may ignore some significant factors due to manual approximations.
Deep learning methods can extract features automatically based on data without prior knowledge. Considering bidirectional
temporal features and spatial features of FCCU, deep bidirectional long-short-term memory (DBLSTM) network is proposed
for product yields forecasting. The bidirectional structure can capture bidirectional temporal features of FCCU by considering
previous information as well as future information over a period of time. Significant spatial features of sensors at each time
step can be extracted automatically through a deep structure by stacking multiple bidirectional structures. Moreover, the deep
bidirectional LSTM network can deal with long-term dependencies by integrating deep bidirectional structure with LSTM cell.
To avoid overfitting, regularization adopted in this paper is dropout and early stopping. Efficacy of the DBLSTM approach is
demonstrated by process data from an actual FCCU in China. Through the comparison of mean squared error on product yields
forecasting, the DBLSTM approach is superior to traditional regression models and other recurrent models.
Key Words: product yields forecasting, bidirectional temporal features, spatial features, DBLSTM, FCCU
8013
2 Related Works 2.2 LSTM Cell
In this section, previous works related to RNN are Simple RNN is trained via backpropagation through time,
overviewed, including deep bidirectional RNN and LSTM which can be seen as a very deep network unfolded in the
cells. RNN, as a class of deep learning model as well as a time step, hence it causes gradients vanishing and exploding.
pure data-driven model, has been extensively used in sequen- Long-short-term memory network (LSTM) can deal with the
tial data modeling for speech recognition and natural lan- problem and prevent back-propagated errors from vanishing
guage processing [9,10]. One shortcoming of conventional or exploding [13]. A single LSTM cell is shown in Fig. 2,
RNN is that they can only use previous information. How- where ’◦’ denotes the Hadamard product.
ever, in off-line modeling for FCC unit, data are collected
over a period of time from the real-time database. Accord-
ingly, previous information as well as future information can
be obtained in that certain period. To extract bidirectional
temporal information, bidirectional RNN (BRNN) can read
data in both two directions with two separate hidden layers,
which are fed to the same output later[11]. Besides, spatial
features in different locations of FCCU should be extracted
as well. Since a deep structure can extract high level repre-
sentation[12], deep bidirectional RNN (DBRNN) is utilized
for extracting spatial features.
input to input gate, forget gate, output gate, and cell state
connections, and they are parameterized by weight matrices
Ui , Uf , Uo , Uc , respectively. It also has hidden state to input
gate, forget gate, output gate, and cell state connections, and
they are parameterized by weight matrices Wi , Wf , Wo , Wc ,
respectively. LSTM cell is described by the equations (6)-
(11):
i(t) = σ(Ui X(t) + Wi h(t − 1)), (6)
f (t) = σ(Uf X(t) + Wf h(t − 1)), (7)
o(t) = σ(Uo X(t) + Wo h(t − 1)), (8)
c̃(t) = tanh(Uc X(t) + Wc h(t − 1)), (9)
The lth layer of input to hidden connections are param- where ’◦’ denotes the Hadamard product, σ is sigmoid func-
→
− ←− tion, tanh is hyperbolic tangent function, i(t) is the input
eterized by weight matrices U l and U l . The lth layer of
gate, f (t) is the forget gate, o(t) is the output gate, c(t) is
hidden to hidden recurrent connections are parameterized by
−→ ←
− the final cell state, and h(t) is the hidden state. i(t) deter-
weight matrices W l and W l . The lth layer of hidden to out- mines how much of X(t) matters, f (t) decides how much
→
−
put connections are parameterized by weight matrices V l of c(t − 1) (the last cell state) should be forgotten, o(t) de-
←−l
and V , and deep bidirectional propagation in this model is termines how much c(t) should be exposed, and c̃(t) shows
defined as follows: how to compute new cell state.
→l
− −→− → →
− →
−
h (t)input = W l h l (t − 1) + U l Ỹ l−1 (t) + b lin , (1) 3 Formulation of Product Yields Forecasting
−
→l →
− Problem
h (t) = f1 ( h l (t)input ), (2)
←−l ←−l ←−l ←−l l−1 ←−l There are three subsystems in a FCC unit: reactor-
h (t)input = W h (t − 1) + U Ỹ (t) + b in , (3) regenerator system, fractionating system and absorption sta-
←−l ←−l
h (t) = f1 ( h (t)input ), (4) bilization system. But here, we consider FCC unit as
l →l −
− →l ←−l ←
−l l
a whole, and it is defined as a multi-input-multi-output
Ỹ (t) = f2 ( V h (t) + V h (t) + bout ). (5) (MIMO) system (shown in Fig. 3). The input variables
The final output is Ỹ N , N is the total number of layers, mainly include three categories: feedstock, temperature and
and l = 1, 2, · · · , N . The only difference with BRNN is that pressure, which are the major concerns affecting product
the input of the lth layer is the output of (l − 1) th layer. yields. Product flowrate means the flow rate of Gasoline,
8014
the purpose of training model is to find the nonlinear map
as accurate as possible (shown in Fig. 4(a)). Finally, short-
term forecasting process can be described according to the
nonlinear map f (·):
Ỹ (t) = f [x(t), y(t − 1)] , (18)
Fig. 3: Main categories of process variables in FCC unit Ỹ (t + p) = f x(t + p), Ỹ (t + p − 1) , (19)
where p = 1, 2, · · · , P , and P is the total number of fore-
Table 1: Physical unit of input and output variables casting step, satisfying P ∈ {P |0 < P ≤ τ, P ∈ Z}. Ac-
Category Notation Physical Unit cordingly, a recursive forecasting of production yields can
Temperature T ◦
C be achieved (shown in Fig. 4(b)), where X and Y denote
Input Variables Feedstock Q t/h the inputs and outputs of the model, respectively, and L is
Pressure P kP a
the total time steps of inputs. In recursive forecasting pro-
Product Flowrate R t/h
Output Variables cess, data can only be obtained from real-time database be-
Product Yields Y %
fore time step k2 + t − 1, which means the product yields
forecasting can be earlier than the real-time database.
liquefied petroleum gas (LPG), diesel and coke. Table 1
presents physical unit of each category.
We consider a system with m dimensional inputs and n
dimensional outputs. In order to establish a relationship be-
tween history inputs and future outputs, time delay should
8015
bidirectional input weight matrices and state weight matrices 4.2 The Training of DBLSTM
are: (1) Model Loss
−
→ −
→ − → − → − → ←
− ←
− ← − ← − ← − When training the model, mean squared error (MSE) is
W l = W li , W lc , W lf , W lo , W l = W li , W lc , W lf , W lo , (20) utilized as the loss to measure how far the predicted produc-
tion yields is from the real production yields. In training
− l −
→ → − → − → − → ←
− ←
− ← − ← − ← − phase, data are divided into training set, validation set, and
U = U li , U lc , U lf , U lo , U l = U li , U lc , U lf , U lo , (21)
test set, which gives learning MSE, validation MSE, and test
where l is the lth layer of the deep structure, and l = MSE, and they can be defined by equations (36)-(38):
1, 2, · · · , N . The lth layer of forward deep LSTM is : n Ltrain
1 1
Learning M SE = (ỸiN (t) − Yi (t))2 ,
−l
→ →
− →−
− → n Ltrain
i (t) = σ( U li Ỹ l−1 (t) + W li h l (t − 1)), (22) i=1 t=1
(36)
n Lval
1 1
−l
→ →
− → −
− → V alidation M SE = (ỸiN (t) − Yi (t))2 ,
f (t) = σ( U lf Ỹ l−1 (t) + W lf h l (t − 1)), (23) n i=1
Lval t=1
(37)
−
→ →
− → −
− → 1
n
1
Ltest
o l (t) = σ( U lo Ỹ l−1 (t) + W lo h l (t − 1)), (24) T est M SE = (ỸiN (t) − Yi (t))2 , (38)
n i=1
Ltest t=1
−l
→ →
− → −
− → where Ltrain , Lval , and Ltest are the length of training set,
c̃ (t) = tanh( U lc Ỹ l−1 (t) + W lc h l (t − 1)), (25)
validation set, and test set, respectively, n is the dimension of
−c l (t) = −
→ →l →
− →
− output, ỸiN is the ith dimensional final output of DBLSTM
f (t) ◦ −
→
c l (t − 1) + i l (t) ◦ c̃ l (t), (26) as well as the predicted production yields, and Yi is the ith
dimensional real product yields. In the short-term forecast-
−l
→
h (t) = −
→
o l (t) ◦ tanh(−
→
c l (t)). (27) ing process described in equations (18) and (19), we use
forecast MSE to measure the error between short-term fore-
Similarly, the lth layer of backward deep LSTM can be casting value and real value:
given as follows: Lf ore
n
1 1
←
−l ←− −←
← − F orecast M SE = (ỸiN (t) − Yi (t))2 , (39)
i (t) = σ( U li Ỹ l−1 (t) + W li h l (t − 1)), (28) n i=1
Lf ore t=1
←
− ←
− ←− ←− gk+1 = ∇θ fk+1 (θk ), (40)
c l (t) = f l (t) ◦ ←
−
c l (t − 1) + i l (t) ◦ c̃ l (t), (32)
mk+1 = β1 · mk + (1 − β1 ) · gk+1 , (41)
←
−
h l (t) = ←
−
o l (t) ◦ tanh(←
−
c l (t)). (33) vk+1 = β2 · vk + (1 − β2 ) · 2
gk+1 , (42)
The output of lth layer is: m̂k+1 = mk+1 /(1 − β1k+1 ), (43)
→−
− → −←
← − v̂k+1 = vk+1 /(1 − β2k+1 ), (44)
Ỹ l (t) = F ( V l h l (t) + V l h l (t) + blout ). (34)
θk+1 = θk − α · m̂k+1 /( v̂k+1 + ), (45)
Since l = 1, 2, ..., N ,the final output is where gk+1 is the stochastic gradient w.r.t θk at epoch k + 1,
2
gk+1 indicates the Hadamard product, m is the first moment
→ −
− → − ←
← −
Ỹ (t) = Ỹ N (t) = F ( V N h N (t) + V N h N (t) + bN
out ), (35) vector, v is the second moment vector, k is the kth epoch, α
is learning rate, β1 is the exponential decay rates for the first
where F (·) is the output activation function. Because prod- moment estimates, β2 is the exponential decay rates for the
uct yields forecasting is a regression problem and continuity second moment estimates, is a very small value in order
is required, a linear function is selected as the output activa- to avoid overflow. The parameters settings are as follows:
tion function. α = 0.001, β1 = 0.9, β2 = 0.999, = 10−8 , and the initial
So far, we have obtained the model of the proposed values of m, v, θ, k are set to zero.
DBLSTM by equation (22)-(35). Via DBLSTM, the tem- (3)Regularization
poral features are extracted automatically by the recurrent To avoid overfitting, the regularization adopted in this pa-
structure, the spatial features at each time step can be ex- per is dropout [15]. The term dropout means temporarily
tracted automatically through the deep structure, and the removing units in a neural network along with their incom-
long-term dependency problem can be solved by LSTM cell. ing and outgoing connections. The removed units are chosen
8016
randomly in a fixed probability p when training, and thus of Gasoline, liquefied petroleum gas (LPG),diesel and coke,
we can obtain a thinner network. When it comes to test- respectively. Q8 is the fresh feed flow rate, Q9 is the incom-
ing, we do not consider dropout anymore, but the sum of all ing residuum flow rate, and Q10 is the incoming sump oil
units outputs will be multiplied by the retention probability flow rate. Gasoline, LPG, and diesel are three main products
(1 − p), which give the final outputs. In this way, different and coke is the byproduct, so here n = 4. Hence, the input
epochs produce various network structures and correspond- dimension is 72 and output dimension is 4 according to the
ing overfittings. The process of dropout is to average dif- equation (16) and (17).
ferent networks and counteract the reversed overfitting. Be- In this paper, data are obtained from the real-time data
sides, since the update of weights may be depended on a few base in May, 2017. After data preprocessing, which includes
neurons with fixed relationships, dropout can also avoid the outliers deletion and normalization, 13100 samples are left
combined effects among neurons. totally. There are 11000 samples in training set, 1000 sam-
Apart from dropout, another regularization used in this pa- ples in validation set, 1000 samples in test set, and 100 sam-
per is early stopping [16]. Through early stopping, train- ples in forecast set.
ing stops when the validation loss stops decreasing, so it will
5.2 Experimental Design
decide the number of the epoch in the experiments later.
Since support vector regression (SVR) and multilayer per-
5 Experiments ceptron (MLP) are the state-of-art regression models, and
In this section, the performances of the proposed RNN, BRNN, and DBRNN are earlier recurrent models,
DBLSTM as well as other methods are evaluated by con- they are chosen to be compared with the proposed DBLSTM
ducting actual experiments on FCC unit. First, data acquisi- for production yields forecasting.
tion is detailed according to real FCC unit. Then, the exper- For a fair comparison, the followings are some settings:
iment is designed with some fair settings. Finally, the com- the structure of MLP, DBRNN and DBLSTM are both set
parison of MSE as well as product yields forecasting results to 72-50-50-50-4, and the hidden neurons of RNN, BRNN
are presented. are all set to 50 as well. Apart from SVR, the epoch of the
other methods is decided by early stopping. The batch size
5.1 Data Acquisition of MLP and recurrent models is set to 10. The time steps
Our data are all obtained from the real-time database of of recurrent models are all set to 10. We choose Adam as
an actual FCC unit in Sinopec Jiujiang Company, China. In the optimization algorithm when training MLP and recurrent
order to make the best use of data, most sensor measure- models. In deep structures (MLP, DBRNN and DBLSTM),
ments in three subsystems are chosen as input variables, as the dropout probability of the first layer and the second layer
is shown in Table 2. is 0.1 and 0.3, respectively.
Note that MLP and recurrent models are performed using
Table 2: Tags of input variables Keras, SVR is running on Matlab, and all experiments are
NO. Category Tag Total number carried out on a PC with an i5-6400 CPU Intel core.
T1∼ T16
Reactor-regenerator 5.3 Experimental Results On Product Yields Forecast-
1 Q1∼ Q10 29
system ing
P1∼ P3
2 Fractionating system T17∼ T35 19 Table 4 presents the comparison of MSE in different data
Absorption-stabilization sets. In general, the models are divided into regression mod-
3 T36∼ T55 20
system
els and recurrent models. SVR and three layered MLP and
are regression models, and they are evaluated by learning
In Table 2, T denotes temperature, Q denotes feed stock, MSE (LMSE) in the training set, validation MSE (VMSE)
P denotes pressure, and their physical units have been shown in the validation set, and test MSE (TMSE) in the test set.
in Table 1. Tags of output variables are illustrated in Table Since regression models cannot capture temporal features,
3: we do not consider their forecast MSE (FMSE) in the fore-
cast set. Recurrent models are RNN, BRNN, DBRNN and
Table 3: Tags of output variables DBLSTM, which are evaluated by LMSE, VMSE, TMSE
NO. Category Tag Total number and FMSE, and the most important indicator is FMSE.
1 Product Flowrate R1∼ R4 4 DBRNN1 and DBLSTM1 denote the deep structure without
2 Product Yields Y1 ∼ Y4 4 dropout, DBRNN2 and DBLSTM2 denote the deep struc-
ture with dropout. The epoch of MLP and recurrent mod-
Product flowrate is directly obtained from real-time els is decided by early stopping. As Table 4 shows, re-
database of FCC unit, but product yields are what we need current models are superior to traditional regression models,
finally. The relationship between product flowrate and prod- and DBLSTM with dropout(DBLSTM2) outperforms other
uct yields is defined by the following prior knowledge: deep bidirectional recurrent models in FMSE.
Hence, DBLSTM with dropout is the adopted model for
R(i)
Y (i) = × 100%, (46) product yields forecasting. The model is tested by real in-
Q8 + Q9 + Q10
puts and real last-step outputs. Because the test data are too
where i = 1, 2, · · · , n, Y (i) is the yields of ith product, and large to show, test results of 100 samples are illustrated in
the physical unit of product yields can also be seen in Table Fig. 7. In the short-term forecasting process, the DBLSTM
1. R(i) is the flow rate of ith product, which is the flow rate model is used for generating predicted outputs as a part of
8017
inputs (equation(19)). In this way, a recursive forecasting is References
realized, and forecasting results are shown in Fig. 8. Since [1] J. Gary, G. Handwerk, and M. Kaiser, Petroleum Refining
errors between real outputs and predicted outputs will accu- Technology and Economics, Florida: CRC Press, chapter 1,
mulate, we just consider short-term forecasting, and here the 2007.
total number of forecasting step is 20. [2] R. Sadeghbeigi, Fluid Catalytic Cracking Handbook: An Ex-
pert Guide to the Practical Operation, Design, and Optimiza-
Table 4: MSE comparison of different methods tion of FCC Units, Elsevier, 2012.
Category Methods LMSE VMSE TMSE FMSE Epoch [3] P. Kadlec, B. Gabrys, S. Strandt, Data-driven soft sensors in
Regression SVR 0.30 4.6 3.2 15715 the process industry, Computers and Chemical Engineering,
Models MLP 0.47 0.48 0.40 61 33(4): 795-814, 2009.
RNN 0.26 0.59 0.40 5.91 179 [4] W. Yan, H. Shao, and X. Wang, Soft sensing modeling based
BRNN 0.11 0.57 0.35 1.51 159
Recurrent
DBRNN1 0.026 0.47 0.21 1.33 193
on support vector machine and bayesian model selection,
Models Computers and Chemical Engineering, 28(8): 1489-1498,
DBRNN2 0.09 0.33 0.18 1.03 386
DBLSTM1 0.023 0.45 0.23 1.28 353 2004.
The Model DBLSTM2 0.09 0.41 0.17 0.68 263 [5] O. Nelles, Nonlinear System Identification: From Classical
Approaches to Neural Networks and Fuzzy Models, Springer,
pp.239, 2013.
[6] G. E. Hinton, S. Osindero, and Y. W. Teh, A fast learning
18 50 algorithm for deep belief nets, Neural Computation, 18(7):
Real Value Real Value
16 Predicted Value 45 Predicted Value 1527-1554, 2006.
Gasoline(%)
14
LPG(%)
18 4.0
Coke (%)
2017.
16 3.5
14 3.0 [9] B. Peng, K. Yao, L. Jing, and K. Wong, Recurrent neural net-
12 2.5 works with external memory for spoken language understand-
10 2.0 ing, arXiv: Computation and Language, 25-35, 2015.
0 20 40 60 80 100 0 20 40 60 80 100
Time Step Time Step [10] X. Li and X. Wu, Constructing long short-term memory based
deep recurrent neural networks for large vocabulary speech
Fig. 6: Model testing: product yields estimation using real
recognition. International Conference on Acoustics, Speech,
last-step outputs and Signal Processing, 4520-4524, 2015.
[11] M. Schuster and K. Paliwal, Bidirectional recurrent neural
25 60 networks, IEEE Transactions on Signal Processing, 45(11):
20
Real Value 55 Real Value 2673-2681, 1997.
Forecasting Value 50 Forecasting Value
Gasoline(%)
40
10 35 521(7553): 436-444, 2015.
5 30 [13] S. Hochreiter and J. Schmidhuber, Long short-term memory,
25
0 20 Neural Computation, 9(8): 1735-1780, 1997.
0 5 10 15 20 0 5 10 15 20
Time Step Time Step [14] D. P. Kingma, J. L. Ba, Adam: a method for stochastic opti-
30 8 mization, International Conference on Learning Representa-
Real Value 7 Real Value
25 Forecasting Value 6 Forecasting Value
Diesel Oil(%)
5 tions, 2015.
Coke (%)
6 Conclusion
This paper proposed a DBLSTM approach for product
yields forecasting based on FCC unit. Bidirectional temporal
features, as well as spatial features of sensors in FCC unit,
can be extracted automatically by DBRNN, and long-term
dependencies can be captured by LSTM cells. The com-
parison of MSE verifies that DBLSTM is superior to tradi-
tional regression models and other recurrent models in prod-
uct yields forecasting. In future work, an innovation of RNN
structure will be further considered for online yiels forecast-
ing.
8018