0% found this document useful (0 votes)
8 views

Zhang 2018

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Zhang 2018

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the 37th Chinese Control Conference

July 25-27, 2018, Wuhan, China

Product Yields Forecasting for FCCU via Deep Bi-directional


LSTM Network
Xu Zhang1 , Yuanyuan Zou1 , Shaoyuan Li1 ∗ , Shenghu Xu2
1. Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing,
Ministry of Education of China, Shanghai 200240, China
E-mail: syli@sjtu.edu.cn
2. Sinopec Jiujiang Company, Jiujiang 332000, China

Abstract: This paper studies product yields forecasting for fluid catalytic cracking unit (FCCU). Conventional product yields
forecasting is usually based on mechanism model, which may ignore some significant factors due to manual approximations.
Deep learning methods can extract features automatically based on data without prior knowledge. Considering bidirectional
temporal features and spatial features of FCCU, deep bidirectional long-short-term memory (DBLSTM) network is proposed
for product yields forecasting. The bidirectional structure can capture bidirectional temporal features of FCCU by considering
previous information as well as future information over a period of time. Significant spatial features of sensors at each time
step can be extracted automatically through a deep structure by stacking multiple bidirectional structures. Moreover, the deep
bidirectional LSTM network can deal with long-term dependencies by integrating deep bidirectional structure with LSTM cell.
To avoid overfitting, regularization adopted in this paper is dropout and early stopping. Efficacy of the DBLSTM approach is
demonstrated by process data from an actual FCCU in China. Through the comparison of mean squared error on product yields
forecasting, the DBLSTM approach is superior to traditional regression models and other recurrent models.
Key Words: product yields forecasting, bidirectional temporal features, spatial features, DBLSTM, FCCU

1 Introduction network(DBN). In [8], the oxygen content of flue gasses is


estimated in 1000-MW ultra superficial units via a new soft
Fluid catalytic cracking (FCC) is an important secondary sensor modeling method that integrates denoising autoen-
process of petroleum refining process [1]. The main purpose coders with a neural network(DAE-NN). But both of them
of FCCU is to convert high-boiling petroleum fractions to ignore temporal features of industrial data. In FCCU, a pe-
high-value transportation fuels (gasoline, jet fuel, and diesel) riod time of data can be obtained from the real-time database,
in the condition of suitable temperature, pressure, and cata- which means that bidirectional temporal features in that pe-
lyst [2]. Product yields forecasting plays a significant role in riod should be considered. Moreover, sensors in different lo-
quality monitoring as well as process safety. cations of FCCU have their own spatial features, which also
Currently, product yields forecasting can be achieved by should be extracted. In this paper, we propose a deep bidi-
mechanism models and data-driven methods. In FCCU, rectional long-short-term memory (DBLSTM) network for
since some microchemical reaction equations need manual product yields forecasting based on FCCU, which integrates
approximations based on prior knowledge, significant fac- deep bidirectional recurrent neural network (DBRNN) with
tors may be ignored and mechanism models do not work LSTM cell for time series modeling. The main contributions
well. Data-driven methods are independent of prior knowl- of this paper are as follows:
edge and have been proved to be effective in the process (1) Bidirectional temporal features of FCCU are extracted
industries[3]. The most representative examples are sup- automatically by considering the previous information as
port vector regression (SVR) and artificial neural network well as the future information over a period of time using
(ANN). Real-time database of FCCU provides lots of sam- bidirectional RNN (BRNN). (2) Significant spatial features
ples, but as the number of samples increases, the computa- of sensors in FCCU at each time step can be extracted au-
tional complexity of SVR grows exponentially [4]. Shallow tomatically through a deep structure by stacking multiple
ANN is always effective on modeling as long as there are BRNNs. (3) Considering long-term dependencies, LSTM is
enough samples [5]. But shallow ANN contains only a small chosen as the cell of DBRNN, thus forms DBLSTM. Process
number of non-linear operations, so it does not have the ca- data from a real FCCU are used to verify the effectiveness of
pacity to describe microchemical reactions, fractional distil- DBLSTM, and we have also compared the approach with
lation process as well as the absorption-stabilization process other state-of-art methods.
of FCCU. Deep ANN suffers from uncontrolled convergence This paper is organized as follows. In Section 2, previous
speed and local optima, and both are bothering researchers works related to RNN as well as LSTM cell are overviewed.
until the rise of deep learning [6]. FCC process and product yields forecasting problem are de-
Recently, deep learning has also been considered as an scribed in Section 3. The modeling and training procedures
effective data-driven modeling method [7,8]. In [7], the of DBLSTM for FCCU are detailed in Section 4. Experi-
heavy diesel 95% cut point of a crude distillation unit is es- ments are employed to demonstrate the efficacy of the pro-
timated by developing regression models with deep belief posed approach in Section 5. The final section gives con-
cluding remarks.
This work is supported by National Natural Science Foundation of
China (NSFC) under Grant (61590924, 61773162), and Natural Science
Foundation of Shanghai (18ZR1420000).

8013
2 Related Works 2.2 LSTM Cell
In this section, previous works related to RNN are Simple RNN is trained via backpropagation through time,
overviewed, including deep bidirectional RNN and LSTM which can be seen as a very deep network unfolded in the
cells. RNN, as a class of deep learning model as well as a time step, hence it causes gradients vanishing and exploding.
pure data-driven model, has been extensively used in sequen- Long-short-term memory network (LSTM) can deal with the
tial data modeling for speech recognition and natural lan- problem and prevent back-propagated errors from vanishing
guage processing [9,10]. One shortcoming of conventional or exploding [13]. A single LSTM cell is shown in Fig. 2,
RNN is that they can only use previous information. How- where ’◦’ denotes the Hadamard product.
ever, in off-line modeling for FCC unit, data are collected

over a period of time from the real-time database. Accord- 

ingly, previous information as well as future information can 
 
be obtained in that certain period. To extract bidirectional
temporal information, bidirectional RNN (BRNN) can read

data in both two directions with two separate hidden layers, 
which are fed to the same output later[11]. Besides, spatial  
features in different locations of FCCU should be extracted  

as well. Since a deep structure can extract high level repre- 

sentation[12], deep bidirectional RNN (DBRNN) is utilized

for extracting spatial features. 

2.1 Deep Bidirectional RNN Fig. 2: LSTM cell


Bidirectional temporal features as well as spatial features
at each time step can be extracted by DBRNN. The lth layer In Fig. 2, input gate i(t), forget gate f (t), output gate
of DBRNN is illustrated in Fig. 1. The input of the lth layer o(t), and cell state c(t) are newly added to the basic RNN
is the output of the (l − 1)th layer. cell, which only has a hidden state h(t). The LSTM cell has

  input to input gate, forget gate, output gate, and cell state
connections, and they are parameterized by weight matrices
Ui , Uf , Uo , Uc , respectively. It also has hidden state to input
gate, forget gate, output gate, and cell state connections, and
they are parameterized by weight matrices Wi , Wf , Wo , Wc ,
respectively. LSTM cell is described by the equations (6)-
(11):
i(t) = σ(Ui X(t) + Wi h(t − 1)), (6)
f (t) = σ(Uf X(t) + Wf h(t − 1)), (7)
o(t) = σ(Uo X(t) + Wo h(t − 1)), (8)
c̃(t) = tanh(Uc X(t) + Wc h(t − 1)), (9)

  c(t) = f (t) ◦ c(t − 1) + i(t) ◦ c̃(t), (10)


Fig. 1: Deep bidirectional recurrent neural network h(t) = o(t) ◦ tanh(c(t)), (11)

The lth layer of input to hidden connections are param- where ’◦’ denotes the Hadamard product, σ is sigmoid func-

− ←− tion, tanh is hyperbolic tangent function, i(t) is the input
eterized by weight matrices U l and U l . The lth layer of
gate, f (t) is the forget gate, o(t) is the output gate, c(t) is
hidden to hidden recurrent connections are parameterized by
−→ ←
− the final cell state, and h(t) is the hidden state. i(t) deter-
weight matrices W l and W l . The lth layer of hidden to out- mines how much of X(t) matters, f (t) decides how much


put connections are parameterized by weight matrices V l of c(t − 1) (the last cell state) should be forgotten, o(t) de-
←−l
and V , and deep bidirectional propagation in this model is termines how much c(t) should be exposed, and c̃(t) shows
defined as follows: how to compute new cell state.
→l
− −→− → →
− →

h (t)input = W l h l (t − 1) + U l Ỹ l−1 (t) + b lin , (1) 3 Formulation of Product Yields Forecasting

→l →
− Problem
h (t) = f1 ( h l (t)input ), (2)
←−l ←−l ←−l ←−l l−1 ←−l There are three subsystems in a FCC unit: reactor-
h (t)input = W h (t − 1) + U Ỹ (t) + b in , (3) regenerator system, fractionating system and absorption sta-
←−l ←−l
h (t) = f1 ( h (t)input ), (4) bilization system. But here, we consider FCC unit as
l →l −
− →l ←−l ←
−l l
a whole, and it is defined as a multi-input-multi-output
Ỹ (t) = f2 ( V h (t) + V h (t) + bout ). (5) (MIMO) system (shown in Fig. 3). The input variables
The final output is Ỹ N , N is the total number of layers, mainly include three categories: feedstock, temperature and
and l = 1, 2, · · · , N . The only difference with BRNN is that pressure, which are the major concerns affecting product
the input of the lth layer is the output of (l − 1) th layer. yields. Product flowrate means the flow rate of Gasoline,

8014
the purpose of training model is to find the nonlinear map
      as accurate as possible (shown in Fig. 4(a)). Finally, short-
   
  term forecasting process can be described according to the
    nonlinear map f (·):

  
    Ỹ (t) = f [x(t), y(t − 1)] , (18)
 
Fig. 3: Main categories of process variables in FCC unit Ỹ (t + p) = f x(t + p), Ỹ (t + p − 1) , (19)
where p = 1, 2, · · · , P , and P is the total number of fore-
Table 1: Physical unit of input and output variables casting step, satisfying P ∈ {P |0 < P ≤ τ, P ∈ Z}. Ac-
Category Notation Physical Unit cordingly, a recursive forecasting of production yields can
Temperature T ◦
C be achieved (shown in Fig. 4(b)), where X  and Y  denote
Input Variables Feedstock Q t/h the inputs and outputs of the model, respectively, and L is
Pressure P kP a
the total time steps of inputs. In recursive forecasting pro-
Product Flowrate R t/h
Output Variables cess, data can only be obtained from real-time database be-
Product Yields Y %
fore time step k2 + t − 1, which means the product yields
forecasting can be earlier than the real-time database.
liquefied petroleum gas (LPG), diesel and coke. Table 1
presents physical unit of each category.
 
We consider a system with m dimensional inputs and n 
 
dimensional outputs. In order to establish a relationship be-   
tween history inputs and future outputs, time delay should  

be considered. In other words, if inputs start from time step 


 
 
k1 , and its corresponding outputs should start from time step
k2 , satisfying k2 = k1 + τ . Taking time delay into consider-   
(a) Training process (b) Recursive forecasting process
ation, m dimensional input variables X and n dimensional
output variables Y can be described as follows: Fig. 4: The training and recursive forecasting of product
⎛ ⎞ yields model
x1 (k1 ) x1 (k1 + 1) ··· x1 (k1 + L − 1)
⎜ x2 (k1 ) x2 (k1 + 1) ··· x2 (k1 + L − 1) ⎟
⎜ ⎟
Xm×L =⎜ .. .. .. .. ⎟,
⎝ . . . . ⎠ 4 DBLSTM Network
xm (k1 ) xm (k1 + 1) ··· xm (k1 + L − 1)
(12) In this section, modeling and training procedures of
⎛ ⎞
y1 (k2 ) y1 (k2 + 1) ··· y1 (k2 + L − 1) DBLSTM for FCC unit are presented in detail, which
⎜ y2 (k2 ) y2 (k2 + 1) ··· y2 (k2 + L − 1) ⎟
⎜ ⎟ include DBLSTM formulation, model loss, optimization
Yn×L = ⎜ . . .. . ⎟, (13)
⎝ .. .. . .. ⎠ method and regularization.
yn (k2 ) yn (k2 + 1) ··· yn (k2 + L − 1)
4.1 DBLSTM for Product Yields Forecasting
where k2 = k1 + τ, k1 ∈ {k1 |k1 ≥ 1, k1 ∈ Z}, k2 ∈
{k2 |k2 ≥ 1, k2 ∈ Z}, τ ∈ {τ |τ ≥ 1, τ ∈ Z} and L is the Substituting LSTM cell for hidden state of DBRNN, the
length of data as well as the total time steps. At time step lth layer of DBLSTM is shown in Fig. 5.
t, m dimensional input variables and n dimensional output
variables are:  
⎛ ⎞
x1 (k1 + t − 1)
⎜ .. ⎟
x(t) = ⎝ . ⎠, (14)
xm (k1 + t − 1)
⎛ ⎞
y1 (k2 + t − 1)
⎜ .. ⎟
y(t) = ⎝ . ⎠, (15)
yn (k2 + t − 1)
where t = 1, · · · , L. Since x(t) cannot give the state of the
system at time step t, y(t − 1) is chosen as state, which can
also achieve recursive forecasting later. Thus, the final inputs
and outputs of the model are:  
Fig. 5: The lth layer of DBLSTM
X(t) = [x(t); y(t − 1)] , (16)
The input to input gate, input to cell, input to forget gate
Y (t) = y(t). (17)
and input to output gate connection weights are defined as
In order to compute the model loss later in Section 4, we input weight matrices. The state to input gate, state to
define Yi (t) = yi (k2 + t − 1), i = 1, 2, · · · , n. Supposing cell, state to forget gate and state to output gate connec-
that there exists nonlinear map f (·) between X(t) and Y (t), tion weights are defined as state weight matrices. Therefore,

8015
bidirectional input weight matrices and state weight matrices 4.2 The Training of DBLSTM
are: (1) Model Loss

→ −
→ − → − → − → ←
− ←
− ← − ← − ← − When training the model, mean squared error (MSE) is
W l = W li , W lc , W lf , W lo , W l = W li , W lc , W lf , W lo , (20) utilized as the loss to measure how far the predicted produc-
tion yields is from the real production yields. In training
− l −
→ → − → − → − → ←
− ←
− ← − ← − ← − phase, data are divided into training set, validation set, and
U = U li , U lc , U lf , U lo , U l = U li , U lc , U lf , U lo , (21)
test set, which gives learning MSE, validation MSE, and test
where l is the lth layer of the deep structure, and l = MSE, and they can be defined by equations (36)-(38):
1, 2, · · · , N . The lth layer of forward deep LSTM is : n Ltrain
1 1
Learning M SE = (ỸiN (t) − Yi (t))2 ,
−l
→ →
− →−
− → n Ltrain
i (t) = σ( U li Ỹ l−1 (t) + W li h l (t − 1)), (22) i=1 t=1
(36)
n Lval
1 1
−l
→ →
− → −
− → V alidation M SE = (ỸiN (t) − Yi (t))2 ,
f (t) = σ( U lf Ỹ l−1 (t) + W lf h l (t − 1)), (23) n i=1
Lval t=1
(37)

→ →
− → −
− → 1
n
1
Ltest
o l (t) = σ( U lo Ỹ l−1 (t) + W lo h l (t − 1)), (24) T est M SE = (ỸiN (t) − Yi (t))2 , (38)
n i=1
Ltest t=1
−l
→ →
− → −
− → where Ltrain , Lval , and Ltest are the length of training set,
c̃ (t) = tanh( U lc Ỹ l−1 (t) + W lc h l (t − 1)), (25)
validation set, and test set, respectively, n is the dimension of
−c l (t) = −
→ →l →
− →
− output, ỸiN is the ith dimensional final output of DBLSTM
f (t) ◦ −

c l (t − 1) + i l (t) ◦ c̃ l (t), (26) as well as the predicted production yields, and Yi is the ith
dimensional real product yields. In the short-term forecast-
−l

h (t) = −

o l (t) ◦ tanh(−

c l (t)). (27) ing process described in equations (18) and (19), we use
forecast MSE to measure the error between short-term fore-
Similarly, the lth layer of backward deep LSTM can be casting value and real value:
given as follows: Lf ore
n
1 1

−l ←− −←
← − F orecast M SE = (ỸiN (t) − Yi (t))2 , (39)
i (t) = σ( U li Ỹ l−1 (t) + W li h l (t − 1)), (28) n i=1
Lf ore t=1

where Lf ore is the length of forecast set.



− ←− − ←
← −
f l (t) = σ( U lf Ỹ l−1 (t) + W lf h l (t − 1)), (29) (2) Optimization Algorithm
The optimization method used in this paper is Adam [14],

− ←− − ←
← − which possesses a greater advantage than other stochastic
o l (t) = σ( U lo Ỹ l−1 (t) + W lo h l (t − 1)), (30) optimization methods. Assuming θk is the parameter to be
optimize, and f (θk ) is the loss function. The update of θk+1

−l ←− − ←
← −
c̃ (t) = tanh( U lc Ỹ l−1 (t) + W lc h l (t − 1)), (31) using Adam can be expressed as follows:


− ←
− ←− ←− gk+1 = ∇θ fk+1 (θk ), (40)
c l (t) = f l (t) ◦ ←

c l (t − 1) + i l (t) ◦ c̃ l (t), (32)
mk+1 = β1 · mk + (1 − β1 ) · gk+1 , (41)


h l (t) = ←

o l (t) ◦ tanh(←

c l (t)). (33) vk+1 = β2 · vk + (1 − β2 ) · 2
gk+1 , (42)

The output of lth layer is: m̂k+1 = mk+1 /(1 − β1k+1 ), (43)

→−
− → −←
← − v̂k+1 = vk+1 /(1 − β2k+1 ), (44)
Ỹ l (t) = F ( V l h l (t) + V l h l (t) + blout ). (34) 
θk+1 = θk − α · m̂k+1 /( v̂k+1 + ), (45)
Since l = 1, 2, ..., N ,the final output is where gk+1 is the stochastic gradient w.r.t θk at epoch k + 1,
2
gk+1 indicates the Hadamard product, m is the first moment
→ −
− → − ←
← −
Ỹ (t) = Ỹ N (t) = F ( V N h N (t) + V N h N (t) + bN
out ), (35) vector, v is the second moment vector, k is the kth epoch, α
is learning rate, β1 is the exponential decay rates for the first
where F (·) is the output activation function. Because prod- moment estimates, β2 is the exponential decay rates for the
uct yields forecasting is a regression problem and continuity second moment estimates,  is a very small value in order
is required, a linear function is selected as the output activa- to avoid overflow. The parameters settings are as follows:
tion function. α = 0.001, β1 = 0.9, β2 = 0.999,  = 10−8 , and the initial
So far, we have obtained the model of the proposed values of m, v, θ, k are set to zero.
DBLSTM by equation (22)-(35). Via DBLSTM, the tem- (3)Regularization
poral features are extracted automatically by the recurrent To avoid overfitting, the regularization adopted in this pa-
structure, the spatial features at each time step can be ex- per is dropout [15]. The term dropout means temporarily
tracted automatically through the deep structure, and the removing units in a neural network along with their incom-
long-term dependency problem can be solved by LSTM cell. ing and outgoing connections. The removed units are chosen

8016
randomly in a fixed probability p when training, and thus of Gasoline, liquefied petroleum gas (LPG),diesel and coke,
we can obtain a thinner network. When it comes to test- respectively. Q8 is the fresh feed flow rate, Q9 is the incom-
ing, we do not consider dropout anymore, but the sum of all ing residuum flow rate, and Q10 is the incoming sump oil
units outputs will be multiplied by the retention probability flow rate. Gasoline, LPG, and diesel are three main products
(1 − p), which give the final outputs. In this way, different and coke is the byproduct, so here n = 4. Hence, the input
epochs produce various network structures and correspond- dimension is 72 and output dimension is 4 according to the
ing overfittings. The process of dropout is to average dif- equation (16) and (17).
ferent networks and counteract the reversed overfitting. Be- In this paper, data are obtained from the real-time data
sides, since the update of weights may be depended on a few base in May, 2017. After data preprocessing, which includes
neurons with fixed relationships, dropout can also avoid the outliers deletion and normalization, 13100 samples are left
combined effects among neurons. totally. There are 11000 samples in training set, 1000 sam-
Apart from dropout, another regularization used in this pa- ples in validation set, 1000 samples in test set, and 100 sam-
per is early stopping [16]. Through early stopping, train- ples in forecast set.
ing stops when the validation loss stops decreasing, so it will
5.2 Experimental Design
decide the number of the epoch in the experiments later.
Since support vector regression (SVR) and multilayer per-
5 Experiments ceptron (MLP) are the state-of-art regression models, and
In this section, the performances of the proposed RNN, BRNN, and DBRNN are earlier recurrent models,
DBLSTM as well as other methods are evaluated by con- they are chosen to be compared with the proposed DBLSTM
ducting actual experiments on FCC unit. First, data acquisi- for production yields forecasting.
tion is detailed according to real FCC unit. Then, the exper- For a fair comparison, the followings are some settings:
iment is designed with some fair settings. Finally, the com- the structure of MLP, DBRNN and DBLSTM are both set
parison of MSE as well as product yields forecasting results to 72-50-50-50-4, and the hidden neurons of RNN, BRNN
are presented. are all set to 50 as well. Apart from SVR, the epoch of the
other methods is decided by early stopping. The batch size
5.1 Data Acquisition of MLP and recurrent models is set to 10. The time steps
Our data are all obtained from the real-time database of of recurrent models are all set to 10. We choose Adam as
an actual FCC unit in Sinopec Jiujiang Company, China. In the optimization algorithm when training MLP and recurrent
order to make the best use of data, most sensor measure- models. In deep structures (MLP, DBRNN and DBLSTM),
ments in three subsystems are chosen as input variables, as the dropout probability of the first layer and the second layer
is shown in Table 2. is 0.1 and 0.3, respectively.
Note that MLP and recurrent models are performed using
Table 2: Tags of input variables Keras, SVR is running on Matlab, and all experiments are
NO. Category Tag Total number carried out on a PC with an i5-6400 CPU Intel core.
T1∼ T16
Reactor-regenerator 5.3 Experimental Results On Product Yields Forecast-
1 Q1∼ Q10 29
system ing
P1∼ P3
2 Fractionating system T17∼ T35 19 Table 4 presents the comparison of MSE in different data
Absorption-stabilization sets. In general, the models are divided into regression mod-
3 T36∼ T55 20
system
els and recurrent models. SVR and three layered MLP and
are regression models, and they are evaluated by learning
In Table 2, T denotes temperature, Q denotes feed stock, MSE (LMSE) in the training set, validation MSE (VMSE)
P denotes pressure, and their physical units have been shown in the validation set, and test MSE (TMSE) in the test set.
in Table 1. Tags of output variables are illustrated in Table Since regression models cannot capture temporal features,
3: we do not consider their forecast MSE (FMSE) in the fore-
cast set. Recurrent models are RNN, BRNN, DBRNN and
Table 3: Tags of output variables DBLSTM, which are evaluated by LMSE, VMSE, TMSE
NO. Category Tag Total number and FMSE, and the most important indicator is FMSE.
1 Product Flowrate R1∼ R4 4 DBRNN1 and DBLSTM1 denote the deep structure without
2 Product Yields Y1 ∼ Y4 4 dropout, DBRNN2 and DBLSTM2 denote the deep struc-
ture with dropout. The epoch of MLP and recurrent mod-
Product flowrate is directly obtained from real-time els is decided by early stopping. As Table 4 shows, re-
database of FCC unit, but product yields are what we need current models are superior to traditional regression models,
finally. The relationship between product flowrate and prod- and DBLSTM with dropout(DBLSTM2) outperforms other
uct yields is defined by the following prior knowledge: deep bidirectional recurrent models in FMSE.
Hence, DBLSTM with dropout is the adopted model for
R(i)
Y (i) = × 100%, (46) product yields forecasting. The model is tested by real in-
Q8 + Q9 + Q10
puts and real last-step outputs. Because the test data are too
where i = 1, 2, · · · , n, Y (i) is the yields of ith product, and large to show, test results of 100 samples are illustrated in
the physical unit of product yields can also be seen in Table Fig. 7. In the short-term forecasting process, the DBLSTM
1. R(i) is the flow rate of ith product, which is the flow rate model is used for generating predicted outputs as a part of

8017
inputs (equation(19)). In this way, a recursive forecasting is References
realized, and forecasting results are shown in Fig. 8. Since [1] J. Gary, G. Handwerk, and M. Kaiser, Petroleum Refining
errors between real outputs and predicted outputs will accu- Technology and Economics, Florida: CRC Press, chapter 1,
mulate, we just consider short-term forecasting, and here the 2007.
total number of forecasting step is 20. [2] R. Sadeghbeigi, Fluid Catalytic Cracking Handbook: An Ex-
pert Guide to the Practical Operation, Design, and Optimiza-
Table 4: MSE comparison of different methods tion of FCC Units, Elsevier, 2012.
Category Methods LMSE VMSE TMSE FMSE Epoch [3] P. Kadlec, B. Gabrys, S. Strandt, Data-driven soft sensors in
Regression SVR 0.30 4.6 3.2 15715 the process industry, Computers and Chemical Engineering,
Models MLP 0.47 0.48 0.40 61 33(4): 795-814, 2009.
RNN 0.26 0.59 0.40 5.91 179 [4] W. Yan, H. Shao, and X. Wang, Soft sensing modeling based
BRNN 0.11 0.57 0.35 1.51 159
Recurrent
DBRNN1 0.026 0.47 0.21 1.33 193
on support vector machine and bayesian model selection,
Models Computers and Chemical Engineering, 28(8): 1489-1498,
DBRNN2 0.09 0.33 0.18 1.03 386
DBLSTM1 0.023 0.45 0.23 1.28 353 2004.
The Model DBLSTM2 0.09 0.41 0.17 0.68 263 [5] O. Nelles, Nonlinear System Identification: From Classical
Approaches to Neural Networks and Fuzzy Models, Springer,
pp.239, 2013.
[6] G. E. Hinton, S. Osindero, and Y. W. Teh, A fast learning
18 50 algorithm for deep belief nets, Neural Computation, 18(7):
Real Value Real Value
16 Predicted Value 45 Predicted Value 1527-1554, 2006.
Gasoline(%)

14
LPG(%)

40 [7] C. Shang, F. Yang, D. Huang, and W. Lyu, Data-driven soft


12
35 sensor development based on deep learning technique, Jour-
10
nal of Process Control, 24(3): 223-233, 2014.
8 30
0 20 40 60 80 100 0 20 40 60 80 100 [8] W. Yan, D. Tang, and Y. Lin, A data-driven soft sensor model-
Time Step Time Step
22 5.0 ing method based on deep learning and its application, IEEE
20 Real Value 4.5 Real Value
Predicted Value Predicted Value Transactions on Industrial Electronics, 64(5): 4237-4245,
Diesel Oil(%)

18 4.0
Coke (%)

2017.
16 3.5
14 3.0 [9] B. Peng, K. Yao, L. Jing, and K. Wong, Recurrent neural net-
12 2.5 works with external memory for spoken language understand-
10 2.0 ing, arXiv: Computation and Language, 25-35, 2015.
0 20 40 60 80 100 0 20 40 60 80 100
Time Step Time Step [10] X. Li and X. Wu, Constructing long short-term memory based
deep recurrent neural networks for large vocabulary speech
Fig. 6: Model testing: product yields estimation using real
recognition. International Conference on Acoustics, Speech,
last-step outputs and Signal Processing, 4520-4524, 2015.
[11] M. Schuster and K. Paliwal, Bidirectional recurrent neural
25 60 networks, IEEE Transactions on Signal Processing, 45(11):
20
Real Value 55 Real Value 2673-2681, 1997.
Forecasting Value 50 Forecasting Value
Gasoline(%)

15 45 [12] Y. Lecun, Y. Bengio, and G. E. Hinton, Deep learning, Nature,


LPG(%)

40
10 35 521(7553): 436-444, 2015.
5 30 [13] S. Hochreiter and J. Schmidhuber, Long short-term memory,
25
0 20 Neural Computation, 9(8): 1735-1780, 1997.
0 5 10 15 20 0 5 10 15 20
Time Step Time Step [14] D. P. Kingma, J. L. Ba, Adam: a method for stochastic opti-
30 8 mization, International Conference on Learning Representa-
Real Value 7 Real Value
25 Forecasting Value 6 Forecasting Value
Diesel Oil(%)

5 tions, 2015.
Coke (%)

20 4 [15] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and


3
15 2 R. Salakhutdinov, Dropout: a simple way to prevent neural
1 networks from overfitting. Journal of Machine Learning Re-
10 0
0 5 10 15 20 0 5 10 15 20 search, 15 (1):1929-1958, 2014.
Time Step Time Step
[16] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning,
Fig. 7: Online forecasting: product yields forecasting using Cambridge, Massachusetts: the MIT Press, 2016.
estimated last-step outputs

6 Conclusion
This paper proposed a DBLSTM approach for product
yields forecasting based on FCC unit. Bidirectional temporal
features, as well as spatial features of sensors in FCC unit,
can be extracted automatically by DBRNN, and long-term
dependencies can be captured by LSTM cells. The com-
parison of MSE verifies that DBLSTM is superior to tradi-
tional regression models and other recurrent models in prod-
uct yields forecasting. In future work, an innovation of RNN
structure will be further considered for online yiels forecast-
ing.

8018

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy