0% found this document useful (0 votes)

46 views72 pages

CISC 867 Deep Learning: 12. Recurrent Neural Networks

- Recurrent neural networks are useful for processing sequential data like time series. - The Jena Climate dataset contains over 8 years of weather observations recorded every 10 minutes, consisting of 14 measured features like temperature, pressure, humidity. - To train a model to forecast temperature 24 hours ahead, the input would be a sequence of 720 past observations (5 days worth) and the target output would be the temperature 24 hours after the last observation in the input sequence.

Uploaded by

adel hany

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views72 pages

CISC 867 Deep Learning: 12. Recurrent Neural Networks

Uploaded by

adel hany

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

CISC 867 Deep Learning

12. Recurrent Neural Networks

Credits: Vassilis Athitsos, Yu Li

1
Sequential Data

• Sequential data, as the name implies, are sequences.

• What is the difference between a sequence and a set?
• A set is a collection of elements, with no inherent order.
– {1,2,3} = {3,1,2} = {2,1,3}, the order in which we write the elements
does not matter.
• A sequence 𝑿 is a set of elements, together with a total
order imposed on those elements.
– A total order describes, for any two elements 𝒙𝟏 , 𝒙𝟐 , which of them
comes before and which comes after.
– Sequences (1,2,3), (3,1,2), (2,1,3) are all different from each other,
because the order of elements matters.

2
Time Series

• A time series is a sequence of vectors.

• Each vector can be thought of as an observation, or
measurement, that corresponds to one specific moment
in time.
• Examples:
– Stock market prices (for a single stock, or for multiple stocks).
– Heart rate of a patient over time.
– Position of one or multiple people/cars/airplanes over time.
– Speech: represented as a sequence of audio measurements at
discrete time steps.
– A musical melody: represented as a sequence of pairs (note,
duration).

3
Dimensionality of Vectors

• In the simplest case, a time series is just a sequence of

numbers.
• An example is the sequence of daily high temperatures (in
Fahrenheit) in Arlington, from January 1 to January 4, 2022.
– We get the time series (74, 40, 54, 54).
– This time series has length 4.
• In general, a time series is a sequence of vectors.
– All vectors in the time series must have the same dimensionality.
• For example, we can take the previous sequence of daily high
temperatures, and include the daily low temperature as well.
– We get the time series ((74,24), (40,19), (55, 25), (64,33)).
– It has length 4, and every element is a 2D vector.

4
Time Series Terminology

• We can use the term “sequence” to refer to a time series.

– This is correct usage. Any time series is a sequence. The reverse is not
true, there are sequences (for example, strings) that are NOT time series.
• The length of a sequence is the number of elements in the
sequence.
– For example, sequence ((74,24), (40,19), (55, 25), (64,33)) has length 4.
• A feature refers to a specific dimension of the vectors that the
time series contains.
– For example, in an earlier slide we said that the first feature in sequence
((74,24), (40,19), (55, 25), (64,33)) is the daily high temperature. The
second feature is the daily low temperature.
• We can refer to an element of a time series as a “feature
vector”.
5
Strings and Time Series

• Strings are an example of sequential data: a string is a

sequence of characters from some alphabet.
• Strings are sequential: the order of the characters matters.
– Strings “mile” and “lime” are not equal.
– Compare to sets {‘m’, ‘i’, ‘l’, ‘e’} and {‘l’, ‘i’, ‘m’, ‘e’}, which are equal.
• Strings are not time series, because their elements are
characters (symbols from a finite and discrete alphabet)
and not vectors.
• However, we can easily convert a string dataset to a time
series dataset.
– We map each character to a one-hot vector.
– The dimensionality of these one-hot vectors is equal to the number of
letters in the alphabet.
6
Text and Time Series

• Text is another example of sequential data: a piece of text

data can be seen as a sequence of letters, or as a sequence
of words.
• Using one-hot vectors we can convert a text dataset to a
time series dataset.
– We can map each letter to a one-hot vector, or each word to a one-hot
vector. Mapping words is more common.
– There are other methods as well for converting a piece of text to a
time series.
– We will cover this topic in detail in a few weeks.

7
Example: The Jena Climate Dataset

• The Jena Climate dataset is a weather time series dataset.

– Publicly available at: https://www.kaggle.com/mnassrib/jena-climate
• The data was recorded at the Weather Station of the Max
Planck Institute for Biogeochemistry in Jena, Germany.
• 8 years of data: January 1, 2009 to December 31, 2016.
• A feature vector was recorded every 10 minutes during those
eight years.
• Each feature vector is 14-dimensional.
• These are some of the recorded features:
– Air temperature.
– Atmospheric pressure.
– Humidity.
– Wind direction…

8
Jena Climate Dataset: A Closer Look

• You can download the dataset from:

https://www.kaggle.com/mnassrib/jena-climate
• The dataset is saved in a CSV file with 15 columns:
– 0: Date and Time
– 1: Atmospheric pressure, in millibars.
– 2: Temperature in Celsius.
– 3: Temperature in Kelvin.
– 4: Temperature in Celsius relative to humidity. According to the
dataset web page, “Dew Point is a measure of the absolute amount of
water in the air, the DP is the temperature at which the air cannot hold
all the moisture in it and water condenses.”
– 5: Relative humidity.
– 6: Saturation vapor pressure.
– 7: Vapor pressure.

9
Jena Climate Dataset: A Closer Look

• The dataset is saved in a CSV file with 15 columns:

– 8: Vapor pressure deficit.
– 9: Specific humidity.
– 10: Water vapor concentration.
– 11: Airtight.
– 12: Wind speed.
– 13: Maximum wind speed.
– 14: Wind direction in degrees.

• As you see, some of these features (like “saturation vapor

pressure”, “airtight”) are pretty esoteric to non-
specialists, whereas others (like temperature, wind
speed) have a meaning that we can all understand.

10
Reading the Data

fname = "jena_climate_2009_2016.csv"
with open(fname) as f:
data = f.read()
lines = data.split("\n")
lines = lines[1:] # The first line in the file is header information

temperature = np.zeros((len(lines),))
raw_data = np.zeros((len(lines), len(header) - 1))
for i, line in enumerate(lines):
values = [float(x) for x in line.split(",")[1:]]
temperature[i] = values[1]
raw_data[i] = values

• This code creates two time series: temperature, and raw_data.

11
Reading the Data

for i, line in enumerate(lines):

values = [float(x) for x in line.split(",")[1:]]
temperature[i] = values[1]
raw_data[i] = values

• Variable temperature is a 1D time series.

– temperature[i] is the i-th temperature observation, in Celsius.
• Variable raw_data is a 14-dimensional time series.
– raw_data[i] is a 14-dimensional vector.
– From the original 15 columns of the dataset, we exclude column 0,
which was the date and time.

12
Visualizing the Data

plt.plot(range(0, len(temperature)),
temperature)

plots all 420,451 values in the

temperature array (one observation
every ten minutes, for eight years).

plt.plot(range(0, 1440),
temperature[:1440])

plots the first 1440 values,

corresponding to the first 10 days.

13
Inputs and Target Outputs

• Let’s look at our forecasting task description again: given

data from the last five days, the goal is to predict the
temperature exactly 24 hours from now.
• What will be the input to this “forecasting” system?
• What will be the output of the system?

14
Inputs and Target Outputs

• Let’s look at our forecasting task description again: given

data from the last five days, the goal is to predict the
temperature exactly 24 hours from now.
• What will be the input to this “forecasting” system?
• What will be the output of the system?
• The input will be a sequence of feature vectors containing
all values from a period of five days.
– Number of columns: 14, since we have 14 features in our data.
– Number of rows: 5 days * 24 hours * 6 observations per hour = 720.
– Shape of the input: 720x14, which gives 10080 numbers.
• The output will be a single number: the temperature (in
Celsius) 24 hours after the last observation in the input.
15
Creating a Training Set

• Our training data is a 14-dimensional time series of

length 210,225.
• We want to extract a random training example of length
720.
• So, we pick a random start point in the time series, and
we get the next 720 elements.
• What is the smallest and largest legal value for the start
point?
• Smallest: 0
• Largest: timeseries length – 720 – 24*6.
– Why? We need enough room to choose 720 elements, plus enough
room to look 24 hours past the last element, to get the target value
that we aim to forecast.

16
Creating a Training Set

• What is the smallest and largest legal value for the start
point?
• Smallest: 0
• Largest: timeseries length – 720 – 24*6.
– Why? We need enough room to choose 720 elements, plus enough
room to look 24 hours past the last element, to get the target value that
we aim to forecast.
target value
input length = 720

24 hours = 24*6 time steps

Random
after last element of input
start point 17
A Simple RNN Model

• This is an example model that is small enough to draw

easily:
– The input to the model is a time series 𝒙 of length 3.
– Each element of the 𝒙 has two dimensions.

• So, 𝒙 = 𝑥1,1 , 𝑥1,2 , 𝑥2,1 , 𝑥2,2 , 𝑥3,1 , 𝑥3,2

𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

18
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
A Simple RNN Model

• Previously, we used to draw input layers on the left, and

output layers on the right.
• Here, it is easier to draw input layers at the bottom, and
output layers at the top.

𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

19
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
A Simple RNN Model

• There is nothing different about the input layer, it is as

usual.
• 𝒙 = 𝑥1,1 , 𝑥1,2 , 𝑥2,1 , 𝑥2,2 , 𝑥3,1 , 𝑥3,2 , so we need six
input units to represent the input.

𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

20
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
A Simple RNN Model

• The second layer is a recurrent layer.

– Because of this layer, this network is not a feedforward neural
network.
– In a feedforward neural network, the inputs to a layer come from the
outputs of the previous layer.
– Here, some inputs to the second layer come from the second layer
itself. 𝑧 𝑧 𝑧
2,1 2,2 2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

21
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
A Simple RNN Model

• Notice that unit 𝑈2,2 receives inputs not only from input units, but also
from second-layer unit 𝑈2,1 .
• Similarly, unit 𝑈2,3 receives inputs not only from input units, but also
from second-layer unit 𝑈2,2 .
• These connections between units of the same layer are called recurrent
connections.
𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

22
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
A Simple RNN Model

• Notice that all units in the second layer share the same weights.
• The weights connecting two input units to a second-layer unit are
denoted with the same two symbols.
• There is also a new symbol, the recurrent weight 𝑢2,1 for the recurrent
connections between 𝑈2,1 and 𝑈2,2 , and between 𝑈2,2 and 𝑈2,3 .

𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

23
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Computing the Output

• The outputs of the second layer play two roles:

– They are used as inputs to other units in the second layer.
– They are also the outputs of the entire network.
• In more complicated models, we could have more layers on
top.
𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

24
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Computing the Output

• Computing the output of each unit needs to be follow the

order of the time steps.
• First we compute, from bottom to top, the output of all units
that correspond to time step 1.

𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

25
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Computing the Output

• Computing the output of each unit needs to be follow the

order of the time steps.
• Second we compute, from bottom to top, the output of all
units that correspond to time step 2.
– This way we can use 𝑧2,1 from time step 1.

𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

26
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Computing the Output

• Computing the output of each unit needs to be follow the

order of the time steps.
• Third we compute, from bottom to top, the output of all
units that correspond to time step 3.
– This way we can use 𝑧2,2 from time step 2.

𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

27
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Order of Output Computations

• In a feedforward neural network, we simply followed the

order of layers, from input to output.
• In an RNN, we first follow the order of time steps.
– Within a single time step, we follow the order of layers, from input to
output.

𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

28
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Translating to Keras

model = keras.Sequential([keras.Input(shape=(3,2)),
keras.layers.SimpleRNN(1)])

• This piece of code implements our network.

– Parameter 1 for the SimpleRNN layer specifies one unit per time step.

𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

29
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Translating to Keras

model = keras.Sequential([keras.Input(shape=(3,2)),
keras.layers.SimpleRNN(3)])

• Here the SimpleRNN layer has three units per time step.
• We do not show the connections and weights anymore.
– Within a time step, all three 2nd layer units are connected to all two
inputs.
– All 2nd layer units from the previous step are inputs to all 2nd layer units
in the next step.
𝑈2,1,1 𝑈2,1,2 𝑈2,1,3 𝑈2,2,1 𝑈2,2,2 𝑈2,2,3 𝑈2,3,1 𝑈2,3,2 𝑈2,3,3

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

30
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Simplifying the Drawings

• When we draw an RNN, we typically do not show each

individual unit.
• Instead, we group units into blocks, such that all units in a
block belong to the same layer and the same time step.

𝑈2,1,1 𝑈2,1,2 𝑈2,1,3 𝑈2,2,1 𝑈2,2,2 𝑈2,2,3 𝑈2,3,1 𝑈2,3,2 𝑈2,3,3

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

31
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Simplifying the Drawings

• When we draw an RNN, we typically do not show each

individual unit.
• Instead, we group units into blocks, such that all units in a
block belong to the same layer and the same time step.
• In this simplified drawing:
– 𝐿1,1 groups together units 𝑈1,1,1 and 𝑈1,1,2 .
– 𝐿1,2 groups together units 𝑈2,1,1 , 𝑈2,1,2 , and 𝑈2,1,3 .

𝐿1,2 𝐿2,2 𝐿3,2

𝐿1,1 𝐿2,1 𝐿3,1

32
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Simplifying the Drawings

• Now that we have simplified the drawing, we can draw

connections again.
• An arrow means that all units of one group are connected
to all units of the other group.
• Of course, now it is not clear how many units are in each
layer.
– When we simplify, some details are inevitably lost.

𝐿1,2 𝐿2,2 𝐿3,2

𝐿1,1 𝐿2,1 𝐿3,1

33
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Simplifying the Drawings

• We can always make up conventions to provide more

information.
• For example, here we show for each block:
– The type of layer that it belongs to.
– The number of units.

Recurrent(3) Recurrent(3) Recurrent(3)

Input(2) Input(2) Input(2)

34
𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2
Simplifying the Drawings

• This is a common way to draw RNNs.

• Since the structure at each time step looks the same, we
just show three steps:
– “Previous”, “current”, and next”.
– Oftentimes more detail in shown in the current step.

SimpleRNN(3)

Input(2)
35
𝑡−1 𝑡 𝑡+1
An RNN Network for Jena Climate

model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16),
keras.layers.Dense(1),])

model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])

callbacks = [keras.callbacks.ModelCheckpoint("jena_LSTM1_16.keras",
save_best_only=True)]

history_lstm = model.fit(training_inputs, training_targets, epochs=20,

validation_data=(validation_inputs, validation_targets),
callbacks=callbacks)

• This code trains a network with an RNN layer.

36
An RNN Network for Jena Climate

model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16),
keras.layers.Dense(1),])

• The highlighted line creates the recurrent layer.

– It has 16 units for each time step.
– There are 120 time steps (not shown in this drawing).

Recurrent(16) Recurrent(16) Recurrent(16)

Input(14) Input(14) Input(14) 37

An RNN Network for Jena Climate

model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16),
keras.layers.Dense(1),])

• We have a fully connected output layer.

• This layer only connects to the recurrent
units of the LAST TIME STEP.
Dense(1)

Recurrent(16) Recurrent(16) Recurrent(16)

Input(14) Input(14) Input(14) 38

Detour: The return_sequences option

model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16, return_sequences=False),
keras.layers.Dense(1),])

• SimpleRNN layers have an option called

return_sequences.
– The default value is False.
– This specifies that the output of the layer Dense(1)
is just the output of the last time step.

Recurrent(16) Recurrent(16) Recurrent(16)

Input(14) Input(14) Input(14) 39

Detour: The return_sequences option

model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16, return_sequences=True),
keras.layers.Flatten(),
keras.layers.Dense(1),])

• If return_sequences is true, the output of the layer is the

output of all time steps.

Dense(1)

Recurrent(16) Recurrent(16) Recurrent(16)

Input(14) Input(14) Input(14) 40

Detour: The return_sequences option

model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16, return_sequences=True),
keras.layers.Flatten(),
keras.layers.Dense(1),])

• Note the flattening step in this case, between the

SimpleRNN layer and the Dense layer.

Dense(1)

Recurrent(16) Recurrent(16) Recurrent(16)

Input(14) Input(14) Input(14) 41

Detour: The return_sequences option

model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.SimpleRNN(16, return_sequences=True),
keras.layers.Flatten(),
keras.layers.Dense(1),])

• We will revisit the return_sequences=True option later.

– For temperature forecasting, it gives worse accuracy.

Dense(1)

Recurrent(16) Recurrent(16) Recurrent(16)

Input(14) Input(14) Input(14) 42

LSTM Layer

• LSTM stands for Long Short-Term Memory.

• Like SimpleRNN, an LSTM layer is a recurrent layer.
• LSTM layers are used widely in practice.
• However, the description of an LSTM layer is more
complicated than that of a SimpleRNN layer.
– An LSTM layer produces a secondary output, shown in red. This is
called the carry, and it is computed using some special rules.

LSTM(M)

Input(N)
𝑡−1 𝑡 𝑡+1 43
SimpleRNN Computations at Time t

• The output of a simple RNN layer at time 𝑡 depends on:

– An N-dimensional input vector 𝑥𝑡 from the input layer.
– An M-dimensional vector 𝑧𝑡−1 , produced by the RNN layer at time 𝑡 −
1.
– 𝑊𝑧 , which is an 𝑀 × 𝑁 matrix of weights applied to 𝑥𝑡 .
– 𝑈𝑧 , which is an 𝑀 × 𝑀 matrix of weights applied to 𝑧𝑡−1 .
– 𝐵𝑧 , which is an 𝑀-dimensional vector of bias weights.
• The output 𝑧𝑡 is computed as:
𝑧𝑡 = tanh 𝑊𝑧 𝑥𝑡 + 𝑈𝑧 𝑧𝑡−1 + 𝐵
𝑧𝑡

– Note that tanh is the 𝑧𝑡−1 SimpleRNN(M)

default activation function 𝑧𝑡
for a SimpleRNN layer, but another
function could be substituted.
Input(N)
Time 𝑡 44
LSTM Computations at Time t

• In addition to the inputs and weights of a simple RNN layer,

an LSTM layer also has:
– An 𝑀-dimensional carry vector 𝑐𝑡−1 , produced at time 𝑡 − 1.
– 𝑉𝑧 , which is an 𝑀 × 𝑀 matrix of weights applied to 𝑐𝑡−1 .
• The output 𝑧𝑡 is now computed with a new formula:
𝑧𝑡 = tanh 𝑊𝑧 𝑥𝑡 + 𝑈𝑧 𝑧𝑡−1 + 𝑉𝑧 𝑐𝑡−1 + 𝐵

𝑐𝑡−1
𝑧𝑡 𝑐𝑡
𝑧𝑡−1 LSTM(M)
𝑧𝑡

Input(N)
Time 𝑡 45
LSTM Computations at Time t

• To complete the description of the LSTM layer, we must

specify how to compute the carry vector 𝑐𝑡 at time 𝑡.
• To do that, we use some additional weight matrices:
– 𝑊𝑖 , 𝑊𝑓 , 𝑊𝑘 are three 𝑀 × 𝑁 weight matrices applied to 𝑥𝑡 .
– 𝑈𝑖 , 𝑈𝑓 , 𝑈𝑘 are three 𝑀 × 𝑀 weight matrices applied to 𝑧𝑡 .
– 𝐵𝑖 , 𝐵𝑓 , 𝐵𝑘 are three 𝑀-dimensional vectors of bias weights.
• Then, we compute:
𝑖𝑡 = 𝜎 𝑊𝑖 𝑥𝑡 + 𝑈𝑖 𝑧𝑡−1 + 𝐵𝑖 𝑐𝑡−1
𝑘𝑡 = 𝜎 𝑊𝑘 𝑥𝑡 + 𝑈𝑘 𝑧𝑡−1 + 𝐵𝑘 𝑧𝑡 𝑐𝑡
𝑓𝑡 = 𝜎 𝑊𝑓 𝑥𝑡 + 𝑈𝑓 𝑧𝑡−1 + 𝐵𝑓 𝑧𝑡−1 LSTM(M)
𝑐𝑡 = 𝑖𝑡 ∗ 𝑘𝑡 + 𝑐𝑡−1 ∗ 𝑓𝑡 𝑧𝑡

Symbol * means “pointwise multiplication”. Input(N)

Time 𝑡 46
An Intuitive Interpetation

𝑖𝑡 = 𝜎 𝑊𝑖 𝑥𝑡 + 𝑈𝑖 𝑧𝑡−1 + 𝐵𝑖
𝑘𝑡 = 𝜎 𝑊𝑘 𝑥𝑡 + 𝑈𝑘 𝑧𝑡−1 + 𝐵𝑘
𝑓𝑡 = 𝜎 𝑊𝑓 𝑥𝑡 + 𝑈𝑓 𝑧𝑡−1 + 𝐵𝑓
𝑐𝑡 = 𝑖𝑡 ∗ 𝑘𝑡 + 𝑐𝑡−1 ∗ 𝑓𝑡

• There is a somewhat intuitive interpretation that motivated these

formulas. According to this interpretation:
– 𝑖𝑡 represents new information, computed at time 𝑡.
– 𝑘𝑡 represents the importance of each dimension in 𝑖𝑡 .
– 𝑐𝑡−1 represents old information, computed from previous time steps.
– 𝑓𝑡 represents the importance of each dimension in 𝑐𝑡−1 .
• If all values of 𝑘𝑡 are 1, and all values of 𝑓𝑡 are 0, then 𝑐𝑡 = 𝑖𝑡 .
– New information 𝑖𝑡 replaces old information 𝑐𝑡 − 1, which is “forgotten”.

47
An Intuitive Interpetation

• There is a somewhat intuitive interpretation that motivated these

formulas. According to this interpretation:
– 𝑖𝑡 represents new information, computed at time 𝑡.
– 𝑘𝑡 represents the importance of each dimension in 𝑖𝑡 .
– 𝑐𝑡−1 represents old information, computed from previous time steps.
– 𝑓𝑡 represents the importance of each dimension in 𝑐𝑡−1 .
• If all values of 𝑘𝑡 are 0, and all values of 𝑓𝑡 are 1, then 𝑐𝑡 = 𝑐𝑡−1 .
– New information 𝑖𝑡 is ignored, old information 𝑐𝑡 − 1 is retained in 𝑐𝑡 .

48
An Intuitive Interpetation

• In the typical case, individual values of 𝑘𝑡 and 𝑓𝑡 will range

between 0 and 1.
• Then, each dimension of vector 𝑐𝑡 will be a weighted sum of:
– new information from the corresponding dimension of 𝑖𝑡 , with weight
specified by the corresponding dimension of 𝑘𝑡 .
– old information from the corresponding dimension of 𝑐𝑡−1 , with weight
specified by the corresponding dimension of 𝑓𝑡 .

49
The Vanishing Gradient Problem

• An additional justification for the LSTM architecture is the

“vanishing gradient” problem.
– Under some neural network architectures, some weights do not get
sufficiently updated during backpropagation, due to very small
gradients.
– Consequently, backpropagation does not learn good values for those
weights.𝑧 𝑧 𝑧
2,1 2,2 2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2 50

The Vanishing Gradient Problem

• To understand the problem, consider the toy RNN model

below.
– Let’s assume that the only output of the model is 𝑧2,3 , so that the model
estimates a single number.
– Consider how weight 𝑤2,1,1 (as an example) gets updated during
training.
𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2 51

The Vanishing Gradient Problem

• Weight 𝑤2,1,1 influences the output in multiple ways.

– It gets multiplied by input 𝑥1,1 during the first time step.
– It gets multiplied by input 𝑥2,1 during the second time step.
– It gets multiplied by input 𝑥3,1 during the third time step.

𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2 52

The Vanishing Gradient Problem

• During backpropagation, we update 𝑤2,1,1 based on the

three different ways in which it influenced the output.
• However, the third time step will often influence
disproportionately how 𝑤2,1,1 is updated.
• Why?
𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2 53

The Vanishing Gradient Problem

𝜕𝐸 𝜕𝐸 𝜕𝑧2,3 𝜕𝑎2,3
=
𝜕𝑤2,1,1 𝜕𝑧2,3 𝜕𝑎2,3 𝜕𝑤2,1,1

• We start by applying the chain rule to compute the

partial derivative of the loss 𝐸 with respect to 𝑤2,1,1 .
𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2 54

The Vanishing Gradient Problem

𝜕𝑎2,3 𝜕𝑎2,3 𝜕𝑧2,2 𝜕𝑎2,2

= 𝑥3,1 +
𝜕𝑤2,1,1 𝜕𝑧2,2 𝜕𝑎2,2 𝜕𝑤2,1,1

𝜕𝑎2,3 𝜕𝑧2,2 𝜕𝑎2,2 𝜕𝑧2,1 𝜕𝑎2,1

= 𝑥3,1 + 𝑥2,1 +
𝜕𝑧2,2 𝜕𝑎2,2 𝜕𝑧2,1 𝜕𝑎2,1 𝜕𝑤2,1,1
𝑧2,1 𝑧2,2 𝑧2,3

𝑢2,1 𝑢2,1
𝑈2,1 𝑈2,2 𝑈2,3
𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2 𝑤2,1,1 𝑤2,1,2

𝑈1,1,1 𝑈1,1,2 𝑈1,2,1 𝑈1,2,2 𝑈1,3,1 𝑈1,3,2

𝑥1,1 𝑥1,2 𝑥2,1 𝑥2,2 𝑥3,1 𝑥3,2 55

The Vanishing Gradient Problem

• Combining the calculations from the previous slides we get:

Influence of 3rd time step:
𝜕𝐸 𝜕𝐸 𝜕𝑧2,3 product of 3 numbers.
= 𝑥 +
𝜕𝑤2,1,1 𝜕𝑧2,3 𝜕𝑎2,3 3,1 Influence of 2nd time step:
product of 5 numbers.
𝜕𝐸 𝜕𝑧2,3 𝜕𝑎2,3 𝜕𝑧2,2
𝑥2,1 +
𝜕𝑧2,3 𝜕𝑎2,3 𝜕𝑧2,2 𝜕𝑎2,2 Influence of 1st time step:
product of 7 numbers.
𝜕𝐸 𝜕𝑧2,3 𝜕𝑎2,3 𝜕𝑧2,2 𝜕𝑎2,2 𝜕𝑧2,1
𝑥1,1
𝜕𝑧2,3 𝜕𝑎2,3 𝜕𝑧2,2 𝜕𝑎2,2 𝜕𝑧2,1 𝜕𝑎2,1

• The influence of each step is the product of many terms, which

typically are between 0 and 1.
56
The Vanishing Gradient Problem

• We can extrapolate the previous formula to our temperature

forecasting RNN.
• There, the input length is 120 steps.
• For any weight 𝑤𝑖 connecting the SimpleRNN layer to the
𝜕𝐸
input, partial derivative will be a sum of 120 terms.
𝜕𝑤𝑖
𝜕𝐸
• Each term of will correspond to the influence of a single
𝜕𝑤𝑖
time step.
– The influence of time step 120 will be a product of 3 numbers.
– The influence of time step 119 will be a product of 5 numbers.
– The influence of time step 118 will be a product of 7 numbers.
…
– The influence of time step 1 will be a product of 241 numbers.

57
The Vanishing Gradient Problem

𝜕𝐸
• Each term of will correspond to the influence of a single
𝜕𝑤𝑖
time step.
– The influence of time step 120 will be a product of 3 numbers.
– The influence of time step 119 will be a product of 5 numbers.
– The influence of time step 118 will be a product of 7 numbers.
…
– The influence of time step 1 will be a product of 241 numbers.
• So, the influence of time step 1 will be a product of 241
numbers, which will usually be between 0 and 1.
• This will be a very small quantity.
• Overall, the influence of a time step drops exponentially as we
move from the end towards the beginning of the input time
series.

58
LSTMs and Vanishing Gradients

• The carry output can (potentially) remember information from

earlier time steps.
• This allows calculations from earlier time steps to influence the
output more heavily than in a SimpleRNN layer.
– Influencing the output more heavily means higher contributions to the
partial derivatives of weights.
– That way, the model can learn to give more importance to earlier time
steps.

LSTM(M)

Input(N)
𝑡−1 𝑡 𝑡+1 59
Detour: ResNet

• The vanishing gradient problem is not particular to RNNs.

• Any deep network involves a sequence of calculations,
mapping inputs to outputs.
– Calculations earlier in the sequence end up making smaller
contributions to partial derivatives of weights.
• For convolutional neural networks (CNNs), a popular method
for resolving the vanishing gradient problem is ResNet.
• We will not discuss ResNet in this class.
– The method is somewhat similar to LSTM, by providing a way for
earlier calculations to be “remembered” in later layers.
• If you are interested in learning more about ResNet, a good
starting point is the Wikipedia article:
https://en.wikipedia.org/wiki/Residual_neural_network

60
GRU

• GRU stands for Gated Recurrent Unit.

• GRU layers are yet another type of recurrent layer.
• GRU layers can be used instead of SimpleRNN or LSTM.
• You can think of a GRU layer as an approach that is more
complicated than a SimpleRNN layer and more simple
than an LSTM layer.
• We will not discuss GRU layers any further in this class.
• As usual, the Wikipedia article is a good starting point
for more info:

https://en.wikipedia.org/wiki/Gated_recurrent_unit

61
Recurrent Dropout

• Dropout can be used with recurrent layers (such as

SimpleRNN, LSTM, GRU).
• However, the picture is more complicated, because the
same weights are used in multiple time steps.
• In practice, better results are usually obtained if the same
weights are “dropped” at each time step.
• A normal Keras dropout layer does not know how to do
that.
• To use dropout properly with recurrent layers, you
should use the optional parameters dropout and
recurrent_dropout.

62
Recurrent Dropout

model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.LSTM(32, dropout=0.3,
recurrent_dropout=0.25),
keras.layers.Dropout(0.5),
keras.layers.Dense(1),])

• This piece of code shows an example of how to combine

different dropouts.
• The LSTM layer specifies a dropout value of 0.3.
– This means that 30% of the weights between the input layer and the
LSTM layer will be dropped for each training object.
• The LSTM layer specifies a recurrent_dropout value of 0.25.
– This means that 25% of the weights applied to outputs and carry
values from the previous time step will be dropped for each training
object.

63
Recurrent Dropout

model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.LSTM(32, dropout=0.3,
recurrent_dropout=0.25),
keras.layers.Dropout(0.5),
keras.layers.Dense(1),])

• Notice that we still use a regular Keras dropout layer between

the LSTM layer and the fully connected output layer.
– Here we specify that, for each training object, 50% of the weights
connecting the LSTM outputs to the Dense layer will be dropped.
• Optional parameters dropout and recurrent_dropout specify
how to do dropout of weights incoming to the recurrent layer.
• For weights outgoing from the layer and incoming to a fully
connected layer, a regular dropout layer should be used.

64
Bidirectional Layers

• A recurrent layer processes information from time step to

time step, in chronological order.
• Would it make a difference if information was processed
in reverse chronological order?
– It might.
• How can we know which order is better?
– We usually don’t.

Recurrent

Input
𝑡−1 𝑡 𝑡+1 65
Bidirectional Layers

• A bidirectional layer processes information in both

chronological and anti-chronological order.
• Essentially, a bidirectional layer consists of two recurrent
layers, each processing information in different order.
• The output of the bidirectional layer is simply the
merged output of both layers.
𝒛1 𝒛2 𝒛3 𝒛4 𝒛5 𝒛6

𝒙1 𝒙2 𝒙3 𝒙1 𝒙2 𝒙3
𝑡=1 𝑡=2 𝑡=3 𝑡=1 𝑡=2 𝑡=3 66
Recurrent layer (SimpleRNN, Recurrent layer (SimpleRNN,
LSTM, GRU) processing LSTM, GRU) processing
information in chronological information in REVERSE
order. chronological order.

𝒛1 𝒛2 𝒛3 𝒛4 𝒛5 𝒛6

𝒙1 𝒙2 𝒙3 𝒙1 𝒙2 𝒙3
𝑡=1 𝑡=2 𝑡=3 𝑡=1 𝑡=2 𝑡=3
67
Recurrent layer (SimpleRNN, Recurrent layer (SimpleRNN,
LSTM, GRU) processing LSTM, GRU) processing
information in chronological information in REVERSE
order. chronological order.

𝒛1 𝒛2 𝒛3 𝒛4 𝒛5 𝒛6

𝒙1 𝒙2 𝒙3 Both layers 𝒙1 𝒙2 𝒙3
receive the
𝑡=1 𝑡=2 𝑡=3 exact same 𝑡=1 𝑡=2 𝑡=3
inputs. 68
These two recurrent layers, combined, form what we call a
bidirectional layer.

𝒛1 𝒛2 𝒛3 𝒛4 𝒛5 𝒛6

𝒙1 𝒙2 𝒙3 𝒙1 𝒙2 𝒙3
𝑡=1 𝑡=2 𝑡=3 𝑡=1 𝑡=2 𝑡=3
69
The output of the bidirectional layer is the concatenation
of the outputs of the two recurrent layers.

𝒛1 𝒛2 𝒛3 𝒛4 𝒛5 𝒛6

𝒙1 𝒙2 𝒙3 𝒙1 𝒙2 𝒙3
𝑡=1 𝑡=2 𝑡=3 𝑡=1 𝑡=2 𝑡=3
70
Bidirectional Layers in Keras

model = keras.Sequential([keras.Input(shape=input_shape),
keras.layers.Bidirectional(keras.layers.LSTM(32)),
keras.layers.Dense(1),])

• The Bidirectional layer takes as argument a recurrent layer.

• The Bidirectional layer creates two replicas of the recurrent
layer.
– The first replica processes information in chronological order.
– The second replica processes information in antichronological order.
• The output of the Bidirectional layer is the concatenated
output of the two replicas.

71
RNN Summary

• Recurrent layers process information one time step at a time.

• The most simple recurrent layer is SimpleRNN.
• The LSTM layer is a more complicated recurrent layer, that allows
the model to learn when to remember old information and when
to replace that “memory” with new information.
• Recurrent dropout must be handled differently than regular
dropout, using optional parameters when creating the recurrent
layers.
– dropout for weights connecting the previous layer to the recurrent layer.
– recurrent_dropout for weights connecting outputs of the recurrent layer
from previous time to the current time step.
• Bidirectional layers allow processing information both in
chronological and in antichronological order.

Makeup Exam Dec 23
No ratings yet
Makeup Exam Dec 23
30 pages
Astm F1886 - F1886M
No ratings yet
Astm F1886 - F1886M
3 pages
Time Series Analysis With Python
100% (1)
Time Series Analysis With Python
64 pages
05 Feature Engineering
No ratings yet
05 Feature Engineering
14 pages
Time Series
100% (1)
Time Series
91 pages
CISC 867 Deep Learning: 13. Processing Text With Neural Networks
No ratings yet
CISC 867 Deep Learning: 13. Processing Text With Neural Networks
106 pages
How To Handle Missing Timesteps in Sequence Prediction Problems With Python
No ratings yet
How To Handle Missing Timesteps in Sequence Prediction Problems With Python
14 pages
Module 5
No ratings yet
Module 5
25 pages
Time Series Models Presentation
No ratings yet
Time Series Models Presentation
25 pages
Lecture 1 - Time Series Fundamentals - Introduction
No ratings yet
Lecture 1 - Time Series Fundamentals - Introduction
61 pages
NARX
100% (1)
NARX
10 pages
Primer/Liquid Adhesive System Polyken® 1027
No ratings yet
Primer/Liquid Adhesive System Polyken® 1027
1 page
DAA IRC HF 631Gdp 2 Etinilpireno
No ratings yet
DAA IRC HF 631Gdp 2 Etinilpireno
27 pages
Chapter Recurrent Neural Networks
No ratings yet
Chapter Recurrent Neural Networks
10 pages
Drag Reduction in Ships
No ratings yet
Drag Reduction in Ships
30 pages
Computational Modeling of Material Forming Processes
No ratings yet
Computational Modeling of Material Forming Processes
23 pages
Report
No ratings yet
Report
5 pages
10 Time Series Fundamentals and Milestone Project 3 Bitpredict
No ratings yet
10 Time Series Fundamentals and Milestone Project 3 Bitpredict
48 pages
Pay Attention To Evolution Time Series Forecasting
No ratings yet
Pay Attention To Evolution Time Series Forecasting
17 pages
Merged
No ratings yet
Merged
67 pages
Motion Code
No ratings yet
Motion Code
20 pages
PhysRevC 105 054614
No ratings yet
PhysRevC 105 054614
11 pages
Brooks 2005
No ratings yet
Brooks 2005
12 pages
Feature-Based Time-Series Analysis
No ratings yet
Feature-Based Time-Series Analysis
28 pages
Practise Questions
No ratings yet
Practise Questions
26 pages
Nikrothal LX: (Resistance Heating Wire and Resistance Wire)
No ratings yet
Nikrothal LX: (Resistance Heating Wire and Resistance Wire)
2 pages
Canada Cellulose Nitrate Method C-17 - e
No ratings yet
Canada Cellulose Nitrate Method C-17 - e
5 pages
Logistic Regression Q/As From hands-on-ML: Gradient Descent, or Mini-Batch Gradient Descent
No ratings yet
Logistic Regression Q/As From hands-on-ML: Gradient Descent, or Mini-Batch Gradient Descent
3 pages
F4: Large-Scale Automated Forecasting Using Fractals: Deepayan Chakrabarti Christos Faloutsos
No ratings yet
F4: Large-Scale Automated Forecasting Using Fractals: Deepayan Chakrabarti Christos Faloutsos
8 pages
Open Ended Lab mt-471
No ratings yet
Open Ended Lab mt-471
5 pages
Ohamed Bd-Elnaser: Profile
No ratings yet
Ohamed Bd-Elnaser: Profile
2 pages
A Survey On Graph Neural Networks For Time Series: Forecasting, Classification, Imputation, and Anomaly Detection
No ratings yet
A Survey On Graph Neural Networks For Time Series: Forecasting, Classification, Imputation, and Anomaly Detection
27 pages
Multi-Channels Deep Convolution Neural Network For Early Classification of Multivariate Time Series
No ratings yet
Multi-Channels Deep Convolution Neural Network For Early Classification of Multivariate Time Series
11 pages
Forecasting Wavelet Transformed Time Series With Attentive Neural Networks
No ratings yet
Forecasting Wavelet Transformed Time Series With Attentive Neural Networks
6 pages
Unlocking Online Insights: LSTM Exploration and Transfer Learning Prospects
No ratings yet
Unlocking Online Insights: LSTM Exploration and Transfer Learning Prospects
14 pages
Cover Letter: Digital Egypt Builders
No ratings yet
Cover Letter: Digital Egypt Builders
1 page
PHYC10003 Exam2019 - Boxes
No ratings yet
PHYC10003 Exam2019 - Boxes
34 pages
MEE322 - Final Exam - S20
No ratings yet
MEE322 - Final Exam - S20
5 pages
Sequential Models
No ratings yet
Sequential Models
105 pages
Deep Learning For Time-Series Analysis
No ratings yet
Deep Learning For Time-Series Analysis
13 pages
Lesson 1 Introduction To Time Series
No ratings yet
Lesson 1 Introduction To Time Series
31 pages
Time Series
100% (5)
Time Series
45 pages
Time Series Using Python
No ratings yet
Time Series Using Python
47 pages
时间序列
No ratings yet
时间序列
39 pages
Module 4
No ratings yet
Module 4
36 pages
CISC 867 Deep Learning: 14. Text Classification With Recurrent Neural Networks and Word Embeddings
No ratings yet
CISC 867 Deep Learning: 14. Text Classification With Recurrent Neural Networks and Word Embeddings
28 pages
CISC 867 Deep Learning: 15. Generative Adversarial Networks
No ratings yet
CISC 867 Deep Learning: 15. Generative Adversarial Networks
71 pages
Time Series Analysis Handbook 02
No ratings yet
Time Series Analysis Handbook 02
6 pages
Multivariate Time Series Forecasting With LSTMs in Keras
No ratings yet
Multivariate Time Series Forecasting With LSTMs in Keras
20 pages
Spectral Temporal Graph Neural Network For Multivariate Time-Series Forecasting
No ratings yet
Spectral Temporal Graph Neural Network For Multivariate Time-Series Forecasting
20 pages
Memoria
No ratings yet
Memoria
47 pages
Chapter15 RNN
No ratings yet
Chapter15 RNN
29 pages
Dual-Band Panel Dual Polarization Half-Power Beam Width: Xxpol Panel 870-960/1710-1880 C 65°/ 60° 17.5/17.5dbi 6°T
No ratings yet
Dual-Band Panel Dual Polarization Half-Power Beam Width: Xxpol Panel 870-960/1710-1880 C 65°/ 60° 17.5/17.5dbi 6°T
1 page
Answerkey
No ratings yet
Answerkey
4 pages
Computational Finance and Algorithmic Trading
No ratings yet
Computational Finance and Algorithmic Trading
11 pages
3D Geometric Transformation: Course Coordinator Dr. Badal Soni
No ratings yet
3D Geometric Transformation: Course Coordinator Dr. Badal Soni
32 pages
AIDS HA4 Answers
No ratings yet
AIDS HA4 Answers
8 pages
CISC 867: Project 2 (10 Points) : Data Preparation
No ratings yet
CISC 867: Project 2 (10 Points) : Data Preparation
2 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
Defense
No ratings yet
Defense
91 pages
Wiley Electrochemical Engineering 978-1-119-00425-7
No ratings yet
Wiley Electrochemical Engineering 978-1-119-00425-7
2 pages
FULLTEXT02
No ratings yet
FULLTEXT02
63 pages
Remaining Life Estimation With Keras - by Marco Cerliani - Towards Data Science
No ratings yet
Remaining Life Estimation With Keras - by Marco Cerliani - Towards Data Science
7 pages
Decoder Only Foundation Model For Time Series Forecasting: Reprint
No ratings yet
Decoder Only Foundation Model For Time Series Forecasting: Reprint
21 pages
FP2 Oct 21 QP
No ratings yet
FP2 Oct 21 QP
36 pages
Time Series Classification: Lab Based Project
No ratings yet
Time Series Classification: Lab Based Project
14 pages
Time Series Prediction and Neural Networks: R.J.Frank, N.Davey, S.P.Hunt
No ratings yet
Time Series Prediction and Neural Networks: R.J.Frank, N.Davey, S.P.Hunt
12 pages
Time-Series-Forecast-A-Comprehensive-Guide - Jupyter Notebook
No ratings yet
Time-Series-Forecast-A-Comprehensive-Guide - Jupyter Notebook
24 pages
Lab 1
No ratings yet
Lab 1
13 pages
7 Libraries That Help in Time-Series problems-AI Data Science
No ratings yet
7 Libraries That Help in Time-Series problems-AI Data Science
20 pages
Prob Book
No ratings yet
Prob Book
234 pages
USBR1040 - Procedure For Calibrating Pressure Gauges
No ratings yet
USBR1040 - Procedure For Calibrating Pressure Gauges
4 pages
S217 - GB - 20 Tech Ds
No ratings yet
S217 - GB - 20 Tech Ds
4 pages
WK4 - Radial Basis Function Networks
No ratings yet
WK4 - Radial Basis Function Networks
40 pages
SP3457-6 - PLP Compression Dead-End - II Color
100% (1)
SP3457-6 - PLP Compression Dead-End - II Color
12 pages
Electrical Machine Ytc
No ratings yet
Electrical Machine Ytc
20 pages
Lesson Plan For Observation
No ratings yet
Lesson Plan For Observation
4 pages
FreDo - Frequency Domain-Based Long-Term Time Series Forecasting
No ratings yet
FreDo - Frequency Domain-Based Long-Term Time Series Forecasting
12 pages
Mergeddv
No ratings yet
Mergeddv
2 pages
Input Window Size and Neural Network Predictors: XT D FXTXT XT N XT D F T T X
No ratings yet
Input Window Size and Neural Network Predictors: XT D FXTXT XT N XT D F T T X
8 pages
WWW Tensorflow Org Tutorials Structured Data Time Series
No ratings yet
WWW Tensorflow Org Tutorials Structured Data Time Series
41 pages
Model Test Civil MPPSC
No ratings yet
Model Test Civil MPPSC
11 pages
tps73201 Ep
No ratings yet
tps73201 Ep
22 pages
Beam Force Detail Summary: 550 X 500 Beams
No ratings yet
Beam Force Detail Summary: 550 X 500 Beams
1 page
01-Calibration of Control Valves (Instrument Workshop)
No ratings yet
01-Calibration of Control Valves (Instrument Workshop)
2 pages
Manufacturing Technology Lab I
No ratings yet
Manufacturing Technology Lab I
41 pages
Strain Gauge
No ratings yet
Strain Gauge
7 pages
RNN LSTM Example Implementations With Keras TensorFlow
No ratings yet
RNN LSTM Example Implementations With Keras TensorFlow
20 pages
Time Series Analysis
0% (1)
Time Series Analysis
173 pages
Timeseries - Analysis
No ratings yet
Timeseries - Analysis
37 pages
Negative Corona Inception Voltages in Rod-Plane Gaps at Various Air Pressures and Humidities
No ratings yet
Negative Corona Inception Voltages in Rod-Plane Gaps at Various Air Pressures and Humidities
7 pages
Time Series Forecasting With 2D Convolutions
No ratings yet
Time Series Forecasting With 2D Convolutions
33 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
IGNOU Operating System Previous Years Solved Papers
From Everand
IGNOU Operating System Previous Years Solved Papers
Manish Soni
No ratings yet
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
From Everand
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
Bob Mather
3/5 (1)
Physics I Essentials
From Everand
Physics I Essentials
The Editors of REA
3.5/5 (4)
Measurement of Length - Screw Gauge (Physics) Question Bank
From Everand
Measurement of Length - Screw Gauge (Physics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.