0% found this document useful (0 votes)
156 views31 pages

Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model

The document describes a neural network model and the backpropagation algorithm. A neural network consists of nodes and weighted connections that model the neuron interconnections in the brain. Backpropagation is used to minimize prediction error by propagating error backwards from the output nodes through the network to update the weights. Weights are adjusted in a way that reduces prediction error for each training example in an iterative process.

Uploaded by

ahmadkhreiss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views31 pages

Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model

The document describes a neural network model and the backpropagation algorithm. A neural network consists of nodes and weighted connections that model the neuron interconnections in the brain. Backpropagation is used to minimize prediction error by propagating error backwards from the output nodes through the network to update the weights. Weights are adjusted in a way that reduces prediction error for each training example in an iterative process.

Uploaded by

ahmadkhreiss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

ADVANCED INFORMATION

RETREIVAL
Chapter 02: Modeling Neural Network Model

Neural Network Model

A neural network is an oversimplified representation


of the neuron interconnections in the human brain:
nodes are processing units
edges are synaptic connections
the strength of a propagating signal is modelled by a
weight assigned to each edge
the state of a node is defined by its activation level
depending on its activation level, a node might issue
an output signal

Neural Networks
Neural Networks
Complex learning systems recognized in animal brains
Single neuron has simple structure
Interconnected sets of neurons perform complex learning tasks
Human brain has 1015 synaptic connections
Artificial Neural Networks attempt to replicate non-linear learning
found in nature

Dendrites
Axon
Cell Body

Neural Networks (contd)


Dendrites gather inputs from other neurons and combine

information
Then generate non-linear response when threshold reached
Signal sent to other neurons via axon
x1
x2

xn

Artificial neuron model is similar


Data inputs (xi) are collected from upstream neurons input to

combination function (sigma)

Neural Networks (contd)


Activation function reads combined input and produces non-linear

response (y)
Response channeled downstream to other neurons

What problems applicable to Neural Networks?


Quite robust with respect to noisy data
Can learn and work around erroneous data
Results opaque to human interpretation
Often require long training times

Input and Output Encoding


Neural Networks require attribute values encoded to [0, 1]

Numeric
Apply Min-max Normalization to continuous variables
X min( X )
X min( X )

range( X )
max( X ) min( X )
Works well when Min and Max known
Also assumes new data values occur within Min-Max range
Values outside range may be rejected or mapped to Min or Max
X*

Input and Output Encoding (contd)


Output
Neural Networks always return continuous values [0, 1]
Many classification problems have two outcomes
Solution uses threshold established a priori in single output node to
separate classes
For example, target variable is leave or stay
Threshold value is leave if output >= 0.67
Single output node value = 0.72 classifies record as leave

Simple Example of a Neural


Network
Input Layer

Node
1
Node
2
Node
3

W1A
W1B
W2A
W2B
W3A
W3B

Hidden Layer Output Layer


W0A

Node
A
Node
B

WAZ
Node
Z

WBZ
W0Z

W0B feedforward, completely


Neural Network consists of layered,
connected network of nodes
Feedforward restricts network flow to single direction
Flow does not loop or cycle
Network composed of two or more layers

Simple Example of a Neural Network


(contd)

Most networks have Input, Hidden, Output layers


Network may contain more than one hidden layer
Network is completely connected
Each node in given layer, connected to every node in next layer
Every connection has weight (Wij) associated with it
Weight values randomly assigned 0 to 1 by algorithm
Number of input nodes dependent on number of predictors
Number of hidden and output nodes configurable

Simple Example of a Neural Network (cont)


Input Layer
Node
1
Node
2
Node
3

W1A
W1B
W2A
W2B
W3A
W3B

Hidden Layer Output Layer


W0A
Node
A

WAZ

Node
B

WBZ

W0B

Combination function produces linear combination of node


inputs and connection weights to single scalar value

net j Wij xij W0 j x0 j W1 j x1 j ... WIj xIj


i

For node j, xij is ith input


Wij is weight associated with ith input node
I+ 1 inputs to node j
x1, x2, ..., xI are inputs from upstream nodes
x0 is constant input value = 1.0
Each input node has extra input W0jx0j = W0j

Node
Z

W0Z

Simple Example of a Neural Network


(contd)

x0 = 1.0

W0A = 0.5

W0B = 0.7

W0Z = 0.5

x1 = 0.4

W1A = 0.6

W1B = 0.9

WAZ = 0.9

x2 = 0.2

W2A = 0.8

W2B = 0.8

WBZ = 0.9

0.7
W = 0.6
= 0.4
The scalarx =value
computed
forW hidden
layer Node A equals
3

3A

3B

net A WiA xiA W0 A (1.0) W1 A x1 A W2 A x2 A W3 A x3 A


i

0.5 0.6(0.4) 0.8(0.2) 0.6(0.7) 1.32


Node A, netA = 1.32 is input to activation

For
function
Neurons fire in biological organisms
Signals sent between neurons when combination of inputs
cross threshold

Simple Example of a Neural Network


(contd)

Firing response not necessarily linearly related to increase in


input stimulation
Neural Networks model behavior using non-linear activation
function
Sigmoid function most commonly used

1
1 ex
In Node A, sigmoid function takes netA = 1.32 as input and
produces output
y

1
y
0.7892
1.32
1 e

Simple Example of a Neural Network


(contd)

Node A outputs 0.7892 along connection to Node Z, and


becomes component of netZ
Before netZ is computed, contribution from Node B required
net B WiB xiB W0 B (1.0) W1B x1B W2 B x2 B W3 B x3 B
i

0.7 0.9(0.4) 0.8(0.2) 0.4(0.7) 1.5


and,

f (net B )

1
0.8176
1.5
1 e

Node Z combines outputs from Node A and Node B, through


netZ

Simple Example of a Neural Network


(contd)

Inputs to Node Z not data attribute values


Rather, outputs are from sigmoid function in upstream nodes
net Z WiZ xiZ W0 Z (1.0) WAZ x AZ WBZ xBZ
i

0.5 0.9(0.7892) 0.9(0.8176) 1.9461


finally,
f (net z )

1
0.8750
1 e 1.9461

Value 0.8750 output from Neural Network on first pass


Represents predicted value for target variable, given first
observation

Sigmoid Activation Function

Sigmoid function combines nearly linear, curvilinear, and nearly


constant behavior depending on input value
Function nearly linear for domain values -1 < x < 1
Becomes curvilinear as values move away from center
At extreme values, f(x) is nearly constant
Moderate increments in x produce variable increase in f(x),
depending on location of x
Sometimes called Squashing Function
Takes real-valued input and returns values [0, 1]

Back-Propagation

Neural Networks are supervised learning method


Require target variable
Each observation passed through network results in output
value
Output value compared to actual value of target variable
(Actual Output) = Error
Prediction error analogous to residuals in regression models
Most networks use Sum of Squares (SSE) to measure how well
predictions fit target values

SSE

2
(
actual

output
)

Re cords OutputNodes

Back-Propagation (contd)

Squared prediction errors summed over all output nodes, and


all records in data set
Model weights constructed that minimize SSE
Actual values that minimize SSE are unknown
Weights estimated, given the data set

Back-Propagation Rules

Back-propagation percolates prediction error for record back


through network
Partitioned responsibility for prediction error assigned to various
connections
Back-propagation rules defined (Mitchell)

wij , NEW wij ,CURRENT wij , where


wij

j xij

x ij

learning rate
signifies ith input to node j

represents responsibility for a particular error belonging to node j

Back-Propagation Rules (contd)

Error responsibility computed using partial derivative of the


sigmoid function with respect to netj
Values take one of two forms

output j (1 output j )(actual j output j )


for output layer nodes
j
output j (1 output j )
W jk j
for hidden layer nodes

DOWNSTREAM
where,
W jk j refers to weighted sum of error responsibilities for nodes downstream
DOWNSTREAM

Rules show why input values require normalization


Large input values xij would dominate weight adjustment
Error propagation would be overwhelmed, and learning stifled

Example of Back-Propagation
Input Layer
Node
1
Node
2
Node
3

W1A
W1B
W2A
W2B
W3A
W3B

Hidden Layer Output Layer


W0A
Node
A

WAZ

Node
B

WBZ

W0B

Node
Z

W0Z

Recall that first pass through network yielded output = 0.8750


Assume actual target value = 0.8, and learning rate = 0.01
Prediction error = 0.8 - 0.8750 = -0.075
Neural Networks use stochastic back-propagation
Weights updated after each record processed by network
Adjusting the weights using back-propagation shown next

Error responsibility for Node Z, an output node, found first

Z output Z (1 output Z )(actualZ output Z )


0.875(1 0.875)(0.8 0.875) 0.0082

Example of Back-Propagation (contd)

Now adjust constant weight w0Z using rules

W0 Z Z (1) 0.1(0.0082)(1) .00082


w0 Z , NEW w0 Z ,CURRENT w0 Z 0.5 0.00082 0.49918

Move upstream to Node A, a hidden layer node


Only node downstream from Node A is Node Z

A output A (1 output A )

W jk j

DOWNSTREAM

0.7892(1 0.7892)(0.9)(0.0082) 0.00123

Example of Back-Propagation (contd)

Adjust weight wAZ using back-propagation rules


WAZ Z (OUTPUTA ) 0.1(0.0082)(0.7892) 0.000647
wAZ , NEW wAZ ,CURRENT wAZ 0.9 0.000647 0.899353

Connection weight between Node A and Node Z adjusted from


0.9 to 0.899353

Next, Node B is hidden layer node


Only node downstream from Node B is Node Z

B output B (1 output B )

W jk j

DOWNSTREAM

0.8176(1 0.8176)(0.9)(0.0082) 0.0011

Example of Back-Propagation (contd)

Adjust weight wBZ using back-propagation rules


WBZ Z (OUTPUTB ) 0.1(0.0082)(0.8176) 0.00067
wBZ , NEW wBZ ,CURRENT wBZ 0.9 0.0.00067 0.89933

Connection weight between Node B and Node Z adjusted from


0.9 to 0.89933

Similarly, application of back-propagation rules continues to


input layer nodes
Weights {w1A, w2A, w3A , w0A} and {w1B, w2B, w3B , w0B} updated by
process

Example of Back-Propagation (contd)

Now, all network weights in model are updated


Each iteration based on single record from data set

Summary

Network calculated predicted value for target variable


Prediction error derived
Prediction error percolated back through network
Weights adjusted to generate smaller prediction error
Process repeats record by record

Termination Criteria

Many passes through data set performed


Constantly adjusting weights to reduce prediction error
When to terminate?
Stopping criterion may be computational clock time?
Short training times likely result in poor model

Terminate when SSE reaches threshold level?


Neural Networks are prone to overfitting
Memorizing patterns rather than generalizing

And

Learning Rate

Recall Learning Rate (Greek eta) is a constant


0 1, where
learning rate

Small Learning Rate

Helps adjust weights toward global minimum for SSE


With small learning rate, weight adjustments small
Network takes unacceptable time converging to solution

Large Learning Rate

Suppose algorithm close to optimal solution


With large learning rate, network likely to overshoot optimal
solution

Neural Network for IR:

From the work by Wilkinson & Hingston, SIGIR91


Query
Terms

Document
Terms
k1

Documents
d1

ka
ka
kb

kb
kc

dj
dj+1

kc
kt

dN

Neural Network for IR

Three layers network


Signals propagate across the network
First level of propagation:
Query terms issue the first signals
These signals propagate accross the network to
reach the document nodes

Second level of propagation:


Document nodes might themselves generate new
signals which affect the document term nodes
Document term nodes might respond with new
signals of their own

Quantifying Signal Propagation

Normalize signal strength (MAX = 1)


Query terms emit initial signal equal to 1
Weight associated with an edge from a query term
node ki to a document term node ki:
Wiq =

wiq
2
sqrt ( i wiq )

Weight associated with an edge from a document


term node ki to a document node dj:
Wij =

wij
2
sqrt ( i wij )

Quantifying Signal Propagation

After the first level of signal propagation, the


activation level of a document node dj is given by:

i Wiq Wij =

i wiq wij

sqrt ( i wiq ) * sqrt ( i wij )


2

which is exactly the ranking of the Vector model

New signals might be exchanged among document


term nodes and document nodes in a process
analogous to a feedback cycle
A minimum threshold should be enforced to avoid
spurious signal generation

Conclusions

Model provides an interesting formulation of the IR


problem
Model has not been tested extensively
It is not clear the improvements that the model might
provide

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy