Lecture 30,31
Lecture 30,31
CS G553
‹#›
Lecture –30,31
Mapping Design to Reconfigurable Platform: Logic
Synthesis , LUT Based Technology Mapping
CS G553 2
FPGA Physical Design Flow
CS G553 3
FPGA Logic Optimization and Synthesis
a
Netlist of basic gates
b f
c
LUT
Netlist of logic blocks
CS G553 4
Logic Synthesis in FPGAs
o Logic Optimization
o Technology Mapping
f
f f
Definitions
a b c d e a b c d e
o Technology Mapping a b c d e
o Logic Optimization
o Logic Synthesis = Logic Optimization + Tech Mapping
CS G553 5
FPGAs vs. Custom Logic
CS G553 6
Objective Function for Mapping
Minimize area
o in terms of number of LUTs
Minimize power
o in terms of switching activity in individual LUTs.
Maximize performance
o in terms of connectivity (depth of LUT implementation)
Any combination of the above (multi-objective)
o combined with different weights
CS G553 7
LUT Based Technology Mapping
CS G553 8
LUT-Based Technology Mapping
TYPES OF ALGORITHMS
Area Routability
o Chortle- o Bhat and Hill
o Schlag
o Chortle-crf
o Kong and Chang approach
o MIS-fpga
Power
o Xmap o Very few work
Delay
o Flowmap
o Chortle-d
o DAG-map
o MIS-pga-delay
CS G553 9
Types of Algorithms
Classification:
1. Based on objective functions:
o Area-driven
o Performance-driven
o Routability-driven
o Power-driven
2. Based on input network:
o Combinational
• Assumes fixed positions for sequential elements
• Only considers the combinational logic between sequential elements
o Sequential
• May relocate FFs during mapping (retiming)
• Can explore a much larger solution space (better quality)
3. Based on employed transformation technique:
o Structural
o Functional
CS G553 10
Types of Algorithms
Structural TM:
o Does not modify the input netlist (except logic duplication).
o Covering a netlist with logic cells (e.g., K-LUTs) of the target FPGAs
o Efficient for large designs
• Most algorithms are structural.
Functional TM:
o Boolean transformation/decomposition of the input design into a set
of interconnected logic cells
o Mixes Boolean optimization with covering
o Can potentially explore a larger solution space than structural
mapping
o Time-consuming
• Restricted to small designs (or small portions of a large design)
CS G553 11
LUT-Technology Mapping- Definitions
Fan-in of v or input(v):
o the set of nodes whose outputs are inputs of v
Fan-out of v or output(v):
o the set of nodes, which use the output of v as inputs
Primary Input:
o a node with no predecessor
Primary output:
o a node with no successor
Level of a node:
o the length of the longest path from PI to that node
Depth of a graph:
o the largest level of a node in the graph
K-bounded Boolean network:
o if |input(v)|≤ K for all nodes in the graph
CS G553 12
LUT-Technology Mapping- Definitions
A tree or fan-out-free circuit is one in which each node has a maximal
fan-out of one.
o node v, max(fan-out(v)) =1
Forest:
o an independent set of trees
Leaf-DAG
o A combinational circuit in which
• The only gates with a fan-in greater than one are the primary inputs
CS G553 13
LUT-Technology Mapping- Definitions
Cone Cv at node v:
o the tree with root v and which
spans from v to Primary Inputs.
K-feasible cone :
o Cv is K-feasible at node v if:
• |input(Cv)| ≤ K and
A K-feasible Cone at v
• any path connecting a node in Cv
and v lies entirely in Cv
Fanout-free cone:
o a cone in which the fanouts of
every node other than the root are
inside the cone
• For each node ν, there is a unique
maximum fanout-free cone (MFFCv) − i.e. contains every
fanout-free cone
rooted at ν
CS G553 14
LUT-Technology Mapping
LUT-technology mapping problem:
o Covering a Boolean network with a set of K-feasible cones.
• The Boolean network is usually 2-bounded (if not, it is converted to 2-
bounded)
CS G553
1515
LUT-Technology Mapping
LUT-technology mapping problem:
o Covering a Boolean network with a set of K-feasible
cones.
Find the largest logic cone that will fit into the LUT:
d=a+b
s = d’ q = g’ + h
r = q + s’
CS G553 17
Chortle
CS G553 18
Chortle Algorithm
Chortle [Francis90]:
o Developed by Francis et al, University of Toronto in 1990
o Optimal i.t.o. area
Inputs:
o Fanout-free tree of combinational network
o n-input LUTs
Procedure:
o Dynamic programming:
• Computes and records solutions to all sub-problems proceeding
from smallest to largest sub-problem.
– Recording the solution to each sub-problem eliminates the need to
recalculate it as part of the solution of any larger sub-problem.
CS G553 19
Chortle Algorithm
CS G553 20
Chortle Algorithm
Input DAG:
o Assumption: A tree
o If not:
• Convert to a forest of maximal fanout-free tree
n
n
na nb
nc
a
c a
b
b c
Node Splitting
CS G553 21
Chortle Algorithm
Input DAG:
o Assumption:
Fanin of any
node <= K
o If > K, then
decomposition algorithm:
• Considers all possible
K=3
decompositions of every
node
Node Decomposition
CS G553 22
Chortle Algorithm
Post-Order Tree Traversal
o Visit Left subtree
o Visit Right subtree
o Visit Root
Definition1:
A mapping of a node n, in a Tree T:
o circuit of K-input lookup tables that implements the sub-tree of T that
is rooted at n and extends to the leaf nodes of T
Definition 2:
The root lookup table of a mapping of node n:
o has as its single output the Boolean function of the node n
CS G553 23
Chortle Algorithm
Definition 3:
Utilization of LUT at root of subtree (node n):
o The number of inputs, out of K inputs, that are actually used in the
circuit.
o U { 2..K }
o U=K for fully utilized LUT
CS G553 24
Chortle Algorithm
Utilization Division of an LUT:
o Node n has fanin nodes n1, n2, nf
o LUT of n includes all fanin edges of n and some subtrees Si rooted at ni
o Utilization Division:
• Distribution of inputs to the LUT among the subtrees
o U(u1, u2, … uf)
o UD: How the inputs of this LUT are divided among the fanin edges of
node n
(1,1,2)
CS G553 25
Chortle Algorithm
Can show:
o For any node n, with fanin nodes n1, …, nf, if we have previously
calculated minMap(ni, Ui) for all Ui from 2 to K for every node ni, then
we can calculate minMap(n, U) for all U from 2 to K.
CS G553 26
Chortle Algorithm
MapTree(T,K)
o For each node (n) in postorder traversal of Tree (T)
• For each utilization (U) = 2 to K of node n
– CurrentBestCost = ∞
– CurrentBestMap = Ø
– For each Utilization Divisions (μ(left,right)) such that
left+right=U
• Construct minimum-cost mapping, M, for subtree rooted at
n
• Calculate cost(M)
• If Cost(M) < CurrentBestCost(M)
CurrentBestMap = M
CurrentBestCost = Cost(M)
• MinMap(n,U) = CurrentBestMap
o Return MinMap(root,K)
CS G553 27
Chortle Algorithm
Construct minMap:
o Combine constructed root LUT with the mapping minMap(ni, ui)
which have been previously computed
• If ui = 1, minMap(ni,K) must be used instead of minMap(ni, 1)
• The root LUT of minMap(ni, ui) is eliminted because it is within the
constructed LUT.
CS G553 28
Chortle: Example
Given:
o F = A*B + (C*D*E) + F
Decompose
o F = (A*B) + (C*D) * E) + F
Given
o K=4
Find Optimal Implementation
o Maptree(n5,2)
o Maptree(n5,3)
o Maptree(n5,4)
CS G553 29
Chortle: Example
For n=n1
o For U=2
• For μn1(1,1)
– CurrentBestMap=n1(A,B)
– CurrentBestCost(M) = 1 LUT
• MinMap(n1,2) = n1(A,B)
– MinCost = 1 LUT
o For U=3 : Same as U=2
• MinMap(n1,3) = MinMap(n1,2)=n1(A,B)
– MinCost = 1 LUT
o For U=4 : Same as U=2
• MinMap(n1,4) = MinMap(n1,2)=n1(A,B)
– MinCost = 1 LUT
CS G553 30
Chortle: Example
For n=n2
o For U=2
• For μn2(1,1)
– CurrentBestMap=n2(C,D)
– CurrentBestCost(M) = 1 LUT
• MinMap(n2,2) = n2(C,D)
– MinCost = 1 LUT
o For U=3 : Same as U=2
• MinMap(n2,3) = MinMap(n2,2)=n2(C,D)
– MinCost = 1 LUT
o For U=4 : Same as U=2
• MinMap(n2,4) = MinMap(n2,2)=n2(C,D)
– MinCost = 1 LUT
CS G553 31
Chortle: Example
For n=n3
o For U=2
• For μn3(1,1)
– CurrentBestMap
=n3(MinMap(n2,K=4),E)
=n3(n2(C,D),E)
– CurrentBestCost(M)
=MinCost(n2) + 1=2 LUTs
• MinMap(n3,2) = n3(n2(C,D),E)
– MinCost = 2 LUTs
CS G553 32
Chortle: Example
• MinMap(n3,3) = n3(C,D,E)
– MinCost = 1 LUT
o For U=4 : Same as U=3
CS G553 33
Chortle: Example
For n=n4
o For U=2
• For μ(left,right)=μ(1,1)
– CurrentBestMap
=n4(MinMap(n3,K=4),F)
=n4(n3(C,D,E),F)
– CurrentBestCost(M) = 2 LUTs
– MinMap(n4,2) = n4(n3(C,D,E),F)
• MinCost = 2 LUTs
CS G553 34
Chortle: Example
For n=n4 (continued)
o For U=3
• For μn4(2,1):
– CurrentBestMap
=n4(MinMap(n3,2),F)
=n4(n2(C,D),E,F)
– CurrentBestCost(M) = 2 LUTs
• For: μn4(1,2): Same as μn4(1,1)
– Cost = 2 LUTs (tie)
• MinMap(n4,3) = n4(n2(C,D),E,F)
– MinCost = 2 LUTs
CS G553 35
Chortle: Example
CS G553 36
Chortle: Example
CS G553 37
Chortle: Example
CS G553 38
Chortle: Example
CS G553 39
Chortle: Example
CS G553 40
Chortle: Example
CS G553 41
Chortle: Example
o MinMap(n5,4)
• = n5(A,B,n3(C,D,E),F)
– MinCost = 2 LUTs
CS G553 42
Chortle: Example
Optimal Solution
CS G553 43
Chortle-crf
CS G553 44
Chortle-crf Algorithm
Chortle
o Chortle uses exhaustive search to find the optimal gate-level
decomposition of every node in a fanout-free tree
Chortle-crf [Francis91]:
o Developed by Francis et al, University of Toronto in 1991
Inputs:
o SOP representation of a single output function
o K-input LUTs
Features:
o x 28 faster
o 14% fewer LUTs than Chortle
CS G553 45
Chortle-crf Algorithm
Procedure
o Bin packing and dynamic programming to choose gate-level
decomposition
o Exploitation of reconvergent paths
o Replication of logic at fanout nodes
CS G553 46
Bin Packing Problem
http://www.cs.gsu.edu/~cscskp/Algorithms/NP/node11.html
CS G553 47
Bin Packing : Formulation
i N 1,2,..., n
n
S.T
j 1
Wj Xij CYi
X
i 1
ij 1 j N
CS G553 48
Chortle-crf Algorithm
Example:
o K=3
o f = ab + cd
o # of inputs = 4 > K
• Cannot use a single LUT.
o Decomposition:
• f1 = ab,
• f2 = cd,
• f = f 1 + f2
o Alternative decomposition:
• f1 = ab,
• f = f1 + cd
CS G553 49
Chortle-crf Algorithm
Map the trees: (Step-A)
o Traverse the network from inputs to output.
o At each node v, a circuit implementing the cone (from v to PIs) is
constructed.
• The circuit is called Best Circuit (BC) at v.
Objectives in constructing BC:
o minimize number of LUTs (area)
o maximize number of unused inputs at the output LUTs
• Allows subsequent nodes to be implemented without extra LUTs.
Points:
o The order of traversal ensures that the immediate fanin circuits have
been constructed.
o Output LUTs of the fanin BCs will be referred to as fanin LUTs.
CS G553 50
Chortle-crf Algorithm
Example:
o K=5
o An OR node and its fanin LUTs
f g h i j
CS G553 51
Chortle-crf Algorithm: Decomposition
Decomposition: (Step-B)
o Goal:
• To construct a tree of LUTs that implements
1. both the functions of fanin LUTs and
2. a decomposition of the node.
Two Steps:
1. Two-level decomposition
2. Convert it to multi-level decomposition
CS G553 52
Chortle-crf Algorithm: Decomposition
f g h i j
Multi-level
Two-level
CS G553 53
Two-Level Decomposition
Bin packing:
o Bins: second-level lookup tables FirstFitDecreasing
{
o Boxes: fanin lookup tables. start with en empty bin list
uhile there are unpacked boxes
o The capacity of each bin: K {
if the largest unpacked box will not fit
o Size of each box (fanin lookup vithin any bin in the bin list
{
table): its number of used inputs. create an empty bin and
add it to the end of the bin list
}
pack the largest unpacked box into the
first bin it will fit within
}
• Example: }
• sizes
f g h i j
I level
• 3, 2, 2, 2, and 2
• Final contents of the packed bins:
• 5, 4, and 2 II level
CS G553 54
Step 1: Two-Level Decomposition
Packing:
o Combining two LUTs LUT1 (implementing f1) and LUT2
(implementing f2) into a new LUTr that implements the function
f = f1 Ø f2, where Ø is the function implemented in the fan-out
node (e.g. OR)
• Uses first-fit decreasing (FFD) method
• Can use best-fit (BFD)
f g h i j
f g h i j
Boxes
Bins
CS G553 55
Two-Level Decomposition
CS G553 56
Chortle-crf: Multi-Level Decomposition
Multi-level Decomposition
o The first-level node is implemented with a tree
of LUTs:
• Inputs to the leaf LUT of the 1st-level tree =
outputs of 2nd level LUTs of two-level
decomposition 2nd level
o Reduction of the number of LUTs:
• by using unused pins of the 2nd level LUTs to 1st level
implement a portion of the first-level LUTs.
Algorithm MultiLevel
{
while there are more than one unconnected LUT do
{
if there are no free inputs among
the remaining unconnected LUT
{
create an empty LUT and add
it to the end of the LUT list
}
connect the most filled unconnected LUT
to the next unconnected LUT with a free input
}
}
CS G553 57
Chortle-crf: Multi-Level Decomposition
CS G553 58
Chortle-crf: Reconvergent Paths
CS G553 59
Chortle-crf: Reconvergent Paths
bin
f i j
boxes f g h i j g h
Fanin LUTs with shared inputs Reconvergent paths realized within one LUT
CS G553 60
Chortle-crf: Reconvergent Paths
CS G553 61
Chortle-crf: Reconvergent Paths
CS G553 65
Chortle-crf: Logic Replication
CS G553 66
Chortle-crf
CS G553 67
Chortle-crf
o Basic Xilinx tech mapping follows Chortle
• with modification to handle registers.
CS G553 68
Chortle-d
Chortle-d:
o Considers delay as objective
o FlowMap solves it optimally.
CS G553 69
The End
Questions ?
CS G553 70
Bin Packing
First fit decreasing algorithm
A B C D E F
With the first fit decreasing algorithm we sort the blocks into
descending order first.
5 6
4 3 3 3
2 2
1
Bin Packing
First fit decreasing algorithm
A B C D E F
Now we use the first fit algorithm
6 5 4 3 3 3 2 2 1
Bin Packing
First fit decreasing algorithm
A B C D E F
Now we use the first fit algorithm
5 4 3 3 3 2 2 1
Bin Packing
5
First fit decreasing algorithm
6 5
A B C D E F
Now we use the first fit algorithm
4 3 3 3 2 2 1
Bin Packing
First
4 fit
4
decreasing algorithm
6 5 4
A B C D E F
Now we use the first fit algorithm
3 3 3 2 2 1
Bin Packing
First
3 fit decreasing algorithm
3
3
6 5 4
3
A B C D E F
Now we use the first fit algorithm
3 3 2 2 1
Bin Packing
First
3 fit decreasing algorithm
3
3
3
6 5 4
3
A B C D E F
Now we use the first fit algorithm
3 2 2 1
Bin Packing
First
3 fit decreasing
3 algorithm
3
3
3
6 5 4
3 3
A B C D E F
Now we use the first fit algorithm
2 2 1
Bin Packing
First
2
fit decreasing algorithm
2
2 3
6 5 4
3 3
A B C D E F
Now we use the first fit algorithm
2 1
Bin Packing
First
2
fit decreasing
2 2
algorithm
2
2 3 2
6 5 4
3 3
A B C D E F
Now we use the first fit algorithm
1
Bin Packing
First fit decreasing algorithm
1
1 2 3 2
6 5 4
3 3
A B C D E F
Nowhave
We we use
packed
the first
themfitinto
algorithm
5 bins.