0% found this document useful (0 votes)
164 views81 pages

Lecture 30,31

The document describes reconfigurable computing and the mapping of designs to reconfigurable platforms like FPGAs. It discusses the FPGA design flow involving synthesis, placement, packing, and routing. It covers logic optimization and synthesis techniques like technology mapping and how designs are mapped to lookup tables (LUTs). Different algorithms for LUT-based technology mapping are also summarized, including area-focused algorithms like Chortle that use dynamic programming to find an optimal mapping of logic cones to LUTs.

Uploaded by

Piyush Parashar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
164 views81 pages

Lecture 30,31

The document describes reconfigurable computing and the mapping of designs to reconfigurable platforms like FPGAs. It discusses the FPGA design flow involving synthesis, placement, packing, and routing. It covers logic optimization and synthesis techniques like technology mapping and how designs are mapped to lookup tables (LUTs). Different algorithms for LUT-based technology mapping are also summarized, including area-focused algorithms like Chortle that use dynamic programming to find an optimal mapping of logic cones to LUTs.

Uploaded by

Piyush Parashar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Reconfigurable Computing

CS G553

Dr. A. Amalin Prince


BITS - Pilani K K Birla Goa Campus
Department of Electrical and Electronics Engineering

‹#›
Lecture –30,31
Mapping Design to Reconfigurable Platform: Logic
Synthesis , LUT Based Technology Mapping

CS G553 2
FPGA Physical Design Flow

Design Synthesis Logic


Entry Optimization

Placement Packing LUTs Mapping


to CLBs to k-LUT

Routing Simulation Configure an


FPGA

CS G553 3
FPGA Logic Optimization and Synthesis
a
Netlist of basic gates
b f
c

Technology Independent logic optimization

Technology map to lookup tables (LUTS) LUT


LUT

Pack LUTs into logic blocks CLB


LUT

LUT
Netlist of logic blocks

CS G553 4
Logic Synthesis in FPGAs

o Logic Optimization
o Technology Mapping
f

f f

 Definitions
a b c d e a b c d e

o Technology Mapping a b c d e

o Logic Optimization
o Logic Synthesis = Logic Optimization + Tech Mapping

CS G553 5
FPGAs vs. Custom Logic

 Cost metric for custom static gates is literal:


o ax + bx’ has four literals, requires 8 transistors.
 Cost metric for FPGAs is logic element:
o All functions that fit in an LE have the same cost.

CS G553 6
Objective Function for Mapping

 Minimize area
o in terms of number of LUTs
 Minimize power
o in terms of switching activity in individual LUTs.
 Maximize performance
o in terms of connectivity (depth of LUT implementation)
 Any combination of the above (multi-objective)
o combined with different weights

CS G553 7
LUT Based Technology Mapping

 A major obstacle in applying conventional technology


mapping approaches to LUT circuit is the large number of
different functions that a LUT can implement.
k
o K-input LUT >>> 2 2 different Boolean function
• Rule based method of developing a set of rules that encapsulates the
complete functionality of a LUT.
• Library based systems, the library representing a K-input LUT need not
k
include all 2 2 different functions.
• Cell generator avoids the problem of large libraries by using matching
algorithms that simply test network sub-functions against the parameters
defining the cell family.
– The number of sub-functions that must be considered is reduced by using
the network itself to direct the search. However, the cell families used by
these approaches do not completely encompass the functionality of a K-input
LUT.

CS G553 8
LUT-Based Technology Mapping

TYPES OF ALGORITHMS
Area  Routability
o Chortle- o Bhat and Hill
o Schlag
o Chortle-crf
o Kong and Chang approach
o MIS-fpga
 Power
o Xmap o Very few work
Delay
o Flowmap
o Chortle-d
o DAG-map
o MIS-pga-delay
CS G553 9
Types of Algorithms

 Classification:
1. Based on objective functions:
o Area-driven
o Performance-driven
o Routability-driven
o Power-driven
2. Based on input network:
o Combinational
• Assumes fixed positions for sequential elements
• Only considers the combinational logic between sequential elements
o Sequential
• May relocate FFs during mapping (retiming)
• Can explore a much larger solution space (better quality)
3. Based on employed transformation technique:
o Structural
o Functional

CS G553 10
Types of Algorithms
 Structural TM:
o Does not modify the input netlist (except logic duplication).
o Covering a netlist with logic cells (e.g., K-LUTs) of the target FPGAs
o Efficient for large designs
• Most algorithms are structural.
 Functional TM:
o Boolean transformation/decomposition of the input design into a set
of interconnected logic cells
o Mixes Boolean optimization with covering
o Can potentially explore a larger solution space than structural
mapping
o Time-consuming
•  Restricted to small designs (or small portions of a large design)

CS G553 11
LUT-Technology Mapping- Definitions
 Fan-in of v or input(v):
o the set of nodes whose outputs are inputs of v
 Fan-out of v or output(v):
o the set of nodes, which use the output of v as inputs
 Primary Input:
o a node with no predecessor
 Primary output:
o a node with no successor
 Level of a node:
o the length of the longest path from PI to that node
 Depth of a graph:
o the largest level of a node in the graph
 K-bounded Boolean network:
o if |input(v)|≤ K for all nodes in the graph
CS G553 12
LUT-Technology Mapping- Definitions
 A tree or fan-out-free circuit is one in which each node has a maximal
fan-out of one.
o node v, max(fan-out(v)) =1
 Forest:
o an independent set of trees
 Leaf-DAG
o A combinational circuit in which
• The only gates with a fan-in greater than one are the primary inputs

CS G553 13
LUT-Technology Mapping- Definitions
 Cone Cv at node v:
o the tree with root v and which
spans from v to Primary Inputs.

 K-feasible cone :
o Cv is K-feasible at node v if:
• |input(Cv)| ≤ K and
A K-feasible Cone at v
• any path connecting a node in Cv
and v lies entirely in Cv

 Fanout-free cone:
o a cone in which the fanouts of
every node other than the root are
inside the cone
• For each node ν, there is a unique
maximum fanout-free cone (MFFCv) − i.e. contains every
fanout-free cone
rooted at ν
CS G553 14
LUT-Technology Mapping
 LUT-technology mapping problem:
o Covering a Boolean network with a set of K-feasible cones.
• The Boolean network is usually 2-bounded (if not, it is converted to 2-
bounded)

[Chen06] Graph covering with cones LUT Mapping

CS G553
1515
LUT-Technology Mapping
 LUT-technology mapping problem:
o Covering a Boolean network with a set of K-feasible
cones.

Graph covering with cones LUT Mapping


CS G553 16
LUT-based logic synthesis

 Find the largest logic cone that will fit into the LUT:

d=a+b

s = d’ q = g’ + h

r = q + s’

CS G553 17
Chortle

CS G553 18
Chortle Algorithm

 Chortle [Francis90]:
o Developed by Francis et al, University of Toronto in 1990
o Optimal i.t.o. area
 Inputs:
o Fanout-free tree of combinational network
o n-input LUTs

 Procedure:
o Dynamic programming:
• Computes and records solutions to all sub-problems proceeding
from smallest to largest sub-problem.
– Recording the solution to each sub-problem eliminates the need to
recalculate it as part of the solution of any larger sub-problem.

CS G553 19
Chortle Algorithm

 The purpose of the algorithm is to find the minimum cost


circuit of K-input LUTs that implements an arbitrary Boolean
network represented by a graph G.
o The algorithm begins by converting the graph G into a forest of
maximal fanout-free trees.
o Each of these trees is then mapped to find the minimum cost circuit
that implements the tree.
o A circuit implementing the entire graph G is formed by combining
the circuits that implement each tree in the forest.

CS G553 20
Chortle Algorithm

 Input DAG:
o Assumption: A tree
o If not:
• Convert to a forest of maximal fanout-free tree

n
n
na nb
nc
a
c a
b
b c
Node Splitting

CS G553 21
Chortle Algorithm
 Input DAG:
o Assumption:
Fanin of any
node <= K
o If > K, then
decomposition algorithm:
• Considers all possible
K=3
decompositions of every
node

Node Decomposition
CS G553 22
Chortle Algorithm
 Post-Order Tree Traversal
o Visit Left subtree
o Visit Right subtree
o Visit Root

Definition1:
 A mapping of a node n, in a Tree T:
o circuit of K-input lookup tables that implements the sub-tree of T that
is rooted at n and extends to the leaf nodes of T

Definition 2:
 The root lookup table of a mapping of node n:
o has as its single output the Boolean function of the node n

CS G553 23
Chortle Algorithm
Definition 3:
 Utilization of LUT at root of subtree (node n):
o The number of inputs, out of K inputs, that are actually used in the
circuit.
o U  { 2..K }
o U=K for fully utilized LUT

 Minimum Cost of sub-circuit rooted at n


o MinMap(n,Un):
• Optimal solution for node n for U  { 2..K }

CS G553 24
Chortle Algorithm
 Utilization Division of an LUT:
o Node n has fanin nodes n1, n2, nf
o LUT of n includes all fanin edges of n and some subtrees Si rooted at ni
o Utilization Division:
• Distribution of inputs to the LUT among the subtrees
o U(u1, u2, … uf)
o UD: How the inputs of this LUT are divided among the fanin edges of
node n

(1,1,2)

CS G553 25
Chortle Algorithm

 Can show:
o For any node n, with fanin nodes n1, …, nf, if we have previously
calculated minMap(ni, Ui) for all Ui from 2 to K for every node ni, then
we can calculate minMap(n, U) for all U from 2 to K.

CS G553 26
Chortle Algorithm
 MapTree(T,K)
o For each node (n) in postorder traversal of Tree (T)
• For each utilization (U) = 2 to K of node n
– CurrentBestCost = ∞
– CurrentBestMap = Ø
– For each Utilization Divisions (μ(left,right)) such that
left+right=U
• Construct minimum-cost mapping, M, for subtree rooted at
n
• Calculate cost(M)
• If Cost(M) < CurrentBestCost(M)
CurrentBestMap = M
CurrentBestCost = Cost(M)
• MinMap(n,U) = CurrentBestMap
o Return MinMap(root,K)
CS G553 27
Chortle Algorithm

 Construct minMap:
o Combine constructed root LUT with the mapping minMap(ni, ui)
which have been previously computed
• If ui = 1, minMap(ni,K) must be used instead of minMap(ni, 1)
• The root LUT of minMap(ni, ui) is eliminted because it is within the
constructed LUT.

CS G553 28
Chortle: Example

 Given:
o F = A*B + (C*D*E) + F
 Decompose
o F = (A*B) + (C*D) * E) + F
 Given
o K=4
 Find Optimal Implementation
o Maptree(n5,2)
o Maptree(n5,3)
o Maptree(n5,4)

CS G553 29
Chortle: Example

 For n=n1
o For U=2
• For μn1(1,1)
– CurrentBestMap=n1(A,B)
– CurrentBestCost(M) = 1 LUT
• MinMap(n1,2) = n1(A,B)
– MinCost = 1 LUT
o For U=3 : Same as U=2
• MinMap(n1,3) = MinMap(n1,2)=n1(A,B)
– MinCost = 1 LUT
o For U=4 : Same as U=2
• MinMap(n1,4) = MinMap(n1,2)=n1(A,B)
– MinCost = 1 LUT

CS G553 30
Chortle: Example

 For n=n2
o For U=2
• For μn2(1,1)
– CurrentBestMap=n2(C,D)
– CurrentBestCost(M) = 1 LUT
• MinMap(n2,2) = n2(C,D)
– MinCost = 1 LUT
o For U=3 : Same as U=2
• MinMap(n2,3) = MinMap(n2,2)=n2(C,D)
– MinCost = 1 LUT
o For U=4 : Same as U=2
• MinMap(n2,4) = MinMap(n2,2)=n2(C,D)
– MinCost = 1 LUT

CS G553 31
Chortle: Example

 For n=n3
o For U=2
• For μn3(1,1)
– CurrentBestMap
=n3(MinMap(n2,K=4),E)
=n3(n2(C,D),E)
– CurrentBestCost(M)
=MinCost(n2) + 1=2 LUTs
• MinMap(n3,2) = n3(n2(C,D),E)
– MinCost = 2 LUTs

CS G553 32
Chortle: Example

 For n=n3 (continued)


o For U=3
• For μn3(2,1)
– CurrentBestMap
=n3(MinMap(n2,2),E)
=n3(C,D,E)
– CurrentBestCost(M) = 1 LUT
• For: μn3(1,2): Same as μn3(1,1)

• MinMap(n3,3) = n3(C,D,E)
– MinCost = 1 LUT
o For U=4 : Same as U=3

CS G553 33
Chortle: Example

 For n=n4
o For U=2
• For μ(left,right)=μ(1,1)
– CurrentBestMap
=n4(MinMap(n3,K=4),F)
=n4(n3(C,D,E),F)
– CurrentBestCost(M) = 2 LUTs

– MinMap(n4,2) = n4(n3(C,D,E),F)
• MinCost = 2 LUTs

CS G553 34
Chortle: Example
 For n=n4 (continued)
o For U=3
• For μn4(2,1):
– CurrentBestMap
=n4(MinMap(n3,2),F)
=n4(n2(C,D),E,F)
– CurrentBestCost(M) = 2 LUTs
• For: μn4(1,2): Same as μn4(1,1)
– Cost = 2 LUTs (tie)

• MinMap(n4,3) = n4(n2(C,D),E,F)
– MinCost = 2 LUTs

CS G553 35
Chortle: Example

CS G553 36
Chortle: Example

CS G553 37
Chortle: Example

CS G553 38
Chortle: Example

CS G553 39
Chortle: Example

CS G553 40
Chortle: Example

CS G553 41
Chortle: Example

o MinMap(n5,4)
• = n5(A,B,n3(C,D,E),F)
– MinCost = 2 LUTs

CS G553 42
Chortle: Example

 Optimal Solution

CS G553 43
Chortle-crf

CS G553 44
Chortle-crf Algorithm

 Chortle
o Chortle uses exhaustive search to find the optimal gate-level
decomposition of every node in a fanout-free tree

 Chortle-crf [Francis91]:
o Developed by Francis et al, University of Toronto in 1991
 Inputs:
o SOP representation of a single output function
o K-input LUTs

 Features:
o x 28 faster
o 14% fewer LUTs than Chortle
CS G553 45
Chortle-crf Algorithm

 Procedure
o Bin packing and dynamic programming to choose gate-level
decomposition
o Exploitation of reconvergent paths
o Replication of logic at fanout nodes

CS G553 46
Bin Packing Problem

 Bin packing problem:


o Placing n objects into a number of bins (at most n bins).
o Each object has a weight (Wi > 0)
o Each bin has a limited capacity (Ci > 0)
o Find the best assignment of objects to bins such that
• The total weight of the objects in each bin does not exceed its capacity
• The number of bins used is minimized
o Let Yi = 1 if (bin i) is used
o Let Xij = 1 if (object j) is assigned to (bin i)

http://www.cs.gsu.edu/~cscskp/Algorithms/NP/node11.html

CS G553 47
Bin Packing : Formulation

Obj (j): object

Bin (i): Bin Bin Bin


n
Min Z   Yi
i 1

i  N  1,2,..., n 
n
S.T 
j 1
Wj Xij  CYi

X
i 1
ij  1 j N

CS G553 48
Chortle-crf Algorithm

 Example:
o K=3
o f = ab + cd
o  # of inputs = 4 > K
• Cannot use a single LUT.
o Decomposition:
• f1 = ab,
• f2 = cd,
• f = f 1 + f2

o Alternative decomposition:
• f1 = ab,
• f = f1 + cd

CS G553 49
Chortle-crf Algorithm
 Map the trees: (Step-A)
o Traverse the network from inputs to output.
o At each node v, a circuit implementing the cone (from v to PIs) is
constructed.
• The circuit is called Best Circuit (BC) at v.
 Objectives in constructing BC:
o minimize number of LUTs (area)
o maximize number of unused inputs at the output LUTs
• Allows subsequent nodes to be implemented without extra LUTs.
 Points:
o The order of traversal ensures that the immediate fanin circuits have
been constructed.
o Output LUTs of the fanin BCs will be referred to as fanin LUTs.

CS G553 50
Chortle-crf Algorithm

 Example:
o K=5
o An OR node and its fanin LUTs

f g h i j

CS G553 51
Chortle-crf Algorithm: Decomposition

 Decomposition: (Step-B)
o Goal:
• To construct a tree of LUTs that implements
1. both the functions of fanin LUTs and
2. a decomposition of the node.
 Two Steps:
1. Two-level decomposition
2. Convert it to multi-level decomposition

CS G553 52
Chortle-crf Algorithm: Decomposition

f g h i j

Multi-level
Two-level

CS G553 53
Two-Level Decomposition
 Bin packing:
o Bins: second-level lookup tables FirstFitDecreasing
{
o Boxes: fanin lookup tables. start with en empty bin list
uhile there are unpacked boxes
o The capacity of each bin: K {
if the largest unpacked box will not fit
o Size of each box (fanin lookup vithin any bin in the bin list
{
table): its number of used inputs. create an empty bin and
add it to the end of the bin list
}
pack the largest unpacked box into the
first bin it will fit within
}
• Example: }

• sizes
f g h i j
I level
• 3, 2, 2, 2, and 2
• Final contents of the packed bins:
• 5, 4, and 2 II level

CS G553 54
Step 1: Two-Level Decomposition

 Packing:
o Combining two LUTs LUT1 (implementing f1) and LUT2
(implementing f2) into a new LUTr that implements the function
f = f1 Ø f2, where Ø is the function implemented in the fan-out
node (e.g. OR)
• Uses first-fit decreasing (FFD) method
• Can use best-fit (BFD)

f g h i j
f g h i j
Boxes
Bins

CS G553 55
Two-Level Decomposition

Pseudo-code for Two-


Level Decomposition

CS G553 56
Chortle-crf: Multi-Level Decomposition

 Multi-level Decomposition
o The first-level node is implemented with a tree
of LUTs:
• Inputs to the leaf LUT of the 1st-level tree =
outputs of 2nd level LUTs of two-level
decomposition 2nd level
o Reduction of the number of LUTs:
• by using unused pins of the 2nd level LUTs to 1st level
implement a portion of the first-level LUTs.
Algorithm MultiLevel
{
while there are more than one unconnected LUT do
{
if there are no free inputs among
the remaining unconnected LUT
{
create an empty LUT and add
it to the end of the LUT list
}
connect the most filled unconnected LUT
to the next unconnected LUT with a free input
}
}
CS G553 57
Chortle-crf: Multi-Level Decomposition

Pseudo-code for Multi-


Level Decomposition

CS G553 58
Chortle-crf: Reconvergent Paths

 Exploiting reconvergent paths (RP)


o Creates two paths in the graph that terminates at same node
o If two boxes share the same input, there is a pair of RPs
o If # of distinct inputs to these two boxes <= K
•  can pack into one bin.

CS G553 59
Chortle-crf: Reconvergent Paths
bin
f i j
boxes f g h i j g h

Fanin LUTs with shared inputs Reconvergent paths realized within one LUT

Local Reconvergent Paths

CS G553 60
Chortle-crf: Reconvergent Paths

With forced merge, 2 LUTs Without forced merge, 3 LUTs

Exploiting Reconvergent Paths

CS G553 61
Chortle-crf: Reconvergent Paths

Psedo-code for Exhaustive Reconvergent Search


CS G553 62
Chortle-crf: Reconvergent Paths

Psedo-code for Maximum Share Decreasing


CS G553 63
Chortle-crf: Logic Replication
 Improvement
o Logic replication at fan-out nodes reduces the number of LUTs
• Previous version of Chortle partitioned the circuit into fanout-free trees.

Without Replicated Logic, 3 LUTs With Replicated logic, 2 LUTs

Replication of Logic at a Fanout Node


CS G553 64
Chortle-crf: Logic Replication

Without replication With replication

Replication of the Root LUT

CS G553 65
Chortle-crf: Logic Replication

Pseudo-code for Root-LUT Replication

CS G553 66
Chortle-crf

 -c using only the constructive bin packing approach

 -cr using the reconvergent optimization

 -cf using the replication optimization

 -crf using both reconvergent and replication

CS G553 67
Chortle-crf
o Basic Xilinx tech mapping follows Chortle
• with modification to handle registers.

CS G553 68
Chortle-d

 Chortle-d:
o Considers delay as objective
o FlowMap solves it optimally.

CS G553 69
The End

 Questions ?

 Thank you for your attention

CS G553 70
Bin Packing
First fit decreasing algorithm

A B C D E F
With the first fit decreasing algorithm we sort the blocks into
descending order first.

5 6
4 3 3 3
2 2
1
Bin Packing
First fit decreasing algorithm

A B C D E F
Now we use the first fit algorithm

6 5 4 3 3 3 2 2 1
Bin Packing
First fit decreasing algorithm

A B C D E F
Now we use the first fit algorithm

5 4 3 3 3 2 2 1
Bin Packing
5
First fit decreasing algorithm

6 5

A B C D E F
Now we use the first fit algorithm

4 3 3 3 2 2 1
Bin Packing
First
4 fit
4
decreasing algorithm

6 5 4

A B C D E F
Now we use the first fit algorithm

3 3 3 2 2 1
Bin Packing
First
3 fit decreasing algorithm
3
3

6 5 4
3

A B C D E F
Now we use the first fit algorithm

3 3 2 2 1
Bin Packing
First
3 fit decreasing algorithm
3
3
3
6 5 4
3

A B C D E F
Now we use the first fit algorithm

3 2 2 1
Bin Packing
First
3 fit decreasing
3 algorithm
3
3
3
6 5 4
3 3

A B C D E F
Now we use the first fit algorithm

2 2 1
Bin Packing
First
2
fit decreasing algorithm
2
2 3
6 5 4
3 3

A B C D E F
Now we use the first fit algorithm

2 1
Bin Packing
First
2
fit decreasing
2 2
algorithm
2
2 3 2
6 5 4
3 3

A B C D E F
Now we use the first fit algorithm

1
Bin Packing
First fit decreasing algorithm
1
1 2 3 2
6 5 4
3 3

A B C D E F
Nowhave
We we use
packed
the first
themfitinto
algorithm
5 bins.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy