0% found this document useful (0 votes)

27 views3 pages

MAERI: Enabling Flexible Dataflow Mapping Over DNN Accelerators Via Programmable Interconnects

Uploaded by

lalladn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views3 pages

MAERI: Enabling Flexible Dataflow Mapping Over DNN Accelerators Via Programmable Interconnects

Uploaded by

lalladn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

MAERI: Enabling Flexible Dataflow Mapping over DNN

Accelerators via Programmable Interconnects

Hyoukjun Kwon Ananda Samajdar Tushar Krishna
Georgia Institute of Technology Georgia Institute of Technology Georgia Institute of Technology
Atlanta, Georgia Atlanta, Georgia Atlanta, Georgia
hyoukjun@gatech.edu anandsamajdar@gatech.edu tushar@ece.gatech.edu

ABSTRACT (within [4] and across layers [2]) to exploit data reuse, which leads
The microarchitecture of DNN inference engines is an active re- to different dataflow patterns within accelerators.
search topic in the computer architecture community because DNN The DNN Data Flow Graph (DFG) is fundamentally a multi-
accelerators are needed to maximize performance/watt for mass dimensional multiply-accumulate calculation, as Figure 1 demon-
deployment across phones, cars, and so on. This has led to a flurry strates. Each dataflow is essentially some kind of transformation of
of ASIC DNN accelerator proposals in academia over recent years. this loop [10, 12], which contains different optimization potentials
Industry is also investing heavily so every major company develop- depending on neural network layers. Unfortunately, most of DNN
ing its own neural network accelerator, which resulted in myriad accelerators cannot exploit potentials of each dataflow as they inter-
of dataflow patterns. We claim that dataflows essentially lead to nally support fixed dataflow patterns. This is because they perform
different kinds of data movement within an accelerator. Thus, to a careful co-design of the PEs and the network-on-chip (NoC) (e.g.,
support arbitrary dataflows in accelerators, we propose to make TPU [9]). Because of such inflexibility, mapping different dataflows
interconnects programmable. We achieve it by augmenting all com- on an accelerator can lead to compute resource underutilization.
pute elements (multipliers and adders) and on-chip buffers with tiny Therefore, each new optimization has required a new accelerator de-
switches, which can be configured at compile time or runtime. Our sign for that optimization [2, 7, 11, 13], which makes the hardening
design, MAERI, connects these switches via a new configurable and of accelerator designs challenging and uneconomical.
non-blocking tree topology to provide not only programmability Our insight in this work is that different dataflows essentially
but also high throughput. lead to different data movement patterns within accelerators. Thus,
to support arbitrary dataflows in spatial accelerators, we propose
ACM Reference Format: MAERI (Multiply-Accumulate Engine with Reconfigurable Inter-
Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. connect)1 , a DNN accelerator with programmable interconnects.
MAERI: Enabling Flexible Dataflow Mapping over DNN Acceler- MAERI augments all compute elements (multipliers and adders)
ators via Programmable Interconnects. In Proceedings of SysML and on-chip buffers with tiny switches, which can be programmed/-
’18. ACM, New York, NY, USA, 3 pages. https://doi.org/10.1145/
configured at compile- or run-time to support myriad dataflow
nnnnnnn.nnnnnnn
scenarios. We connect these switches via a new configurable and
1 INTRODUCTION non-blocking tree topology.
The microarchitecture of deep neural network (DNN) inference 2 MAERI BUILDING BLOCKS
engines is an active research area in computer architecture commu- Figure 1 shows MAERI’s building blocks:
nity. GPUs provide are efficient training platforms with their mass
parallelism, multi-core CPUs provide platforms for algorithmic ex- • Prefetch buffer (PB): serves as a cache of DRAM, which stores
ploration, and FPGAs provide power-efficient and configurable plat- input activations, weights, intermediate partial sums that could
forms for algorithmic exploration and acceleration. However, for not be fully accumulated, and output activations.
mass deployment across various domains (phones, cars, etc.), DNN • Activation Units: Lookup Tables are used to implement dif-
accelerators are needed to maximize performance/watt. This has ferent activation functions (such as ReLU).
led to a flurry of ASIC proposals for DNN accelerators over recent • Distribution Tree: A fat-tree is used to distribute activations
years [3–7, 11]. Industry is also heavily investing, with every major and weights from the PB to multipliers.
company developing its own spatial DNN accelerator [1, 8, 9]. One
• Simple Switch (SS): Each node in the distribution tree is a
of the practical and open challenges for DNN accelerator designs is
simple 2:1 switch to unicast/multicast inputs/weights.
programmability because DNNs can be partitioned in myriad ways
• Augmented Reduction Tree (ART): A fat-tree augmented
with forwarding links is used to reduce partial sums and send
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed outputs to activation units. An ART with N leaves can support
for profit or commercial advantage and that copies bear this notice and the full citation 1 to N/2 simultaneous reductions with provable non-blocking.
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, • Adder Switch (AS): Each node in ART is an adder augmented
to post on servers or to redistribute to lists, requires prior specific permission and/or a with a switch to allow data forwarding to peers or to parents.
fee. Request permissions from permissions@acm.org.
SysML ’18, Feb 15–16, 2018, Stanford, CA
© 2018 Association for Computing Machinery.
1A version of this paper will appear in Proc. of the 23rd ACM Int. Conf. on Architectural
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn Support for Programming Languages and Operating Systems (ASPLOS), Mar 2018.
SysML ’18, Feb 15–16, 2018, Stanford, CA Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna

I
C C P C C P C C P C C C P C C C P
N
O O O O O O O O O O O O O O O O O F F F
P
N N O N N O N N O N N N O N N N O C C C
U
V V L V V L V V L V V V L V V V L
T
VGG-16 Activation Control Layer Topology

∑ ∑ …∑ W * ∑ ∑ …∑ W * Activation Units
Accelerator Controller
I I

Distribution/Reduction Network
Loop Ordering
for(ki=0; ki<K; ki++) { // C-dimension Filter; Loop_K

+
10 Augmented From/To DRAM
for(ci=0; ci<C; ci++) { // Input Channel; Loop_C ∑ ∑ …∑ W * I

+
Reduction
0
for(yi=0; yi<Y; yi++) { // Image row; Loop_Y ∑ ∑ …∑ W
Y Map Tree (ART)

+
20
+ ∑ ∑ …∑ W * I

Control
for(xi=0; xi<X; xi++) { // Image col; Loop_X *I 10

+
Loop trans-
0
… Prefetch
for(ri=0; ri<R; ri++) { // Weight filter row; Loop_R formations
Loop Blocking/Tiling Y Buﬀer
x x x x x x x x x x x x x x x x
+ ∑ Y-10
∑ …∑ W * I MAERI Blocks
for(si=0; si<S; si++) { // Weight filter col; Loop_S
∑ ∑ …∑R W * I Adder Switch

+
O[ki][xi][yi] += W[ki][ci][ri][si] * I[ci][yi][xi]}}}}}}}
R S
0
R x Multiplier Switch
Full Convolution Loop ∑ …∑ ∑ W * I + ∑ ∑ …∑ W * I
0 1:2 Switch
0 0
…
Loop Unrolling R Distribution Tree Local Buﬀer
+ ∑ ∑ …∑
0 W * I Weights / Input Activations Local Forwarding
MAERI Data link
Dataflow Optimizations
Figure 1: An overview of Maeri. Maeri is designed to efficiently handle CONV, LSTM, POOL and FC layers. It can also handle cross-layer
and sparse mappings. We implement this flexibility using configurable distribution and reduction trees within the fabric.

X00 X01 X02 X03

O00 O01 O02 O03 Instruction Stream for
Slide
X10 X11 X12 X13 Programming Switches To/From Prefetch Buﬀer
W00 W01
X X20 X21 =O
X22
10 O11 O12 O13
X23
O20 O21 O22 O23 to create Virtual Neurons
W10 W11
X30 X31 X32 X33
O30 O31 O32 O33

+
+

+
Filter Input Activation Output Activation

+
Oij = W00 X Xij + W01 X Xi(j+1)

+
+ W10 X X(i+1)j + W11 X X(i+1)(j+1)
CONV
0 Target Layer 1 Programming of Switches

To/From Prefetch Buﬀer To/From Prefetch Buﬀer

VN 0 VN 1 VN 2 VN 0 VN 1 VN 2
+

+
+

+
W00 W01 W10 W11 W00 W01 W10 W11 W00 W01 W10 W11 Mult. valA
X00 X01 X10 X11 X10 X11 X20 X21 X20 X21 X30 X31 Mult. valB
3 Weight and Input Activation Distribution 2 Virtual Neuron Construction

To/From Prefetch Buﬀer To/From Prefetch Buﬀer

4a 4b
VN 0 VN 1 VN 2 VN 0 VN 1 VN 2
+

+
+

O00 O10 O20 O01 O11 O21

X02 X12 X12 X22 X32 X33
+

X22 X03 X13 X13 X23

+
X23
+

+
X01 X11 X11 X21 X21 X31 X02 X12 X12 X22 X22 X32

W00 W01 W10 W11 W00 W01 W10 W11 W00 W01 W10 W11 Mult. valA W00 W01 W10 W11 W00 W01 W10 W11 W00 W01 W10 W11 Mult. valA
X01 X02 X11 X12 X11 X12 X21 X22 X21 X22 X31 X32 Mult. valB X02 X03 X12 X13 X12 X13 X22 X23 X22 X23 X32 X33 Mult. valB
4 Output Activation Calculation
Figure 2: Programming the switches to map a CONV layer in Maeri. W, X, and O represent weights, input activations, and output activations.

• Multiplier Switch (MS): Each multiplier is augmented with of the interconnects allows us to create VNs of any size, which
a switch to allow data forwarding to neighboring multipliers, provides the ability to map arbitrary dataflows simultaneously.
and data reception from the PB or neighboring MSes. Figure 2 shows a walk-through example of mapping a convolu-
The distribution and reduction trees provide full non-blocking tional layer. Similarly, recurrent, max-pool, fully-connected, sparse
bandwidth to the compute blocks, but can be pruned to reduce the and so on layers can be mapped.
bandwidth if required to reduce area and power. 4 EVALUATIONS AND CONCLUSIONS
Maeri is a spatial accelerator for mapping arbitrary dataflows that
3 MAPPING DATAFLOWS OVER MAERI arise in DNNs due to its topology or mappings by using tiny pro-
The entire accelerator is controlled by a programmable controller grammable switches next to each on-chip compute and memory en-
which manages reconfiguration of all three sets of switches (MS, AS, gine. It provides 130-283% better utilization across multiple dataflow
and SS) for mapping the target dataflow. This is done by creating mappings over baselines with rigid NoC fabrics. MAERI’s intercon-
Virtual Neurons (VN) over the multiplers and adders. The flexibility nects are 10-100× smaller than conventional NoCs.
MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Programmable Interconnects
SysML ’18, Feb 15–16, 2018, Stanford, CA

REFERENCES
[1] Filipp Akopyan, Jun Sawada, Andrew Cassidy, Rodrigo Alvarez-Icaza, John
Arthur, Paul Merolla, Nabil Imam, Yutaka Nakamura, Pallab Datta, Gi-Joon Nam,
Brian Taba, Michael Beakes, Bernand Brezzo, Jente B. Kuang, Rajit Manohar,
William P. Risk, Brayan Jackson, and Dharmendra S. Modha, Truenorth: Design
and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,
TCADICS 34 (2015), no. 10, 1537–1557.
[2] Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder, Fused-layer CNN
accelerators, 49th Annual IEEE/ACM International Symposium on Microarchitec-
ture (MICRO), 2016.
[3] Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen,
and Olivier Temam, Diannao: A small-footprint high-throughput accelerator for
ubiquitous machine-learning, ASPLOS, 2014, pp. 269–284.
[4] Yu-Hsin Chen, Joel Emer, and Vivienne Sze, Eyeriss: A spatial architecture for
energy-efficient dataflow for convolutional neural networks, ISCA, 2016.
[5] Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tian-
shi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam, Dadiannao: A machine-
learning supercomputer, MICRO, 2014, pp. 609–622.
[6] Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaob-
ing Feng, Yunji Chen, and Olivier Temam, Shidiannao: Shifting vision processing
closer to the sensor, ISCA, 2015.
[7] Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz,
and William J Dally, Eie: efficient inference engine on compressed deep neural
network, ISCA, 2016.
[8] Intel, Intel’s new self-learning chip promises to accelerate
artificial intelligence, https://newsroom.intel.com/editorials/
intels-new-self-learning-chip-promises-accelerate-artificial-intelligence/.
[9] Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal,
Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al.,
In-datacenter performance analysis of a tensor processing unit, Proceedings of the
44th Annual International Symposium on Computer Architecture, ACM, 2017,
pp. 1–12.
[10] Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li,
Flexflow: A flexible dataflow accelerator architecture for convolutional neural net-
works, HPCA, 2017.
[11] Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rang-
harajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and
William J Dally, Scnn: An accelerator for compressed-sparse convolutional neural
networks, Proceedings of the 44th Annual International Symposium on Computer
Architecture, ACM, 2017, pp. 27–40.
[12] Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong,
Optimizing fpga-based accelerator design for deep convolutional neural networks,
FPGA, 2015, pp. 161–170.
[13] Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo,
Tianshi Chen, and Yunji Chen, Cambricon-x: An accelerator for sparse neural
networks, MICRO, 2016.

HPE Superdome Flex Server Architecture and RAS Technical White Paper-A00036491enw
No ratings yet
HPE Superdome Flex Server Architecture and RAS Technical White Paper-A00036491enw
22 pages
3BDD011933-111 A en Freelance Introduction New Features History
No ratings yet
3BDD011933-111 A en Freelance Introduction New Features History
252 pages
Risk Management and Cyber Security
No ratings yet
Risk Management and Cyber Security
6 pages
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
No ratings yet
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
42 pages
Read Me First: The ICDL® Qualification
No ratings yet
Read Me First: The ICDL® Qualification
76 pages
Q 172 D Straining Motion School
No ratings yet
Q 172 D Straining Motion School
421 pages
Sessions
No ratings yet
Sessions
147 pages
ELearning (The Engine of The Knowledge Economy)
No ratings yet
ELearning (The Engine of The Knowledge Economy)
109 pages
Polymorphic Accelerators For Deep Neural Networks
No ratings yet
Polymorphic Accelerators For Deep Neural Networks
13 pages
ITGS Project
No ratings yet
ITGS Project
90 pages
Python Variable, Data Types and Operators
No ratings yet
Python Variable, Data Types and Operators
42 pages
NOCS Rethinking Upload
No ratings yet
NOCS Rethinking Upload
42 pages
Programming and Synthesis For Software-Defined FPGA Acceleration - Status and Future Prospects
No ratings yet
Programming and Synthesis For Software-Defined FPGA Acceleration - Status and Future Prospects
39 pages
Design and Implementation of Convolutional Neural Network Accelerator Based On RISCV
No ratings yet
Design and Implementation of Convolutional Neural Network Accelerator Based On RISCV
6 pages
RAPIDO 2023 Paper 2868
No ratings yet
RAPIDO 2023 Paper 2868
6 pages
Sensors 25 00083
No ratings yet
Sensors 25 00083
29 pages
10.1109 fpl53798.2021.00061
No ratings yet
10.1109 fpl53798.2021.00061
6 pages
Computer Questions (Part - 5)
No ratings yet
Computer Questions (Part - 5)
15 pages
Trends in FPGA WP
No ratings yet
Trends in FPGA WP
5 pages
Introduction To Cloud Computing
No ratings yet
Introduction To Cloud Computing
108 pages
CCNA Security: Chapter 9 Managing A Secure Network
No ratings yet
CCNA Security: Chapter 9 Managing A Secure Network
97 pages
How To Install and Secure phpMyAdmin With Apache On A CentOS 7 Server
No ratings yet
How To Install and Secure phpMyAdmin With Apache On A CentOS 7 Server
24 pages
Make 04 00004 v3
No ratings yet
Make 04 00004 v3
37 pages
Neural Network Accelerators: CS223 Computer Architecture & Organization
No ratings yet
Neural Network Accelerators: CS223 Computer Architecture & Organization
45 pages
EECS251Leture-JennyHuang 2021
No ratings yet
EECS251Leture-JennyHuang 2021
67 pages
E Mail Header
No ratings yet
E Mail Header
3 pages
Test Applications Access and Server Inventory: Server Names For Environments
No ratings yet
Test Applications Access and Server Inventory: Server Names For Environments
8 pages
DNN Accelerators
No ratings yet
DNN Accelerators
29 pages
Deep NN - Theory, Tutorial and Survey
No ratings yet
Deep NN - Theory, Tutorial and Survey
32 pages
Removable Disk CDC 9760 SMD
No ratings yet
Removable Disk CDC 9760 SMD
6 pages
Clearance Form (To Be Completed Before Relieving)
No ratings yet
Clearance Form (To Be Completed Before Relieving)
2 pages
High-Throughput Near-Memory Processing On Cnns With 3D Hbm-Like Memory
No ratings yet
High-Throughput Near-Memory Processing On Cnns With 3D Hbm-Like Memory
20 pages
ResultsPlus Direct Administration Area Quickstart Guide
No ratings yet
ResultsPlus Direct Administration Area Quickstart Guide
5 pages
Understanding LBDT - S2720, S5700, and S6700 V200R019C00 Configuration Guide - Ethernet Switching - Huawei
No ratings yet
Understanding LBDT - S2720, S5700, and S6700 V200R019C00 Configuration Guide - Ethernet Switching - Huawei
5 pages
In-Memory Data Parallel Processor: Daichi Fujiki Scott Mahlke Reetuparna Das
No ratings yet
In-Memory Data Parallel Processor: Daichi Fujiki Scott Mahlke Reetuparna Das
14 pages
SC17 DNN Resilience
No ratings yet
SC17 DNN Resilience
12 pages
Architecture Design For Highly Flexible and Energy-Efficient Deep Neural Network Accelerators
No ratings yet
Architecture Design For Highly Flexible and Energy-Efficient Deep Neural Network Accelerators
147 pages
An Overview On Cyber Security Awareness in Muslim Countries: January 2014
No ratings yet
An Overview On Cyber Security Awareness in Muslim Countries: January 2014
5 pages
SCT 3
No ratings yet
SCT 3
9 pages
Lesson 3 Network Connectivity and People Behind The Systems
No ratings yet
Lesson 3 Network Connectivity and People Behind The Systems
3 pages
g10 SUMMATIVE
No ratings yet
g10 SUMMATIVE
7 pages
Hardware Approximate Techniques For Deep Neural Network Accelerators: A Survey
No ratings yet
Hardware Approximate Techniques For Deep Neural Network Accelerators: A Survey
36 pages
Hortonworks Data Platform (HDP) 3.0 - Faster, Smarter, Hybrid Data
No ratings yet
Hortonworks Data Platform (HDP) 3.0 - Faster, Smarter, Hybrid Data
3 pages
An Empirical Approach To Enhance Performance For Scalable CORDIC-Based Deep Neural Networks
No ratings yet
An Empirical Approach To Enhance Performance For Scalable CORDIC-Based Deep Neural Networks
32 pages
Applsci 12 09581 v2
No ratings yet
Applsci 12 09581 v2
15 pages
Customizable Computing
No ratings yet
Customizable Computing
120 pages
6.1.3.9 Packet Tracer - Connect To A Wireless Network
No ratings yet
6.1.3.9 Packet Tracer - Connect To A Wireless Network
4 pages
MTIA First Generation Silicon Targeting Meta's Recommendation
No ratings yet
MTIA First Generation Silicon Targeting Meta's Recommendation
13 pages
Cyber Crime-Its Types, Analysis and Prevention Techniques
No ratings yet
Cyber Crime-Its Types, Analysis and Prevention Techniques
6 pages
FP-DNN An Automated Framework For Mapping
No ratings yet
FP-DNN An Automated Framework For Mapping
8 pages
Advanced Topics For AI
No ratings yet
Advanced Topics For AI
30 pages
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
No ratings yet
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
41 pages
Cyber Warfare Research Report
No ratings yet
Cyber Warfare Research Report
8 pages
FPGA CNN Project Paper
No ratings yet
FPGA CNN Project Paper
31 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
GSLP-CIM A 28-nm Globally Systolic and Locally Parallel CNN Transformer Accelerator With Scalable and Reconfigurable eDRAM Compute-in-Memory Macro For Flexible Dataflow
No ratings yet
GSLP-CIM A 28-nm Globally Systolic and Locally Parallel CNN Transformer Accelerator With Scalable and Reconfigurable eDRAM Compute-in-Memory Macro For Flexible Dataflow
11 pages
Integrating Memristors and CMOS For Better AI: News & Views
No ratings yet
Integrating Memristors and CMOS For Better AI: News & Views
2 pages
Efficient Processing of Deep Neural Networks
No ratings yet
Efficient Processing of Deep Neural Networks
341 pages
Applsci 12 10771 v2
No ratings yet
Applsci 12 10771 v2
44 pages
Capra 2020
No ratings yet
Capra 2020
48 pages
An Overview of Efficient Interconnection Networks For Deep Neural Network Accelerators
No ratings yet
An Overview of Efficient Interconnection Networks For Deep Neural Network Accelerators
15 pages
Diannao Asplos2014
No ratings yet
Diannao Asplos2014
15 pages
Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications
No ratings yet
Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications
12 pages
Futureinternet 12 00113 v2
No ratings yet
Futureinternet 12 00113 v2
22 pages
Unit 1-Fundamentals of DBMS
No ratings yet
Unit 1-Fundamentals of DBMS
53 pages
Port Flooding Mikrotik
No ratings yet
Port Flooding Mikrotik
3 pages
Analog Architectures For Neural Network Acceleration Based On Non-Volatile Memory
No ratings yet
Analog Architectures For Neural Network Acceleration Based On Non-Volatile Memory
35 pages
NoC Based DNN Accelerators
No ratings yet
NoC Based DNN Accelerators
8 pages
Redactor
No ratings yet
Redactor
8 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Embedded Deep Learning Accelerators - A Survey On Recent Advances
No ratings yet
Embedded Deep Learning Accelerators - A Survey On Recent Advances
19 pages
Applsci 15 00688 v3
No ratings yet
Applsci 15 00688 v3
21 pages
Qualys Pci Compliance Getting Started Guide
No ratings yet
Qualys Pci Compliance Getting Started Guide
12 pages
NeuronLink An Efficient Chip-to-Chip Interconnect For Large-Scale Neural Network Accelerators
No ratings yet
NeuronLink An Efficient Chip-to-Chip Interconnect For Large-Scale Neural Network Accelerators
13 pages
Design and Implementation of Hardware Computation For Convolutional Neural Networks
No ratings yet
Design and Implementation of Hardware Computation For Convolutional Neural Networks
6 pages
Efficient Hardware Architectures For Deep Convolutional Neural Network
No ratings yet
Efficient Hardware Architectures For Deep Convolutional Neural Network
13 pages
HLS-Based Acceleration Framework For Deep Convolutional Neural Networks
No ratings yet
HLS-Based Acceleration Framework For Deep Convolutional Neural Networks
11 pages
2022 Review of FPGA-Based Accelerators of Deep Convolutional Neural Networks
No ratings yet
2022 Review of FPGA-Based Accelerators of Deep Convolutional Neural Networks
7 pages
A 0.61-J Frame Pipelined Wired-Logic DNN Processor in 16-nm FPGA Using Convolutional Non-Linear Neural Network
No ratings yet
A 0.61-J Frame Pipelined Wired-Logic DNN Processor in 16-nm FPGA Using Convolutional Non-Linear Neural Network
11 pages
w1 - Machine Learning Hardware Design For Efficiency, Flexibility, and Scalability (Feature)
No ratings yet
w1 - Machine Learning Hardware Design For Efficiency, Flexibility, and Scalability (Feature)
19 pages
286 1006 1 PB
No ratings yet
286 1006 1 PB
8 pages
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
No ratings yet
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
8 pages
Embedded Deep Learning Accelerators A Survey On Recent Advances
No ratings yet
Embedded Deep Learning Accelerators A Survey On Recent Advances
19 pages
Accelerating Deep Neural Networks Implem
No ratings yet
Accelerating Deep Neural Networks Implem
18 pages
Hardware Accleration For ML
No ratings yet
Hardware Accleration For ML
26 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
15 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
Systolic Array Architecture For Educational Use
No ratings yet
Systolic Array Architecture For Educational Use
6 pages
Mukul Kumar CV Sept 2020
No ratings yet
Mukul Kumar CV Sept 2020
1 page
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
From Everand
Study Guide Designing Cisco Data Centre Infrastructure (300-610) Exam
Anand Vemula
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

MAERI: Enabling Flexible Dataflow Mapping Over DNN Accelerators Via Programmable Interconnects

Uploaded by

MAERI: Enabling Flexible Dataflow Mapping Over DNN Accelerators Via Programmable Interconnects

Uploaded by

MAERI: Enabling Flexible Dataflow Mapping over DNN

Accelerators via Programmable Interconnects

X00 X01 X02 X03

To/From Prefetch Buﬀer To/From Prefetch Buﬀer

To/From Prefetch Buﬀer To/From Prefetch Buﬀer

O00 O10 O20 O01 O11 O21

X22 X03 X13 X13 X23

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.