research-article

Yield and Speedup Improvements in Extensible Processors by Allocating Extra Cycles to Some Custom Instructions

Authors:

Ali Afzali-Kusha,

Massoud PedramAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 21, Issue 2

Article No.: 28, Pages 1 - 25

https://doi.org/10.1145/2830566

Published: 28 January 2016 Publication History

Abstract

In this article, we investigate the application of different techniques for mitigating the impact of process variations on the custom functional unit (CFU) of extensible processors. The techniques include using extra cycles for the CFU and extending the clock period for the extensible processor. The former technique is based on providing an extra clock cycle to those custom instructions (CIs) that have timing yields smaller than one. For this purpose, we make use of a lookup table (LUT) for each fabricated processor. Based on a post-fabrication analysis, the need for an extra clock cycle for some CIs is determined. Consequently, the CI timing violations are prevented, and all manufactured extensible processors will work with a predefined clock cycle time. To study the effect of the objective function (used during the CI selection phase) on the efficacy of the suggested architectural technique, we investigate three different objective functions. In the second technique, the clock period extension is used to guarantee a design yield of one. Our results demonstrate that combining both techniques helps increase the speedup achieved by the extensible processor. To assess the efficacies of the proposed methods, several benchmarks from different application domains are used. Results of the study reveal that the suggested techniques provide considerable improvements in the speedups of the extensible processors when compared to those of approaches that do not consider the impact of process variations.

References

[1]

P. Bonzini and L. Pozzi. 2008. Recurrence-aware instruction set selection for extensible embedded processors. IEEE Transactions on (VLSI) Systems. 16, 10, 1259--1267.

Digital Library

[2]

D. Bull, S. Das, K. Shivashankar, G. S. Dasika, K. Flautner, and D. Blaauw. 2011. A power-efficient 32 bit ARM processor using timing-error detection and correction for transient-error tolerance and adaptation to PVT variation. IEEE Journal of Solid-State Circuits. 46, 1, 18--31.

[3]

M. L. Bushnell and V. D. Agrawal. 2000. Essentials of Electronic Testing for Digital, Memory, and Mixed-Signal VLSI Circuits. Springer, Netherlands.

[4]

M. R. Casu, S. Colazzo, and P. Mantovani. 2011. Coupling latency-insensitivity with variable-latency for better than worst case design: A RISC case study. In Proceedings of the Greate Lakes Symposium on VLSI (GLSVLSI). 163--168.

Digital Library

[5]

R. Chen and H. Zhou. 2008. Fast estimation of timing yield bounds for process variations. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 16, 3, 241--248.

Digital Library

[6]

E. Chun, Z. Chishti, and T. N. Vijaykumar. 2008. Shapeshifter: Dynamically changing pipeline width and speed to address process variations. In Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture. 411--422.

Digital Library

[7]

N. Clark, A. Hormati, S. Mahlke, and S. Yehia. 2006. Scalable subgraph mapping for acyclic computation accelerators. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. 147--157.

Digital Library

[8]

N. T. Clark, H. Zhong, and S. A. Mahlke. 2005. Automated custom instruction generation for domain-specific processor acceleration. IEEE Transactions on Computer 54, 10, 1258--1270.

Digital Library

[9]

FreePDK. 2010. A free open access 45nm PDK and cell library for university. http://www.eda.ncsu.edu.

[10]

C. Galluzi and K. Bertels. 2011. The instruction-set extension problem: A survey. ACM Transaction on Reconfigurable Technology and Systems 4, 2, 18-1:18-28.

Digital Library

[11]

M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the International Workshop on Workload Characterization. 3--14.

Digital Library

[12]

Y. Hara-Azumi, T. Azumi, and N. D. Dutt. 2013a. VISA synthesis: Variation-aware instruction set architecture synthesis. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC). 243--248.

[13]

Y. Hara-Azumi, F. Firouzi, S. Kiamehr, and M. Tahoori. 2013b. Instruction-set extension under process variation and aging effects. In Proceedings of the Design, Automation and Test in Europe (DATE). 182--187.

Digital Library

[14]

M. Kamal, A. Afazli-Kusha, and M. Pedram. 2011. Timing variation-aware custom instruction extension technique. In Proceedings of the Design, Automation and Test in Europe (DATE). 1517--1520.

[15]

M. Kamal, A. Afzali-Kusha, A. Safari, and M. Pedram. 2012. An architecture-level approach for mitigating the impact of process variations on extensible processors. In Proceedings of the Design, Automation and Test in Europe (DATE). 467--472.

Digital Library

[16]

M. Kamal, A. Afzali-Kusha, S. Safar, and M. Pedram. 2013. Considering the effect of process variations during the ISA extension design flow. Elsevier Microprocessors and Microsystems 27, 6--7, 713--724.

Digital Library

[17]

M. Kamal, A. Afzali-Kusha, S. Safari, and M. Pedram. 2014. Impact of process-variations on speedup and maximum achievable frequency of extensible processors. ACM Journal on Emerging Technologies in Computing Systems 10, 3, 19:1--19:25.

Digital Library

[18]

C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. 330--335.

Digital Library

[19]

X. Liang and D. Brooks. 2006. Mitigating the impact of process variations on processor register files and execution units. In Proceedings of the 39^th Annual IEEE/ACM International Symposium on Microarchitecture. 504--514.

Digital Library

[20]

X. Liang, G. Y. Wei, and D. Brooks. 2008. Revival: A variation-tolerant architecture using voltage interpolation and variable latency. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA-35). 191--202.

Digital Library

[21]

Y. S. Lu, L. Shen, L. B. Huang, Z. Y. Wang, and N. Xiao. 2009. Optimal subgraph covering for customisable VLIW processors. IET Computer and Digital Techniques 3, 1, 14--23.

[22]

N. V. Mujadiya. 2009. Instruction scheduling for VLIW processors under variation scenario. In Proceedings of the 9th International Conference on Systems, Architectures, Modeling and Simulation (SAMOS). 33--40.

Digital Library

[23]

P. Ndai, N. Rafique, M. Thottethodi, S. Ghosh, S. Bhunia, and K. Roy. 2009. Trifecta: A nonspeculative scheme to exploit common, data-dependent subcritical paths. IEEE Transactions on VLSI Systems 18, 1, 53--65.

Digital Library

[24]

M. Orshansky, S. R. Nassif, and D. Boning. 2008. Design for Manufacturability and Statistical Design: A Conservative Approach. Springer.

Digital Library

[25]

L. Pozzi, K. Atasu, and P. Ienne. 2006. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25, 7, 1209--1229.

Digital Library

[26]

R. Ramaswamy and T. Wolf. 2003. PacketBench: A tool for workload characterization of network processing. In Proceedings of the IEEE International Workshop on Workload Characterization. 42--50.

[27]

T. Sato and S. Watanabe. 2009. Uncriticality-directed scheduling for tackling variation and power challenges. In Proceedings of the 10th International Symposium on Quality Electronic Design (ISQED). 820--825.

Digital Library

[28]

S. Seo, R. G. Dreslinski, M. Who, Y. Park, C. Charkrabari, S. Mahlke, D. Blaauw, and T. Mudge. 2012. Process variation in near-threshold wide SIMD architectures. In Proceedings of the Design Automation Conference (DAC). 980--987.

Digital Library

[29]

L. Siew-Kei, T. Srikanthan, and C.T. Clarke. 2009. Selecting profitable custom instructions for area--time-efficient realization on reconfigurable architectures. IEEE Transactions on Industrial Electronics 56, 10, 3998--4005.

[30]

SNU. 2011. Real-time benchmarks. http://www.cprover.org/goto-cc/examples/snu.html.

[31]

A. Tiwari, S. R. Sarangi, and J. Torrellas. 2007. ReCycle: Pipeline adaptation to tolerate process variation. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA). 323--334.

Digital Library

[32]

P. N. Whatmough, S. Das, and D. M. Bull. 2013. Hybrid circuit and algorithmic timing error correction for low-power robust DSP accelerators. In Proceedings of the IEEE Asian Solid-State Circuits Conference (A-SSCC). 29--32.

[33]

Y. Xie and Y. Chen. 2009. Statistical high-level synthesis under process variability. IEEE Transaction Design and Test Computers. 26, 4, 78--87.

Digital Library

Index Terms

Yield and Speedup Improvements in Extensible Processors by Allocating Extra Cycles to Some Custom Instructions
1. Hardware

Recommendations

Impact of Process Variations on Speedup and Maximum Achievable Frequency of Extensible Processors

In this article, we investigate the impact of process variations on the speedup and maximum frequency of the extended ISA processor. First, without considering process variations, a custom functional unit (CFU) is designed based on nominal timing ...
A Synthesis Methodology for Hybrid Custom Instruction and Coprocessor Generation for Extensible Processors

Systems-on-chip often use hardware accelerators or coprocessors to provide efficient implementations of application-specific functions. The emergence of extensible processor cores with supporting design tools has given designers with another viable ...
Efficient resource utilization for an extensible processor through dynamic instruction set adaptation

State-of-the-art application-specific instruction set processors (ASIPs) allow the designer to define individual prefabrication customizations, thus improving the degree of specialization towards the actual application requirements, e.g., the ...

Comments

comments powered by Disqus.

Information & Contributors

Information

Published In

ACM Transactions on Design Automation of Electronic Systems Volume 21, Issue 2

January 2016

422 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/2888405

Editor:
Naehyuck Chang
Korea Advanced Institute of Science and Technology, Korea

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 28 January 2016

Accepted: 01 September 2015

Revised: 01 April 2015

Received: 01 January 2015

Published in TODAES Volume 21, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
90
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy