skip to main content
research-article

Yield and Speedup Improvements in Extensible Processors by Allocating Extra Cycles to Some Custom Instructions

Published: 28 January 2016 Publication History

Abstract

In this article, we investigate the application of different techniques for mitigating the impact of process variations on the custom functional unit (CFU) of extensible processors. The techniques include using extra cycles for the CFU and extending the clock period for the extensible processor. The former technique is based on providing an extra clock cycle to those custom instructions (CIs) that have timing yields smaller than one. For this purpose, we make use of a lookup table (LUT) for each fabricated processor. Based on a post-fabrication analysis, the need for an extra clock cycle for some CIs is determined. Consequently, the CI timing violations are prevented, and all manufactured extensible processors will work with a predefined clock cycle time. To study the effect of the objective function (used during the CI selection phase) on the efficacy of the suggested architectural technique, we investigate three different objective functions. In the second technique, the clock period extension is used to guarantee a design yield of one. Our results demonstrate that combining both techniques helps increase the speedup achieved by the extensible processor. To assess the efficacies of the proposed methods, several benchmarks from different application domains are used. Results of the study reveal that the suggested techniques provide considerable improvements in the speedups of the extensible processors when compared to those of approaches that do not consider the impact of process variations.

References

[1]
P. Bonzini and L. Pozzi. 2008. Recurrence-aware instruction set selection for extensible embedded processors. IEEE Transactions on (VLSI) Systems. 16, 10, 1259--1267.
[2]
D. Bull, S. Das, K. Shivashankar, G. S. Dasika, K. Flautner, and D. Blaauw. 2011. A power-efficient 32 bit ARM processor using timing-error detection and correction for transient-error tolerance and adaptation to PVT variation. IEEE Journal of Solid-State Circuits. 46, 1, 18--31.
[3]
M. L. Bushnell and V. D. Agrawal. 2000. Essentials of Electronic Testing for Digital, Memory, and Mixed-Signal VLSI Circuits. Springer, Netherlands.
[4]
M. R. Casu, S. Colazzo, and P. Mantovani. 2011. Coupling latency-insensitivity with variable-latency for better than worst case design: A RISC case study. In Proceedings of the Greate Lakes Symposium on VLSI (GLSVLSI). 163--168.
[5]
R. Chen and H. Zhou. 2008. Fast estimation of timing yield bounds for process variations. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 16, 3, 241--248.
[6]
E. Chun, Z. Chishti, and T. N. Vijaykumar. 2008. Shapeshifter: Dynamically changing pipeline width and speed to address process variations. In Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture. 411--422.
[7]
N. Clark, A. Hormati, S. Mahlke, and S. Yehia. 2006. Scalable subgraph mapping for acyclic computation accelerators. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. 147--157.
[8]
N. T. Clark, H. Zhong, and S. A. Mahlke. 2005. Automated custom instruction generation for domain-specific processor acceleration. IEEE Transactions on Computer 54, 10, 1258--1270.
[9]
FreePDK. 2010. A free open access 45nm PDK and cell library for university. http://www.eda.ncsu.edu.
[10]
C. Galluzi and K. Bertels. 2011. The instruction-set extension problem: A survey. ACM Transaction on Reconfigurable Technology and Systems 4, 2, 18-1:18-28.
[11]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the International Workshop on Workload Characterization. 3--14.
[12]
Y. Hara-Azumi, T. Azumi, and N. D. Dutt. 2013a. VISA synthesis: Variation-aware instruction set architecture synthesis. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC). 243--248.
[13]
Y. Hara-Azumi, F. Firouzi, S. Kiamehr, and M. Tahoori. 2013b. Instruction-set extension under process variation and aging effects. In Proceedings of the Design, Automation and Test in Europe (DATE). 182--187.
[14]
M. Kamal, A. Afazli-Kusha, and M. Pedram. 2011. Timing variation-aware custom instruction extension technique. In Proceedings of the Design, Automation and Test in Europe (DATE). 1517--1520.
[15]
M. Kamal, A. Afzali-Kusha, A. Safari, and M. Pedram. 2012. An architecture-level approach for mitigating the impact of process variations on extensible processors. In Proceedings of the Design, Automation and Test in Europe (DATE). 467--472.
[16]
M. Kamal, A. Afzali-Kusha, S. Safar, and M. Pedram. 2013. Considering the effect of process variations during the ISA extension design flow. Elsevier Microprocessors and Microsystems 27, 6--7, 713--724.
[17]
M. Kamal, A. Afzali-Kusha, S. Safari, and M. Pedram. 2014. Impact of process-variations on speedup and maximum achievable frequency of extensible processors. ACM Journal on Emerging Technologies in Computing Systems 10, 3, 19:1--19:25.
[18]
C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. 330--335.
[19]
X. Liang and D. Brooks. 2006. Mitigating the impact of process variations on processor register files and execution units. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. 504--514.
[20]
X. Liang, G. Y. Wei, and D. Brooks. 2008. Revival: A variation-tolerant architecture using voltage interpolation and variable latency. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA-35). 191--202.
[21]
Y. S. Lu, L. Shen, L. B. Huang, Z. Y. Wang, and N. Xiao. 2009. Optimal subgraph covering for customisable VLIW processors. IET Computer and Digital Techniques 3, 1, 14--23.
[22]
N. V. Mujadiya. 2009. Instruction scheduling for VLIW processors under variation scenario. In Proceedings of the 9th International Conference on Systems, Architectures, Modeling and Simulation (SAMOS). 33--40.
[23]
P. Ndai, N. Rafique, M. Thottethodi, S. Ghosh, S. Bhunia, and K. Roy. 2009. Trifecta: A nonspeculative scheme to exploit common, data-dependent subcritical paths. IEEE Transactions on VLSI Systems 18, 1, 53--65.
[24]
M. Orshansky, S. R. Nassif, and D. Boning. 2008. Design for Manufacturability and Statistical Design: A Conservative Approach. Springer.
[25]
L. Pozzi, K. Atasu, and P. Ienne. 2006. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25, 7, 1209--1229.
[26]
R. Ramaswamy and T. Wolf. 2003. PacketBench: A tool for workload characterization of network processing. In Proceedings of the IEEE International Workshop on Workload Characterization. 42--50.
[27]
T. Sato and S. Watanabe. 2009. Uncriticality-directed scheduling for tackling variation and power challenges. In Proceedings of the 10th International Symposium on Quality Electronic Design (ISQED). 820--825.
[28]
S. Seo, R. G. Dreslinski, M. Who, Y. Park, C. Charkrabari, S. Mahlke, D. Blaauw, and T. Mudge. 2012. Process variation in near-threshold wide SIMD architectures. In Proceedings of the Design Automation Conference (DAC). 980--987.
[29]
L. Siew-Kei, T. Srikanthan, and C.T. Clarke. 2009. Selecting profitable custom instructions for area--time-efficient realization on reconfigurable architectures. IEEE Transactions on Industrial Electronics 56, 10, 3998--4005.
[30]
SNU. 2011. Real-time benchmarks. http://www.cprover.org/goto-cc/examples/snu.html.
[31]
A. Tiwari, S. R. Sarangi, and J. Torrellas. 2007. ReCycle: Pipeline adaptation to tolerate process variation. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA). 323--334.
[32]
P. N. Whatmough, S. Das, and D. M. Bull. 2013. Hybrid circuit and algorithmic timing error correction for low-power robust DSP accelerators. In Proceedings of the IEEE Asian Solid-State Circuits Conference (A-SSCC). 29--32.
[33]
Y. Xie and Y. Chen. 2009. Statistical high-level synthesis under process variability. IEEE Transaction Design and Test Computers. 26, 4, 78--87.

Recommendations

Comments

Information & Contributors

Information

Published In

ACM Transactions on Design Automation of Electronic Systems  Volume 21, Issue 2
January 2016
422 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2888405
  • Editor:
  • Naehyuck Chang
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 28 January 2016
Accepted: 01 September 2015
Revised: 01 April 2015
Received: 01 January 2015
Published in TODAES Volume 21, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Extensible processor
  2. process variation
  3. speedup
  4. yield

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 90
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy