Abstract
Digital signal processors (DSPs) with very long instruction word (VLIW) data-path architectures are increasingly being deployed on embedded devices in video and other multimedia processing applications. To reduce the power consumption and design cost of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports associated with register files. This presents challenges for compilers attempting to generate efficient codes. In this paper we present an instruction scheduling method and phase ordering framework for such an architecture based on the well-known PALF scheme. The PALF scheme first performs bank partitioning followed by register allocation and then instruction scheduling. Our contribution includes the insertion of a pseudo instruction scheduler that performs bank assignment analysis before PALF assignment. We also enhance the PALF scheme by utilizing the program graph with cycle information generated by our pseudo scheduler. Finally, a ping-pong-aware scheduling policy is used in the scheduling phases to address the issue of limited temporal connectivities among register banks for DSP processors. Experiments were performed on an instruction set simulator for Parallel Architecture Core DSP processors based on the Open64 compiler infrastructure. Preliminary experiments with the EEMBC and MiBench benchmarks show that a compiler based on our proposed scheme for handling hardware constraints of VLIW scheduling on distributed register files exhibits performance superior to that of the PALF scheme.
Similar content being viewed by others
References
Capitanio A, Dutt N, Nicolau A (1992) Partitioned register files for VLIW’s: a preliminary analysis of tradeoffs. In: Proceedings of the 25th annual international symposium on microarchitecture, December, pp 292–300
Texas Instruments (2000) TMS320C64x technical overview. Texas Instruments, Feb
CEVA (2004) CEVA-X1620 datasheet. CEVA
Chang D, Baron M (2004) Taiwan’s roadmap to leadership in design. Microprocessor Report, In-Stat/MDR, December
Leupers R (2000) Instruction scheduling for clustered VLIW DSPs. In: Proceedings of international conference on parallel architecture and compilation techniques, October, pp 291–300
Qian Y, Carr S, Sweany PH (2002) Optimizing loop performance for clustered VLIW architectures. In: International conference on parallel architectures and compilation techniques, September
Lin Y-C, You Y-P, Lee J-K (2007) PALF: compiler supports for irregular register files in clustered VLIW DSP processors. In: Concurrency and computation: practice and experience
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(2):291–307
Pothen A, Simon HD, Liou K-P (1990) Partitioning sparse matrices with eigenvectors of graphs. SIAM J Matrix Anal Appl, July
Karypis G, Kumar V (1999) Multilevel k-way hypergraph partitioning. In: Proceedings of the 36th annual ACM/IEEE design automation conference
Wu C-J, Chen S-Y, Lee J-K (2006) Copy propagation optimizations for vliw dsp processors with distributed register files. Lang Compilers Parallel Comput. doi:10.1007/978-3-540-72521-3_19
Lu C-H, Lin Y-C, You Y-P, Lee J-K (2008) LC-GRFA: global register file assignment with local consciousness for VLIW DSP processors with non-uniform register files. Concurr Comput Pract Exp. doi:10.1002/cpe.v21:1
Wu C-J, Lu C-H, Lee J-J (2009) Expression rematerialization for VLIW DSP processors with distributed register files. In: International workshop on compilers for parallel computing, January
Lin TJ, Chang CC, Lee CC, Jen CW (2003) An efficient VLIW DSP architecture for baseband processing. In: Proceedings of the 21th international conference on computer design
Lin T-J, Chao C-M, Liu C-H, Hsiao P-C, Chen S-K, Lin L-C, Liu C-W, Jen C-W (2005) Computer architecture: a unified processor architecture for RISC & VLIW DSP. In: Proceedings of the 15th ACM Great Lakes symposium on VLSI, April
Guthaus MR, Ringenberg JS, Ernst D, Austin TM, Mudge T, Brown RB (2001) MiBench: a free, commercially representative embedded benchmark suite. In: IEEE 4th annual workshop on workload characterization, December
EDN Embedded Microprocessor Benchmark Consortium (2011) http://www.eembc.org
Cutcutache I, Wong W-F (2008) Fast, frequency-based, integrated register allocation and instruction scheduling. In: Software: practice and experience
Ivanov DS (2010) Register allocation with instruction scheduling for VLIW-architectures. Program Comput Softw. doi:10.1134/S0361768810060058
Kim D-H, Lee H-J (2009) Fine-grain register allocation and instruction scheduling in a reference flow. Comput J. doi:10.1093/comjnl/bxp056
Ellis JR (1986) Bulldog: a compiler for VLIW architectures. MIT Press, Cambridge
Lowney PG, Freudenberger SM, Karzes TJ, Lichtenstein WD, Nix RP, O’Donell JS, Ruttenberf JC (1993) The multiflow trace scheduling compiler. J Supercomput 7:51–142
Capitanio A, Dutt N, Nicolau A (1993) Design considerations for limited connectivity VLIW architectures. Technical Report, department of information and computer science, University of California, Irvine
Mercaldi M, Swanson S, Petersen A, Putnam A, Schwerin A, Oskin M, Eggers SJ (2006) Instruction scheduling for a tiled dataflow architecture. In: Proceedings of the 12th international conference on architectural support for programming languages and operating systems
Swanson S, Michelson K, Schwerin A, Oskin M (2003) WaveScalar. In: Proceedings of the international symposium on microarchitecture
Desoli G (1998) Instruction assignment for clustered VLIW DSP compilers: a new approach. Technical Report, Hewlett-Packard Laboratories
Ozer E, Banerjia S, Conte TM (1998) Unified assign and schedule: a new approach to scheduling for clustered register files micro architectures. In: Proceedings of the 31st annual international symposium on microarchitecture, November
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science. doi:10.1126/science.220.4598.671
Chaitin GJ, Auslander MA, Chandra AK, Cocke J, Hopkins ME, Markstein PW (1981) Register allocation via coloring. Comput Lang 6:47–57
Chaitin GJ (1982) Register allocation and spilling via graph coloring. In: Proceedings of the ACM SIGPLAN 1982 symposium on compiler construction, pp 201–207
Briggs P, Cooper KD, Torczon L (1992) Rematerialization. In: Conference on programming language design and implementation
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, CJ., Lin, YT. & Lee, JK. Instruction scheduling methods and phase ordering framework for VLIW DSP processors with distributed register files. J Supercomput 61, 1024–1047 (2012). https://doi.org/10.1007/s11227-011-0671-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0671-8