0% found this document useful (0 votes)
56 views4 pages

Improved SIMD Architecture For High Performance Video Processors

The document describes a proposed improved SIMD architecture for high performance video processors. The architecture includes two novel features: 1) a parallel memory structure that supports variable block sizes and word lengths, providing flexibility for data access, and 2) a configurable SIMD structure that allows for random register file access and slightly different operations in ALUs. These features help address bottlenecks in traditional SIMD implementations of video codec kernels like H.264/AVC. Simulation results show the new architecture provides a speedup of 2.1-4.6x over conventional SIMD for kernel functions, and is projected to provide an overall speedup of 2.46x for H.264/AVC encoding.

Uploaded by

VICKY PAWAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views4 pages

Improved SIMD Architecture For High Performance Video Processors

The document describes a proposed improved SIMD architecture for high performance video processors. The architecture includes two novel features: 1) a parallel memory structure that supports variable block sizes and word lengths, providing flexibility for data access, and 2) a configurable SIMD structure that allows for random register file access and slightly different operations in ALUs. These features help address bottlenecks in traditional SIMD implementations of video codec kernels like H.264/AVC. Simulation results show the new architecture provides a speedup of 2.1-4.6x over conventional SIMD for kernel functions, and is projected to provide an overall speedup of 2.46x for H.264/AVC encoding.

Uploaded by

VICKY PAWAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

Improved SIMD Architecture for High

Performance Video Processors


Vicky Pawar Shivam Mandal
Student, Department of Electronic Engineering, Student, Department of Electronic Engineering,
Shree L.R.Tiwari college of Engineerig, Shree L.R.Tiwari college of Engineerig,
Mumbai, India. Mumbai, India.
pkpawar26@gmail.com shivam.mandal000@g,ail.com

Prof. Rohini Rathod


Department of Electronic Engineering,
Shree L.R.Tiwari college of Engineerig,
Mumbai, India.
rohini.jadhav@slrtce.in

Abstract – SIMD execution is in no doubt an efficient way to exploit the data level parallelism in image and video
applications. However, SIMD execution bottlenecks must be tackled in order to achieve high execution efficiency. We
first analyze in this paper the implementation of two major kernel functions of H.264/AVC namely, SATD and subpel
interpolation, in conventional SIMD architectures to identify the bottlenecks in traditional approaches. Based on the
analysis results, we propose a new SIMD architecture with two novel features: (1) parallel memory structure with
variable block size and word length support; and (2) configurable SIMD structure. The proposed parallel memory
structure allows great flexibility for programmers to perform data access of different block sizes and different word
lengths. The configurable SIMD structure allows almost “random” register file access and slightly different
operations in ALUs inside SIMD. The new features greatly benefit the realization of H.264/AVC kernel functions.
For instance, the fractional motion estimation, particularly the half to quarter pixel interpolation, can now be
executed with minimal or no additional memory access. When comparing with the conventional SIMD systems, the
proposed SIMD architecture can have a further speedup of 2.1X to 4.6X when implementing H.264/AVC kernel
functions. Based on Amdahl’s law, the overall speedup of H.264/AVC encoding application can be projected to be
2.46X. We expect significant improvement can also be achieved when applying the proposed architecture to other
image and video processing applications.

Keywords – Configurable SIMD, Parallel memory structure, SIMD bottlenecks, video codec processor

I. INTRODUCTION For this reason, the Single Instruction Multiple


Data (SIMD) architecture is most suitable and is
With the extensive use of image and video widely adopted. Two popular examples are Intel’s
information in modern computer applications, the MMX/SSE1/SSE2/SSE3 [2] and Motorola’s
development of high performance image and video AltiVec [3], where multimedia SIMD instruction
processing units has attracted much interest from set extensions have been added for efficient
both academic researchers and VLSI system realization of video processing applications.
designers. Among the image and video processing In recent years, many researchers studied how
operations that are performed in general computer much performance can be gained after using SIMD
applications, video coding is the most computation instructions in modern video codec [4]-[9].
intensive operation that is often used as the Simulation results using reference model
benchmark to measure the performance of a video demonstrate that there is at least 2-12X speedup. A
processor. For the rest of this paper, we shall focus basic requirement to employ SIMD instructions is
on the realization of the state-of-the-art video to possibly feed multiple data elements perfectly
coding standard H.264/AVC [1] and use it as an into vector registers so that the same computation
example to illustrate the merit of the proposed operation can be applied. Although much research
video processor design. effort [6] [10]-[12] has been made to address the
problem, there are often overheads and
To deal with the extremely high computational
performance bottlenecks when aligning the
complexity of video coding, one common approach
multiple data to feed into vector registers. Extra
is to exploit the data level parallelism (DLP) in the
memory loads and stores, unpacking, packing and
execution. As different from application specific
shuffling are often required that prevent SIMD
ASIC designs, a general purpose video processor
execution from achieving the peak performance.
should provide great flexibility for programmers
Besides, the memory mis-alignment, stride
while exploiting the parallelism in the execution.
memory access, memory latency, random register “by”, “for”, “from”, “if”, “in”, “into”, “on”, “or”, “of”,
file access and branch mis-prediction also prevent “the”, “to”, “with”.
the processor from fetching data in a timely Author details must not show any professional title
fashion to achieve peak throughput [13]-[15]. (e.g. Professor), any academic title (e.g. Dr.) or any
membership of any professional organization.
To address the aforementioned problems, our To avoid confusion, the family name must be written
team has designed and implemented a new SIMD as the last part of each author name.
based video processor with architecture as shown Each affiliation must include, at the very least, the
in Fig. 1. Our video processor is a 5-stage pipeline name of the company and the name of the country where
multi-threaded multi-issue semi out-of-order the author is based (e.g. XXX Private Ltd, India).
superscalar processor. It supports a maximum of 4 3. Section Headings
No more than 3 levels of headings should be used.
threads of execution simultaneously. A maximum
Every word in a heading must be capitalized except for
of 4 instructions can be short minor words as listed in Section III-2.
3.1 Level-1 Heading: A level-1 heading must be in
Small Caps, centered and numbered using uppercase
Roman numerals. For example, see heading “I.
Introduction” of this document. Headings must be in
12pt bold with small caps font. The two level-1 headings
which must not be numbered are “Acknowledgment” and
“References”.
II. SINGLE ERROR CORRECTION 3.2 Level-2 Heading: A level-2 heading must be in
bold, left-justified and numbered using an uppercase
(SEC) REED SOLOMON (RS) CODES
alphabetic letter followed by a period. Headings must be
Reed Solomon codes are a subclass of non binary BCH
in 10pt bold font. For example, see heading “3. Section
codes constructed with symbols from a Galois Field
Headings” above.
GF(q) where q is typically a power of two. For q ¼ 2m;
3.3 Level-3 Heading: A level-3 heading must be
m bit symbols are used to construct the code. An RS code
indented, in bold and numbered. The level-3 heading
has the following parameters: maximum block length n =
must end with a colon. The body of the level-3 section
q - 1, number of parity check symbols n – k = 2t and
immediately follows the level-3 heading in the same
minimum distance dmin = 2t + 1. All those parameters are
paragraph. Headings must be in 10pt bold font. For
expressed in terms of q-ary symbols. As a symbol has m
example, this paragraph begins with a level-3 heading.
bits, the maximum block length in bits is mðq 1Þ and the
number of parity check bits 2mt. When t ¼ 1, the
minimum distance is three and therefore the code can
correct single symbol errors. These errors can affect
multiple bits as long as all of them belong to the same
symbol. RS codes are commonly expressed as RS(n, k, IV. FIGURES AND TABLES
m). The parity check matrix for an RS code is
constructed as shown in equation (1) where a is a Figures and tables must be centered in the columns.
primitive element in GF(q): Large figures and tables may span across both columns.
III. PAGE STYLE Any table or figure that takes up more than 1 column
width must be positioned either at the top or at the
bottom of the page.
All paragraphs must be indented. All paragraphs must
Graphics may be full color. Graphics must be drawn
be justified, i.e. both left-justified and right-justified.
using solid colors which contrast well both on screen and
1. Text Font of Entire Document
on a black-and-white hardcopy. The caption of the graph
Type your main text in 10-point Times, single-spaced.
must be in 10 pt Times new roman regular font. Captions
Do not use double-spacing. Your paper must be in two
must be of a single line must be centered whereas multi-
column format with a space of 0.2" between columns. Be
line captions must be justified. Captions with figure
sure your text is fully justified—that is, flush left and
numbers must be placed after their associated figures.
flush right. Please do not place any additional blank lines
Please check all figures in your paper both on screen
between paragraphs.
and on a black-and-white hardcopy. When you check
2. Title and Author Details
your paper on a black-and-white hardcopy, please ensure
Title must be in 20 pt Bold Times new roman font
that:
Sentence case. Author name must be in 11 pt Bold
a. the colors used in each figure contrast well,
Times new roman font. Author affiliation must be in 9 pt
b. the image used in each figure is clear,
regular. Email address must be in 9 pt Times new roman
c. all text labels in each figure are legible.
Regular font.
1. Figure Captions
All titles and author’s details must be in single-column
Figures must be numbered using numerals. Figure
format placed in cells of table with borders set to no line
captions must be in 10 pt Times new roman regular font.
and centered.
Captions must of a single line must be centered whereas
Every word in a title must be capitalized except for
multi-line captions must be justified. Captions with figure
short minor words such as “a”, “an”, “and”, “as”, “at”,
numbers must be placed after their associated figures, as
shown in Fig.1. Authors are advised not to include low
resolution and poor quality images, since it reduces the dc do not have to be defined. Abbreviations that
credibility of the journal. incorporate periods should not have spaces: write
“V.L.S.I.,” not “V. L. S. I.” Do not use abbreviations in
the title unless they are unavoidable (for example,
“International Journal of Scientific Research Engineering
Technology” in the title of this article).
3. Links and Bookmarks
All hypertext links and section bookmarks will be
removed from papers during the processing of papers for
publication. If you need to refer to an Internet email
address or URL in your paper, you must type out the
address or URL fully in Regular font.

Fig.1. Process Design Gap (Figure shows drawn Vs VI. CONCLUSION (12, BOLD, SMALL CAPS)
printed gap increases as we move towards smaller device
size) A conclusion section is not required. Although a
conclusion may review the main points of the paper, do
2. Table Captions not replicate the abstract as the conclusion. A conclusion
Tables must be numbered using numbers. Table might elaborate on the importance of the work or suggest
captions must be centred and in 10 pt Time new roman applications and extensions.
Regular font. Every word in a table caption must be in
regular font. Captions with table numbers must be placed APPENDIX (12, BOLD, SMALL CAPS)
before their associated tables. Tables should not be
images. The contents of the table must be in 10 point
times new roman regular font. The heading of the Appendix section must not be
Table I: FONT SIZES FOR PAPERS numbered. Appendixes, if needed, appear before the
Font Appearance (in Time New Roman or Times) acknowledgment.
Size Regular Bold Italic
10 table caption, reference item
figure caption, (partial)
reference item
ACKNOWLEDGMENT
10 author email abstrac abstract heading
address, t body (also in Bold)
cell in a table The heading of the Acknowledgment section must not
12 level-1 heading level-2 heading, be numbered. The preferred spelling of the word
(in Small Caps), level-3 heading, “acknowledgment” in American English is without an
paragraph author affiliation “e” after the “g.” Use the singular heading even if you
have many acknowledgments. Avoid expressions such as
11 author name “One of us (S.B.A.) would like to thank ... .” Instead,
20 title write “F. A. Author thanks ... .” Sponsor and financial
support acknowledgments are placed in the
unnumbered footnote on the first page.
V. SOME HELPFUL HINTS
REFERENCES (12, BOLD, SMALL CAPS)
1. Equations
Equations should be numbered consecutively The heading of the References section must not be
throughout the paper. The equation number is enclosed in numbered. All reference items must be in 9 pt font.
parentheses and placed flush right, as in (1). Your Please use Regular and Italic styles to distinguish
equation should be typed using the Times New Roman different fields as shown in the References section.
font (please no other font). To create multileveled Number the reference items consecutively in square
equations, it may be necessary to treat the equation as a brackets (e.g. [1]).
graphic and insert it into the text after your paper is When referring to a reference item, please simply use
styled. the reference number, as in [2]. Do not use “Ref. [3]” or
If you are using Word, use either the Microsoft “Reference [3]” except at the beginning of a sentence,
Equation Editor or the MathType add-on e.g. “Reference [3] shows …”. Multiple references are
(http://www.mathtype.com). each numbered with separate brackets (e.g. [2], [3], [4]–
[6]).
dny dy
f n ( x) n
 ....  f1 ( x )  f 0 ( x) y  h( x) (1)
[1] David Z. Pan, Senior Member, IEEE, Bei Yu, and Jhih-
dx dx
Rong Gao “Design for Manufacturing With Emerging
2. Abbreviations and Acronyms
Nanolithography” IEEE Transactions on Computer-
Define abbreviations and acronyms the first time they Aided Design of Integrated Circuits And Systems, Vol.
are used in the text, even after they have already been 32, No. 10, October 2013 (9, Regular)
defined in the abstract. Abbreviations such as SI, ac, and
[2] M. Lu, et al., “Novel customized manufacturable DFM
solutions,” Proc. SPIE Photo mask Technology 2012,
vol. 8522, pp. 852223, December 2012.
[3] Sergio Gomez and Francesc Moll. “Lithography aware
regular cell design based on a predictive technology
model.” J. Low Power Electronics, 6(4):1–14, 2010
[4] B. Le Gratiet, F. Sundermann, J. Massin, et al.,
“Improved CD control for 45-40 nm CMOS logic
patterning: anticipation for 32-28 nm”, In proceedings
of SPIE Vol. 7638,76380A (2010)
[5] Shi-Hao Chen, Ke-Cheng Chu, Jiing-Yuan Lin and
Cheng-Hong Tsai “DFM/DFY practices during physical
designs for timing, signal integrity, and power” 2007
IEEE conference.
[6] Wing Chiu Tam and Shawn Blanton “To DFM or Not to
DFM” IEEE Asia Pacific Conference on Circuits and
Systems, 2006.
[7] Raina Rajesh “What is DFM & DFY and Why Should I
Care?” INTERNATIONAL TEST CONFERENCE 2009
[8] Garg Manish, Kumar Aatish “Litho-driven Layouts for
Reducing Performance Variability” IEEE 2005
[9] Daehyun Jang, Naya Ha, Joo-Hyun Park, Seung-Weon
Paek “DFM Optimization of Standard Cells
Considering Random and Systematic Defect”
International SoC Design Conference 2008
[10] Sergio Gomez, Francesc Moll, Antonio Rubio “Design
Guidelines towards Compact Litho-Friendly Regular
cells” SPIE Photomask Technology 2012
[11] "Design for Manufacturability"
http://www.mentor.com/blogs/
[12] “Litho Friendly Design kit, a tool of DFM strategy”,
(http://www.eetimes.com/electrical-
engineers/education-training/tech-
papers/4130133/Litho-Friendly-Design-Kit-A-Tool-of-
DFM-Strategy).
[13] Y. Borodovsky, “Lithography 2009 overview of
opportunities,” in Proc.Semicon West, 2009.
[14] J. A. Torres, “Layout verification in the era of process
uncertainty: Target process variability bands versus
actual process variability bands,” in Proc. SPIE Design
Manufacturability through Design-Process Integration
II, vol. 6925. 2008, pp. 692509-1–692509-8.
[15] A. Carlson and T.-J. Liu, “Negative and iterated spacer
lithography processes for low variability and ultra dense
integration,” in Proc. SPIE Optical Microlithography
XXI, vol. 6924. 2008, pp. 69240B-1–69240B-9.

AUTHOR PROFILE (12, BOLD, SMALL CAPS)


<Author Photo>
Author’s Name (11, Bold)
Author’s Detail (Regular, 9)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy