0% found this document useful (0 votes)
355 views9 pages

MEL G642-Compre Solution - 2 2016-17

The document contains a question paper for the subject VLSI Architecture with 4 questions. Question 1 has multiple parts asking about instruction coding format, pipelined processor design with and without forwarding, hazards in a code sequence, and effect of adding load/store instructions. Question 2 asks about branch penalty, name dependence, instruction level parallelism. Question 3 contrasts DSP and GPP, discusses multiply accumulate operation in DSPs, distinctive DSP addressing modes and functional blocks. Question 4 describes a CISC instruction and asks for its flowchart, exception states, and control word generation logic.

Uploaded by

Gaurav Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
355 views9 pages

MEL G642-Compre Solution - 2 2016-17

The document contains a question paper for the subject VLSI Architecture with 4 questions. Question 1 has multiple parts asking about instruction coding format, pipelined processor design with and without forwarding, hazards in a code sequence, and effect of adding load/store instructions. Question 2 asks about branch penalty, name dependence, instruction level parallelism. Question 3 contrasts DSP and GPP, discusses multiply accumulate operation in DSPs, distinctive DSP addressing modes and functional blocks. Question 4 describes a CISC instruction and asks for its flowchart, exception states, and control word generation logic.

Uploaded by

Gaurav Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

(PILANI, K.K.BIRLA GOA & HYDERABAD CAMPUSES)


II SEMESTER 2016-17
MEL G642 VLSI ARCHITECTURE 13th May 2017
(CLOSED BOOK) MM: 40 Duration 3 Hours
_____________________________________________________________________________________

Q1. Assuming that a 32-bit RISC processor ( with a register file containing 32 registers) that has only the
following three instructions in its instruction set: (i) ADD Rd, Rs1, Rs2 (ii) SUB Rd, Rs1, Rs2 (iii) BEQ Di,
Rs1, Rs2. (Here Rs1 and Rs2 are source registers and Rd is the destination register. ADD and SUB
instructions perform addition and subtraction operations. Instruction BEQ is a conditional branch
instruction which causes branching when the contents of its two source registers are equal. The 8-bit
branching distance Di (relative to the current value of program counter) is provided by a bit-field in the
binary code of BEQ instruction.

(a) Suggest an instruction coding format for the above instruction set and also binary codes for the
three instructions in view of ease of implementation. (1.5)
(b) Design the architectural schematic diagrams of a 4-stage (FETCH, DECODE-OPERAND READ,
EXECUTE, WRITE-BACK) pipelined implementation of this instruction set (i) without internal
forwarding of operands and (ii) with internal forwarding of operands clearly depicting different
fields of the pipieline registers (and what they contain), different functional blocks used in the
pipeline stages and the control circuit. (3+3)
(c) Following code is to be executed on this processor:
ADD R10, R6, R4
SUB R10, R10, R5
BEQ 40, R10, R5
(i) Enumerate all the hazards and their types in the above code. (2)
(ii) Give a clock cycle-by-clock cycle account of execution of this code on your 4-stage
pipelined implementations of the processor without and with internal forwarding of
operands (1.5+1.5)
(d) Now LOAD and STORE instructions are added to the instruction set, and data memory access
(for reading or writing) is organized through the addition of two pipeline stages MEM1 and
MEM2 between the EXECUTE and WRITEBACK stages. How will execution time of the code in
part (c) get effected in the case when there is no internal forwarding of operands ?
Give a cycle-by-cycle description of execution of the code. (1.5)

Q2. What is branch penalty? How can it be reduced / minimized? Give example. What is name
dependence or anti-dependence? Give an example. How is it tackled to gain execution efficiency? What
is Instruction Level Parallelism (ILP) ? How is it exploited in computer architecture? (6)
Q3.

(a) Contrast the design objectives of DSP processors and General Purpose Processors (GPPs). (2)
(b) What is the single most important DSP operation that influences the micro-architecture of DSP
processors. How is it accelerated in DSP processors? (2)
(c) Name and describe two distinctive data addressing modes that are supported only by DSP
Processors and not by GPPs and why ? (2)
(d) Name and briefly describe (functionally) the distinctive functional blocks of a DSP data path and
DSP address path that are typically not found in GPPs. Also draw the overall architectural
diagram of a DSP processor. (3)
(e) What is fractional data type? Why is it used in DSP processors? How do you convert a 16-bit
integer multiplier to a 16-bit fractional multiplier? (2)
(f) What special variants of commonly used arithmetic operations are supported by a DSP
processor? How are they implemented by the main functional blocks of the data path ? (2)

Q4.A CISC processor features an instruction CMX Rx Ry. This instruction compares the magnitudes
(absolute values) of integer data (assume 2’s complement representation) stored in registers Rx and
Ry. The instruction exchanges the stored data values in the registers if the magnitude of data stored
in register Ry happens to be smaller than the magnitude of data stored in register Rx.

(a) Write level II flowcharts for this instruction using the data path diagram given at the end of the
question paper (4)
(b) Assuming that no external interrupts of any kind occur during the execution of the above
instruction (including program or data memory access related interrupts), name the flowchart
states that can potentially cause exception processing to initiate immediately upon their
completion and why ? (1)
(c) Draw the schematic diagram of the next control word address generation logic of a CISC
processor which can handle deferred external interrupts and immediate external interrupts.
(2)

Execution Unit Block Diagram for Q4. Part (a)


ALU: Arithmetic and Logic Unit DO : Data Out buffer
IRF: Instruction Register for Fetch K: Constant generator
IRE: Instruction Register for Execution DI: Data Input register
T1, T2 : Temporary registers PC : Program Counter
R0 - Rn : Programmer’s registers AO : Address Out buffer

Rules of Operation for the Execution Unit:


1. A transfer from source to bus to destination takes one state time
2. A source can drive up to three destination loads
3. Inputs to the ALU are from the A internal bus and either K (values 0, +1, -1) or the B
internal bus
4. When the ALU is a destination, T1 is automatically loaded from the ALU output
5. A transfer to AO activates the on-chip external bus controller
6. ALU supports addition and subtraction (B input – A input) operations on 2’s complement
binary integers, and can set condition codes reflecting condition of the ALU result :
V(arithmetic overflow), N(ALU result negative), Z(ALU result zero) when desired, or
leave the condition codes unaltered if so desired.
7. All memory addresses are represented as positive integers in 2’s complement binary
representation
Q3.

(a) Contrast the design objectives of DSP processors and General Purpose Processors (GPPs). (2)

Solution:
GPP
 The GPP designers think of ultimate performance and ultimate flexibility as well as the compiler-
friendly instruction set.
 The instruction set must be general because the application is unknown and the programmers
behavior is unknown.
DSP
 The DSP designers think of application and cost first, and the challenge is to be efficient.
 Flexibility should be sufficient instead of ultimate.
 The goal of DSP designer is to reach the highest performance over silicon, the highest
performance over power consumption, the highest performance over the design cost.

(b) What is the single most important DSP operation that influences the micro-architecture of DSP
processors. How is it accelerated in DSP processors? (2)
Solution:
The most important DSP operation is Multiply and Accumulate (MAC) operation. The
enhancements in the architecture to support MAC operation are:
1. MAC Instruction supported by MAC unit-performing multiply and accumulate operation
2. Multiple data memories
3. Direct memory access capability for the MAC unit
4. Auto-increment addressing mode
5. Modulo/circular addressing mode
6. Hardware loop control
7. Guarding and saturation arithmetic in MAC to handle iterative loops and avoid exception
(which affects the real time constraints)

(c) Name and describe two distinctive data addressing modes that are supported only by DSP
Processors and not by GPPs and why ? (2)
Solution:
1. Modulo/circular addressing mode
Most of the DSP operation is carried out by convolution (FIR filter, IIR, Filter,
Autocorrelation, Cross correlation etc.). example: ( ) ∑ ( )
Since these are data shifting algorithms, shifting the sample for every output sample
computation is expensive in terms of time. In order to avoid this overhead, modulo
addressing has been proposed. In DSP processors, modulo addressing is implemented in
hardware and is present in the AGU. [Refer Lecture-DSP_Introduction for more details.]
2. Bit reversed addressing mode.
DFT is one of the most widely used operations in DSP. DFT can be computed using FFT which
requires less computational steps than the normal method.
 The Discrete Fourier Transform (DFT) allows for spectral analysis in the frequency domain.

– It is computed as

o for k = 0, 1, … , N-1, where

o x is the input sequence in the time domain

o y is an output sequence in the frequency domain

 The Inverse Discrete Fourier Transform (IDFT) is computed as

 The Fast Fourier Transform (FFT) provides an efficient method for computing the DFT.

FFT can be computed by DIT or DIF methods

If we look at DIT FFT, the data sample has to be preordered and supplied where as in DIF FFT, the input
sample is supplied in order but the output sample has to be preordered. In order to speedup this
process, hardwired bit -reversed addressing mode is supported by DSP.

(d) Name and briefly describe (functionally) the distinctive functional blocks of a DSP data path and
DSP address path that are typically not found in GPPs. Also draw the overall architectural
diagram of a DSP processor. (3)

Solution:
DSP data path has
1. Register File
a. Multiple registers present generally more than 64 registers are present. Some of the
special DSP processors have 512 registers
2. ALU
a. Perform special operations with and without saturation arithmetic, absolute value
finding, Select larger value, Select smaller value, Difference of two absolute values,
Absolute of the difference etc.
b. Have Guard bit (generally one guard bit) and saturation arithmetic units.
3. MAC
a. Performs iterative computing, have guard bits and saturation arithmetic units
b. Performs multiplication (integer, fractional, signed, unsigned, double and single
precision) and MAC operation
c. Performs scaling operation also
4. Other accelerated instruction execution units

DSP address path

1. Multiple address generating units (AGUs)


a. Special registers and multiple address pointers
b. Address calculation units

Figure shows the data path and address path components


Figure shows the overall architecture of a DSP

(e) What is fractional data type? Why is it used in DSP processors? How do you convert a 16-bit
integer multiplier to a 16-bit fractional multiplier? (2)

Solution:
Fractional: between -1 and 1-2-n+1 or [-1, 1-2-n+1]
Why Important?
 For computationally intensive application (like DSP), without taking exceptions,
fractional data type favors faster execution.
 Easy to implement data path HW
 Short physical critical path
 Low hardware (memory) costs, low power, But, it must be the acceptable precision

Steps:

1. Supply the 16-bit fractional input to the integer multiplier


2. Shift the result to the left (discard the additional sign bit) and the newly introduced LSB
is filled with zero.
3. One special case: before shifting the result to the left, check the two MSB bits. If they
are same, no overflow has occurred and proceeds to step 2. If they are not the same,
saturation arithmetic has to be performed. This is the function of saturation arithmetic
unit.

[Refer Lecture-ASIP_DSP_Implementation slide from 16-24 for more details]

(f) What special variants of commonly used arithmetic operations are supported by a DSP
processor? How are they implemented by the main functional blocks of the data path ? (2)

Solution:

Operations generally supported by DSP


1. Addition and subtraction with and without saturation arithmetic
2. absolute value finding
3. Select larger value
4. Select smaller value
5. Difference of two absolute values
6. Absolute of the difference etc.
7. Performs multiplication (integer, fractional, signed, unsigned, double and single precision)
8. MAC operation
9. Double precision arithmetic
10. Data format conversion
11. scaling operation
12. Guarding and saturation arithmetic.

[Implementation examples are given in the lecture slide]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy