MEL G642-Compre Solution - 2 2016-17
MEL G642-Compre Solution - 2 2016-17
Q1. Assuming that a 32-bit RISC processor ( with a register file containing 32 registers) that has only the
following three instructions in its instruction set: (i) ADD Rd, Rs1, Rs2 (ii) SUB Rd, Rs1, Rs2 (iii) BEQ Di,
Rs1, Rs2. (Here Rs1 and Rs2 are source registers and Rd is the destination register. ADD and SUB
instructions perform addition and subtraction operations. Instruction BEQ is a conditional branch
instruction which causes branching when the contents of its two source registers are equal. The 8-bit
branching distance Di (relative to the current value of program counter) is provided by a bit-field in the
binary code of BEQ instruction.
(a) Suggest an instruction coding format for the above instruction set and also binary codes for the
three instructions in view of ease of implementation. (1.5)
(b) Design the architectural schematic diagrams of a 4-stage (FETCH, DECODE-OPERAND READ,
EXECUTE, WRITE-BACK) pipelined implementation of this instruction set (i) without internal
forwarding of operands and (ii) with internal forwarding of operands clearly depicting different
fields of the pipieline registers (and what they contain), different functional blocks used in the
pipeline stages and the control circuit. (3+3)
(c) Following code is to be executed on this processor:
ADD R10, R6, R4
SUB R10, R10, R5
BEQ 40, R10, R5
(i) Enumerate all the hazards and their types in the above code. (2)
(ii) Give a clock cycle-by-clock cycle account of execution of this code on your 4-stage
pipelined implementations of the processor without and with internal forwarding of
operands (1.5+1.5)
(d) Now LOAD and STORE instructions are added to the instruction set, and data memory access
(for reading or writing) is organized through the addition of two pipeline stages MEM1 and
MEM2 between the EXECUTE and WRITEBACK stages. How will execution time of the code in
part (c) get effected in the case when there is no internal forwarding of operands ?
Give a cycle-by-cycle description of execution of the code. (1.5)
Q2. What is branch penalty? How can it be reduced / minimized? Give example. What is name
dependence or anti-dependence? Give an example. How is it tackled to gain execution efficiency? What
is Instruction Level Parallelism (ILP) ? How is it exploited in computer architecture? (6)
Q3.
(a) Contrast the design objectives of DSP processors and General Purpose Processors (GPPs). (2)
(b) What is the single most important DSP operation that influences the micro-architecture of DSP
processors. How is it accelerated in DSP processors? (2)
(c) Name and describe two distinctive data addressing modes that are supported only by DSP
Processors and not by GPPs and why ? (2)
(d) Name and briefly describe (functionally) the distinctive functional blocks of a DSP data path and
DSP address path that are typically not found in GPPs. Also draw the overall architectural
diagram of a DSP processor. (3)
(e) What is fractional data type? Why is it used in DSP processors? How do you convert a 16-bit
integer multiplier to a 16-bit fractional multiplier? (2)
(f) What special variants of commonly used arithmetic operations are supported by a DSP
processor? How are they implemented by the main functional blocks of the data path ? (2)
Q4.A CISC processor features an instruction CMX Rx Ry. This instruction compares the magnitudes
(absolute values) of integer data (assume 2’s complement representation) stored in registers Rx and
Ry. The instruction exchanges the stored data values in the registers if the magnitude of data stored
in register Ry happens to be smaller than the magnitude of data stored in register Rx.
(a) Write level II flowcharts for this instruction using the data path diagram given at the end of the
question paper (4)
(b) Assuming that no external interrupts of any kind occur during the execution of the above
instruction (including program or data memory access related interrupts), name the flowchart
states that can potentially cause exception processing to initiate immediately upon their
completion and why ? (1)
(c) Draw the schematic diagram of the next control word address generation logic of a CISC
processor which can handle deferred external interrupts and immediate external interrupts.
(2)
(a) Contrast the design objectives of DSP processors and General Purpose Processors (GPPs). (2)
Solution:
GPP
The GPP designers think of ultimate performance and ultimate flexibility as well as the compiler-
friendly instruction set.
The instruction set must be general because the application is unknown and the programmers
behavior is unknown.
DSP
The DSP designers think of application and cost first, and the challenge is to be efficient.
Flexibility should be sufficient instead of ultimate.
The goal of DSP designer is to reach the highest performance over silicon, the highest
performance over power consumption, the highest performance over the design cost.
(b) What is the single most important DSP operation that influences the micro-architecture of DSP
processors. How is it accelerated in DSP processors? (2)
Solution:
The most important DSP operation is Multiply and Accumulate (MAC) operation. The
enhancements in the architecture to support MAC operation are:
1. MAC Instruction supported by MAC unit-performing multiply and accumulate operation
2. Multiple data memories
3. Direct memory access capability for the MAC unit
4. Auto-increment addressing mode
5. Modulo/circular addressing mode
6. Hardware loop control
7. Guarding and saturation arithmetic in MAC to handle iterative loops and avoid exception
(which affects the real time constraints)
(c) Name and describe two distinctive data addressing modes that are supported only by DSP
Processors and not by GPPs and why ? (2)
Solution:
1. Modulo/circular addressing mode
Most of the DSP operation is carried out by convolution (FIR filter, IIR, Filter,
Autocorrelation, Cross correlation etc.). example: ( ) ∑ ( )
Since these are data shifting algorithms, shifting the sample for every output sample
computation is expensive in terms of time. In order to avoid this overhead, modulo
addressing has been proposed. In DSP processors, modulo addressing is implemented in
hardware and is present in the AGU. [Refer Lecture-DSP_Introduction for more details.]
2. Bit reversed addressing mode.
DFT is one of the most widely used operations in DSP. DFT can be computed using FFT which
requires less computational steps than the normal method.
The Discrete Fourier Transform (DFT) allows for spectral analysis in the frequency domain.
– It is computed as
The Fast Fourier Transform (FFT) provides an efficient method for computing the DFT.
If we look at DIT FFT, the data sample has to be preordered and supplied where as in DIF FFT, the input
sample is supplied in order but the output sample has to be preordered. In order to speedup this
process, hardwired bit -reversed addressing mode is supported by DSP.
(d) Name and briefly describe (functionally) the distinctive functional blocks of a DSP data path and
DSP address path that are typically not found in GPPs. Also draw the overall architectural
diagram of a DSP processor. (3)
Solution:
DSP data path has
1. Register File
a. Multiple registers present generally more than 64 registers are present. Some of the
special DSP processors have 512 registers
2. ALU
a. Perform special operations with and without saturation arithmetic, absolute value
finding, Select larger value, Select smaller value, Difference of two absolute values,
Absolute of the difference etc.
b. Have Guard bit (generally one guard bit) and saturation arithmetic units.
3. MAC
a. Performs iterative computing, have guard bits and saturation arithmetic units
b. Performs multiplication (integer, fractional, signed, unsigned, double and single
precision) and MAC operation
c. Performs scaling operation also
4. Other accelerated instruction execution units
(e) What is fractional data type? Why is it used in DSP processors? How do you convert a 16-bit
integer multiplier to a 16-bit fractional multiplier? (2)
Solution:
Fractional: between -1 and 1-2-n+1 or [-1, 1-2-n+1]
Why Important?
For computationally intensive application (like DSP), without taking exceptions,
fractional data type favors faster execution.
Easy to implement data path HW
Short physical critical path
Low hardware (memory) costs, low power, But, it must be the acceptable precision
Steps:
(f) What special variants of commonly used arithmetic operations are supported by a DSP
processor? How are they implemented by the main functional blocks of the data path ? (2)
Solution: