Dspa Word File
Dspa Word File
ARCHITECTURES (R18A0427)
Lecture Notes B. TECH
UNIT-1
1.1What is DSP?
DSP is a technique of performing the mathematical operations on the signals in digital domain.
As real time signals are analog in nature we need first convert the analog signal to digital, then we
have to process the signal in digital domain and again converting back to analog domain. Thus ADC is
required at the input side whereas a DAC is required at the output end. A typical DSP system is as
shown in figure 1.1.
A computer or a processor is used for digital signal processing. Anti aliasing filter is a LPF
which passes signal with frequency less than or equal to half the sampling frequency in order to avoid
Aliasing effect. Similarly at the other end, reconstruction filter is used to reconstruct the samples from
the staircase output of the DAC (Figure 1.2).
1.4The Sampling Process
ADC process involves sampling the signal and then quantizing the same to a digital value. In
order to avoid Aliasing effect, the signal has to be sampled at a rate at least equal to the Nyquist rate.
Where, fs is the sampling frequency, fm is the maximum frequency component in the message
signal. If the sampling of the signal is carried out with a rate less than the Nyquist rate, the higher
frequency components of the signal cannot be reconstructed properly. The plots of the reconstructed
outputs for various conditions are as shown in figure 1.4.
1.5Discrete Time Sequences
A sequence that repeats itself after every period N is called a periodic sequence.
Consider a periodic sequence x (n) with period N x (n)=x (n+N) n=……..,-1,0,1,2,……..
Frequency response gives the frequency domain equivalent of a discrete time sequence. It is denoted
as X(ejθ)=∑x(n) e-jnθ
Frequency response of a discrete sequence involves both magnitude response and phase response.
From the above expression it is clear that we can use DFT to find the Frequency response of a
A system which satisfies superposition theorem is called as a linear system and a system that
has same input output relation at all times is called a Time Invariant System. Systems, which satisfy
both the properties, are called LTI systems.
LTI systems are characterized by its impulse response or unit sample response in time domain whereas
it is characterized by the system function in frequency domain.
1.7.1 Convolution
Convolution is the operation that related the input output of an LTI system, to its unit sample
response. The output of the system y (n) for the input x (n) and the impulse response of the system
being h (n) is given as y (n) = x(n) * h(n) = ∑ -k), x(n) is the input of the system, h(n) is the
impulse response of the system, y(n) is the output of the system.
1.7.2 Z Transformation
Z Transformations are used to find the frequency response of the system. The Z Transform for
a discrete sequence x (n) is given by, X(Z)= ∑x(n) z-n
1.8Digital Filters
Filters are used to remove the unwanted components in the sequence. They are characterized by
the impulse response h (n). The general difference equation for an Nth order filter is given by
∑aky(n-k)+ ∑ k x(n-k)
A typical digital filter structure is as shown in figure 1.7.
Values of the filter coefficients vary with respect to the type of the filter. Design of a digital filter
involves determining the filter coefficients. Based on the length of the impulse response, digital filters
are classified into two categories via Finite Impulse Response (FIR) Filters and Infinite Impulse
Response (IIR) Filters.
1.8.1 FIR Filters
FIR filters have impulse responses of finite lengths. In FIR filters the present output depends
only on the past and present values of the input sequence but not on the previous output sequences.
Thus they are non recursive hence they are inherently stable.FIR filters possess linear phase response.
Hence they are very much applicable for the applications requiring linear phase response.
The difference equation of an FIR filter is represented as
The major drawback of FIR filters is, they require more number of filter coefficients to realize a
desired response as compared to IIR filters. Thus the computational time required will also be more.
Stability of IIR filters depends on the number and the values of the filter coefficients. The major
advantage of IIR filters over FIR is that, they require lesser coefficients compared to FIR filters for the
same desired response, thus requiring less computation time.
Design procedure of an FIR filter involves the determination of the filter coefficients bk.
Direct IIR filter design methods are based on least squares fit to a desired frequency response. These
methods allow arbitrary frequency response specifications.
1.9.1 Decimation
Decimation is a process of dropping the samples without violating sampling theorem. The
factor by which the signal is decimated is called as decimation factor and it is denoted by M. It is
given by,
Problems:
1. Obtain the transfer function of the IIR filter whose difference equation is given by
y (n)=0.9y (n-1)+0.1x (n)
y (n)= 0.9y (n-1)+0.1x (n)
Taking Z transformation both sides
Y (Z) = 0.9 Z-1 Y (Z) + 0.1 X (Z)
Y (Z) [1- 0.9 Z-1] = 0.1 X (Z)
The transfer function of the system is given by the expression,
H (Z)= Y(Z)/X(Z)
= 0.1/ [ 1- 0.9 Z-1]
Realization of the IIR filter with the above difference equation is as shown in figure.
UNIT-2
Architectures for Programmable Digital Signal Processing
Devices
2.1Basic Architectural Features
A programmable DSP device should provide instructions similar to a conventional
microprocessor. The instruction set of a typical DSP device should include the following,
a. Arithmetic operations such as ADD, SUBTRACT, MULTIPLY etc
b. Logical operations such as AND, OR, NOT, XOR etc
c. Multiply and Accumulate (MAC) operation
d. Signal scaling operation
In addition to the above provisions, the architecture should also include,
a. On chip registers to store immediate results
b. On chip memories to store signal samples (RAM)
c. On chip memories to store filter coefficients (ROM)
2.2DSP Computational Building Blocks
Each computational block of the DSP should be optimized for functionality and speed and in
the meanwhile the design should be sufficiently general so that it can be easily integrated with other
blocks to implement overall DSP systems.
2.2.1 Multipliers
The advent of single chip multipliers paved the way for implementing DSP functions on a
VLSI chip. Parallel multipliers replaced the traditional shift and add multipliers now days. Parallel
multipliers take a single processor cycle to fetch and execute the instruction and to store the result.
They are also called as Array multipliers. The key features to be considered for a multiplier are:
a. Accuracy
b. Dynamic range
c. Speed
The number of bits used to represent the operands decides the accuracy and the dynamic range
of the multiplier. Whereas speed is decided by the architecture employed. If the multipliers are
implemented using hardware, the speed of execution will be very high but the circuit complexity will
also increases considerably. Thus there should be a tradeoff between the speed of execution and the
circuit complexity. Hence the choice of the architecture normally depends on the application.
2.2.4 Speed
Conventional Shift and Add technique of multiplication requires n cycles to perform the
multiplication of two n bit numbers. Whereas in parallel multipliers the time required will be the
longest path delay in the combinational circuit used. As DSP applications generally require very high
speed, it is desirable to have multipliers operating at the highest possible speed by having parallel
implementation.
2.2.6 Shifters
Shifters are used to either scale down or scale up operands or the results. The following
scenarios give the necessity of a shifter
a. While performing the addition of N numbers each of n bits long, the sum can grow up to n+log2 N
bits long. If the accumulator is of n bits long, then an overflow error will occur. This can be overcome
by using a shifter to scale down the operand by an amount of log2N.
b. Similarly while calculating the product of two n bit numbers, the product can grow up to 2n bits
long. Generally the lower n bits get neglected and the sign bit is shifted to save the sign of the
product.
c. Finally in case of addition of two floating-point numbers, one of the operands has to be shifted
appropriately to make the exponents of two numbers equal.
From the above cases it is clear that, a shifter is required in the architecture of a DSP.
In conventional microprocessors, normal shift registers are used for shift operation. As it
requires one clock cycle for each shift, it is not desirable for DSP applications, which generally
involves more shifts. In other words, for DSP applications as speed is the crucial issue, several shifts
are to be accomplished in a single execution cycle. This can be accomplished using a barrel shifter,
which connects the input lines representing a word to a group of output lines with the required shifts
determined by its control inputs. For an input of length n, log2 n control lines are required. And an
dditional control line is required to indicate the direction of the shift.
The block diagram of a typical barrel shifter is as shown in figure 2.3.
Figure 2.4 depicts the implementation of a 4 bit shift right barrel shifter. Shift to right by 0, 1, 2 or 3
bit positions can be controlled by setting the control inputs appropriately.
Although addition and multiplication are two different operations, they can be performed in parallel.
By the time the multiplier is computing the product, accumulator can accumulate the product of the
previous multiplications. Thus if N products are to be accumulated, N-1 multiplications can overlap
with N-1 additions. During the very first multiplication, accumulator will be idle and during the last
accumulation, multiplier will be idle. Thus N+1 clock cycles are required to compute the sum of N
products.
Shifters
Shifters can be provided at the input of the MAC to normalize the data and at the output to de
normalize the same.
Guard bits
As the normalization process does not yield accurate result, it is not desirable for some
applications. In such cases we have another alternative by providing additional bits called guard bits in
the accumulator so that there will not be any overflow error. Here the add/subtract unit also has to be
modified appropriately to manage the additional bits of the accumulator.
Saturation Logic
Overflow/ underflow will occur if the result goes beyond the most positive number or below
the least negative number the accumulator can handle. Thus the overflow/underflow error can be
resolved by loading the accumulator with the most positive number which it can handle at the time of
overflow and the least negative number that it can handle at the time of underflow. This method is
called as saturation logic. A schematic diagram of saturation logic is as shown in figure 2.7. In
saturation logic, as soon as an overflow or underflow condition is satisfied the accumulator will be
loaded with the most positive or least negative number overriding the result computed by the MAC
unit.
Status Flags
ALU includes circuitry to generate status flags after arithmetic and logic operations. These flags
include sign, zero, carry and overflow.
Overflow Management
Depending on the status of overflow and sign flags, the saturation logic can be used to limit the
accumulator content.
Register File
Instead of moving data in and out of the memory during the operation, for better speed, a large set of
general purpose registers are provided to store the intermediate results.
In order to increase the speed of operation, separate memories were used to store program and
data and a separate set of data and address buses have been given to both memories, the architecture
called as Harvard Architecture. It is as shown in figure 2.10.
Although the usage of separate memories for data and the instruction speeds up the processing,
it will not completely solve the problem. As many of the DSP instructions require more than one
operand, use of a single data memory leads to the fetch the operands one after the other, thusincreasing
the delay of processing. This problem can be overcome by using two separate data memories for
storing operands separately, thus in a single clock cycle both the operands can be fetchedtogether
(Figure 2.11).
Fig 2.11 Harvard Architecture with Dual Data Memory
Although the above architecture improves the speed of operation, it requires more hardware
and interconnections, thus increasing the cost and complexity of the system. Therefore there should be
a trade off between the cost and speed while selecting memory architecture for a DSP.
There are four special cases in this addressing mode. They are
a. SAR < EAR & updated PNTR > EAR
b. SAR < EAR & updated PNTR < SAR
c. SAR >EAR & updated PNTR > SAR
d. SAR > EAR & updated PNTR < EAR
The buffer length in the first two case will be (EAR-SAR+1) whereas for the next tow cases (SAR-
EAR+1)
The pointer updating algorithm
Fig 2.12 Special Cases in Circular Addressing Mode
The block diagram of a typical address generation unit is as shown in figure 2.13.
Solution:-
y(n)= ∑h(i) x(n-i) n=0,1,2…
In order to implement the above operation in a DSP, the architecture requires the
following features
2). It is required to find the sum of 64, 16 bit numbers. How many bits should
theaccumulator have so that the sum can be computed without the occurrence
of overflow error or loss of accuracy?
The sum of 64, 16 bit numbers can grow up to (16+ log2 64 )=22 bits long. Hence
the accumulator should be 22 bits long in order to avoid overflow error from occurring.
3. If a sum of 256 products is to be computed using a pipelined MAC unit, and if the MAC
execution time of the unit is 100nsec, what will be the total time required to complete
theoperation?
As N=256 in this case, MAC unit requires N+1=257execution cycles. As the single MAC
execution time is 100nsec, the total time required will be, (257*100nsec)=25.7usec
4. Consider a MAC unit whose inputs are 16 bit numbers. If 256 products are to
be summed up in this MAC, how many guard bits should be provided for the
accumulator to prevent overflow condition from occurring?
As it is required to calculate the sum of 256, 16 bit numbers, the sum can be as
long as (16+ log2 256)=24 bits. Hence the accumulator should be capable of handling
these 22 bits. Thus the guard bits required will be (24-16)= 8 bits.
The block diagram of the modified MAC after considering the guard or extention bits is as shown in
the figure
5. What are the memory addresses of the operands in each of the following cases of indirect
addressing modes? In each case, what will be the content of the addreg after the memory
access? Assume that the initial contents of the addreg and the offsetreg are 0200h and
0010h,respectively.
a. ADD *addreg
b.ADD +*addreg
c. ADD offsetreg+,*addreg
d. ADD *addreg,offsetreg-
6. A DSP has a circular buffer with the start and the end addresses as 0200h and 020Fh
respectively. What would be the new values of the address pointer of the buffer if, in the
courseof address computation, it gets updated to
0212h
b. 01FCh
Buffer Length= (EAR-SAR+1) = 020F-0200+1=10h
a. New Address Pointer= Updated Pointer-buffer length = 0212-10=0202h
b. New Address Pointer= Updated Pointer+ buffer length = 01FC+10=020Ch
9. Compute the indices for an 8-point FFT using Bit reversed Addressing Mode
Start with index 0. Therefore the first index would be (000)
Next index can be calculated by adding half the FFT length, in this case it is (100)
to the previous index. i.e. Present Index= (000)+B (100)= (100)
Similarly the next index can be calculated as
Present Index= (100)+B (100)= (010)
The process continues till all the indices are calculated. The following table summarizes
the calculation.
UNIT-3
Programmable Digital Signal Processors
3.1Introduction:
Leading manufacturers of integrated circuits such as Texas Instruments (TI), Analog devices &
Motorola manufacture the digital signal processor (DSP) chips. These manufacturers have developed a
range of DSP chips with varied complexity.
The TMS320 family consists of two types of single chips DSPs: 16-bit fixed point &32-bit floating-
point. These DSPs possess the operational flexibility of high-speed controllers and the numerical
capability of array processors
Accumulators A and B store the output from the ALU or the multiplier/adder block and provide a
second input to the ALU. Each accumulators is divided into three parts: guards bits (bits 39-32), high-
order word (bits-31-16), and low-order word (bits 15- 0), which can be stored and retrieved
individually. Each accumulator is memory-mapped and partitioned. It can be configured as the
destination registers. The guard bits are used as a head margin for computations.
Figure 3.2.Functional diagram of the central processing unit of the TMS320C54xxprocessors.
Barrel shifter: provides the capability to scale the data during an operand read or write.
No overhead is required to implement the shift needed for the scaling operations. The’54xx barrel
shifter can produce a left shift of 0 to 31 bits or a right shift of 0 to 16 bits on the input data. The shift
count field of status registers ST1, or in the temporary
register T. Figure 3.3 shows the functional diagram of the barrel shifter of TMS320C54xx processors.
The barrel shifter and the exponent encoder normalize the values in an accumulator in a single cycle.
The LSBs of the output are filled with0s, and the MSBs can be either zero filled or sign extended,
depending on the state of the sign-extension mode bit in the status register ST1. An additional shift
capability enables the processor to perform numerical scaling, bit extraction, extended arithmetic, and
overflow prevention operations.
Figure 3.3.Functional diagram of the barrel shifter
Multiplier/adder unit: The kernel of the DSP device architecture is multiplier/adder unit. The
multiplier/adder unit of TMS320C54xx devices performs 17 x 17 2’s complement multiplication with
a 40-bit addition effectively in a single instruction cycle.
In addition to the multiplier and adder, the unit consists of control logic for integer and
fractional computations and a 16-bit temporary storage register, T. Figure 3.4 show the functional
diagram of the multiplier/adder unit of TMS320C54xx processors. The compare, select, and store unit
(CSSU) is a hardware unit specifically incorporated to accelerate the add/compare/select operation.
This operation is essential to implement the Viterbi algorithm used in many signal-processing
applications. The exponent encoder unit supports the EXP instructions, which stores in the T register
the number of leading redundant bits of the accumulator content. This information is useful while
shifting the accumulator content for the purpose of scaling.
Figure 3.4. Functional diagram of the multiplier/adder unit of TMS320C54xx processors.
3.3.3 Internal Memory and Memory-Mapped Registers:
The amount and the types of memory of a processor have direct relevance to the efficiency and
performance obtainable in implementations with the processors. The ‘54xx memory is organized into
three individually selectable spaces: program, data, and I/O spaces. All ‘54xx devices contain both
RAM and ROM. RAM can be either dual-access type (DARAM) or single-access type (SARAM).
Theon-chip RAM for these processors is organized in pages having 128 word locations on each page.
The ‘54xx processors have a number of CPU registers to support operand addressing and
computations. The CPU registers and peripherals registers are all located on page 0 of the data
memory. Figure 3.5(a) and (b) shows the internal CPU registers and peripheral registers with their
addresses. The processors mode status (PMST) registers
that is used to configure the processor. It is a memory-mapped register located at address 1Dh on page
0 of the RAM. A part of on-chip ROM may contain a boot loader and look-up tables for function such
as sine, cosine, μ- law, and A- law.
HM: Hold mode, indicates whether the processor continues internal execution or acknowledge for
external interface.
INTR: Interrupt vector pointer, point to the 128-word program page where the interrupt vectors
reside.
MP/MC: Microprocessor/Microcomputer mode,
MP/MC=0, the on chip ROM is enabled.
MP/MC=1, the on chip ROM is enabled.
OVLY: RAM OVERLAY, OVLY enables on chip dual access data RAM blocks to be mapped into
program space.
AVIS: It enables/disables the internal program address to be visible at the address pins.
DROM: Data ROM, DROM enables on-chip ROM to be mapped into data space.
CLKOFF: CLOCKOUT off.
Data addressing modes provide various ways to access operands to execute instructions and place
results in the memory or the registers. The 54XX devices offer seven basic addressing modes
1. Immediate addressing.
2. Absolute addressing.
3. Accumulator addressing.
4. Direct addressing.
5. Indirect addressing.
6. Memory mapped addressing
7. Stack addressing.
If CPL = 0 Selects DP
CPL = 1 selects SP,
It should be remembered that when SP is used instead of DP, the effective address iscomputed by adding the 7-bit offset to
SP
Figure 3.7 Block diagram of the direct addressing mode for TMS320C54xx Processors.
3.4.1Indirect Addressing:
TMS320C54xx have 8, 16 bit auxiliary register (AR0 – AR 7). Two auxiliary register arithmetic units
(ARAU0 & ARAU1)
Used to access memory location in fixed step size. AR0 register is used for indexed and bit reverse
addressing modes.
– operand addressing
MOD _ type of indirect addressing
ARF _ AR used for addressing
ARP depends on (CMPT) bit in ST1
CMPT = 0, Standard mode, ARP set to zero
CMPT = 1, Compatibility mode, Particularly AR selected by ARP
Table 3.2 Indirect addressing options with a single data –memory operand.
Circular Addressing;
Used in convolution, correlation and FIR filters.
A circular buffer is a sliding window contains most recent data. Circular buffer of size R must
start on a N-bit boundary, where 2N > R .
Effective base address (EFB): By zeroing the N LSBs of a user selected AR (ARx).
If 0 _ index + step < BK ; index = index +step;
else if index + step _ BK ; index = index + step - BK;
else if index + step < 0; index + step + BK
Bit-Reversed Addressing:
o Used for FFT algorithms.
o AR0 specifies one half of the size of the FFT.
o The value of AR0 = 2N-1: N = integer FFT size = 2N
o AR0 + AR (selected register) = bit reverse addressing.
o The carry bit propagating from left to right.
Dual-Operand Addressing:
Dual data-memory operand addressing is used for instruction that simultaneously
perform two reads (32-bit read) or a single read (16-bit read) and a parallel store (16-bit
store) indicated by two vertical bars, II. These instructions access operands using indirect addressing
mode.
If in an instruction with a parallel store the source operand the destination operand point to the
same location, the source is read before writing to the destination. Only 2 bits are available in the
instruction code for selecting each auxiliary register in this mode. Thus, just four of the auxiliary
registers, AR2-AR5, can be used, The ARAUs together with these registers, provide capability to
access two operands in a single cycle. Figure 3.11 shows how an address is generated using dual data-
memory operand addressing.
3.4.6. Memory-Mapped Register Addressing:
Used to modify the memory-mapped registers without affecting the current data page
pointer (DP) or stack-pointer (SP)
o Overhead for writing to a register is minimal
o Works for direct and indirect addressing
o Scratch –pad RAM located on data PAGE0 can be modified
STM #x, DIRECT
STM #tbl, AR1
2. Assuming the current contents of AR3 to be 200h, what will be its contents after
each of the following TMS320C54xx addressing modes is used? Assume that
the contents of AR0 are20h
a. *AR3 + 0B
b. *AR3 – 0B
Solution:
a. AR3 ← AR3 + AR0 with reverse
carry propagation; AR3 = 200h + 20h
(with reverse carry propagation) =
220h.
b. AR3 ← AR3 - AR0 with reverse
carry propagation; AR3 = 200h - 20h
(with reverse carry propagation) =
23Fh.
2 In the fetch phase, an instruction word is fetched from the program bus, PB,
and loaded into the instruction register, IR. These two phases from the
instruction
fetch sequence.
3 During the decode stage, the contents of the instruction register, IR are
decoded to determine thetype of memory access operation and the control
signals required for the data-address generation unit and the CPU.
4 The access phase outputs the read operand’s on the data address bus, DAB. If
a second operand is required, the other data address bus, CAB, also loaded with
an appropriate address. Auxiliary registers in indirect addressing mode and the
stack pointer (SP) are also updated.
5 In the read phase the data operand(s), if any, are read from the data buses,
DB and CB. This phase completes the two-phase read process and starts the two
phase write processes. The data address of thewrite operand, if any, is loaded
into the data write address bus, EAB.
6 The execute phase writes the data using the data write bus, EB, and
completes the operand write sequence. The instruction is executed in this
phase.
UNIT-IV
Analog Devices Family of DSP Devices:
ALU Block Diagram:
Shifter Instructions:
INSTRUCTION
GENERATOR GENERATOR
SEQUENCER
PMA
DMA
DMA BUS
PMD
EXCHANGE
DMD
DMD BUS
On the ADSP-2100, the four memory buses are extended off-chip for directconnection to
external memories. The program memory data (PMD) bus serves primarily to transfer instructions
from off-chip memory to the internal instruction register. Instructions are fetched and loaded into
the instruction register duringone processor cycle; they execute during the following cycle while the
next instruction is being fetched. The instruction register introduces a single level of pipelining in the
program flow. Instructions loaded into theinstruction register are also written into the cache
memory, to be described below
The next instruction address i s generated by the program sequencer depending on the
current instruction and internal processor status. This address is placed on the program memory
address (PMA) bus. The program sequencer uses features such as conditional branching, loop
counters and zero-overhead looping to minimize program flow overhead.The program memory
address (PMA) bus is 14 bits wide, allowing direct access to up to 16K words of instruction code
and 16K words of data. Thestate of the PMDA pin distinguishes between code and data access of
program memory. The program memory data (PMD) bus, like the processor’s instruction words, is
24 bits wide.
.
The data memory address (DMA) bus is 14 bits wide allowing direct access of up to 16K
words of data. The data memory data (DMD) bus is 16bits wide. The data memory data (DMD) bus
provides a path for the contents of any register in the processor to be transferred to any other
register, or to any external data memory location, in a single cycle. The datamemory address can
come from two sources: an absolute value specified in the instruction code (direct addressing) or
the output of a data address generator (indirect addressing). Only indirect addressing is supported
for data fetches via the program memory bus.
The program memory data (PMD) bus can also be used to transfer data toand from the
computational units through direct paths or via the PMD- DMD bus exchange unit. The PMD-
DMD bus exchange unit permits datato be passed from one bus to the other. It contains
hardware to overcome the 8-bit width discrepancy between the two buses when necessary.
Each computational unit contains a set of dedicated input and output registers.
Computational operations generally take their operands from input registers and load the result
into an output register. The registers actas a stopover point for data between the external
memory and the computational circuitry, effectively introducing one pipeline level on input and
one level on output. The computational units are arranged side by side rather than in cascade. To
avoid excessive pipeline delays when a series of different operations are performed, the internal
result (R) bus allows any of the output registers to be used directly (without delay) as the input to
another computation.
For a wide variety of calculations, it is desirable to fetch two operands at the same time—
one from data memory and one from program memory. Fetching data from program memory,
however, makes it impossible to fetch the next instruction from program memory on the same
cycle; an additional cycle would be required. To avoid this overhead, the ADSP- 2100 incorporates
an instruction cache which holds sixteen words. The benefit of the cache architecture is most
apparent when executing a program loop that can be totally contained in the cache memory. In
this situation, the ADSP-2100 works like a three-bus system with an instruction fetch and two
operand fetches taking place at the same time. Many algorithms are readily coded in loops of
sixteen instructions or lessbecause of the parallelism and high-level syntax of the ADSP-2100
assembly language.
Here’s how the cache functions: Every instruction loaded into the instruction register is
also written into cache memory. As additional instructions are fetched, they overwrite the current
contents of cache in acircular fashion. When the current instruction does a program memory data
access, the cache automatically sources the instruction register if its contents are valid. Operation
of the cache is completely transparent to user.
There are two independent data address generators (DAGs). As a pair, they allow the
simultaneous fetch of data stored in program and in data memory for executing dual-operand
instructions in a single cycle. One data address generator (DAG1) can supply addresses to the data
memoryonly; the other (DAG2) can supply addresses to either the data memory orthe program
memory. Each DAG can handle linear addressing as well as modulo addressing for circular buffers.
With its multiple bus structure, the ADSP-2100 supports a high degree of operational
parallelism. In a single cycle, the ADSP-2100 can fetch an instruction, compute the next instruction
address, perform one or two data transfers, update one or two data address pointers and perform
a computation. Every instruction executes in a single cycle.
Figure 1.2, on the next page, is a simplified representation of the ADSP-2100 in a system
context. The figure shows the two external memories used by the processor. Program memory
stores instructions and is also used to store data. Data memory stores only data. The data memory
address space may be shared with memory-mapped peripherals, if desired. Both memories may
be accessed by external devices, such as a system host, if desired. Figure 1.2 also shows the
processor control interface signals, (RESET, HALT and TRAP) the four interrupt request lines, the bus
request and bus grant lines (BR and BG) and the clock input(CLKIN) and output (CLKOUT).
CLOCK
CLKIN CLKOUT
ADDR
MEMORY
ADSP-2100
Program Memory
ADDR
RESET HALT TRAP IRQ BG
DATA
The ADSP-2181 is a single-chip microcomputer optimized fordigital signal processing ( DSP) and
other high speed numeric processing applications.
The ADSP-2181 combines the ADSP-2100 family base archi- tecture (three computational units,
data address generators and a program sequencer) with two serial ports, a 16-bit internal DMA
port, a byte DMA port, a programmable timer, Flag I/O,extensive interrupt capabilities, and on-
chip program and data memory.
The ADSP-2181 integrates 80K bytes of on-chip memory con- figured as 16K words (24-bit) of
program RAM, and 16K words(16-bit) of data RAM. Power-down circuitry is also provided to meet
the low power needs of battery operated portable equip- ment. The ADSP-2181 is available in
128- lead TQFP and 128- lead PQFP packages.
In addition, the ADSP-2181 supports new instructions, which include bit manipulations—bit set,
bit clear, bit toggle, bit test—new ALU constants, new multiplication instruction (x squared), biased
rounding, result free ALU operations, I/O memory trans- fers and global interrupt masking for
increased flexibility.
Fabricated in a high speed, double metal, low power, CMOS process, the ADSP-2181 operates
with a 25 ns instruction cycletime. Every instruction can execute in a single processor cycle.
The ADSP-2181’s flexible architecture and comprehensive instruction set allow the processor to
perform multiple opera-tions in parallel. In one processor cycle the ADSP-2181 can:
• Generate the next program address
• Fetch the next instruction
• Perform one or two data moves
16
INPUT
INPUTREGS
REGS INPUT REGS
INPUT REGS INPUT REGS COMPANDING
CIRCUITRY INTERNAL 16
DMA
ALU
ALU MAC
MAC SHIFTER TIMER PORT
TRANSMIT REG TRANSMI REG
OUTPUT
OUTPUTREGS
REGS OUTPUT
OUTPUTREGS
REGS OUTPUT REGS T
RECEIVE REG
RECEIVE REG
SERIAL 4
16 PORT 0 SERIAL INTERRUPTS
PORT 0
R BUS
5 5
Figure 1. ADSP-2181
Block Diagram
The ADSP-2181 instruction set provides flexible data moves and multifunction (one
or two data moves with a computation)instructions. Every instruction can be executed
in a single processor cycle. The ADSP-2181 assembly language uses an algebraic syntax
for ease of coding and readability. A comprehensive set of development tools
supports program development.
Figure 1 is an overall block diagram of the ADSP-2181. The processor
contains three independent computational units: the ALU, the multiplier/accumulator
( MAC) and the shifter. The computational units process 16-bit data directly and have
provi-sions to support multiprecision computations. The ALU per- forms a standard
set of arithmetic and logic operations; division primitives are also supported. The MAC
performs single-cycle multiply, multiply/add and multiply/subtract operations with 40
bits of accumulation. The shifter performs logical and arithmetic shifts, normalization,
denormalization and derive exponent operations. The shifter can be used to efficiently
implement numeric format control including multiword and block floating- point
representations.
The internal result (R) bus connects the computational units sothat the output of
any unit may be the input of any unit on the next cycle.
A powerful program sequencer and two dedicated data address generators ensure
efficient delivery of operands to these computational units. The sequencer supports
conditional jumps, subroutinecalls and returns in a single cycle. With internal loop
counters and loop stacks, the ADSP-2181 executes looped code with zero over- head;
no explicit jump instructions are required to maintain loops.
Two data address generators ( DAGs) provide addresses for simultaneous dual
operand fetches (from data memory and program memory). Each DAG maintains and
updates four address pointers. Whenever the pointer is used to access data
(indirect addressing), it is post-modified by the value of one of four possible modify
registers. A length value may be associatedwith each pointer to implement
automatic modulo addressing for circular buffers.
Efficient data transfer is achieved with the use of five internalbuses:
• Program Memory Address (PMA) Bus
• Program Memory Data (PMD) Bus
• Data Memory Address ( DMA) Bus
• Data Memory Data ( DMD) Bus
• Result (R) Bus
The two address buses (PMA and DMA) share a single external address bus, allowing
memory to be expanded off-chip, and the two data buses (PMD and DMD) share a
single external data
bus. Byte memory space and I/O memory space also share the external buses.
Program memory can store both instructions and data, permit- ting the ADSP-2181
to fetch two operands in a single cycle, one from program memory and one from
data memory. The ADSP-2181 can fetch an operand from program memory and the
next instruction in the same cycle. In addition to the address and data bus for external
memory connection, the ADSP-2181 has a 16-bit Internal DMA port (IDMA port) for
connection to external systems. The IDMA port is made up of 16 data/address pins
and five control pins. The IDMA port provides transparent, direct access to the DSPs
on-chip program and data RAM. An interface to low cost byte-wide memory is
provided by the Byte DMA port (BDMA port). The BDMA port is bidirectionaland can
directly address up to four megabytes of external RAMor ROM for off-chip storage
of program overlays or data tables. The byte memory and I/O memory space interface
supports slow memories and I/O memory-mapped peripherals with program- mable
wait state generation. External devices can gain control of external buses with bus
request/grant signals ( BǍ, BŠǏ and BŠ).One execution mode (Go Mode) allows the
ADSP-2181 to con- tinue running from on-chip memory. Normal execution mode
requires the processor to halt while buses are granted. The ADSP-2181 can respond
to 13 possible interrupts, eleven of which are accessible at any given time. There
can be up to sixexternal interrupts (one edge-sensitive, two level-sensitive and three
configurable) and seven internal interrupts generated bythe timer, the serial ports
(SPORTs), the Byte DMA port and the power-down circuitry. There is also a master
ǍESET signal. The two serial ports provide a complete synchronous serial inter- face
with optional companding in hardware and a wide variety of framed or frameless data
transmit and receive modes of operation. Each port can generate an internal
UNIT - V
Interfacing Memory & Parallel I/O Peripheralsto DSP
Devices
5.1 Introduction: A typical DSP system has DSP with external memory, input devices and
output devices. Since the manufacturers of memory and I/O devices are not same as that
of manufacturers of DSP and also since there are variety of memory and I/O devices
available, the signals generated by DSP may not suit memory and I/O devices to be
connected to DSP. Thus, there is a need for interfacing devices the purpose of it being to
us e DSP signals to generate the appropriate signals for setting up communication with the
memory. DSP with interface is shown in fig. 5.1.
External memory is off-chip. They are slower memory. External Interfacing is required to
establish the communication between the memory and the DSP. They can be with large
memoryspace. The purpose is being to store variable data and as scratch pad memory.
Program memory can be ROM, Dual Access RAM (DARAM), Single Access RAM
(SARAM), or a combination of all these. The program memory can be extended externally
to 8192K words. That is, 128 pages of 64K words each. The arrangement of memory and
DSP in the case of Single Access RAM (SARAM) and Dual Access RAM (DARAM) is
shown in fig. 7.3. One set of address bus and data bus is available in the case of SARAM
and two sets of address bus and data bus is available in the case of DARAM. TheDSP
can thus access two memory locations simultaneously.
There are 3 bits available in memory mapped register, PMST for the purpose of on-chip
memory mapping. They are microprocessor / microcomputer mode. If this bit is 0, the on-
chip ROM isenabled and addressable and if this bit is 1 the on-chip ROM not available. The
bit can be manipulated by software / set to the value on this pin at system reset. Second bit
is OVLY. It implies RAM Overlay. It enables on-chip DARAM data memory blocks to be
mapped into program space. If this bit is 0, on-chip RAM is addressable in data space but
not in Program Space and if it is 1, on-chip RAM is mapped into Program & Data Space.
The third bit is DROM. It enables on-chip DARAM 4-7 to be mapped into data space.
If this bit is 0, on-chipDARAM 4-7 is not mapped into data space and if this bit is 1, on-
chip DARAM 4-7 is mapped into Data Space. On-chip data memory is partitioned into
several regions as shown in table 7.1. Data memory can be onchip / off-chip.
The on-chip memory of TMS320C54xx can be both program & data memory. It
enhances speed of program execution by using parallelism. That is, multiple data access
capability is provided for concurrent memory operations. The number of operations in
single memory access is 3 reads & one write. The external memory to DSP can be
interfaced with 16 -23 bit Address Bus, 16 bit Data Bus. Interfacing Signals are generated
by the DSP to refer to external memory. The signals required by the memory are typically
chip Select, Output Enable and Write Enable. For example, TMS320C5416 has 16K ROM,
64K DARAM and 64K SARAM.
Extended external Program Memory is interfaced with 23 address lines i.e., 8192K
locations. The external memory thus interfaced is divided into 128 pages, with 64K words
per page.
Read/Write Signal is low when DSP is writing and high when DSP is reading. Strobe
Interfacing Signals, Memory Strobe and I/O Strobe both are active low. They remain low
during the entire read & write operations of memory and I/O operations respectively.
External Bus Interfacing Signals from 1-8 are all are unidirectional except Data Bus which is
bidirectional. Address Lines are outgoing signals and all other control signals are also
outgoing signals.
Data Ready signal is used when a slow device is to be interfaced. Hold Request and
Hold Acknowledge are used in conjunction with DMA controller. There are two Interrupt
related signals: Interrupt Request and Interrupt Acknowledge. Both are active low.
Interrupt Request typically for data exchange. For example, between ADC / another
Processor. TMS320C5416 has 14 hardware interrupts for the purpose of User interrupt,
Mc- BSP, DMA and timer. The External Flag is active high,asynchronous and outgoing
control signal. It initiates an action or informs about the completion of a transaction to the
peripheral device. Branch Control Input is a active low, asynchronous, incoming control
signal. A low on this signal makes the DSP to respond or attend to the peripheral device. It
informs about the completion of a transaction to the DSP.
enables the availability of data from a memory location onto the data bus. The address
bus is unidirectional, carries address into the memory IC. Data bus is bidirectional. Chip
Select, Write Enable and Output Enable control signals are active high or low and they
carry signals into the memory ICs. The task of the memory interface is to use DSP
signals and generate the appropriate signals for setting up communication with the
memory. The logical spacing of interface is shown in fig. 7.4.
The timing sequence of memory access is shown in fig. 7.5. There are two read
operations, both referring to program memory. Read Signal is high and Program Memory
Select is low. There is one Write operation referring to external data memory. Data
Memory Select is low and Write Signal low. Read and write are to memory device and
hence memory strobe is low. Internal program memory reads take one clock cycle and
External data memory access require two clock cycles.
5.5 Parallel I/O Interface: I/O devices are interfaced to DSP using unconditional I/O
mode, programmed I/O mode or interrupt I/O mode. Unconditional I/O does not require
any handshaking signals. DSP assumes the readiness of the I/O and transfers the data with
its own speed. Programmed I/O requires handshaking signals. DSP waits for the readiness
of the I/O readiness signal which is one of the handshaking signals. After the
completion of transaction DSP conveys the same to the I/O through another handshaking
signal. Interrupt I/O also requires handshaking signals. DSP is interrupted by the I/O
indicating the readiness
of the I/O. DSP acknowledges the interrupt, attends to the interrupt. Thus, DSP need not
wait for the I/O to respond. It can engage itself in execution as long as there is no
interrupt.
5.6 : Programmed I /O interface: The timing diagram in the case of programmed I/O is
shown in fig. 7.6. I/O strobe and I/O space select are issued by the DSP. Two clock cycles
each are required for I/Oread and I/O write operations.
An example of interfacing ADC to DSP in programmed I/O mode is shown in fig. 7.7.
ADC has a startof conversion (SOC) signal which initiates the conversion. In programmed
I/O mode, external flag signal is issued by DSP to start the conversion. ADC issues end of
conversion (EOC) after completion of conversion. DSP receives Branch input control by
ADC when ADC completes the conversion. The DSP issues address of the ADC, I/O strobe
and read / write signal as high to read the data. An address decoder does the translation
of this information into active low read signal to ADC. The data is supplied on data bus by
ADC and DSP reads the same. After reading,
DSP issues start of conversion once again after the elapse of sample interval. Note
that there are no address lines for ADC. The decoded address selects the ADC. During
conversion, DSP waits checking branch input control signal status for zero. The flow chart
of the activities in programmed I/O is shown in fig. 7.8.
Registers used in managing interrupts are Interrupt flag Register (IFR) and
Interrupt Mask Register (IMR). IFR maintains pending external & internal interrupts. One
in any bit position implies pending interrupt. Once an interrupt is received, the
Corresponding bit is set. IMR is used to mask or unmask an interrupt. One implies that the
corresponding interrupt is unmasked. Both these registersare Memory Mapped Registers.
One flag, Global enable bit (INTM), in ST1 register is used to enable or disable all
interrupts globally. If INTM is zero, all unmasked interrupts are enabled. If it is one, all
maskable interrupts are disabled.
In DMA, data transfer can be between memory and peripherals which are either
internal or external devices. DMA controller manages DMA operation. Thus DSP is
relieved of the task ofdata transfer. Because of direct transfer, speed of transfer is high. In
TMS320C54xx, there are up to 6 independent programmable DMA channels. Each channel
is between certain source & destination. One channel at a time can be used for data
transfer and not all six simultaneously. These channels can be prioritized. The speed of
transfer measured in terms of number of clock cycles for one DMA transfer depends on
several factors such as source and destination location, external interface conditions,
number of active DMA channels, wait states and bank switching time. The time for data
transfer between two internal memory is 4 cycles foreach word.
There are five, channel context registers for each DMA channel. They are Source
Address Register (DMSRC), Destination Address Register (DMDST), Element Count Register
(DMCTR), Sync select & Frame Count register (DMSFC), Transfer Mode Control Register
(DMMCR). There are four reload registers. The context register DMSRC & DMDST are
source & destination address holders. DMCTR is for holding number of data elements in a
frame. DMSFC is to convey sync event to use to trigger DMA transfer, word size for
transfer and for holding frame count. DMMCR Controls transfer mode by specifying
source and destination spaces as program memory,data memory or I/O space. Source
address reload & Destination address reload are useful in
reloading source address and destination address. Similarly, count reload and frame count
reload are used in reloading count and frame count. Additional registers for DMA that are
common to all channels are Source Program page address, DMSRCP, Destination Program
page address, DMDSTP, Element index address register, Frame index address register.
Number of memory mapped registers for DMA are 6x(5+4) and some common registers
for all channels, amounting to total of 62 registers required. However, only 3 (+1 for
priority related) are available. They are DMA Priority & Enable Control Register
(DMPREC), DMA sub bank Address Register (DMSA), DMA sub bank Data Register with
auto increment (DMSDI) and DMA sub bank Data Register (DMSDN). To access each of
the DMA Registers Register sub addressing Technique is employed. The schematic of
the arrangement is shown in fig. 7.13. A set of DMA registers of all channels (62) are
made available in set of memory locations called sub bank. This voids the need for 62
memory mapped registers. Contents of either DMSDI or DMSDN indicate the code (1’s &
0’s) to be written for a DMA register and contents of DMSA refers to the unique sub
address of DMAregister to be accessed. Mux routes either DMSDI or DMSDN to the sub
bank. The memory locationto be written
DMSDI is used when an automatic increment of the sub address is required after
each access. Thus it can be used to configure the entire set of registers. DMSDN is used
when single DMA register access is required. The following examples bring out clearly the
method of accessing the DMA registers and transfer of data in DMA mode.