DSP PPT Mod3
DSP PPT Mod3
REFERENCE BOOKS:
1.Digital Signal Processing: A practical approach, Ifeachor
E. C., Jervis B. W Pearson- Education, PHI, 2002.
2.“Digital Signal Processors”, B Venkataramani and M Bhaskar
TMH, 2nd, 2010.
3. “Architectures for Digital Signal Processing”, Peter Pirsch
John Weily, 2008.
Learning objectives:
To understand the architecture and data addressing modes of
TMS320C54xx processors.
Memory space, program control, instructions and programming of
TMS320C54xx processors.
On-chip peripherals and interrupts of TMS320C54xx processors.
Pipeline operation of TMS320C54xx processors.
Lesson plan:
Sl No Topic Date Date Hours
planned engaged
1. Introduction 1st
The „54xx CPU is common to all the „54xx devices. The ‟54xx
CPU contains a 40-bit arithmetic logic unit (ALU); two 40-bit
accumulators (A and B); a barrel shifter; a 17 x 17-bit multiplier; a
40-bit adder; a compare, select and store unit (CSSU); an exponent
encoder(EXP); a data address generation unit (DAGEN); and a
program address generation unit (PAGEN).
Figure 3.2 shows the Functional diagram of the central processing
unit of the TMS320C54xx processors.
The ALU performs 2‟s complement arithmetic operations and bit-
level Boolean operations on 16, 32, and 40-bit words.
It can also function as two separate 16-bit ALUs and perform two
16-bit operations simultaneously.
Figure 3.2 show the functional diagram of the ALU of the
TMS320C54xx family of devices.
Accumulators A and B; store the output from the ALU or the
multiplier/adder block and provide a second input to the ALU. Each
accumulators is divided into three parts: guards bits (bits 39-32),
high-order word (bits-31-16), and low-order word (bits 15-0),
which can be stored and retrieved individually.
Each accumulator is memory-mapped and partitioned. It can be
configured as the destination registers. The guard bits are used as a
head margin for computations.
ALU supports both saturation logic and sign extension. Saturation
Logic prevents the result from underflow or overflow condition by
keeping the result at maximum or minimum.
ALU also contains many status flags,
OVM(overflow mode bit): this determines what is loaded into the
accumulator when an overflow occurs.
If OVM=0, the overflow result is placed in the accumulator without
any modification.
If OVM=1, accumulator is loaded with, most positive value & least
negative value, depending on the direction of overflow.
TC(test and control flag): this flag is used to indicate the result of any
bit test instruction.
C(carry flag): it is used to set or reset after an arithmetic operation.
After the addition if there is a carry then C bit is set to „1‟
After the subtraction if there is a borrow then it is set to „0‟
OVA (overflow flag of accumulator A):
OVA=0, if there is no overflow
OVA=1, if there is a overflow in accumulator A
OVB (overflow flag of accumulator B):
OVB=0, if there is no overflow
OVB=1, if there is a overflow in accumulator B
SXM(sign extension mode bit)
if SXM=0, sign extension is not done.
SXM=1, sign extension is done before being used by ALU.
ZA/ZB: Determines whether the output is present in accumulator A
or B
Barrel shifter
Figure 3.3.Functional diagram of the barrel shifter
Barrel shifter provides the capability to scale the data during an
operand read or write.
No overhead is required to implement the shift needed for the
scaling operations.
Scaling operation performs
Pre scaling of the input data from the memory or from the
accumulator before ALU operation.
Performing logical or arithmetic shift of the accumulator value.
Normalizing the accumulator value.
Post scaling of the accumulator before storing the accumulator value
into the memory.
The‟54xx barrel shifter can produce a left shift of 0 to 31 bits or a
right shift of 0 to 16 bits on the input data.
The shift requirements are defined in the shift count field of
instruction, the shift count field of status registers ST1, or in the
temporary register T.
Figure 3.3 shows the functional diagram of the barrel
shifter of TMS320C54xx processors.
The barrel shifter and the exponent encoder normalize the
values in an accumulator in a single cycle. The LSBs of the output
are filled with zeros, and the MSBs can be either zero filled or sign
extended, depending on the state of the sign-extension mode bit in
the status register ST1.
An additional shift capability enables the processor to perform
numerical scaling, bit extraction, extended arithmetic, and overflow
prevention operations.
Multiplier/Adder Unit
The kernel of the DSP device architecture is multiplier/adder unit.
The multiplier/adder unit of TMS320C54xx devices performs 17 x
17 2‟s complement multiplication with a 40-bit addition effectively in
a single instruction cycle.
In addition to the multiplier and adder, the unit consists of control
logic for integer and fractional computations and a 16-bit temporary
storage register, T.
Figure 3.4 show the functional diagram of the multiplier/adder unit
of TMS320C54xx processors.
The compare, select, and store unit (CSSU) is a hardware unit
specifically incorporated to accelerate the add/compare/select
operation.
This operation is essential to implement the Viterbi algorithm used in
many signal-processing applications.
The exponent encoder unit supports the EXP instructions, which
stores in the T register the number of leading redundant bits of
the accumulator content.
This information is useful while shifting the accumulator content for
the purpose of scaling.
3.3.3 Internal Memory and Memory-Mapped Registers:
The amount and the types of memory of a processor have direct
relevance to the efficiency and performance obtainable in
implementations with the processors.
The „54xx memory is organized into three individually selectable
spaces: program, data, and I/O spaces.
All „54xx devices contain both RAM and ROM. RAM can be either
dual-access type (DARAM) or single-access type (SARAM).
The on-chip RAM for these processors is organized in pages having
128 word locations on each page.
The „54xx processors have a number of CPU registers to support
operand addressing and computations.
The CPU registers and peripherals registers are all located on page 0
of the data memory.
Figure 3.5(a) and (b) shows the internal CPU registers and peripheral
registers with their addresses. The processors mode status (PMST)
registers that is used to configure the processor. It is a memory-
mapped register located at address1Dh on page 0 of the RAM.
A part of on-chip ROM may contain a boot loader and look-up tables
for function such as sine, cosine, µ - law, and A- law.
Figure 3.5(a) Internal memory-mapped registers of TMS320C54xx processors.
Figure 3.5(b).peripheral registers for the TMS320C54xx processors
Status registers (ST0,ST1):
ST0: Contains the status flags (OVA, OVB, C, TC) produced by
arithmetic operations & bit manipulations.
ST1: Contain the status of various conditions & modes. Bits of
ST0&ST1registers can be set or clear with the SSBX & RSBX
instructions.
PMST: Contains memory-setup status & control information.
Status register0 diagram:
IPTR: Interrupt vector pointer, the 9 bit INTR field points to the 128-word
program page where the interrupt vectors reside.
MP/MC: Microprocessor/Microcomputer mode, MP/MC=0, the on chip
ROM is enabled. MP/MC=1, the on chip ROM is not available.
OVLY: RAM OVERLAY, OVLY enables on chip dual access data RAM
blocks to be mapped into program space.
OVLY=0, the on-chip RAM is addressable in data space but not in program
space.
OVLY=1, the on-chip RAM is mapped into data space and program space.
AVIS: address visibility mode. It enables/disables the internal program
address to be visible at the address pins.
AVIS=0, the external address lines do not change with the internal program
address.
AVIS=1, this mode allows the internal program address to appear at the pins
of the ‟54X so that the internal program address can be traced.
DROM: Data ROM, DROM enables on-chip ROM to be mapped into data
space.
DROM=0, a on chip ROM is not mapped into data space.
DROM=1, a portion of the on chip ROM is mapped into data space.
CLKOFF: CLOCKOUT off. When the CLKOFF bit is 1, the output of
CLKOUT is disabled and remains at a high level
SMUL: Saturation on multiplication.
SMUL=1, saturation of a multiplication result occurs before performing the
accumulation in a MAC instruction.
SST: Saturation on store.
SST=1,saturation of the data from the accumulator is enabled before storing
in memory. The saturation is performed after the shift operation.
3.4 Data Addressing Modes of TMS320C54X Processors:
Data addressing modes provide various ways to access operands to execute
instructions and place results in the memory or the registers. The 54XX devices
offer seven basic addressing modes
1. Immediate addressing.
2. Absolute addressing.
3. Accumulator addressing.
4. Direct addressing.
5. Indirect addressing.
7. Stack addressing.
3.4.1 Immediate addressing:
The instruction contains the specific value of the operand. The operand can
be short (3,5,8 or 9 bit in length) or long (16 bits in length). The
instruction syntax for short operands occupies one memory location and
long operand occupies two memory locations. This addressing modes can
be used to initialize registers and memory locations.
Example: LD #20, DP.
RPT #0FFFFh.
Table 3.2 Indirect addressing options with a single data –memory operand.
Operand syntax Function
*(lk) addr lk
Name Function
Opcode This field contains the operation code for the instruction
Xmod Defined the type of indirect addressing mode used for accessing the
Xmem operand
XAR Xmem AR selection field defines the AR that contains the address of Xmem
Ymod Defies the type of inderect addressing mode used for accessing the
Ymem operand
Yar Ymem AR selection field defines the AR that contains the address of Ymem
Figure 3.11 Block diagram of the Indirect addressing options with a dual data –memory
operand.
3.4.6. Memory-Mapped Register Addressing:
Used to modify the memory-mapped registers without affecting the current data- page
pointer (DP) or stack-pointer (SP)
– Overhead for writing to a register is minimal
– Works for direct and indirect addressing
– Scratch pad RAM located on data PAGE0 can be modified
• STM #x, DIRECT
• STM #tbl, AR1
Solution:
AR3 = 1020h means that currently it points to location 1020h.
Masking the lower 6 bits zeros gives the start address of the buffer as
1000h. Replacing the same bits with the BK gives the end address as
1040h.
The Instruction LD*AR3 + 0%, A modifies AR3 by adding AR0
to it and applying the circular modification. It yields
AR3 = circ(1020h+0025h) = circ(1045h) = 1045h - 40h = 1005h.
Thus the location 1005h is the one pointed to by AR3.
Presented by:
Bhargavi K Rao
Assistant Professor
Dept. of ECE
TEXT BOOK:
1. “Digital Signal Processing”, Avatar Singh and S. Srinivasan, Thomson
Learning, 2004.
REFERENCE BOOKS:
1.Digital Signal Processing: A practical approach, Ifeachor E. C.,
Jervis B. W Pearson- Education, PHI, 2002.
2.“Digital Signal Processors”, B Venkataramani and M Bhaskar TMH, 2nd,
2010.
3. “Architectures for Digital Signal Processing”, Peter Pirsch John
Weily, 2008.
Lesson Plan
TOPICS HOURS
Arithmetic operations. 1st
Load and store instructions. 2nd
Logical operations. 3rd
Program-control operations 4th
Programs 5th
Programs 6th
Programs 7th
On chip peripherals 8th
Pipeline operation 9th
Pipeline operation 10th
Contents
Arithmetic operations
Load and store instructions
Logical operations
Program-control operations
Programs
On chip peripherals
Pipeline operation
Pipeline operation
Assembly language instructions can be classified as:
Arithmetic operations.
Load and store instructions.
Logical operations.
Program-control operations
Arithmetic operations
BANZ: Branch on Auxiliary Register Not Zero
SSBX: Set Status Register Bit
Hardware Timer
An on chip down counter
Used to generate signal to initiate any interrupt or any other process
Consists of 3 memory mapped registers:
The timer register (TIM)
Timer period register (PRD)
Timer controls register (TCR)
Prescaler block (PSC)
TDDR (Time Divide Down ratio)
TIN &TOUT
SSBX: Set Status Register Bit
STL: Store Accumulator Low Into Memory
STH : Store Accumulator High Into Memory
Example3: Write a Program to compute multiply and accumulate
using indirect addressing mode
Let us assume auxiliary register AR2 to address the data using the
indirect addressing mode.
AR2 is initialized to 310h, the location where x(n) is stored, and is
advanced to the next address after each multiply operation.
[310h,311h,312h]
AR3 is used as the pointer to access coefficients starting at h.
At the end of three multiply operations, AR2 points to 313h, the
address at which the lower 16 bits of y(n) are to be stored.
And in 314h, higher 16 bits of y(n) are to be stored.
* y(n)= h(0)x(n)+h(1)x(n-1)+h(2)x(n-2)
* h(0), h(1) and h(2) are stored in data memory locations starting
at location h,
*x(n), x(n-1) and x(n-2) are stored in data memory location 310h,
311h, & 312h resp.
*y(n) is saved in data memory location 313h and 314h.
Write a program to compute multiply and accumulate using
MAC instruction.
* y(n)= h(0)x(n)+h(1)x(n-1)+h(2)x(n-2)
* h(0), h(1) and h(2) are stored in program memory locations starting
at location h,
*x(n), x(n-1) and x(n-2) are stored in data memory locations starting at
x.
*y(n) is to be saved in location y and y+1.
The MAC instruction multiplies the contents of two data-memory
locations and adds the result to the previous contents of the
accumulator being used(note that only auxiliary register AR2-AR5
can be used) This instruction is repeated twice using RPT
instruction.
After each MAC instruction the auxiliary registers, which are being
used, should be incremented by 1.
Finally, the result is stored in the memory location pointed by “y”
using STL & STH.
On chip peripherals
It facilitates interfacing with external devices.
The peripherals are:
General purpose I/O pins
A software programmable wait state generator
Hardware timer
Host port interface (HPI)
Clock generator
Serial port
It has two general purpose I/O pins:
BIO input pin used to monitor the status of external devices.
XF output pin, software controlled used to signal external devices.
Software programmable wait state generator:
Extends external bus cycles up to seven machine cycles.
Hardware Timer
An on chip down counter
Used to generate signal to initiate any interrupt or any other process
Consists of 3 memory mapped registers:
The timer register (TIM)
Timer period register (PRD)
Timer controls register (TCR)
Prescaler block (PSC)
TDDR (Time Divide Down ratio)
TIN &TOUT
The timer register (TIM) is a 16-bit memory-mapped register that
decrements at every pulse from the prescaler block (PSC).
The timer period register (PRD) is a 16-bit memory-mapped
register whose contents are loaded onto the TIM whenever the TIM
decrements to zero or the device is reset (SRESET).
The timer can also be independently reset using the TRB signal.
The timer control register (TCR) is a 16-bit memory-mapped
register that contains status and control bits.
Table shows the functions of the various bits in the TCR
When TRB is set, the TIM is loaded with the value in the
PRD and PSC is loaded with the value in TDDR. TRB
always read as a 0.