Adsp-2136x 2137x 214xx PGR Rev2.4
Adsp-2136x 2137x 214xx PGR Rev2.4
Programming Reference
Includes ADSP-2136x, ADSP-2137x,
and ADSP-214xx SHARC Processors
Disclaimer
Analog Devices, Inc. reserves the right to change this product without
prior notice. Information furnished by Analog Devices is believed to be
accurate and reliable. However, no responsibility is assumed by Analog
Devices for its use; nor for any infringement of patents or other rights of
third parties which may result from its use. No license is granted by impli-
cation or otherwise under the patent rights of Analog Devices, Inc.
PREFACE
Purpose of This Manual ............................................................ xxxiii
Intended Audience .................................................................... xxxiii
Manual Contents ...................................................................... xxxiv
What’s New in This Manual ...................................................... xxxvi
Technical Support ..................................................................... xxxvi
Supported Processors ................................................................ xxxvii
Product Information ................................................................ xxxvii
Analog Devices Web Site ................................................... xxxviii
EngineerZone .................................................................... xxxviii
Notation Conventions ............................................................... xxxix
Register Diagram Conventions ....................................................... xl
INTRODUCTION
SHARC Design Advantages ........................................................... 1-1
Architectural Overview ................................................................. 1-3
Processor Core ......................................................................... 1-3
Dual Processing Elements .................................................... 1-3
Program Sequence Control .................................................. 1-6
REGISTER FILES
Features ........................................................................................ 2-1
Functional Description ................................................................. 2-1
Core Register Classification ..................................................... 2-2
Register Types Overview ......................................................... 2-2
Data Registers ......................................................................... 2-5
Data Register Neighbor Pairing ............................................... 2-5
Complementary Data Register Pairs ......................................... 2-5
Data and Complementary Data Register Access Priorities ......... 2-6
Data and Complementary Data Register Transfers ................... 2-7
Data and Complementary Data Register Swaps ........................ 2-7
System Register Bit Manipulation ............................................ 2-8
Combined Data Bus Exchange Register ................................... 2-9
PX to DREG Transfers ..................................................... 2-10
Immediate 40-bit Data Register Load ................................ 2-11
PX to Memory Transfers ................................................... 2-11
PX to Memory LW Transfers ............................................. 2-12
Uncomplimentary UREG to Memory LW Transfers .......... 2-13
PROCESSING ELEMENTS
Features ........................................................................................ 3-1
Functional Description ................................................................. 3-2
Single Cycle Processing ............................................................ 3-3
Data Forwarding in Processing Units ....................................... 3-3
Data Format for Computation Units ........................................ 3-4
Arithmetic Status ..................................................................... 3-4
Computation Status Update Priority ................................... 3-5
SIMD Computation and Status Flags .................................. 3-5
Arithmetic Logic Unit (ALU) ................................................... 3-5
Functional Description ....................................................... 3-6
ALU Instruction Types ........................................................ 3-7
Compare Accumulation Instruction ................................. 3-7
Fixed-to-Float Conversion Instructions ............................ 3-7
Fixed-to-Float Conversion Instructions with Scaling ........ 3-8
Reciprocal/Square Root Instructions ................................ 3-8
Divide Instruction ........................................................... 3-8
Clip Instruction .............................................................. 3-8
Multiprecision Instructions ............................................. 3-8
PROGRAM SEQUENCER
Features ........................................................................................ 4-1
Functional Description ................................................................. 4-4
Instruction Pipeline ................................................................ 4-5
VISA Instruction Alignment Buffer (IAB) ........................... 4-7
Linear Program Flow .......................................................... 4-8
Direct Addressing ............................................................... 4-9
Variation In Program Flow .......................................................... 4-10
Functional Description ......................................................... 4-10
Hardware Stacks ............................................................... 4-10
PC Stack Access ............................................................ 4-12
PC Stack Status ............................................................ 4-12
PC Stack Manipulation ................................................. 4-13
PC Stack Access Priorities ............................................. 4-13
Status Stack Access ........................................................ 4-14
Status Stack Status ........................................................ 4-15
Instruction Driven Branches ............................................. 4-15
Direct Versus Indirect Branches ......................................... 4-17
Restrictions for VISA Operation ................................... 4-18
Delayed Branches (DB) ................................................. 4-19
Branch Listings ............................................................. 4-19
TIMER
Features ........................................................................................ 5-1
Functional Description ................................................................. 5-1
Timer Interrupts ........................................................................... 5-4
MEMORY
Features ........................................................................................ 7-1
Von Neumann Versus Harvard Architectures .................................. 7-2
Super Harvard Architecture ..................................................... 7-2
Functional Description ................................................................. 7-4
Address Decoding of Memory Space ........................................ 7-4
I/O Processor Space ................................................................. 7-5
IOP Peripheral Registers ...................................................... 7-6
IOP Core Registers ............................................................. 7-7
Writes to IOP Peripheral Registers ....................................... 7-7
Back to Back Writes to IOP Peripheral Registers .............. 7-8
Alternate Writes to IOP Peripheral Registers .................... 7-8
COMPUTATION TYPES
ALU Fixed-Point Computations .................................................. 11-1
Rn = Rx + Ry ........................................................................ 11-2
Rn = Rx – Ry ........................................................................ 11-3
Rn = Rx + Ry + CI ................................................................ 11-4
Rn = Rx – Ry + CI – 1 ........................................................... 11-5
Rn = MRF + Rx * Ry (mod1)
Rn = MRB + Rx * Ry (mod1)
MRF = MRF + Rx * Ry (mod1)
MRB = MRB + Rx * Ry (mod1) ............................................. 11-51
Rn = MRF – Rx * Ry (mod1)
Rn = MRB – Rx * Ry (mod1)
MRF = MRF – Rx * Ry (mod1)
MRB = MRB – Rx * Ry (mod1) ............................................. 11-52
Rn = SAT MRF (mod2)
Rn = SAT MRB (mod2)
MRF = SAT MRF (mod2)
MRB = SAT MRB (mod2) ..................................................... 11-53
Rn = RND MRF (mod3)
Rn = RND MRB (mod3)
MRF = RND MRF (mod3)
MRB = RND MRB (mod3) ................................................... 11-54
MRF = 0
MRB = 0 ............................................................................... 11-55
MRxF/B = Rn
Rn = MRxF/B ........................................................................ 11-56
Multiplier Floating-Point Computations ................................... 11-57
Fn = Fx * Fy ....................................................................... 11-57
Shifter/Shift Immediate Computations ...................................... 11-58
Modifiers ............................................................................ 11-58
Rn = LSHIFT Rx BY Ry
Rn = LSHIFT Rx BY <data8> ................................................ 11-59
Rn = Rn OR LSHIFT Rx BY Ry
Rn = Rn OR LSHIFT Rx BY <data8> .................................... 11-60
Rn = ASHIFT Rx BY Ry
Rn = ASHIFT Rx BY <data8> ................................................ 11-61
Rn = Rn OR ASHIFT Rx BY Ry
Rn = Rn OR ASHIFT Rx BY <data8> .................................... 11-62
Rn = ROT Rx BY Ry
Rn = ROT Rx BY <data8> ...................................................... 11-63
Rn = BCLR Rx BY Ry
Rn = BCLR Rx BY <data8> .................................................... 11-64
Rn = BSET Rx BY Ry
Rn = BSET Rx BY <data8> ..................................................... 11-65
Rn = BTGL Rx BY Ry
Rn = BTGL Rx BY <data8> .................................................... 11-66
BTST Rx BY Ry
BTST Rx BY <data8> ............................................................. 11-67
Rn = FDEP Rx BY Ry
Rn = FDEP Rx BY <bit6>:<len6> ........................................... 11-68
Rn = Rn OR FDEP Rx BY Ry
Rn = Rn OR FDEP Rx BY <bit6>:<len6> ............................... 11-70
Rn = FDEP Rx BY Ry (SE)
Rn = FDEP Rx BY <bit6>:<len6> (SE) ................................... 11-72
Rn = Rn OR FDEP Rx BY Ry (SE)
Rn = Rn OR FDEP Rx BY <bit6>:<len6> (SE) ....................... 11-74
Rn = FEXT Rx BY Ry
Rn = FEXT Rx BY <bit6>:<len6> ........................................... 11-76
Rn = FEXT Rx BY Ry (SE)
Rn = FEXT Rx BY <bit6>:<len6> (SE) ................................... 11-78
Rn = EXP Rx ....................................................................... 11-80
Rn = EXP Rx (EX) .............................................................. 11-81
REGISTERS
Notes on Reading Register Drawings ............................................ A-2
Mode Control 1 Register (MODE1) ............................................. A-3
Mode Control 2 Register (MODE2) ............................................. A-7
Program Sequencer Registers ........................................................ A-8
Fetch Address Register (FADDR) ............................................ A-9
Decode Address Register (DADDR) ........................................ A-9
Program Counter Register (PC) ............................................ A-10
Program Counter Stack Register (PCSTK) ............................ A-10
Program Counter Stack Pointer Register (PCSTKP) .............. A-11
Loop Registers ........................................................................... A-11
Loop Address Stack Register (LADDR) ................................. A-11
Loop Counter Register (LCNTR) ......................................... A-12
Current Loop Counter Register (CURLCNTR) .................... A-12
NUMERIC FORMATS
IEEE Single-Precision Floating-Point Data Format ........................ C-1
Extended-Precision Floating-Point Format .................................... C-3
Short Word Floating-Point Format ............................................... C-4
Packing for Floating-Point Data ................................................... C-4
Fixed-Point Formats ..................................................................... C-6
GLOSSARY
INDEX
Thank you for purchasing and developing systems using SHARC® pro-
cessors from Analog Devices, Inc.
Intended Audience
The primary audience for this manual is a programmer who is familiar
with Analog Devices processors. The manual assumes the audience has a
working knowledge of the appropriate processor architecture and instruc-
tion set. Programmers who are unfamiliar with Analog Devices processors
can use this manual, but should supplement it with other texts, such as
hardware and programming reference manuals that describe their target
architecture.
Manual Contents
This manual provides detailed information about the SHARC processor
family in the following chapters. Please note that there are differences in
this section from previous manual revisions.
• Chapter 1, “Introduction”
Provides an architectural overview of the SHARC processors.
• Chapter 2, “Register Files”
Describes the core register files including the data exchange register
(PX).
• Chapter 3, “Processing Elements”
Describes the arithmetic/logic units (ALUs), multiplier/accumula-
tor units, and shifter. The chapter also discusses data formats, data
types, and register files.
• Chapter 4, “Program Sequencer”
Describes the operation of the program sequencer, which controls
program flow by providing the address of the next instruction to be
executed. The chapter also discusses loops, subroutines, jumps,
interrupts, exceptions, and the IDLE instruction.
• Chapter 5, “Timer”
Describes the operation of the processor’s core timer.
• Chapter 6, “Data Address Generators”
Describes the Data Address Generators (DAGs), addressing modes,
how to modify DAG and pointer registers, memory address align-
ment, and DAG instructions.
• Chapter 7, “Memory”
Describes aspects of processor memory including internal memory,
address and data bus structure, and memory accesses.
Technical Support
You can reach Analog Devices processors and DSP technical support in
the following ways:
• Post your questions in the processors and DSP support community
at EngineerZone®:
http://ez.analog.com/community/dsp
Supported Processors
The name “SHARC” refers to a family of high-performance, floating-point
embedded processors. Refer to the CCES or VisualDSP++ online help for
a complete list of supported processors.
Product Information
Product information can be obtained from the Analog Devices Web site
and the CCES or VisualDSP++ online help.
EngineerZone
EngineerZone is a technical support forum from Analog Devices, Inc. It
allows you direct access to ADI technical support engineers. You can
search FAQs and technical information to get quick answers to your
embedded processing and DSP design questions.
Use EngineerZone to connect with other DSP developers who face similar
design challenges. You can also use this open forum to share knowledge
and collaborate with the ADI support team and your peers. Visit
http://ez.analog.com to sign up.
Notation Conventions
Text conventions in this manual are identified and described as follows.
Example Description
File > Close Titles in reference sections indicate the location of an item within the
IDE environment’s menu system (for example, the Close command
appears on the File menu).
{this | that} Alternative required items in syntax descriptions appear within curly
brackets and separated by vertical bars; read the example as this or
that. One or the other is required.
[this | that] Optional items in syntax descriptions appear within brackets and sepa-
rated by vertical bars; read the example as an optional this or that.
[this,…] Optional item lists in syntax descriptions appear within brackets delim-
ited by commas and terminated with an ellipsis; read the example as an
optional comma-separated list of this.
.SECTION Commands, directives, keywords, and feature names are in text with
letter gothic font.
Caution: Device damage may result if ...
A Caution identifies conditions or inappropriate usage of the product
that could lead to undesirable results or product damage. In the online
version of this book, the word Caution appears instead of this symbol.
Warning: Injury to device users may result if ...
A Warning identifies conditions or inappropriate usage of the product
that could lead to conditions that are potentially hazardous for devices
users. In the online version of this book, the word Warning appears
instead of this symbol.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Reset = 0x0000
4. Dual address generators. The processor has two data address gen-
erators (DAGs) that provide immediate or indirect (pre- and
post-modify) addressing. Modulus, bit-reverse, and broadcast oper-
ations are supported with no constraints on data buffer placement.
5. Efficient program sequencing. In addition to zero-overhead loops,
the processor supports single-cycle setup and exit for loops. Loops
are both nestable (six levels in hardware) and interruptable. The
processors support both delayed and non-delayed branches.
Architectural Overview
The SHARC processors form a complete system-on-a-chip, integrating a
large, high speed SRAM and I/O peripherals supported by I/O buses. The
following sections summarize the features of each functional block.
Processor Core
The processor core consists of two processing elements (each with three
computation units and data register file), a program sequencer, two
DAGs, a timer, and an instruction cache. All processing occurs in the pro-
cessor core. The following list and Figure 1-1 describes some of the
features of the SHARC core processor.
PM ADDRESS 24
DMD/PMD 64 5 STAGE
PROGRAM SEQUENCER
PM DATA 48
DAG1 DAG2
16x32 16x32
PM ADDRESS 32
SYSTEM
I/F
DM ADDRESS 32
USTAT
PM DATA 64 4x32-BIT
PX
DM DATA 64
64-BIT
RF DATA RF
ALU Rx/Fx SWAP Sx/SFx
MULTIPLIER SHIFTER ALU SHIFTER MULTIPLIER
PEx PEy
16x40-BIT 16x40-BIT
STYKx STYKy
generate an interrupt and asserts their timer expired output. The count
register is automatically reloaded from a 32-bit period register and the
countdown resumes immediately.
Instruction cache. The program sequencer includes a 32-word instruction
cache that effectively provides three-bus operation for fetching an
instruction and two data values. The cache is selective; only instructions
whose fetches conflict with data accesses using the PM bus are cached.
This caching allows full speed execution of core, looped operations such as
digital filter multiply-accumulates, and FFT butterfly processing. For
more information on the cache, refer to “Operating Modes” on page 4-88.
Data bus exchange. The data bus exchange (PX) register permits data to be
passed between the 64-bit PM data bus and the 64-bit DM data bus, or
between the 40-bit register file and the PM/DM data bus. These registers
contain hardware to handle the data width difference. For more informa-
tion, see “Register Files” on page 2-1.
JTAG Port
The JTAG port supports the IEEE standard 1149.1 Joint Test Action
Group (JTAG) standard for system test. This standard defines a method
for serially scanning the I/O status of each component in a system. Emula-
tors use the JTAG port to monitor and control the processor during
emulation. Emulators using this port provide full speed emulation with
access to inspect and modify memory, registers, and processor stacks.
JTAG-based emulation is non-intrusive and does not effect target system
loading or timing.
Core Buses
The processor core has two buses—PM data and DM data. The PM bus is
used to fetch instructions from memory, but may also be used to fetch
data. The DM bus can only be used to fetch data from memory. In con-
junction with the cache, this Super Harvard Architecture allows the core
to fetch an instruction and two pieces of data in the same cycle that a data
I/O Buses
The I/O buses are used solely by the IOP to facilitate DMA transfers.
These buses give the I/O processor access to internal memory for DMA
without delaying the processor core (in the absence of memory block con-
flicts). One of the I/O buses is used for all peripherals (SPORT, SPI, IDP,
UART, TWI etc.) while the second I/O bus is only used for the external
port. The address bus is 19 bits wide, and both I/O data buses are 32 bits
wide.
Bus capacities. The PM and DM address buses are both 32 bits wide,
while the PM and DM data buses are both 64 bits wide.
These two buses provide a path for the contents of any register in the pro-
cessor to be transferred to any other register or to any data memory
location in a single cycle. When fetching data over the PM or DM bus, the
address comes from one of two sources: an absolute value specified in the
instruction (direct addressing) or the output of a data address generator
(indirect addressing). These two buses share the same port of the memory.
Each of the four memory blocks can be accessed by any of the two dedi-
cated core and I/O buses assuming the accesses are conflict free.
Data transfers. Nearly every register in the processor core is classified as a
universal register ( Ureg). Instructions allow the transfer of data between
any two universal registers or between a universal register and memory.
This support includes transfers between control registers, status registers,
and data registers in the register file. The bus connect (PX) registers permit
data to be passed between the 64-bit PM data bus and the 64-bit DM data
bus, or between the 40-bit register file and the PM/DM data bus. These
registers contain hardware to handle the data width difference. For more
information, see “Processing Element Registers” on page A-14.
GPIO Flags 4 11 15 15
Data Sizes
64-bit (LW) No Yes Yes Yes
48-bit (NW) Yes Yes Yes Yes
40-bit (NW) Yes Yes Yes Yes
32-bit (NW) Yes Yes Yes Yes
16-bit (SW) Yes Yes Yes Yes
Development Tools
The processor is supported by a complete set of software and hardware
development tools, including Analog Devices’ emulators and the Cross-
Core Embedded Studio or VisualDSP++ development environment. (The
emulator hardware that supports other Analog Devices processors also
emulates the processor.)
The development environments support advanced application code devel-
opment and debug with features such as:
• Create, compile, assemble, and link application programs written
in C++, C, and assembly
• Load, run, step, halt, and set breakpoints in application programs
• Read and write data and program memory
• Read and write core and peripheral registers
• Plot memory
Analog Devices DSP emulators use the IEEE 1149.1 JTAG test access
port to monitor and control the target board processor during emulation.
The emulator provides full speed emulation, allowing inspection and
modification of memory, registers, and processor stacks. Nonintrusive
in-circuit emulation is assured by the use of the processor JTAG inter-
face—the emulator does not affect target system loading or timing.
Software tools also include Board Support Packages (BSPs). Hardware
tools also include standalone evaluation systems (boards and extenders). In
addition to the software and hardware development tools available from
Analog Devices, third parties provide a wide range of tools supporting the
Blackfin processors. Third party software tools include DSP libraries,
real-time operating systems, and block diagram design tools.
Features
The register files have the following features.
• The non memory-mapped registers are called universal registers
and can be used by almost all instructions
• Data registers are used for computation units
• Complementary data registers are used for the complementary
computation units
• System registers are used for bit manipulation
Functional Description
The following sections provide a functional description of the register
files.
ASTATx Element x arithmetic status flags, bit test flag, and so on.
csreg ASTATy Element y arithmetic status flags, bit test flag, and so on.
Data Registers
Each of the processor’s processing elements has a data register file, which
is a set of data registers that transfers data between the data buses and the
computational units. These registers also provide local storage for oper-
ands and results.
The two register files consist of 16 primary registers and 16 alternate (sec-
ondary) registers. The data registers are 40 bits wide. Within these
registers, 32-bit data is left-justified. If an operation specifies a 32-bit data
transfer to these 40-bit registers, the eight LSBs are ignored on register
reads, and the LSBs are cleared to zeros on writes.
Program memory data accesses and data memory accesses to and from the
register file(s) occur on the PM data (PMD) bus and DM data (DMD)
bus, respectively. One PMD bus access for each processing element and/or
one DMD bus access for each processing element can occur in one cycle.
Transfers between the register files and the DMD or PMD buses can
move up to 64 bits of valid data on each bus.
Note that 16 data registers are sufficient to store the intermediate result of
a FFT radix-4 butterfly stage.
the two processing elements. Identical instructions execute on the PEx and
PEy units; the difference is the data. The data registers for PEy operations
are identified (implicitly) from the PEx registers in the instruction. This
implicit relationship between PEx and PEy data registers corresponds to
the complementary register pairs in Table 2-3.
R0 R1 S0 S1
R2 R3 S2 S3
R4 R5 S4 S5
R6 R7 S6 S7
R8 R9 S8 S9
1 For fixed-point operations, the prefixes are Rx (PEx) or Sx (PEy). For floating-point operations,
the prefixes are Fx (PEx) or SFx (PEy)
R7 <-> S7;
R2 <-> S0;
R1 = MODE1;
R1 = BSET R1 by 21; /* sets PEYEN bit */
R1 = BSET R1 by 24; /* sets CBUFEN bit */
MODE1 = R1;
R1 = dm(SYSCTL);
R1 = BSET R1 by 11; /* sets IMDW2 bit 11 */
R1 = BSET R1 by 12; /* sets IMDW3 bit 12 */
dm(SYSCTL) = R1;
BTST R1 by 11; /* clears SZ bit */
IF SZ jump func;
BTST R1 by 12; /* clears SZ bit */
IF SZ jump func;
The core has four user status registers (USTAT4–1) also classified as system
registers but for general-purpose use. These registers allow flexible manip-
ulation/testing of single or multiple individual bits in a register without
affecting neighbor bits as shown in the following example.
USTAT1= dm(SYSCTL);
BIT SET USTAT1 IMDW2|IMDW3; /* sets bits 12-11 */
dm(SYSCTL)=USTAT1;
USTAT1= dm(SYSCTL);
BIT TST USTAT1 IMDW2|IMDW3; /* test bits 12-11 */
PX2 PX1 PX
31 0 31 0 31 0 31 0
The USTAT4-1 and PX2-1 registers allow load and store operations from
memory. However, direct computations using universal registers is not
supported and therefore a data move to the data register is required.
The alignment of PX1 and PX2 within PX appears in Figure 2-2. The com-
bined PX register is an universal register (UREG) that is accessible for
register-to-register or memory-to-register transfers.
PX to DREG Transfers
The PX register to data register transfers are either 40-bit transfers for the
combined PX or 32-bit transfers for PX1 or PX2. Figure 2-2 shows the bit
alignment and gives an example of instructions for register-to-register
transfers. shows that during a transfer between PX1 or PX2 and a data regis-
ter (Dreg), the bus transfers the upper 32 bits of the register file and
zero-fills the eight least significant bits (LSBs). During a transfer between
the combined PX register and a register file, the bus transfers the upper 40
bits of PX and zero-fills the lower 24 bits.
All transfers between the PX register (or any other internal register or
memory) and any I/O processor register are 32-bit transfers (least
PX to Memory Transfers
The PX register-to-internal memory transfers over the DMD or PMD bus
are either 48-bit transfers for the combined PX or 32-bit transfers (on bits
31-0 of the bus) for PX1 or PX2. Figure 2-3 shows these transfers.
Figure 2-3 also shows that during a transfer between PX1 or PX2 and inter-
nal memory, the bus transfers the lower 32 bits of the register. During a
transfer between the combined PX register and internal memory, the bus
transfers the upper 48 bits of PX and zero-fills the lower 16 bits.
PX to Memory LW Transfers
Figure 2-4 shows the transfer size between PX and internal memory over
the PMD or DMD bus when using the long word (LW) option.
The LW notation in Figure 2-4 shows an important feature of PX regis-
ter-to-internal memory transfers over the PM or DM data bus for the
combined PX register. The PX register transfers to memory are 48-bit
(three column) transfers on bits 63-16 of the PM or DM data bus, unless a
long word transfer is used, or the transfer is forced to be 64-bit (four col-
umn) with the LW (long word) mnemonic.
The LW mnemonic affects data accesses that use the NW (normal word)
addresses irrespective of the settings of the PEYEN (processor element Y
enable) and IMDWx (internal memory data width) bits.
PX = PM (0xB8000)(LW);
DM (LW) or PM (LW)
Data Bus Transfer
64 bits
63 31 0
64 bits
63 31 0
Combined PX
I0 = 0X4F800;
M0 = 0X1;
L0 = 0x0;
DM(I0,M0) = 0xabbaabba;
Operating Modes
The following sections detail the operation of the register files.
Note that there is a one cycle latency from the time when writes are
made to the register until an alternate register set can be
MODE1
accessed.
The alternate register sets for data and results are shown in Figure 2-5. For
more information on alternate data address generator registers, see “Alter-
nate (Secondary) DAG Registers” on page 6-28. Bits in the MODE1 register
can activate independent alternate data register sets: the lower half (R0–
R7) and the upper half (R8–R15). To share data between contexts, a pro-
gram places the data to be shared in one half of either the current
processing element’s register file or the opposite processing element’s reg-
ister file and activates the alternate register set of the other half. For
information on how to activate alternate data registers, see the description
of the MODE1 register below. The register files consist of a primary set of 16
x 40-bit registers and an alternate set of 16 x 40-bit registers.
RF DATA RF
Rx/Fx SWAP Sx/SFx
PEx PEy
16x40-BIT 16x40-BIT
R0 R8 S0 S8
R1 R9 S1 S9
R2 R10 S2 S10
R3 R11 S3 S11
R4 R12 S4 S12
R5 R13 S5 S13
R6 R14 S6 S14
R7 R15 S7 S15
AVAILABLE REGISTERS-SISD MODE PEx UNIT AVAILABLE REGISTERS-SIMD MODE PEy UNIT
USTAT3 USTAT4
PX1 PX2
• Transfers between DAG and other system registers and the PX reg-
ister as shown in the following example:
I0 = PX; /* Moves PX1 to I0 */
PX = I0; /* Loads both PX1 and PX2 with I0 */
LCNTR = PX; /* Loads LCNTR with PX1 */
PX = PC; /* Loads both PX1 and PX2 with PC */
PX = USTAT1;
PX1 PX2
32 bits 32 bits
31 0 31 0
32 bits 32 bits
31 0 31 0
USTAT1 USTAT2
The PEx and PEy processing elements perform numeric processing for
processor algorithms. Each element contains a data register file and three
computation units—an arithmetic/logic unit (ALU), a multiplier, and a
barrel shifter. Computational instructions for these elements include both
fixed-point and floating-point operations, and each computational
instruction executes in a single cycle.
Features
The processing elements have the following features.
• Data Formats. The units support 32-bit fixed and floating point
single precision IEEE 32-bit and extended precision IEEE 40-bit.
• Arithmetic/logic unit. The ALU performs arithmetic and logic
operations on fixed-point and floating-point data.
• Multiplier. The multiplier performs floating-point and fixed-point
multiplication and executes fixed-point multiply/add and multi-
ply/subtract operations.
• Barrel Shifter. The barrel shifter performs bit shifts, bit, bit field,
and bit stream manipulation on 32-bit operands. The shifter can
also derive exponents.
• Multifunction. The ALU and Multiplier support simultaneous
operations for fixed- and floating-point data formats. The
fixed-point multiplier can return results as 32 or 80 bits.
Functional Description
The computational units in a processing element handle different types of
operations.
Data flow paths through the computation units are arranged in parallel, as
shown in Figure 3-1. The output of any computation unit may serve as
the input of any computation unit on the next instruction cycle. Data
moving in and out of the computation units goes through a 10-port regis-
ter file, consisting of 16 primary and 16 alternate registers. Two ports on
the register file connect to the PM and DM data buses, allowing data
transfers between the computation units and memory (and anything else)
connected to these buses.
RF
Rx/Fx
MULTIPLIER SHIFTER ALU PEx
16x40-BIT
MRF MRB
Register Register ASTATx
80-BIT 80-BIT
STYKx
The next example shows the same operation without data forwarding.
R5=dm(i2,m2); /* DAG memory load */
Nop;
R5=R5+1; /* r5 used for ALU */
Arithmetic Status
The multiplier and ALU each provide exception information when exe-
cuting floating-point or fixed-point operations (see Table 3-10 on
page 3-43 and Table 3-11 on page 3-44). Each unit updates overflow,
underflow, and invalid operation flags in the processing element’s arith-
metic status (ASTATx and ASTATy) registers and sticky status (STKYx and
STKYy) registers. An underflow, overflow, or invalid operation from any
unit also generates a maskable interrupt. There are three ways to use float-
ing-point or fixed-point exceptions from computations in program
sequencing.
• Enable interrupts and use an interrupt service routine (ISR) to han-
dle the exception condition immediately. This method is
appropriate if it is important to correct all exceptions as they occur.
• Use conditional instructions to test the exception flags in the
ASTATx or ASTATy registers after the instruction executes. This
method permits monitoring each instruction’s outcome.
• Use the bit test (BTST) instruction to examine exception flags in the
STKY register after a series of operations. If any flags are set, some of
the results are incorrect. Use this method when exception handling
is not critical.
STKYx STKYy
Functional Description
ALU instructions take one or two inputs: X input and Y input. These
inputs (known as operands) can be any data registers in the register file.
Most ALU operations return one result. However, in add/subtract opera-
tions, the ALU operation returns two results and in compare operations
the ALU returns no result (only flags are updated). ALU results can be
returned to any location in the register file.
If the ALU operation is fixed-point, the inputs are treated as 32-bit
fixed-point operands. The ALU transfers the upper 32 bits from the
source location in the register file. For fixed-point operations, the result(s)
are 32-bit fixed-point values. Some floating-point operations (LOGB, MANT
and FIX) can also yield fixed-point results.
The processor transfers fixed-point results to the upper 32 bits of the data
register and clears the lower eight bits of the register. The format of
fixed-point operands and results depends on the operation. In most arith-
metic operations, there is no need to distinguish between integer and
fractional formats. Fixed-point inputs to operations such as scaling a float-
ing-point value are treated as integers. For purposes of determining status
such as overflow, fixed-point arithmetic operands and results are treated as
two’s-complement numbers.
Bits 31–24 in the ASTATx/y registers store the flag results of up to eight
ALU compare operations. These bits form a right-shift register. When the
processor executes an ALU compare operation, it shifts the eight bits
toward the LSB (bit 24 is lost). Then it writes the MSB, bit 31, with the
result of the compare operation. If the X operand is greater than the Y
operand in the compare instruction, the processor sets bit 31. Otherwise,
it clears bit 31.
Applications can use the accumulated compare flags to implement two-
and three-dimensional clipping operations.
The ALU supports conversion between floating and fixed point as shown
in the following example.
Fn = FLOAT Rx; /* floating-point */
Rn = FIX Fx; /* fixed-point */
Divide Instruction
Clip Instruction
The clip instruction (CLIP) is very similar to the multiplier saturate (SAT)
instruction, however the clipping (saturation) level is an operand within
the instruction.
Rn = CLIP Rx by Ry; /* clip level stored in Ry register */
Multiprecision Instructions
The add with carry and the subtract with borrow allows the implementa-
tion of 64-bit operations.
Arithmetic Status
ALU operations update seven status flags in the processing element’s arith-
metic status (ASTATx and ASTATy) registers. The following bits in ASTATx or
ASTATy registers flag the ALU status (a 1 indicates the condition) of the
most recent ALU operation.
• ALU result zero or floating-point underflow, (AZ)
• ALU overflow, (AV)
• ALU result negative, (AN)
• ALU fixed-point carry, (AC)
• ALU input sign for ABS, MANT operations, (AS)
• ALU floating-point invalid operation, (AI)
• Last ALU operation was a floating-point operation, (AF)
• Compare accumulation register results of last eight compare opera-
tions, (CACC)
ALU operations also update four sticky status flags in the processing ele-
ment’s sticky status (STKYx and STKYy) registers. The following bits in
STKYx or STKYy flag the ALU status (a 1 indicates the condition). Once set,
a sticky flag remains high until explicitly cleared.
• ALU floating-point underflow, (AUS)
• ALU floating-point overflow, (AVS)
Multiplier
The multiplier performs fixed-point or floating-point multiplication and
fixed-point multiply/accumulate operations. Fixed-point multiply/accu-
mulates are available with cumulative addition or cumulative subtraction.
Multiplier floating-point instructions operate on 32-bit or 40-bit float-
ing-point operands and output 32-bit or 40-bit floating-point results.
Multiplier fixed-point instructions operate on 32-bit fixed-point data and
produce 80-bit results. Inputs are treated as fractional or integer, unsigned
or two’s-complement. Multiplier instructions include:
• Floating-point multiplication
• Fixed-point multiplication
• Fixed-point multiply/accumulate with addition, rounding optional
• Fixed-point multiply/accumulate with subtraction, rounding
optional
• Rounding multiplier result register
• Saturating multiplier result register
• Fixed point multi-precision arithmetic (signed/signed, unsigned/
unsigned or unsigned/signed options)
Functional Description
The multiplier takes two inputs, X and Y. These inputs (also known as
operands) can be any data registers in the register file. The multiplier can
accumulate fixed-point results in the local multiplier result (MRF) registers
or write results back to the register file. The results in MRF can also be
rounded or saturated in separate operations. Floating-point multiplies
yield floating-point results, which the multiplier writes directly to the reg-
ister file.
For fixed-point multiplies, the multiplier reads the inputs from the upper
32 bits of the data registers. Fixed-point operands may be either both in
integer format, or both in fractional format. The format of the result
matches the format of the inputs. Each fixed-point operand may be either
an unsigned number or a two’s-complement number. If both inputs are
fractional and signed, the multiplier automatically shifts the result left one
bit to remove the redundant sign bit.
79 63 31 0
The MRF register (Figure 3-3) is comprised of the MR2F, MR1F, and MR0F reg-
isters, which individually can be read from or written to the register file.
Each of these registers has the same format. When data is read from MR2F
(guard bits), it is sign-extended to 32 bits. The processor zero-fills the
eight LSBs of the 40-bit register file location when data is written from
MR2F, MR1F, or MR0F to a register file location. When the processor writes
data into MR2F, MR1F, or MR0F from the 32 MSBs of a register file location,
the eight LSBs are ignored. Data written to MR1F register is sign-extended
to MR2F, repeating the MSB of MR1F in the 16 bits of the MR2F register.
Data written to the MR0F register is not sign-extended.
Note that the multiply result register (MRF, MRB) is not an orthogonal regis-
ter in the instruction set. Only specific instructions decode it as an
operand or as a result register (no universal register). “Multiplier
Fixed-Point Computations” on page 11-49.
32 BITS 8 BITS
MRF1 ZEROS
32 BITS 8 BITS
MRF0 ZEROS
The clear operation (MRF = 0) resets the specified MRF register to zero.
Often, it is best to perform this operation at the start of a multiply/accu-
mulate operation to remove the results of the previous operation.
The RND operation (MRF = RND MRF) applies only to fractional results, inte-
ger results are not effected. This operation performs a round to nearest of
the 80-bit MRF value at bit 32, for example, the MR1F– MR0F boundary.
Rounding a fixed-point result occurs as part of a multiply or multiply/
accumulate operation or as an explicit operation on the MRF register. The
rounded result in MR1F can be sent to the register file or back to the same
MRF register. To round a fractional result to zero (truncation) instead of to
nearest, a program transfers the unrounded result from MR1F, discarding
the lower 32 bits in MR0F.
The multiplier supports the following data operations for 64-bit data.
MRF = Rx * Ry (SSF); /* signed x signed/fractional */
MRF = Rx * Ry (SUF); /* signed x unsigned/fractional */
MRF = Rx * Ry (USF); /* unsigned x signed/fractional */
MRF = Rx * Ry (UUF); /* unsigned x unsigned/fractional */
The SAT operation (MRF = SAT MRF) sets MRF to a maximum value if the MRF
value has overflowed. Overflow occurs when the MRF value is greater than
the maximum value for the data format—unsigned or two’s-complement
and integer or fractional—as specified in the saturate instruction. The six
possible maximum values appear in Table 3-4. The result from MRF satura-
tion can be sent to the register file or back to the same MRF register.
Arithmetic Status
Multiplier operations update four status flags in the processing element’s
arithmetic status registers (ASTATx and ASTATy). A 1 indicates the condi-
tion of the most recent multiplier operation and are as follows.
• Multiplier result negative (MN)
• Multiplier overflow, (MV)
• Multiplier underflow, (MU)
• Multiplier floating-point invalid operation, (MI)
Multiplier operations also update four “sticky” status flags in the process-
ing element’s sticky status (STKYx and STKYy) registers. Once set (a 1
indicates the condition), a sticky flag remains set until explicitly cleared.
The bits in the STKYx or STKYy registers are as follows.
• Multiplier fixed-point overflow, (MOS)
• Multiplier floating-point overflow, (MVS)
• Multiplier underflow, (MUS)
• Multiplier floating-point invalid operation, (MIS)
Note that (SF) is the default format for one-input operations, and (SSF) is the default format for
two-input operations.
Barrel Shifter
The barrel shifter is a combination of logic with X inputs and Y outputs
and control logic that specifies how to shift data between input and out-
put within one cycle.
The shifter performs bit-wise operations on 32-bit fixed-point operands.
Shifter operations include the following.
• Bit wise operations such as shifts and rotates from off-scale left to
off-scale right
• Bit wise manipulation operations, including bit set, clear, toggle,
and test
• Bit field manipulation operations, including extract and deposit
• Bit stream manipulation operations using a bit FIFO
• Bit field conversion operations including exponent extract, number
of leading 1s or 0s
• Pack and unpack conversion between 16-bit and 32-bit
floating-point
• Optional immediate data for one input within the instruction
Functional Description
The shifter takes one to three inputs: X, Y, and Z. The inputs (known as
operands) can be any register in the register file. Within a shifter instruc-
tion, the inputs serve as follows.
• The X input provides data that is operated on.
• The Y input specifies shift magnitudes, bit field lengths, or bit
positions.
• The Z input provides data that is operated on and updated.
The shifter does not make use of the ALU carry bit, it uses its own status
bits.
The shift compute instruction uses a data register for the Y input. The
data register operates based on the instruction’s 12-bit field for the bit
position start (bit6) and the bit field length (len6). Other instructions
may use only the 8-bit field.
The shift immediate instruction uses immediate data for the Y input. This
input comes from the instruction’s 12-bit field for the bit position start
(bit6) and the bit field length (len6). Other instructions may use only the
8-bit field.
As shown in Figure 3-4, the shifter fetches input operands from the upper
32 bits of a register file location (bits 39–8) or from an immediate value in
the instruction.
The X input and Z input are always 32-bit fixed-point values. The Y input
is a 32-bit fixed-point value or an 8-bit field (SHF8), positioned in the reg-
ister file. These inputs appear in Figure 3-4.
Some shifter operations produce 8 or 6-bit results. As shown in
Figure 3-4, the shifter places these results in the SHF8 field or the bit6
field and sign-extends the results to 32 bits. The shifter always returns a
32-bit result.
39 7 0
39 15 7 0
SHF8
The shifter supports bit field deposit and bit field extract instructions for
manipulating groups of bits within an input. The Y input for bit field
instructions specifies two 6-bit values, bit6 and len6, which are posi-
tioned in the Ry register as shown in Figure 3-5. The shifter interprets
bit6 and len6 as positive integers. The bit6 value is the starting bit posi-
tion for the deposit or extract, and the len6 value is the bit field length,
which specifies how many bits are deposited or extracted.
39 19 13 7 0
len6 bit6
12-BIT Y INPUT
Field deposit (FDEP) instructions take a group of bits from the input regis-
ter (starting at the LSB of the 32-bit integer field) and deposit the bits as
directed anywhere within the result register. The bit6 value specifies the
starting bit position for the deposit. Figure 3-6 shows how the inputs,
bit6 and len6, work in the following field deposit instruction.
Rn = FDEP Rx By Ry
Figure 3-7 shows bit placement for the following field deposit instruction.
R0 = FDEP R1 By R2;
39 19 13 7 0
RY bit6
len6
RX
len6 = NUMBER OF BITS TO TAKE FROM RX, STARTING FROM LSB OF 32-BIT FIELD
39 7 0
RN DEPOSIT FIELD
BIT6 = STARTING BIT POSITION FOR DEPOSIT, REFERENCED FROM LSB OF 32-BIT FIELD
39 32 24 16 8 0
16 8 0
39 32 24 16 8 0
16 8 0
Figure 3-8 shows bit placement for the following field extract instruction.
R3 = FEXT R4 By R5;
39 32 24 16 8 0
39 32 24 16 8 0
16 8 0
Starting bit position
for extraction Reference point
39 32 24 16 8 0
16 8 0
cleared
The instruction bits to the left of the extracted field are
FEXT
in the destination register. The instruction bits to the
FDEP
left and to the right of the deposited field are cleared in the destina-
tion register. Therefore programs can use the (SE) option, which
sign extends the left bits, or programs can use a logical OR instruc-
tion with the source register which does not clear the bits across the
shifted field.
I13 = buffer_base;
M13 = 1;
BFFWRP = 0x0; /* initialize Bit Fifo */
R10 = pm(I13,M13);
If NOT SF BITDEP R10 by 32,
R10 = PM(I13,M13); /* appends R10 to BFF */
DM(Var_1) = R6;
If NOT SF BITDEP R10 by 32, R10 = PM(I13,M13);
R6 = BITEXT(3); /* extracts 3 bits */
DM(Var_2) = R6;
The bit extracts are in variable quantities, but the deposit is always in
32-bits whenever the total number of bits in the bit FIFO increases
beyond 32.
I13 = buffer_base;
M13 = 1;
BFFWRP=0x0;
R10 = dm(_var1); /* get the variable */
BITDEP R10 by 6; /* append it to BFF */
If SF R10 = BITEXT(32),
pm(I13,M13) = R10; /* if the balance > 32,
transfer a word */
R10 = dm(Var_1);
BITDEP R10 by 3;
If NOT SF R10 = BITEXT(32), pm(I13,M13) = R10;
If the program vectors to an ISR during bit FIFO operations, and the ISR
uses the bit FIFO for different other purposes, then the state of the bit
FIFO has to be preserved if the program needs to restart the previous bit
FIFO operations after returning from the ISR. This is shown in
Listing 3-3.
R2 = BITEXT 32;
In the same fashion the bit FIFO can be used to extract and create differ-
ent headers in a kind of time-division multiplex fashion by storing and
restoring the bit FIFO between two different sequences of bit FIFO
operations.
Ifbita FIFO,
bit FIFO related instruction is interrupted and the ISR uses the
the state of the bit FIFO must be preserved and restored
by the ISR.
that would have underflowed, the exponent clears to zero and the mantissa
(including a “hidden” 1) right-shifts the appropriate amount. The packed
result is a denormal, which can be unpacked into a normal IEEE float-
ing-point number.
The shifter instructions may help to perform data compression, convert-
ing 32-bit into 16-bit floating point, storing the data into short word
space, and, if required, fetching and converting them back for further
processing.
Arithmetic Status
Shifter operations update four status flags in the processing element’s
arithmetic status registers (ASTATx and ASTATy) where a 1 indicates the
condition. The bits that indicate shifter status for the most recent ALU
operation are as follows.
• Shifter overflow of bits to left of MSB, (SV)
• Shifter result zero, (SZ)
• Shifter input sign for exponent extract only, (SS)
• Shifter bit FIFO status (SF)
Note that the shifter does not generate an exception handle.
flag.
The shifter FIFO bit ( in SF registers) reflects the status
ASTATx/y
Note this bit is a read-only bit unlike other flags in the
ASTATx/y registers. The value is pushed into the stack during a PUSH
operation but a POP operation does not restore this ASTAT bit.
Multifunction Computations
The processor supports multiple parallel (multifunction) computations by
using the parallel data paths within its computational units. These instruc-
tions complete in a single cycle, and they combine parallel operation of
the multiplier and the ALU or they perform dual ALU functions. The
multiple operations work as if they were in corresponding single function
computations. Multifunction computations also handle flags in the same
way as the single function computations, except that in the dual
add/subtract computation, the ALU flags from the two operations are
ORed together.
To work with the available data paths, the computational units constrain
which data registers hold the four input operands for multifunction com-
putations. These constraints limit which registers may hold the X input
and Y input for the ALU and multiplier.
Listing 3-4. MAC and Parallel Read With Software Pipeline Coding
B1=B0;
F12=F12-F12, F2 = DM(I0,M1), F4 = PM(I8,M8); /* first data */
Lcntr=N, do (pc,4) until lce; /* loop body */
F12=F2*F4, F8=F8+F12, F3 = DM(I0,M1), F4 = PM(I8,M8);
F12=F3*F4, F8=F8+F12, DM(I1,M1)=F3, F4 = PM(I8,M8);
F12=F2*F4, F8=F8+F12, F2 = DM(I0,M1), F4 = PM(I8,M8);
F12=F3*F4, F8=F8+F12, DM(I1,M1)=F8, F4 = PM(I8,M8);
RTS(db), F8=F8+F12, /* last MAC */
Nop;
Nop;
REGISTER FILE
R0 - F0
R1 - F1
R2 - F2
R3 - F3
MULTIPLIER
R4 - F4
R5 - F5
R6 - F6
R7 - F7
Any Register
Any Register R8 - F8
R9 - F9
R10 - F10
R11 - F11
ALU
R12 - F12
R13 - F13
R14 - F14
R15 - F15
Operating Modes
The MODE1 register controls the operating mode of the processing ele-
ments. Table A-1 on page A-4 lists the bits in the MODE1 register. The bits
are described in the following sections.
ALU Saturation
When the ALUSAT bit in the MODE1 register is set (= 1), the ALU is in satu-
ration mode. In this mode, positive fixed-point overflows return the
maximum positive fixed-point number (0x7FFF FFFF), and negative
overflows return the maximum negative number (0x8000 0000).
When the ALUSAT bit is cleared (= 0), fixed-point results that overflow are
not saturated, the upper 32 bits of the result are returned unaltered.
Inis always
fixed-point to floating-point conversion, the rounding boundary
40 bits, even if the RND32bit is set.
For more information on this standard, see Appendix C, Numeric For-
mats. This format is IEEE 754/854 compatible for single-precision
floating-point operations in all respects except for the following.
• The processor does not provide inexact flags. An inexact flag is an
exception flag whose bit position is inexact. The inexact exception
occurs if the rounded result of an operation is not identical to the
exact (infinitely precise) result. Thus, an inexact exception always
occurs when an overflow or an underflow occurs.
• NAN (Not-A-Number) inputs generate an invalid exception and
return a quiet NAN (all 1s).
• Denormal operands, using denormalized (or tiny) numbers, flush
to zero when input to a computational unit and do not generate an
underflow exception. A denormal operand is one of the float-
ing-point operands with an absolute value too small to represent
with full precision in the significant. The denormal exception
occurs if one or more of the operands is a denormal number. This
exception is never regarded as an error.
• The processor supports round-to-nearest and round-toward-zero
modes, but does not support round to +infinity and
round-to-infinity.
Rounding Mode
The TRUNC bit in the MODE1 register determines the rounding mode for all
ALU operations, all floating-point multiplies, and fixed-point multiplies
of fractional data. The processor supports two rounding modes—
round-toward-zero and round-toward-nearest. The rounding modes com-
ply with the IEEE 754 standard and have the following definitions.
Unlike other registers that have alternates, both the MRF and MRB registers
are coded into instructions, without regard to the state of the MODE1 regis-
ter as shown in the following example.
MRB = MRB - R3 * R2 (SSFR);
MRF = MRF + R4 * R12 (UUI);
With this arrangement, programs can use the result registers as primary
and alternate accumulators, or programs can use these registers as two par-
allel accumulators. This feature facilitates complex math. The MODE1
register controls the access to alternate registers. In SIMD mode, swapping
also occurs with the PEY unit based registers (MSF and MSB).
SIMD Mode
The SHARC core contains two sets of computational units and associated
register files. As shown in Figure 1-1 on page 1-4, these two processing
elements (PEx and PEy) support SIMD operation.
The MODE1 register controls the operating mode of the processing ele-
ments. The PEYEN bit (bit 21) in the MODE1 register enables or disables the
PEy processing element. When PEYEN is cleared (0), the processor operates
in SISD mode, using only PEx. When the PEYEN bit is set (1), the proces-
sor operates in SIMD mode, using both the PEx and PEy processing
elements. There is a one cycle delay after PEYEN is set or cleared, before the
mode change takes effect.
For shift immediate instructions the Y input is driven by immediate data
from the instructions (and has no complement data as a register does). If
using SIMD mode, the immediate data are valid for both PEx and PEy
units as shown in Listing 3-6.
The two processing elements are symmetrical; each contains these func-
tional blocks:
• ALU
• Multiplier primary and alternate result registers
• Shifter
• Data register file and alternate register file
Arithmetic Interrupts
The following sections describe how the processor core handles arithmetic
interrupts. Note that the shifter does not generate interrupts for exception
handling.
ALU Interrupts
Table 3-10 provides an overview of the ALU interrupts.
Multiplier Interrupts
Table 3-11 provides an overview of the multiplier interrupts.
Interrupt Acknowledge
After an exception has been detected the ISR routine needs to clear the
flag bit as shown in Listing 3-7.
ISR_ALU_Exception:
bit tst STKYx AVS; /* check condition */
IF TF jump ALU_Float_Overflow;
bit tst STKYx AOS; /* check condition */
IF TF jump ALU_Fixed_Overflow;
ALU_Fixed_Overflow:
bit clr STKYx AOS; /* clear sticky bit */
rti;
ALU_Float_Overflow:
bit clr STKYx AVS; /* clear sticky bit */
rti;
Features
The sequencer controls the following operations.
• Loops. One sequence of instructions executes several times with
zero overhead.
• Subroutines. The processor temporarily breaks sequential flow to
execute instructions from another part of program memory.
• Jumps. Program flow is permanently transferred to another part of
program memory.
IRQ
CALL INSTRUCTION IDLE
RTS RTI
ADDRESS STACK
LADDR
6 x 32
INSTRUCTION
LATCH COUNT STACK CONDITIONAL
STATUS STACK
LCNTR LOGIC
6 x 32 15 x 3 x 32
PROGRAM
SEQUENCER INTERRUPT CONTROL
LATCH INTERRUPTS
PC (E) MASK
MASK POINTER
+/- LOOP
DADDR (A) SEQUENCER
FADDR (F1)
PCSTK
PC STACK
30 x 26
PCSTKP
VISA ISA DAG2
+3 +1
NEXT ADDRESS
Direct PC Relative Next IDLE Next Indirect RTS, RTI IVT
Branch Branch Fetch Fetch Branch TOP of loop Branch
Functional Description
The sequencer uses the blocks shown in Figure 4-2 to execute instruc-
tions. The sequencer’s address multiplexer selects the value of the next
fetch address from several possible sources. These registers contain the
24-bit addresses of the instructions currently being fetched, decoded, and
executed.
Instruction Pipeline
The program sequencer determines the next instruction address by exam-
ining both the current instruction being executed and the current state of
the processor. If no conditions require otherwise, the processor fetches
and executes instructions from memory in sequential order.
To achieve a high execution rate while maintaining a simple programming
mode, the processor employs a five stage interlocked pipeline, shown in
Table 4-1, to process instructions and simplify programming models. All
possible hazards are controlled by hardware.
The legacy Instruction Set Architecture (ISA) instructions are addressed
using normal word (NW) address space, whereas Variable Instruction Set
Architecture (VISA) instructions are addressed using short word (SW)
address space. Switching between traditional ISA and VISA instruction
spaces happens not via any bit settings in any registers. Instead, the transi-
tion occurs automatically when branches (JUMP/CALL or interrupts) take
the execution from ISA address space to VISA address space or vice versa.
Note that the processor always emerges from reset in ISA mode, so
the interrupt vector table must always reside in ISA address space.
The processor controls the fetch address, decode address, and program
counter (FADDR, DADDR, and PC) registers which store the Fetch1, decode,
and execution phase addresses of the pipeline.
Fetch1 In this stage, the appropriate instruction address is Next SW address is auto
chosen from various sources and driven out to mem- incremented by three for
ory. The instruction address is matched with the cache every 48-bit fetch
to generate a condition for cache miss/hit. The next
NW address is auto incremented by one.
Fetch2 This stage is the data phase of the instruction fetch Stores 3 x 16-bit instruc-
memory access wherein the data address generator tion data into the IAB
(DAG) performs some amount of pre-decode. Based buffer and presents 1
on a cache condition, the instruction is read from instruction/cycle to the
cache/driven from the memory instruction data bus. decoder
Decode The instruction is decoded and various conditions that Decode VISA
control instruction execution are generated. The main instruction; store its
active units in this stage are the DAGs, which generate length information in
the addresses for various types of functions like data short words.
accesses (load/store) and indirect branches. DAG pre-
modify (M+I) operation is performed. For a cache
miss, instruction data read from memory are loaded
into the cache.
Execute The operations specified in the instruction are exe- Executing VISA
cuted and the results written back to memory or the instructions the PC value
universal registers. For interrupt branch the IVT is incremented by 1, 2 or
address is forward to the Fetch1 stage. ISA instructions 3 depending on length
always increment PC value by 1 each cycle. information from the
Instruction decode.
48 FROM
MEMORY
DELAY
REGISTER
16 16 16 16 16
“concatenate”
TO
DECODER
In VISA mode, the situation is different since the instruction fetch rate is
always 48 bits but the consumption rate can vary. In Table 4-3, the
instruction fetch (48-bit) stalls because the IAB FIFO is filling up. After
decoding the next instructions, the IAB indicates space for new instruc-
tions which tells the sequencer to continue fetching by increasing the
program counter.
On block space boundaries, the instruction fetch does not halt and
continues to fetch next address.
Execute n n+1 n+2 n+3 n+4 n+5 n+6 n+7 n+8 n+9
Address n n+1 n+2 n+3 n+4 n+5 n+6 n+7 n+8 n+9 n+10
Decode n n+1 n+2 n+3 n+4 n+5 n+6 n+7 n+8 n+9 n+10 n+11
Fetch2 n n+1 n+2 n+3 n+4 n+5 n+6 n+7 n+8 n+9 n+10 n+11 n+12
Direct Addressing
Similar to the DAGs, the sequencer also provides the data address for
direct addressing types as shown in the following example.
R0 = DM(0x90500); /* sequencer generated data address */
PM(0x90600) = R7: /* sequencer generated data address */
Functional Description
In order to manage these variations, the processor uses several mecha-
nisms, primarily hardware stacks, which are described in the following
sections.
Hardware Stacks
If the programmed flow varies (non-sequential and interrupted), the pro-
cessor requires hardware or software mechanisms (stacks, Table 4-4) to
support changes of the regular program flow. The SHARC core supports
three hardware stack types which are implemented outside of the memory
space and are used and accessed for any non-sequential process. The stack
types are:
• Program count stack – Used to store the return address (call, IVT
branch, do until).
• Status stack – Used to store some context of status registers.
• “Loop Stack” on page 4-48 for address and count – Used for hard-
ware looping (unnested and nested). This stack is described in
“Loop Sequencer” section later in this chapter.
The SHARC processor does not have a general-purpose hardware stack.
However, the DAG architecture allows a software stack implementation
by using post (push) and pre-modify (pop) DAG instruction types.
Automated Access
Manual Access
PC Stack Access
For the ADSP-2137x processors and later, the PC register size has
been enlarged to 26-bits. This allows read/write to the former hid-
den bits allowing full software control of the stack registers.
PC Stack Status
The PC stack is 30 locations deep. The stack is full when all entries are
occupied, is empty when no entries are occupied, and is overflowed if a
push occurs when the stack is full. The following bits in the STKYx register
indicate the PC stack full and empty states.
• PC stack full. Bit 21 (PCFL) indicates that the PC stack is full (if 1)
or not full (if 0)—not a sticky bit, cleared by a pop.
PC Stack Manipulation
The PCSTK register contains the top entry on the PC stack. This register is
readable and writable by the core. Reading from and writing to PCSTK does
not move the PC stack pointer. Only a stack push or pop performed with
explicit instructions moves the stack pointer. The PCSTK register contains
the value 0x3FF FFFF when the PC stack is empty. A write to PCSTK has
no effect when the PC stack is empty. “Program Counter Stack Register
(PCSTK)” on page A-10 lists the bits in the PCSTK register.
The address of the top of the PC stack is available in the PC stack pointer
(PCSTKP) register. The value of PCSTKP is zero when the PC stack is empty,
is 1 through 30 when the stack contains data, and is 31 when the stack
overflows. A write to PCSTKP takes effect after one cycle of delay. If the PC
stack is overflowed, a write to PCSTKP has no effect. For example a write to
PCSTKP = 3 deletes all entries except the three oldest.
The sequencer’s status stack eases the return from branches by eliminating
some service overhead like register saves and restores as shown in the fol-
lowing example.
CALL fft1024; /* Where fft1024 is an address label */
fft1024:push sts; /* save MODE1/ASTATx/y registers */
instruction;
instruction;
pop sts; /* re-store MODE1/ASTATx/y registers */
rts;
For some interrupts, (IRQ2–0 and timer expired), the sequencer automati-
cally pushes the ASTATx, ASTATy, and MODE1 registers onto the status stack.
When the sequencer pushes an entry onto the status stack, the processor
uses the MMASK register to clear the corresponding bits in the MODE1 register.
All other bit settings remain the same. See the example in “Interrupt Mask
Mode” on page 4-40.
The sequencer automatically pops the ASTATx, ASTATY, and MODE1 registers
from the status stack during the return from interrupt instruction (RTI).
In one other case, JUMP (CI), the sequencer pops the stack. For more infor-
mation, see “Interrupt Self-Nesting” on page 4-36. Only the IRQ2–0 and
timer expired interrupts cause the sequencer to push an entry onto the
status stack. All other interrupts require either explicit saves and restores
of effected registers or an explicit push or pop of the stack (PUSH/POP STS).
Pushing the ASTATx, ASTATy, and MODE1 registers preserves the status and
control bit settings. This allows a service routine to alter these bits with
the knowledge that the original settings are automatically restored upon
return from the interrupt.
The top of the status stack contains the current values of ASTATx, ASTATy,
and MODE1. Explicit PUSH or POP instructions (not reading and writing these
registers) are used move the status stack pointer.
Asinstructions
shown in the following example, do not use
exiting from or timer ISRs (
IRQx
(DB) modifier in
RTI; and JUMP
(CI);).
The status stack is fifteen locations deep. The stack is full when all entries
are occupied, is empty when no entries are occupied, and is overflowed if a
push occurs when the stack is already full. Bits in the STKYx register indi-
cate the status stack full and empty states as describe below.
• Status stack overflow. Bit 23 (SSOV) indicates that the status stack
is overflowed (if 1) or not overflowed (if 0)—a sticky bit.
• Status stack empty. Bit 24 (SSEM) indicates that the status stack is
empty (if 1) or not empty (if 0)—not sticky, cleared by a push.
Both ASTATx and ASTATy register values are pushed/popped regardless of
SISD/SIMD mode.
In( processors
, ,
with 5-stage pipelines, the instruction driven branch
) occurs in the address phase on the
CALL JUMP DO UNTIL
sequencer while the interrupt (IVT) branch occurs in the Execute
phase. This is different from 3-stage pipelines were all branches
occur in the Execute stage of the pipeline.
• A JUMP or a CALL instruction transfers program flow to another
memory location. The difference between a JUMP and a CALL is that
a CALL automatically pushes the return address (the next sequential
address after the CALL instruction) onto the PC stack. This push
makes the address available for the CALL instruction’s matching
return instruction, (RTS) in the subroutine, allowing an easy return
from the subroutine.
• A RTS instruction causes the sequencer to fetch the instruction at
the return address, which is stored at the top of the PC stack. The
two types of return instructions are return from subroutine (RTS)
and return from interrupt (RTI). While the RTS instruction only
pops the return address off the PC stack, the RTI pops the return
address and:
1. Clears the interrupt’s bit in the interrupt latch register (IRPTL)
and the interrupt mask pointer register (IMASKP). This allows
another interrupt to be latched in the IRPTL register and the inter-
rupt mask pointer (IMASKP) register.
2. Pops the status stack if the ASTATx/y and MODE1 status registers
have been pushed for the interrupts for the IRQ2-0 signals or for
the core timer.
The following are parameters that can be specified for branching
instructions.
Indirect branches are JUMP or CALL instructions that use a dynamic address
that comes from the DAG2. Note that this is useful for reconfigurable
routines and jump tables.
For more information refer to the instruction set types (9a/b and 10a).
Two instruction examples that cause an indirect branch are:
JUMP (M8, I12); /* where (M8, I12) are DAG2 registers */
CALL (M9, I13); /* where (M9, I13) are DAG2 registers */
I9 = my_jump_table;
M9 = 2;
JUMP (M9, I9);
my_jump_table:
JUMP function0;
JUMP function1;
JUMP function2;
. . .
Branch Listings
As shown in Table 4-5 and Table 4-6, the processor aborts the three
instructions after the branch, which are in the Fetch1, Fetch2 and Decode
stages. For a CALL instruction, the address of the instruction after the CALL
is the return address. During the three lost (no-operation) cycles, the first
instruction at the branch address passes through the Fetch2, Decode and
address phases of the instruction pipeline
In the tables that follow, shading indicates aborted instructions, which are
followed by NOP instructions.
Table 4-7. Pipelined Execution Cycles for Delayed Branch (JUMP or Call)
Cycles 1 2 3 4 5 6 7
In JUMP and CALL instructions that use the delayed branch (DB) modifier,
one instruction cycle is lost in the instruction pipeline. This is because the
processor executes the two instructions after the branch and the third is
aborted while the instruction pipeline fills with instructions from the new
location. This is shown in the sample code below.
jump (pc, 3) (db):
instruction 1;
instruction 2;
As shown in Table 4-7 and Table 4-8, the processor executes the two
instructions after the branch and the third is aborted, while the instruc-
tion at the branch address is being processed at the Fetch2, Decode and
Address stages of the instruction pipeline. In the case of a CALL instruc-
tion, the return address is the third address after the branch instruction.
While delayed branches use the instruction pipeline more efficiently than
immediate branches, delayed branch code can be harder to implement
because of the instructions between the branch instruction and the actual
branch. This is described in more detail in “Restrictions when Using
Delayed Branches” on page 4-23.
Note that during a delayed branch, a program can read the PC stack regis-
ter or PC stack pointer register. This read shows the return address on the
PC stack has already been pushed or popped, even though the branch has
not yet occurred.
This example shows that when a program pushes the PCSTK during a
delayed slot, the PC stack pointer is pushed onto the PCSTK.
The following instructions are executed prior to executing the RTS.
pop PCSTK;
RTS (db);
nop;
nop;
explicit
Manipulation of these stacks by using / instructions and
PUSH POP
writes to these stacks may affect the correct loop function.
Writes to the PCSTK or PCSTKP Registers
The following two situations may arise when programs attempt to write to
the PC stack inside a delayed branch.
1. If programs write into the PCSTK inside a jump, one of the follow-
ing situations can occur.
a. The PC stack cannot hold a value that has already been
pushed onto the PC stack.
When the PC stack contains a value and a program writes
that same value onto the stack (via PCSTK), the original value
is overwritten by the new value of the PCSTK register.
b. The PC stack is empty.
Programs cannot write to the PC stack when they are inside
a jump. In this case the PC stack remains empty.
2. Write to the PCSTK inside a call.
If a program writes to the PC stack inside of a call, the value that is
pushed onto the PC stack because of that call is overwritten by the
value written onto the PC stack. Therefore, when a program
performs an RTS, the program returns to the address pushed onto
the PC stack and not to the address pushed while branching to the
subroutine as shown below.
The value 0x90103 is pushed onto the PC stack, while the value
0x90200 is written to the PCSTK register. Accordingly, the value
0x90103 is overwritten by the value 0x90200 in the PC stack
because values that are pushed onto the stack have lower priority
over values written to PCSTK register. Therefore, when the program
executes an RTS, the return address is 0x90200 and not 0x90103.
Operating Mode
This section provides information on the operating mode that controls
variations in program flow.
Interrupt Categories
Interrupt Vector
Table
Table 4-9, Table 4-10, and Table 4-11 show the pipelined execution
cycles for interrupt processing.
Table 4-9. Pipelined Execution Cycles for Interrupt Based During Single
Cycle Instruction
Cycles 1 2 3 4 5 6 7
Table 4-10. Pipelined Execution Cycles for Interrupt During Delayed Branch
Instruction
Cycles 1 2 3 4 5 6 7 8 9 10
n is the delayed branch instruction, j is the jump address, and v is the interrupt vector.
1. Cycle1: Interrupt occurs.
2. Cycle2: Interrupt is latched and recognized, but not processed.
3. Cycle3: n+3 beyond delay slot, interrupt processing delayed.
4. Cycle4: Interrupt processing delayed.
5. Cycle5: Interrupt processed.
6. Cycle6: j pushed onto PC stack, fetch of vector address starts.
Table 4-10. Pipelined Execution Cycles for Interrupt During Delayed Branch
Instruction
Cycles 1 2 3 4 5 6 7 8 9 10
n is the delayed branch instruction, j is the jump address, and v is the interrupt vector.
1. Cycle1: Interrupt occurs.
2. Cycle2: Interrupt is latched and recognized, but not processed.
3. Cycle3: n+3 beyond delay slot, interrupt processing delayed.
4. Cycle4: Interrupt processing delayed.
5. Cycle5: Interrupt processed.
6. Cycle6: j pushed onto PC stack, fetch of vector address starts.
For most interrupts, both internal and external, only one instruction is
executed after the interrupt occurs (and four instructions are aborted),
before the processor fetches and decodes the first instruction of the service
routine. There is also a five cycle latency associated with the IRQ2–0
interrupts.
If nesting is enabled and a higher priority interrupt occurs immediately
after a lower priority interrupt, the service routine of the higher priority
interrupt is delayed until the first instruction of the lower priority inter-
rupt’s service routine is executed. For more information, see “Interrupt
Nesting Mode” on page 4-41.
Interrupt Processing
The next several sections discuss the ways in which the SHARC core pro-
cesses interrupts.
According the IVT table the core supports different groups of interrupts
such as:
• Reset – hardware/software
• emulator – debugger, breakpoints, BTC
• core timer – high, low priority
• illegal memory access – forced long word, illegal IOP space
• stack exceptions – PC, Loop, Status
Between servicing and returning, the sequencer clears the latch bit of the
in-progress ISR every cycle until the RTI (return from interrupt) instruc-
tion is executed. When using an ISR, writes into an IOP control register
or a buffer to clear the interrupt causes some latency. During this delay,
the interrupt may be generated a second time. For more information, see
the processor-specific hardware reference manual.
Latching Interrupts
Interrupt Acknowledge
Interrupt Self-Nesting
When an interrupt occurs, the sequencer sets the corresponding bit in the
IRPTL register. During execution of the service routine, the sequencer
keeps this bit cleared which prevents the same interrupt from being
latched while its service routine is executing. If necessary, programs may
reuse an interrupt while it is being serviced. Using a jump clear interrupt
instruction, (JUMP (CI)) in the interrupt service routine clears the inter-
rupt, allowing its reuse while the service routine is executing.
The JUMP (CI) instruction reduces an interrupt service routine to a nor-
mal subroutine, clearing the appropriate bit in the interrupt latch and
interrupt mask pointer registers and popping the status stack. After the
JUMP (CI) instruction, the processor stops automatically clearing the
interrupt’s latch bit, allowing the interrupt to latch again (Figure 4-5).
When returning from a subroutine that was entered with a JUMP (CI)
instruction, a program must use a return loop reentry instruction, RTS
(LR), instead of an RTI instruction. For more information, see “Restric-
tions on Ending Loops” on page 4-55. The following example shows an
interrupt service routine that is reduced to a subroutine with the (CI)
modifier.
ISR
PRIORITY No Interrupt Self-Nesting
Main
Main
The sequencer supports placing the processor in a low power halted state
called idle. The processor is in this state until an interrupt occurs. The
execution of the ISR releases the processor from the idle state. When
executing an IDLE instruction (Figure 4-2 on page 4-4, Table 4-12), the
sequencer fetches one more instruction at the current fetch address and
then suspends operation. The processor’s internal clock and core timer (if
enabled) continue to run while in the idle state. When an interrupt
occurs, the processor responds normally after a five cycle latency to fetch
the first instruction of the interrupt service routine.
The processor’s I/O processor is not affected by the IDLE instruction.
DMA transfers to or from internal memory continue uninterrupted.
single
The debugger allows you to single step over the instruction in
step mode. This feature is enabled by the emulator interrupt
IDLE
Decode n–2 n–1 idle n+1 n+1 n+2 n+3 v v+1 v+2
nop nop nop
Certain processor operations that span more than one cycle or which
occur at a certain state of the instruction pipeline that involves a change of
program flow can delay interrupt processing. If an interrupt occurs during
one of these operations, the processor synchronizes and latches the inter-
rupt, but delays its processing. The operations that have delayed interrupt
processing are:
• The first of the two cycles used to perform a program memory data
access and an instruction fetch (a bus conflict) when the instruc-
tion is not cached.
• Any cycle in which the core access of internal memory is delayed
due to a conflict with the DMA, or the access to the mem-
ory-mapped registers is delayed due to wait states.
• A branch (JUMP or CALL) instruction and the following two cycles,
whether they are instructions (in a delayed branch) or a NOP (in a
non-delayed branch).
• In addition to the above, the cycle in which a branch is in the
Address stage of the pipeline along with the last instruction of a
counter based loop in the Fetch1 stage.
• The first four of the five cycles used to fetch and execute the first
instruction of an interrupt service routine.
• In the case of arithmetic loops, the cycle in which the loop aborts
and the following three cycles.
• In the case of counter based loops:
• The cycle in which the counter-expired condition tests true
and the following three cycles in the case of loops having
less than four instructions in the body.
• ALU saturation
• SIMD
• Circular buffering
The system needs to disable ALU saturation, SIMD, and bit-reversing for
I8 after pushing the status stack then pushing the MMASK register (these bit
locations should = 1).
The value in the MODE1 register after PUSH STS instruction is:
• Secondary registers for DAG2 (high)
• Interrupt nesting enabled
• Circular buffering enabled
The other settings that were previously set in the MODE1 register remain the
same. The only bits that are affected are those that are set both in the
MMASK and in MODE1 registers. These bits are cleared after the status stack is
pushed.
default
If the program does not make any changes to the register, the
MMASK
setting automatically disables SIMD when servicing any of
the hardware interrupts mentioned above, or during any push of
the status stack.
priority interrupts are latched as they occur, but the processor processes
them according to their priority after the nested routines finish.
The IMASKP bits in the IMASKP register and the MSKP bits in the LIRPTL reg-
ister list the interrupts in priority order and provide a temporary interrupt
mask for each nesting level.
ISR2 ISR2
ISR1 ISR1
Main
ISR
priority
Interrupt Nesting (NESTM bit = 1)
Main
as they occur and the processor processes them in the order of their prior-
ity, after the active routine finishes.
Programs should change the interrupt nesting enable (NESTM) bit only
while outside of an interrupt service routine or during the reset service
routine.
registers
The bits in the
MSKP register and the entire set of
LIRPTL IMASKP
are for interrupt controller use only. Modifying these bits
interferes with the proper operation of the interrupt controller.
Loop Sequencer
The main role of the sequencer is to generate the address for the next
instruction fetch. In normal program flow, the next fetch address is the
previous fetch address plus one (plus three in VISA). When the program
deviates from this standard course, (for example with calls, returns, jumps,
loops) the program sequencer uses a special logic. In cases of program
loops, the sequencer logic:
• Updates the PC stack with the top of loop address.
• Updates the loop stack with the address of the last instruction of
the loop.
• Initializes the LCNTR/CURLCNTR registers and update the loop
counter stack, if the loop is counter based (do until lce).
• Generates the loop-back (go to the beginning of loop) and loop
abort (come out of loop, fetch next instruction from “last instruc-
tion of loop plus one” address) signals, according to defined
termination condition.
• Generates the abort signals to suppress some of the extra fetched
instructions (in case of special loops, some unwanted instructions
may get fetched).
• Provides correct instructions (via loop buffer) to the instruction
bus (in case of one and two instruction loops).
• Handles interrupts without distorting the intended loop-sequenc-
ing (until or unless interrupt service routine deliberately
manipulates the status of loop-sequencer resources).
• Handles the branches from within the loop to outside the loop or
to some other instruction, within the loop. Updates the loop
resources if a branch is paired with an abort option.
• Handles the different types of returns from a subroutine and to
manage loop-sequencer resources accordingly.
• Provides access to non-loop related instructions (like write, read,
push, pop).
Restrictions
There are some restrictions that apply to loop instructions. These restric-
tions can be classified as general (for example applicable to counter,
arithmetic and short loops), or specific (for example arithmetic only, or
short loops only).
Functional Description
A loop occurs when a DO/UNTIL instruction causes the processor to repeat a
sequence of instructions until a condition tests true or indefinite by using
FOREVER as termination condition. Unlike other processors, the SHARC
processors automatically evaluate the loop termination condition and
modify the program counter (PC) register appropriately. This allows zero
overhead looping.
Aand arithmeticinstruction
DO UNTIL may be broadly classified as counter based
or indefinite.
DO/UNTIL Termination; => pushes loop count onto loop count stack
instruction 1; => pushes top loop address onto PC stack
instruction 2;
...
...
Instruction n; => pushes end loop address onto loop address
stack
condition tests true, the sequencer terminates the loop and fetches the
next instruction after the end of the loop, popping the loop and PC stacks.
Table 4-13 and Table 4-14 show the instruction pipeline states for loop
iteration and termination.
Loop Stack
The loop controller supports a stack that controls saving various loop
address and loop counts automatically. This is required for nesting opera-
tions including loop abort calls or jumps.
The loop controller uses the loop and program stack for its opera-
tion. Manipulation of these stacks by using / instructions
PUSH POP
and explicit writes to these stacks may affect the correct function-
ing of the loop.
The sequencer keeps the loop counter stack synchronized with the
loop address stack. Both stacks always have the same number of
locations occupied. Because these stacks are synchronized, the same
empty and overflow status flags from the STKYx register apply to
both stacks.
• Loop stacks overflowed. Bit 25 (LSOV) indicates that the loop
counter stack and loop stack are overflowed (if set to 1) or not
overflowed (if set to 0)— LSOV is a sticky bit.
• Loop stacks empty. Bit 26 (LSEM) indicates that the loop counter
stack and loop stack are empty (if set to 1) or not empty (if set to
0)—not sticky, cleared by a PUSH.
Table A-7 on page A-23 lists the bits in the STKYx register.
Unlike previous SHARC processors with a 3-stage pipeline, the LCNTR reg-
ister in 5-stage processors no longer changes value unless explicitly loaded
as shown in the following example.
R12=0x8;
LCNTR = R12, do (PC,7) until lce;
nop;
nop;
nop;
nop;
nop;
dm(I0,M0) = LCNTR;
dm(I0,M0) = LCNTR;
/* 3-stage products: LCNTR is 8 in first 7 iterations, in the
last iteration it is 1.
For 5-stage products: LCNTR is always 8. */
During the normal execution of the counter based loop, CURLCNTR is dec-
remented in every iteration of the loop, when the end-of-loop instruction
is fetched. Therefore, the NOT LCE condition changes accordingly. Since
there are two cycles of latency for the NOT LCE condition to change after
CURLCNTR value has changed, an instruction with a branch on the NOT LCE
condition also has two cycles of latency. For all other instructions, the
latency is one cycle. The following is an example.
LCNTR = <COUNT>, DO End UNTIL LCE;
...
Instr(e-4); /* In last iteration CURLCNTR = 1 */
IF NOT LCE CALL (sub1); /* In all iterations branch is taken */
IF NOT LCE CALL (sub2); /* In all iterations branch is taken.
However, a non-branch instruction
aborts only in the last iteration */
IF NOT LCE <any type>; /* Branch aborts only in the last
iteration */
End: Instr(e)
Note that the latency is counted in terms of machine cycles and not in
terms of instruction cycles. Therefore, if the pipeline is stalled for some
reason (for example for a DMA) the behavior is different from that shown
in the example.
Arithmetic Loops
Arithmetic loops are loops where the termination condition in the
DO/UNTIL loop is any thing other than LCE. In this type of loop, where the
body has more than one instruction, the termination condition is checked
when the second instruction of the loop body is fetched. In loops that
contain a single instruction, the termination condition is checked in every
cycle after the DO/UNTIL instruction is executed. An example of arithmetic
loop is given below.
R7 = 14;
R6 = 10;
R5 = 6;
Address b+1 b+2 b+3 b+4 b+5 nop nop b+6 b+7
Decode b+2 b+3 b+4 b+5 bnop b+1 nop b+6 b+7 b+8
b is the first instruction of the body of the loop and b+6 is the instruction after the loop
1. Cycle2: Loop back, next fetch instruction is b.
2. Cycle4: Termination condition tests true, loop-back aborts, PC and loop stacks popped.
Indefinite Loops
A DO FOREVER instruction executes a loop indefinitely, until an interrupt
or reset intervenes as shown below.
DO label UNTIL FOREVER; /* pushed LCNTR onto Loop count stack */
R6 = DM(I0,M0); /* pushed to PC stack */
R6 = R6 - 1;
IF EQ CALL SUB;
nop;
label: nop; /* pushed to loop address stack */
instructions in the last four instructions of a loop. This is required for two
reasons:
• To handle interrupts when the sequencer is fetching and executing
the last few instructions.
• To reliably detect the fetch of the last instruction.
The assembler automatically identifies the last four instructions of a hard-
ware loop and treats them appropriately.
In cases of short loops (loops with a body shorter than four instructions),
the above rule extends to state that all the instructions in the loop are left
uncompressed as shown in the following example.
[130000] LCNTR = N, DO the_end UNTIL LCE;
[130001] R0 = R0 + 1; /* short compute */
[130002] R0 = R0 + 1; /* short compute */
[130003] R0 = R0 + 1; /* compute */
[130006] R0 = R0 + 1; /* compute */
[130009] R0 = R0 + 1; /* compute */
[13000C] the_end:R0 = R0 + 1; /* compute */
Short loops that iterate less than minimum number of times, incur
up to three cycles of overhead, because there can be up to three
aborted instructions after the last iteration to clear the instruction
pipeline.
Table 4-16 summarizes all the cases of the loops and the way the termina-
tion condition is checked.
1 1, 2, 3 CURLCNTR==1 3
2 1 CURLCNTR==1 2
3 1 CURLCNTR==1 3
1 The termination condition is always checked when the last instruction of the loop is fetched,
(when the instruction that is four instructions before the end-of-loop is executed).
The following sections provide more detail for these types of loops.
Table 4-17 through Table 4-21 show the instruction pipeline execution
for counter based single instruction loops. Table 4-22 through Table 4-24
show the pipeline execution for counter based two instruction loops.
Table 4-25 and Table 4-26 show the pipeline execution for counter based
three instruction loops.
n is the loop start instruction and n+2 is the instruction after the loop.
1. Cycle1: Next fetch address determined as n+1. n+1 locked in decode stage.
2. Cycle2: Loop count (LCNTR) equals 5, Decode stalls.
3. Cycle3: n+1 stays in decode, n+1 put into fetch stage.
4. Cycle4: Last instruction fetched, counter expired tests true, n+1 stays in decode.
5. Cycle5: Loop back aborts, PC and Loop stacks popped, the instruction after the loop (n+2) is
put in fetch2.
6. Cycle6: Decode stage updates from fetch2.
n is the loop start instruction and n+2 is the instruction after the loop.
1. Cycle1: Next fetch address determined as n+1. n+1 locked in decode stage.
2. Cycle2: Loop count (LCNTR) equals 4, decode stalls.
3. Cycle3: LCNTR equals 4, n+1 stays in decode, last instruction fetched, counter expired tests true.
4. Cycle4: n+1 stays in decode, loop back aborts, PC and Loop stacks popped, the next instruction
after the loop (n+2) is put into fetch.
5. Cycle5: Decode stage updates from fetch2.
Decode n+1 n+1 n+1 n+1 nop nop nop n+2 n+3
nop
Fetch2 n+2 n+3 n+3 n+1 n+1 n+1 n+2 n+3 n+4
Fetch1 n+3 n+1 n+1 n+1 n+1 n+2 n+3 n+4 n+5
n is the loop start instruction and n+2 is the instruction after the loop.
1. Cycle1: Next fetch address determined as n+1. n+1 locked in decode stage.
2. Cycle2: Loop count (LCNTR) equals 3, decode stalls.
3. Cycle3: n+1 stays in decode, n+1 put in fetch1 stage.
4. Cycle4: n+1 stays in decode, n+1 put in fetch1 stage.
5. Cycle5: Last instruction fetched, counter expired tests true.
6. Cycle6: Loop-back aborts, PC and loop stacks popped, n+2 put in fetch1.
Decode n+1 nop n+1 nop nop nop n+2 n+3 n+4
Fetch2 n+2 n+3 n+3 n+1 n+1 n+2 n+3 n+4 n+5
Fetch1 n+3 n+1 n+1 n+1 n+2 n+3 n+4 n+5 n+6
n is the loop start instruction and n+2 is the instruction after the loop.
1. Cycle1: Next fetch address determined as n+1. n+1 locked in decode stage 2.
2. Cycle2: Loop count (LCNTR) equals 2, decode stalls.
3. Cycle3: n+1 stays in decode, n+1 put in fetch1 stage.
4. Cycle4: Last instruction fetched, counter expired tests true.
5. Cycle5: Loop-back aborts, PC and loop stacks popped.
Decode n+1 n+1nop n+1 nop n+1 nop n+1 nop n+2 n+3 n+4
n is the loop start instruction and n+2 is the instruction after the loop.
1. Cycle1: Next fetch address determined as n+1.
2. Cycle2: Loop count (LCNTR) equals 1, decode stalls.
3. Cycle3: Last instruction fetched, counter expired tests true.
4. Cycle5: Loop-back aborts, PC and loop stacks popped, n+2 put in fetch1 stage.
Decode n+1 n+2nop n+2 n+1 n+2 n+1 n+2 n+3 n+4
Fetch2 n+2 n+3 n+3 n+2 n+1 n+2 n+3 n+4 n+5
Fetch1 n+3 n+2 n+2 n+1 n+2 n+3 n+4 n+5 n+6
Note: n is the loop start instruction and n+3 is the instruction after the loop.
1. Cycle1: Next fetch address determined as n+2.
2. Cycle2: Loop count (LCNTR) equals 3, decode stalls.
3. Cycle3: Next fetch address determined as n+1, n+3 and n+2 held in Fetch2 and Fetch1
respectively.
4. Cycle4: n+1 supplied from loop buffer into decode, PC stack supplies top of loop address.
5. Cycle5: Last instruction fetched, counter expired tests true.
6. Cycle6: Loop-back aborts, PC and loop stacks popped.
Decode n+1 n+2 nop n+2 n+1 n+2 n+3 n+4 n+5
n is the loop start instruction and n+3 is the instruction after the loop.
1. Cycle1: Next fetch address determined as n+2.
2. Cycle2: Loop count (LCNTR) equals 2, decode stalls.
3. Cycle3: n+3, and n+2 held in fetch2 and fetch1 respectively counter expired tests true.
4. Cycle4: n+1 supplied from loop buffer into decode, loop-back aborts, PC and loop stacks
popped.
n is the loop start instruction and n+3 is the instruction after the loop.
1. Cycle1: Next fetch address determined as n+2.
2. Cycle2: Loop count (LCNTR) equals 1, decode stalls.
3. Cycle3: Last instruction fetched, counter expired tests true.
4. Cycle4: loop-back aborts, PC and loop stacks popped.
n is the loop start instruction and n+4 is the instruction after the loop.
1. Cycle1: Next fetch address determined as n+1.
2. Cycle2: Loop count (LCNTR) equals 2, fetch address determined by the given rule.
3. Cycle3: Last instruction fetched, counter expired tests true.
4. Cycle4: loop-back aborts, PC and loop stacks popped.
Decode n+1 n+2 n+3 nop nop nop n+4 n+5 n+6
Fetch2 n+2 n+3 n+1 n+2 n+3 n+4 n+5 n+6 n+7
Fetch1 n+3 n+1 n+2 n+3 n+4 n+5 n+6 n+7 n+8
n is the loop start instruction and n+4 is the instruction after the loop.
1. Cycle1: Next fetch address determined as n+1.
2. Cycle2: Loop count (LCNTR) equals 1, fetch address determined by the given rule.
3. Cycle4: Last instruction fetched, counter expired tests true.
4. Cycle5: loop-back aborts, PC and loop stacks popped.
n is the loop start instruction and n+5 is the instruction after the loop
1. Cycle2: Loop count (LCNTR) equals 1, decode stalls
2. Cycle3: Last instruction fetched, Counter expired tests true
3. Cycle4: Loop-back aborts, PC and loop stacks popped
Nested Loops
Signal processing algorithms like FFTs and matrix multiplications require
nested loops. Nested loop constructs are built using multiple DO/UNTIL
instructions. If using counter based instructions the following occurs:
Within the loop sequencer, two separate loop counters operate:
• loop counter (LCNTR) register has top level entry to loop counter
stack
• current loop counter (CURLCNTR) iterates in the current loop
The CURLCNTR register tracks iterations for a loop being executed, and the
LCNTR register holds the count value before the loop is executed. The two
counters let the processor maintain the count for an outer loop, while a
program is setting up the count for an inner loop.
The loop logic decrements the value of CURLCNTR for each loop iteration.
Because the sequencer tests the termination condition four instruction
cycles before the end of the loop, the loop counter also is decremented
before the end of the loop. If a program reads CURLCNTR during these last
four loop instructions, the value is already the count for the next iteration.
The loop counter stack is popped four instructions before the end of the
last loop iteration. When the loop counter stack is popped, the new top
entry of the stack becomes the CURLCNTR value—the count in effect for the
executing loop. Two examples of nested loops are shown in Listing 4-1
andListing 4-2.
A DO/UNTIL instruction pushes the value of LCNTR onto the loop counter
stack, making that value the new CURLCNTR value. The following procedure
and Figure 4-7 demonstrate this process for a set of nested loops. The pre-
vious CURLCNTR value is preserved one location down in the stack.
1. The processor is not executing a loop, and the loop counter stack is
empty (LSEM bit =1). The program sequencer loads LCNTR with
AAAA AAAA.
2. The processor is executing a single loop. The program sequencer
loads LCNTR with the value BBBB BBBB (LSEM bit =0).
3. The processor is executing two nested loops. The program
sequencer loads LCNTR with the value CCCC CCCC.
4. The processor is executing three nested loops. The program
sequencer loads LCNTR with the value DDDD DDDD.
5. The processor is executing four nested loops. The program
sequencer loads LCNTR with the value EEEE EEEE.
6. The processor is executing five nested loops. The program
sequencer loads LCNTR with the value FFFF FFFF.
7. The processor is executing six nested loops. The loop counter stack
(LCNTR) is full (LSOV bit =1).
A read of LCNTR when the loop counter stack is full results in invalid data.
When the loop counter stack is full, the processor discards any data writ-
ten to LCNTR.
1 2 3 4
CURLCNTR
LCNTR AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA
LCNTR CURLCNTR
BBBB BBBB BBBB BBBB BBBB BBBB
LCNTR CURLCNTR
CCCC CCCC CCCC CCCC
LCNTR
DDDD DDDD
5 6 7
Figure 4-7. Pushing the Loop Counter Stack for Nested Loops
Restrictions on Ending Nested Loops
• Nested loops with an arithmetic loop as the outer loop must place
the end address of the outer loop at least two addresses after the
end address of the inner loop.
• Nested loops with an arithmetic based loop as the outer loop that
use the loop abort instruction, JUMP (LA), to abort the inner loop,
may not use JUMP (LA) to the last instruction of the outer loop.
Loop Abort
The following sections describe different scenarios of how a hardware loop
is aborted or interrupted. As previously discussed, instruction and inter-
rupt driven branch mechanisms execute differently, causing different
effects for aborting loops.
The hardware for counter-based loops uses the current counter register,
CURLCNTR, such that it is decremented when the last instruction of the loop
is in the Fetch1 stage of the pipeline. This is done so that branching to the
beginning of the loop for the next iteration can occur without wasting any
cycles. In the case of a CALL or interrupt, this poses a problem since some
instructions are replaced with NOPs before branching to a subroutine or an
ISR, and these instructions are fetched again when the control returns. If
one of the instructions happens to be the end-of-loop instruction, the
CURLCNTR may be decremented twice. To avoid this, after the control
returns, the hardware freezes that counter for the number of fetches equal
to the number of instructions replaced with NOPs.
A special case of loop termination is the loop abort instruction, JUMP (LA).
This instruction causes an automatic loop abort when it occurs inside a
loop. When the loop aborts, the sequencer pops the PC and loop address
stacks once. If the aborted loop was nested, the single pop of the stacks
leaves the correct values in place for the outer loop. However, because
only one pop is performed, the loop abort cannot be used to jump more
than one level of loop nesting as shown in Listing 4-3.
For a branch (call), three instructions in the various stages of the pipeline
(Decode through Fetch1) are replaced with NOP instructions. Accordingly,
the hardware loop logic freezes the CURLCNTR for three fetch cycles on
return from a subroutine. The hardware determines this based on the
sequencer executing a RTS instruction. The immediate CALL may be one of
the last three instructions of a loop except for one instruction loops, or
two instruction one iteration loops as shown in Listing 4-4.
SUB: instruction;
instruction;
instruction;
RTS (LR);/* ensures proper re-entry in loop */
Table 4-28 shows a pipeline where a CALL is in the last but one instruction
of a loop. E = end-of-loop instruction, B = top-of-loop instruction.
Decode CALL E
Fetch2 CALL E B
For servicing the interrupt, four instructions in the various stages of the
pipeline (Address through Fetch1) are replaced with NOP instructions.
Accordingly, the hardware loop logic freezes the CURLCNTR for four fetch
cycles on return from an ISR. The hardware determines this based on the
sequencer executing a RTI instruction.
Table 4-29 shows a pipeline where an interrupt is being serviced in a loop.
E = end-of-loop instruction, B = top-of-loop instruction. E–1 is the return
address.
Decode E–1 E
Fetch2 E–1 E B
Note that there is one situation where an ISR returns into the loop body
using the RTS instruction, when JUMP (CI) is used to convert an ISR to a
normal subroutine. Therefore RTS cannot be used to determine that the
sequencer branched off to a subroutine or ISR. For this reason, the hard-
ware sets an additional (hidden) bit in PCSTK register, before branching off
to an ISR so that on return, either with a RTI or JUMP (CI) + RTS –
CURLCNTR instruction can be frozen for four fetch cycles.
SUB: instruction;
instruction;
instruction;
RTS (LR); /* ensures proper re-entry in loop */
Inswitching
RTOS based systems a fundamental requirement for context
enforces a save all core registers on the software stack,
including the core stack registers.
The SHARC processor prohibits any modification of loop resources, such
as the PCSTK, LADDR, and CURLCNTR registers within the loop (including
subroutines and ISRs starting from a loop) as doing this may adversely
affect the proper function of the looping operation for reasons described
below.
Short loops— those with 1, 2, or 3 instructions in the loop body with a
small iteration count—are handled differently in hardware from other
loops. The exact characterization of the loop, short or otherwise, is deter-
mined when the loop startup instruction (DO … UNTIL termination) is
executed and retained during execution of the loop. This start information
is not stored in a state register that is popped and pushed along with
LADDR/ CURLCNTR and PCSTK registers. During normal nesting of the loops
within a short loop, hardware recreates this information based on the stack
Use the following sequence to pop and push LADDR/CURLCNTR and PCSTK
inside an active loop to temporarily vacate the stacks. A code example is
shown in Listing 4-6.
1. Pop LOOP and PCSTK after storing the value of the CURLCNTR, LADDR,
and PC registers.
2. Use the empty entry/entries of stacks.
3. Recreate the loops by performing the following steps in the pro-
scribed sequence.
a. Push LOOP stack.
b. Load the value of CURLCNTR.
c. Load the LADDR.
d. Push the PCSTK.
e. Load the PC with the stored value.
Sequence a–b–c is critical and therefore must be followed strictly. Any
number of unrelated instructions may be executed between the a–b–c
sequence.
Listing 4-6. Sequence for Pop and Push of Two-deep Nested Loops
In Listing 4-6, LADDR is restored after CURLCNTR. This ensures that when
LADDR is restored, the correct value of loop count is available. At the time
of LADDR restoration, the hardware recreates the information about the
exact characterization of the loop.
• Use of the JUMP (CI) + RTS (LR) instruction for returning from an
ISR to a counter-based loop may not work if the ISR involves sav-
ing and restoring the PCSTK.
Therefore, in application code that requires that the LADDR/CURLCNTR and
PCSTK be saved and restored, in addition to following the sequence
Cache Control
In this section cache control, which is used for internal and external
instruction fetch, is described.
Functional Description
Cache performance (hits) improves if code is executed periodically/repeti-
tively (for example as function calls, PC relative negative jumps or loops).
For linear program flow the cache entries are only filled (misses) and based
on the code size cache entries overridden.
Cache Miss
In the instruction PM(Ip,Mq) = UREG, the data access over the PMD bus
conflicts with the fetch of instruction n+2 (shown in Table 4-30). In this
case the data access completes first. This is true of any program memory
data access type instruction. This stall occurs only when the instruction to
be fetched is not cached.
Decode n n+1
Note that the cache stores the fetched instruction (n+2), not the instruc-
tion requiring the program memory data access.
When the processor first encounters a bus conflict, it must stall for one
cycle while the data is transferred, and then fetch the instruction in the
following cycle. To prevent the same delay from happening again, the pro-
cessor automatically writes the fetched instruction to the cache. The
sequencer checks the instruction cache on every data access using the PM
bus. If the instruction needed is in the cache, a cache hit occurs. The
instruction fetch from the cache happens in parallel with the program
memory data access, without incurring a delay.
If the instruction needed is not in the cache, a cache miss occurs, and the
instruction fetch (from memory) takes place in the cycle following the
program memory data access, incurring one cycle of overhead. The
fetched instruction is loaded into the cache (if the cache is enabled and
not frozen), so that it is available the next time the same instruction (that
requires program memory data) is executed.
Figure 4-8 shows a block diagram of the 2-way set associative instruction
cache. The cache holds 32 instruction-address pairs. These pairs (or cache
entries) are arranged into 16 (15–0) cache sets according to the four least
significant bits (3–0) of their address. The two entries in each set (entry 0
and entry 1) have a valid bit, indicating whether the entry contains a valid
instruction. The least recently used (LRU) bit for each set indicates which
entry was not placed in the cache last (0 = entry 0 and 1 = entry 1).
The cache places instructions in entries according to the four LSBs of the
instruction’s address. When the sequencer checks for an instruction to
fetch from the cache, it uses the four address LSBs as an index to a cache
set. Within that set, the sequencer checks the addresses of the two entries
as it looks for the needed instruction. If the cache contains the instruction,
the sequencer uses the entry and updates the LRU bit (if necessary) to indi-
cate the entry did not contain the needed instruction.
When the cache does not contain a needed instruction, it loads a new
instruction and address and places them in the least recently used entry of
the appropriate cache set. The cache then toggles the LRU bit, if necessary.
cache for a miss and executed internally for the next hit. For more infor-
mation, see the processor-specific hardware reference manual.
Block Conflicts
A block conflict occurs when multiple data accesses are made to the same
block in memory from which the instructions are executed. For more
information, see Chapter 7, Memory.
The caching of instructions happens in the Fetch and Decode stages of the
instruction pipeline.
• Fetch1 Stage – The core launches the instruction fetch address in
the Fetch1 stage. In this stage, the PM address is matched with the
existing addresses in the cache. If the address is found in the cache,
then a cache hit occurs, else a cache miss occurs. In case of a cache
miss, the PM address is loaded into the cache in this stage.
For execution from internal memory, the PM address matching
happens only when the instruction fetch conflicts with a PM data
access (PMD).
For execution from external memory, the address is matched for all
instructions that are fetched.
• Fetch2 Stage – In case of a cache miss, the instruction data is
driven by the memory PMD in this stage. In the case of a cache hit,
the instruction PMD is read out from the cache in this stage.
• Decode Stage – In case of a cache miss, the instruction read from
the 48-bit PMD memory in the Fetch2 stage is loaded into the
cache in this stage.
Table 4-31, Table 4-32 and Table 4-33 illustrate the pipeline versus cache
operation.
If the cache hit immediately follows a cache miss of the same address
(Table 4-33), then the instruction would not have been loaded into the
cache by then. In this case, the instruction is driven directly from the
input instruction load bus of the cache instead of the cache itself.
Table 4-34 and Table 4-35 illustrate the pipeline versus cache operation
in external memory.
Cache Efficiency
Cache operation is usually efficient and requires no intervention. How-
ever, certain ordering in the sequence of instructions can work against the
cache’s architecture, reducing its efficiency. When the order of PM data
accesses and instruction fetches continuously displaces cache entries and
loads new entries, the cache does not operate efficiently. Rearranging the
order of these instructions remedies this inefficiency. Optionally, a
dummy PM read can be inserted to trigger the cache.
When a cache miss occurs, the needed instruction is loaded into the cache
so that if the same instruction is needed again, it will be available (that is,
a cache hit will occur). However, if another instruction whose address is
mapped to the same set displaces this instruction and loads a new
instruction, a cache miss occurs. The LRU bits help to reduce the occur-
rence of a cache miss since at least two other instructions, mapped to the
same set, are needed before an instruction is displaced. If three instruc-
tions mapped to the same set are all needed repeatedly, cache efficiency
(that is, the cache hit rate) can go to zero. To keep this from happening,
move one or more instructions to a new address that is mapped to a differ-
ent cache set.
An example of inefficient cache code appears in Table 4-36. The PM bus
data access at address 0x101 in the loop, OUTER, causes a bus conflict and
also causes the cache to load the instruction being fetched at 0x104 (into
set 4). Each time the program calls the subroutine, INNER, the program
memory data accesses at 0x201 and 0x211 displace the instruction at
0x104 by loading the instructions at 0x204 and 0x214 (also into set 4).
If the program rarely calls the INNER subroutine during the OUTER loop exe-
cution, the repeated cache loads do not greatly influence performance. If
the program frequently calls the subroutine while in the loop, cache ineffi-
ciency has a noticeable effect on performance. To improve cache efficiency
on this code (if for instance, execution of the OUTER instruction of the loop
is time critical), rearrange the order of some instructions. Moving the
subroutine call up one location (starting at 0x201) also works. By using
that order, the two cached instructions end up in cache set 5, instead of set
4.
0x0103 f3 = f2 * f2;
0x0105 r1 = r0-r15;
...
...
...
0x021F rts;
Operating Modes
The following sections describe the cache operating modes.
Cache Restrictions
The following restrictions on cache use should be noted.
• If the cache freeze bit of the MODE2 register is set by instruction n,
then this feature is effective from the n+2 instruction onwards.
This results from the effect latency of the MODE2 register.
• When a program changes the cache mode, an instruction contain-
ing a program memory data access must not be placed directly after
a cache enable or cache disable instruction. This is because the pro-
cessor must wait at least one cycle before executing the PM data
access. A program should have a NOP (no operation) or other
non-conflicting instruction inserted after the cache enable or cache
disable instruction.
Cache Disable
The cache disable bit (bit 4, CADIS) directs the sequencer to disable the
cache (if 1) or enable the cache (if 0).
Note that the FLUSH CACHE instruction has a 1 cycle instruction latency
while executing next Instruction/data from internal memory and a 2 cycle
instruction latency while executing next instruction/data from external
memory.
Cache Freeze
The cache freeze bit (bit 19, CAFRZ) directs the sequencer to freeze the con-
tents of the cache (if 1) or let new entries displace the entries in the cache
(if 0).
Freezing the cache prevents any changes to its contents-a cache miss does
not result in a new instruction being stored in the cache. Disabling the
cache stops its operation completely-all instruction fetches conflicting
with program memory data accesses are delayed. These functions are
selected by the CADIS (cache enable/disable) and CAFRZ (cache freeze) bits
in the MODE2 register.
I/O Flags
There are 16 general-purpose I/O flags in SHARC processors. Each FLAG
pin (3–0) has four dedicated signals. All flag pins can be multiplexed with
parallel/external port pins. The FLAG4-15 pins are also accessible to the sig-
nal routing unit (SRU). A flag pin can be routed to a DAI/DPI pin and
therefore operate in parallel to the parallel/external port. Refer to the
product-specific hardware reference manual for more information.
and
Programs cannot change the output selects of the register
FLAGS
provide a new value in the same instruction. Instead, programs
must use two write instructions—the first to change the output
select of a particular FLAG pin, and the second to provide the new
value as shown below.
bit set flags FLG2O; /* set flag2 as output */
bit clr flags FLG2; /* set flag2 output low */
The FLAGS register is used to control all FLAG15-0 pins. Based on FLAG reg-
ister effect latency and internal timings there must be at least 4 wait states
in order to toggle the same flag correctly as shown in the following exam-
ple. For more information refer to the specific product data sheet.
The processor records status for the PEx element in the ASTATx and STKYx
registers and the PEy element in the ASTATy and STKYy registers.
ALU 0 footnote3 GE
ALU 0 footnote4 LE
ALU carry AC = 1 AC
ALU not carry AC = 0 NOT AC
ALU overflow AV = 1 AV
ALU not overflow AV = 0 NOT AV
Multiplier Multiplier overflow MV = 1 MV
Multiplier not overflow MV= 0 NOT MV
Multiplier sign MN = 1 MS
Multiplier not sign MN = 0 NOT MS
1 ALU greater than (GT) is true if: [AF and (AN xor (AV and ALUSAT)) or (AF and AN)] or AZ = 0
2 ALU less than (LT) is true if: [AF and (AN xor (AV and ALUSAT)) or (AF and AN and AZ)] = 1
3 ALU greater equal (GE) is true if: [AF and (AN xor (AV and ALUSAT)) or (AF and AN and AZ)] = 0
4 ALU lesser or equal (LE) is true if: [ AF and (AN xor (AV and ALUSAT)) or (AF and AN)] or AZ = 1
5 For ADSP-214xx processors and later.
6 Does not have a complement.
Operating Modes
The following sections describe the operating modes for conditional
instruction execution.
Even though the processor has dual processing elements PEx and
PEy, the sequencer does not have dual sets of stacks.
The sequencer has one PC stack, one loop address stack, and one loop
counter stack. The status bits for stacks are in the STKYx register and are
not duplicated in the STKYy register.
1 Complementary pairs are registers with SIMD complements, include PEx/y data registers and
USTAT1/2, USTAT3/4, ASTATx/y, STKYx/y, and PX1/2 Uregs.
2 Uncomplementary registers are Uregs that do not have SIMD complements.
In SIMD mode, two independent bit tests can occur from individual reg-
isters as shown in the following example.
bit set mode1 PEYEN;
nop;
r2=0x80000000;
ustat1=r2;
bit TST ustat1 BIT_31; /* test bit 31 in ustat1/ustat2 */
if TF call SUB; /* branch if both cond are true */
if TF r10=r10+1; /* compute on any cond */
Conditional Compute
In this section the various register files move types are listed and illus-
trated with examples.
1 In SISD mode, the conditional applies only to the entire operation and is only tested against
PEx’s flags. When the condition tests true, the entire operation occurs.
2 In SIMD mode, the conditional applies separately to the explicit and implicit transfers. Where
the condition tests true (PEx for the explicit and PEy for the implicit), the operation occurs in
that processing element.
In all the cases described above, the behavior is the same. If the condition
in PEx is true, then only the transfer occurs.
For this instruction, the processors are operating in SIMD mode, a regis-
ter in the PEx data register file is the explicit register, and I0 is pointing to
an even address in internal memory (ADSP-214xx products external mem-
ory is also allowed). Indirect addressing is shown in the instructions in the
example. However, the same results occur using direct addressing. The
data movement resulting from the evaluation of the conditional test in the
PEx and PEy processing elements is shown in Table 4-45.
IF EQ DM(I0,M0) = R2;
1 In NW space n = 1, in SW space n = 2
1 In NW space n = 1, in SW space n = 2
For the following instructions the processors are operating in SIMD mode
and the explicit register is either a PEx register or PEy register. I0 points
to IOP memory space. This example shows indirect addressing. However,
the same results occur using direct addressing.
IF EQ DM(I0,M0) = R2;
IF EQ DM(I0,M0) = S2;
In the case of memory-to-DAG register moves, the transfer does not occur
when both PEx and PEy are false. Otherwise, if either PEx or PEy is true,
transfers to the DAG register occur. For example:
if EQ m13 = dm(i0,m1);
Conditional Branches
1 (true) IF exe
PEx PEy
0 (false) 1 (true) IF not exe ELSE PEx exe – PEY not exe
1 (true) 0 (false) IF not exe ELSE PEx not exe – PEY exe
1 (true) 1 (true) IF exe ELSE PEx not exe – PEY not exe
For more information and examples, see the following instruction refer-
ence pages.
• “Type 8a ISA/VISA (cond + branch)” on page 9-32
• “Type 9a ISA/VISA (cond + Branch + comp/else comp)” on
page 9-35
• “Type 10a ISA (cond + branch + else comp + mem data move)” on
page 9-40
• “Type 11a ISA/VISA (cond + branch return + comp/else comp)
Type 11c VISA (cond + branch return)” on page 9-44
When this occurs, the instruction that is to update the value and the fol-
lowing instruction, (if not dependent on the new value), are allowed to
execute. If the following instruction needs the updated value, then that
instruction and the instructions that follow it in the earlier stages of the
instruction pipeline are stalled.
The conditions under which data/control hazard stalls occur are described
in the following sections.
M0 = 1;
DM(I2, M0) = R1; /* stalls for 2 cycles */
L2 = 1;
DM(I2, M0) = R1; /* stalls for 2 cycles */
M3 = 1;
DM(I3, M0) = R1; /* no stalls */
In the example shown in Table 4-51, M0 is written back at the end of the
execution stage, while the DM access instruction reads M0 in the Decode
stage to generate the address. The first instruction is allowed to execute
normally, while the remaining instructions are delayed by two cycles.
Table 4-51. Indirect Access One Cycle After DAG Register Load
Cycles 1 2 3 4 5
Execute M0 = 1
Table 4-52. Indirect Access Two Cycles After DAG Register Load
Cycles 1 2 3 4 5
1 Three stage pipeline. These products are not included in this manual.
2 Five stage pipeline. These products are all included in this manual.
Branch Stalls
A data stall can also occur when a register in a DAG is loaded and either of
the following two instructions shown in the code examples below attempts
to generate an indirect target address based on that DAG register for a
branch such as a JUMP or CALL. This happens because the address genera-
tion requires the values of the related DAG register to be read in the
Decode stage, while the load of any register completes in the Execute stage
of the pipeline. The JUMP or CALL itself has three cycles of overhead as
described in “Instruction Driven Branches” on page 4-15.
M8 = 1;
JUMP (M8,I9); /* stalls for two cycles */
In the example shown in Table 4-54, M8 is written back at the end of the
Execute stage of the pipeline, while the following JUMP (or CALL) instruc-
tion has to read M8 in the Decode stage to generate the target address. The
first instruction is allowed to complete normally, while all following
instructions are stalled for two cycles.
In the following code example, an unrelated instruction is inserted
between the write instruction to the DAG register and the jump instruc-
tion requiring address generation. In this instance, the pipeline stalls for
only one cycle.
M8 = 1;
R0 = 0x8; /* any unrelated instruction */
JUMP (M8,I9); /* stalls for one cycle */
Table 4-54. Indirect Branch One Cycle After DAG Register Load
Cycles 1 2 3 4 5 6 7 8 9
j = Branch address
1. Cycle2: Stall cycle
2. Cycle3: Stall cycle
3. Cycle4: I9 + M8 computed
3. The pipeline stalls for two cycles when a branch instruction, condi-
tional on NOT LCE (loop counter not expired), is in the Decode
stage and is immediately followed by any instruction involving a
change in an LCE (loop counter expired) condition, due to the exe-
cution of a DO/UNTIL, POP/PUSH, JUMP(LA) or load of the CURLCNTR
register. A one cycle stall occurs when the instruction is an opera-
tion other than a branch.
Table 4-56. Indirect Branch Two Cycles After DAG Register Load
Cycles 1 2 3 4 5 6
Also note that when this kind of instruction sequence has other reasons to
stall the pipeline, all the stalls arising out of different kinds of dependen-
cies may not merge and some stalls appear as redundant stall cycles.
The pipeline is stalled when the processor executes certain sequence of
instructions to maximize the frequency of operation. The case arises when
a compute operation involving any fixed-point operand register follows a
floating-point multiply operation, and the instruction involving the
fixed-point register is in the Decode stage of the pipeline, the pipeline
stalls for one cycle as shown in the following example. Note that the actual
register used for the operation is not relevant.
F0 = F0*F4;
F5 = FLOAT R1; /* stalls the pipe when in decode */
F0 = F0*F4;
R5 = LSHIFT R10 by 2; /* stalls the pipe when in decode */
F0 = F0*F4;
R5 = R5-1; /* stalls the pipe when in decode */
Loop Stalls
1. A JUMP(LA) stalls the instruction pipeline for one cycle when it is in
the Address stage of the instruction pipeline.
2. When the length of the counter based loop is one, two or four
instructions, the pipeline is stalled by one cycle after the DO/UNTIL
instruction.
3. A one cycle stall is incurred when a RTS (return from subroutine) or
RTI (return from interrupt) instruction causes the sequencer to
return to the last instruction of a loop instruction, and the RTI/RTS
is in the Address stage of the instruction pipeline. This is to avoid
the coincidence of two implicit operations of the PCSTK—one due
to the RTI/RTS instruction and the other due to the possible termi-
nation of the loop. The pipeline stalls so that the pop operation
from the RTI/RTS is executed first.
CJUMP Instruction
The following code examples show a two cycle data hazard stall that
occurs when DAG1 attempts to generate addresses based on the I6 register
or when either or both of the I6 or I7 registers are used as a source of some
data transfer operation immediately after a CJUMP instruction. This occurs
because the CJUMP instruction modifies the I6 register.
Example 1
CJUMP(_SUB1)(DB); /* executes R2 = I6,I6 = I7,
jump(_sub1) (db) */
DM(I6,M0) = R2; /*stalls for two cycles */
Example 2
CJUMP(_SUB1)(DB); /* executes R2 = I6,I6 = I6,
jump(_sub1) (db) */
R2 = I7; /* stalls for two cycles */
Normally
The instruction is intended to be used by the compiler only.
CJUMP
the compiler uses the following sequence of instructions
when calling a subroutine, which does not stall the pipeline.
CJUMP (_SUB1) (DB); /* executes R2 = I6, I6 = I7 */
jump(_sub1)(db)
DM(I7,M0) = R2; /* stores previous I6 */
DM(I7,M0) = PC; /* stores return_address–1 */
RFRAME Instruction
A data hazard stall occurs when DAG1 attempts to generate addresses
based on the I6 or I7 registers or when any or both of the I6 or I7 registers
are used as a source of some data transfer operation immediately after a
RFRAME instruction. This occurs because RFRAME modifies the I6 and I7
registers. In this situation, the pipeline is stalled for two cycles.
RFRAME; /* executes I7 = I6, I6 = dm(0,I6); */
DM(I6,M0) = R2 /* stalls for two cycles */
Sequencer Interrupts
This section describes the interrupts that are triggered by the sequencer
itself.
External Interrupts
For external interrupts (IRQ2–0, DAI, DPI) the processor supports two
types of interrupt sensitivity—edge-sensitive and level-sensitive. The
interrupt overview is shown in Table 4-57.
Software Interrupts
Software interrupts (or programmed exceptions) are instructions which
explicitly generate an exception. The interrupt overview is shown in
Table 4-58.
The IRPTL register provides four software interrupts. When a program sets
the latch bit for one of these interrupts (SFT0I, SFT1I, SFT2I, or SFT3I),
the sequencer services the interrupt, and the processor branches to the cor-
responding interrupt routine. Software interrupts have the same behavior
as all other maskable interrupts. For more information, see Appendix B,
Core Interrupt Control.
If programs force an interrupt by writing to a bit in the IRPTL register, the
processor recognizes the interrupt in the following cycle, and four cycles
of branching to the interrupt vector follow the recognition cycle.
Summary
To manage events, the sequencer’s interrupt controller handles interrupt
processing, determines whether an interrupt is masked, and generates the
appropriate interrupt vector address. With selective caching, the instruc-
tion cache lets the processor access data in program memory and fetch an
instruction (from the cache) in the same cycle. The DAG2 data address
generator outputs program memory data addresses.
Figure 4-2 on page 4-4 identifies all the functional blocks and their rela-
tionship to one another in detail.
The sequencer evaluates conditional instructions and loop termination
conditions by using information from the status registers. The loop
address stack and loop counter stack support nested loops. The status
stack stores status registers for implementing nested interrupt routines.
“Program Sequencer Registers” on page A-8 lists the registers within and
related to the program sequencer. All registers in the program sequencer
are universal registers (Uregs), so they are accessible to other universal reg-
isters and to data memory. All of the sequencer’s registers and the top of
stacks are readable and writable, except for the Fetch1, decode, and PC
registers. Pushing or popping the PC stack is done with a write to the PC
stack pointer, which is readable and writable. Pushing or popping the loop
address stack requires explicit instructions.
A set of system control registers configures or provides input to the
sequencer. A bit manipulation instruction permits setting, clearing, tog-
gling, or testing specific bits in the system registers. For information on
this instruction (bit) and the instruction set, see Chapter 9, Instruction
Set Types, and Chapter 11, Computation Types. Writes to some of these
registers do not take effect on the next cycle. For example, after a write to
the MODE1 register enables ALU saturation mode, the change takes effect
two cycles after the write. Also, some of these registers do not update on
the cycle immediately following a write. An extra cycle is required before a
register read returns the new value.
Features
The timer has the following features.
• Simple programming model of three registers for interval timer
• Provides high or low priority interrupt
• If counter expired timer expired pin is asserted
• If core is in emulation space timer halts
Functional Description
The bits that control the timer are given as follows:
• Timer enable. MODE2 Bit 5 (TIMEN). This bit directs the processor to
enable (if 1) or disable (if 0) the timer.
• Timer count. (TCOUNT) This register contains the decrementing
timer count value, counting down the cycles between timer
interrupts.
To start and stop the timer, programs use the MODE2 register’s TIMEN bit.
With the timer disabled (TIMEN = 0), the program loads TCOUNT with an
initial count value and loads TPERIOD with the number of cycles for the
desired interval. Then, the program enables the timer (TIMEN=1) to begin
the count.
On the core clock cycle after TCOUNT reaches zero, the timer automatically
reloads TCOUNT from the TPERIOD register. The TPERIOD value specifies the
frequency of timer interrupts. The number of cycles between interrupts is
TPERIOD + 1. The maximum value of TPERIOD is 232 – 1.
The timer decrements the TCOUNT register during each clock cycle. When
the TCOUNT value reaches zero, the timer generates an interrupt and asserts
the TMREXP output pin high for several cycles (when the timer is enabled),
as shown in Figure 5-1. For more information about TMREXP pin muxing
refer to system design chapter in the processor-specific hardware reference.
Programs can read and write the TPERIOD and TCOUNT registers by using
universal register transfers. Reading the registers does not effect the timer.
Note that an explicit write to TCOUNT takes priority over the sequencer’s
loading TCOUNT from TPERIOD and the timer’s decrementing of TCOUNT.
Also note that TCOUNT and TPERIOD are not initialized at reset. Programs
should initialize these registers before enabling the timer.
32
TPERIOD 32
32
32
MULTIPLEXER
32
TCOUNT
32
DECREMENT
YES
INTERRUPT,
TCOUNT=0 ASSERT TMREXP PIN
NO
To start and stop the timer, the TIMEN bit in MODE2 register has to be set or
cleared respectively. The latency of this bit is two core clock cycles at the
start of the counter and one core clock cycle at the stop of the counter
shown in Figure 5-2.
CCLK
TIMER DISABLE
Clear TIMEN
in MODE2
CCLK
Timer Interrupts
The timer expired event (TCOUNT decrements to zero) generates two inter-
rupts, TMZHI and TMZLI. For information on latching and masking
these interrupts to select timer expired priority, see “Latching Interrupts”
on page 4-35
The Timer interrupt overview is shown in Table 5-1.
One event can cause multiple interrupts. The timer decrementing to zero
causes two timer expired interrupts to be latched, TMZHI (high priority)
and TMZLI (low priority). This feature allows selection of the priority for
the timer interrupt. Programs should unmask the timer interrupt with the
desired priority and leave the other one masked. If both interrupts are
unmasked, the processor services the higher priority interrupt first and
then services the lower priority interrupt.
Features
The data address generators have the following features.
• Supply address and post-modify. Provides an address during a data
move and auto-increments the stored address for the next move.
• Supply pre-modified (indexed) address. Provides a modified
address during a data move without incrementing the stored
address.
• Modify address. Increments the stored address without performing
a data move.
• Bit-reverse address. Provides a bit-reversed address during a data
move without reversing the stored address, as well as an instruction
to explicitly bit-reverse the supplied address.
• Broadcast data loads. Performs dual data moves to complementary
registers in each processing element to support single-instruction
multiple-data (SIMD) mode.
Functional Description
As shown in Figure 6-1, each DAG has four types of registers. These regis-
ters hold the values that the DAG uses for generating addresses. The four
types of registers are:
• Index registers (I0–I7 for DAG1 and I8–I15 for DAG2). An index
register holds an address and acts as a pointer to memory. For
example, the DAG interprets DM(I0,0) and PM(I8,0) syntax in an
instruction as addresses.
• Modify registers (M0–M7 for DAG1 and M8–M15 for DAG2). A
modify register provides the increment or step size by which an
index register is pre- or post-modified (indexed) during a register
move. For example, the DM(I0,M1) instruction directs the DAG to
output the address in register I0 then modify the contents of I0
using the M1 register.
• Length and base registers (L0–L7 and B0–B7 for DAG1 and L8–
L15 and B8–B15 for DAG2). Length and base registers set the
range of addresses and the starting address for a circular buffer. For
more information on circular buffers, see “Circular Buffer Pro-
gramming Model” on page 6-21.
64 64 64 FROM 64
INSTRUCTION
32
32
MODULAR MUX
FOR INTERRUPTS LOGIC
ADD
32
MUX
MODE1
MUX
BITREV MODE
I0/I8 UPDATE
32
32 STKYX
BITREV INSTRUCTION
(OPTIONAL)
FOR ALL I REGISTERS
USING BITREV INSTRUCTIONS
32 32
I7=NW_addr;
dm(i7,0)=r8; /* 32-bit transfer */
I7=SW_addr;
dm(i7,0)=r14; /* 16-bit transfer */
32-Bit Alignment
The DAGs align normal word (32-bit) addressed transfers to the low order
bits of the buses. These transfers between memory and 32-bit DAG1 or
DAG2 registers use the 64-bit DM and PM data buses. Figure 6-2 illus-
trates these transfers.
DM OR PM DATA BUS
63 31 0
0X0000 0000
31 0
40-Bit Alignment
DM OR PM DATA BUS
63 40 39 8 7 0
0X00 00 00 0X00
31 0
DM OR PM DATA BUS
63 31 0
31 0 31 0
Both DAGs are identical in their operation modes and can access the
entire memory-mapped space. However, the following differences should
be noted.
• Only DAG1 is capable of supporting compiler specific instructions
like RFRAME and CJUMP.
• Only DAG2 is capable of supporting flow control instruction for
indirect branches. Additionally DAG2 access can cause cache
miss/hits for internal memory execution.
If the long word transfer specifies an odd numbered DAG register (I1 or
B3), the odd numbered register value transfers on the lower half of the
64-bit bus, and the odd numbered register – 1 value (I0 or B2 in this
example) transfers on the upper half (bits 63–32) of the bus.
In both the even and odd numbered cases, the explicitly specified DAG
register sources or sinks bits 31–0 of the long word addressed memory.
x0 and x1 x8 and x9
address
The forced long word ( ) mnemonic only effects normal word
LW
accesses and overrides all other factors ( , ).
PEYEN IMDWx
All long word accesses load or store two consecutive 32-bit data values.
The register file source or destination of a long word access is a set of two
neighboring data registers (Table 6-1) in a processing element. In a forced
long word access (using the LW mnemonic), the even (normal word
address) location moves to or from the explicit register in the neigh-
bor-pair, and the odd (normal word address) location moves to or from
the implicit register in the neighbor-pair. In Listing 6-1 the following long
word moves could occur.
DM(0x98000) = R0 (LW);
R15 = DM(0x98003)(LW);
The forced long word (LW) mnemonic can be used for context switch
between tasks in system applications. It only effects normal word address
accesses and overrides all other factors (PEYEN, IMDWx bit settings) as shown
in Listing 6-2.
pm(i15,m15)=i0(lw);
/*until*/
pm(i15,m15)=i6(lw);
dm(i7,m7)=i8(lw);
/*until*/
dm(i7,m7)=i14(lw);
i0=pm(i15,m15)(lw);
/*until*/
i6=pm(i15,m15)(lw);
i8=dm(i7,m7)(lw);
/*until*/
i14=dm(i7,m7)(lw);
Pre-Modify Instruction
As shown in Figure 6-5, the DAGs support two types of modified address-
ing, pre- and post-modify. Modified addressing is used to generate an
address that is incremented by a value or a register.
PRE-MODIFY POST-MODIFY
NO I REGISTER UPDATE I REGISTER UPDATE
2. UPDATE
I 1. OUTPUT I
+ +
M M
The DAG pre-modify addressing type can be used to emulate the pop
(restore of registers) from a SW stack.
Post-Modify Instruction
The DAGs support post-modify addressing. Modified addressing is used
to generate an address that is incremented by a value or a register. In
post-modify addressing, the DAG outputs the I register value unchanged,
then adds an M register or immediate value, updating the I register value.
The DAG post-modify addressing type can be used to emulate the push
(save of registers) to a SW stack.
Modify Instruction
The DAGs support two operations that modify an address value in an
index register without outputting an address. These two operations,
address bit-reversal and address modify, are useful for bit-reverse address-
ing and maintaining pointers.
The MODIFY instruction modifies addresses in any DAG index register
(I0-I15) without accessing memory.
MODIFY(I1,4);
cular
If the register’s corresponding and registers are set up for cir-
I
buffering, a
B L
instruction performs the specified buffer
MODIFY
wraparound (if needed).
The MODIFY instruction executes independent of the state of the CBUFEN
bit. The MODIFY instruction always performs circular buffer modify of the
index registers if the corresponding B and L registers are configured, inde-
pendent of the state of the CBUFEN bit.
B0 = 0x40000;
L0 = 0x10000;
I0 = 0x4ffff;
I1 = modify(I0, 2); // I1 == 0x40001
Bit-Reverse Instruction
The BITREV instruction modifies and bit-reverses addresses in any DAG
index register (I0–I15) without accessing memory. This instruction is
independent of the bit-reverse mode. The BITREV instruction adds a 32-bit
immediate value to a DAG index register, bit-reverses the result, and
writes the result back to the same index register. The following example
adds 4 to I1, bit-reverses the result, and updates I1 with the new value:
BITREV(I1,4);
The processor does support bit-reverse mode. For more information, see
“Operating Modes” on page 6-18.
I6 = BITREV(I1,0);
For examples of data flow paths for single and dual-data transfers, see
Chapter 2, Register Files.
The processor can use its complementary registers explicitly in SIMD
mode. They support single data access as shown in the example below.
S8 = DM(I4,M3);
PM (I12,M13) = S12;
COMP, S8 = DM(I5,M5);
COMP, DM(I5,M5) = S14;
However, transfers to the same DAG registers are not allowed and the
assembler returns an error message.
DM(M2,I1) = I0; /* generates asm error */
Instruction Summary
Table 6-2 lists the instruction types associated with DAG transfer instruc-
tions. Note that instruction set types may have more options (conditions
or compute). For more information see Chapter 9, Instruction Set Types.
In these tables, note the meaning of the following symbols:
DM(Mb,Ia)=UREG(LW);
PM(Md,Ic)=UREG(LW);
UREG=DM(Mb,Ia)(LW);
UREG=PM(Mc,Id)(LW);
DREG=DM(Ia,<data6>);
DREG=PM(Ic,<data6>);
Ia=MODIFY(Ia,Mb); //ADSP-214xx
Ic=MODIFY(Ic,Md); //ADSP-214xx
UREG=DM(<data32>,Ia)(LW);
UREG=PM(<data32>,Ic)(LW);
UREG=DM(<data7>,Ia)(LW);
UREG=PM(<data7>,Ic)(LW);
Ia=MODIFY(Ia,<data32>);
//ADSP-214xx
Ic=MODIFY(Ic,<data32>);
//ADSP-214xx
Ia=BITREV(Ia,<data32>);
//ADSP-214xx
Ic=BITREV(Ic,<data32>);
//ADSP-214xx
Operating Modes
This section describes all modes related to the DAG which are enabled by
a control bit in the MODE1, MODE2 and SYSCTL registers.
M5=1;
USTAT1 = dm(SYSCTL);
bit set USTAT1 IMDW0; /* Blk0 access 40-bit precision */
dm(SYSCTL) = USTAT1;
NOP; /* effect latency */
DM(I5,M5)=R0, PM(I9,M9)=R4; /* DAG1 32-bit, DAG2 40-bit */
Note that the sequencer uses 48-bit memory accesses for instruction
fetches. Programs can make 48-bit accesses with PX register moves, which
default to 48 bits. For more information, see Chapter 2, Register Files.
/* block 0 */
/* block 1 */
When using circular buffers, the DAGs can generate an interrupt on buf-
fer overflow (wraparound). For more information, see “DAG Status” on
page 6-31.
The DAGs support addressing circular buffers. This is defined as address-
ing a range of addresses which contain data that the DAG steps through
repeatedly, wrapping around to repeat stepping through the range of
addresses in a circular pattern. To address a circular buffer, the DAG steps
the index pointer (I register) through the buffer, post-modifying and
updating the index on each access with a positive or negative modify value
(M register or immediate value). If the index pointer falls outside the buf-
fer, the DAG subtracts or adds the buffer length to the index value,
wrapping the index pointer back within the start and end boundaries of
the buffer. The DAG’s support for circular buffer addressing appears in
Figure 6-1 on page 6-3, and an example of circular buffer addressing
appears in Figure 6-6 and Figure 6-7.
The starting address that the DAG wraps around is called the buffer’s base
address (B register). There are no restrictions on the value of the base
address for a circular buffer.
0 1 0 0 0
1 1 4 1 1
2 2 2 7 2
3 3 3 3 10
4 2 4 4 4
5 5 5 5 5
6 6 6 8 6
7 7 7 7 11
8 3 8 8 8
9 9 6 9 9
10 10 10 9 10
THE COLUMNS ABOVE SHOW THE SEQUENCE IN ORDER OF LOCATIONS ACCESSED IN ONE PASS
NOTE THAT “0” ABOVE IS BASE ADDRESS. THE SEQUENCE REPEATS ON SUBSEQUENT PASSES
Figure 6-7 shows a circular buffer with the same syntax as in Figure 6-6,
but with a negative modifier (M1=–4).
0 1 0 0 0 0
1 1 1 1 9 1
2 2 2 6 2 2
3 3 3 3 3 3
4 4 4 4 4 11
5 5 5 5 8 5
6 6 6 5 6 6
7 7 2 7 7 7
8 8 8 8 8 10
9 9 9 9 7 9
10 10 10 4 10 10
After circular buffering is set up, the DAGs use the modulus logic in
Figure 6-1 on page 6-3 to process circular buffer addressing.
Using circular buffering with odd length in SIMD mode allows the
implicit move to exceed the circular buffer limits.
Wraparound Addressing
When circular buffering is enabled, on the first post-modify access to the
buffer, the DAG outputs the I register value on the address bus then mod-
ifies the address by adding the modify value. If the updated index value is
within limits of the buffer, the DAG writes the value to the I register. If
the updated value is outside the buffer limits, the DAG subtracts (for pos-
itive M) or adds (for negative M) the L register value before writing the
updated index value to the I register. In equation form, these post-modify
and wraparound operations work as follows.
• If M is positive:
• Inew = Iold + M if Iold + M < Buffer base + length (end of buffer)
• Inew = Iold + M – L if Iold + M buffer base + length
• If M is negative:
• Inew = Iold + M if Iold + M buffer base (start of buffer)
• Inew = Iold + M + L if Iold + M < buffer base (start of buffer)
The DAGs use all four types of DAG registers for addressing circular buf-
fers. These registers operate as follows for circular buffering.
• The index (I) register contains the value that the DAG outputs on
the address bus.
• The modify (M) register contains the post-modify value (positive or
negative) that the DAG adds to the I register at the end of each
memory access. The M register can be any M register in the same
DAG as the I register and does not have to have the same number.
The modify value can also be an immediate value instead of an M
register. The size of the modify value, whether from an M register or
immediate, must be less than the length (L register) of the circular
buffer.
• The length (L) register sets the size of the circular buffer and the
address range that the DAG circulates the I register through.
The L register must be positive and cannot have a value greater
than 231 – 1. If an L register’s value is zero, its circular buffer oper-
ation is disabled.
• The DAG compares the base (B) register, or the B register plus the L
register, to the modified I value after each access. When the B regis-
ter is loaded, the corresponding I register is simultaneously loaded
with the same value. When I is loaded, B is not changed. Programs
can read the B and I registers independently.
Clearing the CBUFEN bit disables circular buffering for all data load and
store operations. The DAGs perform normal post-modify load and store
accesses, ignoring the B and L register values. Note that a write to a B regis-
ter modifies the corresponding I register, independent of the state of the
CBUFEN bit.
mode
Broadcast Load Mode performs memory reads only. Broadcast
only operates with data registers ( ) or complement data
DREG
registers (CDREG). Enabling either DAG register to perform a broad-
cast load has no effect on register stores or loads to universal
registers (Ureg). For example:
R0=DM(I1,M1); /* I1 load to R0 and S0 */
S10=PM(I9,M9); /* I9 load to S10 and R10 */
Rx = dm(i1,ma); Sx = dm(i1,ma);
Rx = pm(i9,mb); Sx = pm(i9,mb);
Rx = dm(i1,ma), Ry = pm(i9,mb); Sx = dm(i1,ma), Sy = pm(i9,mb);
cast
The bit (SISD/SIMD mode select) does not influence broad-
PEYEN
operations. Broadcast loading is particularly useful in SIMD
applications where the algorithm needs identical data loaded into
each processing element. For more information on SIMD mode (in
particular, a list of complementary data registers), see “Data Regis-
ter Neighbor Pairing” on page 2-5.
Bit-Reverse Mode
The bit reserve mode is useful for FFT calculations, if using a DIT (deci-
mation in time) FFT, all inputs must be scrambled before running the
FFT, thus the output samples are directly interpretable. For DIF (decima-
tion in frequency) FFT the process is reversed. This mode automates bit
reversal, no specific instruction is required.
The BR0 and BR8 bits in the MODE1 register enable the bit-reverse addressing
mode where addresses are output in reverse bit order. When BR0 is set
(= 1), DAG1 bit-reverses 32-bit addresses output from I0. When BR8 is set
(= 1), DAG2 bit-reverses 32-bit addresses output from I8. The DAGs
bit-reverse only the address output from I0 or I8; the contents of these
registers are not reversed. Bit-reverse addressing mode effects post-modify
operations.
Listing 6-7 demonstrates how bit-reverse mode effects address output.
SIMD Mode
When the PEYEN bit in the MODE1 register is set (=1), the processors are in
single-instruction, multiple-data (SIMD) mode. In SIMD mode, many
data access operations differ from the processor’s default single-instruc-
tion, single-data (SISD) mode. These differences relate to doubling the
amount of data transferred for each data access.
For example, processing two channels in parallel requires a more complex
data layout since all inputs and outputs for the two channels have to be
interleaved—that is all even array elements represent one channel while all
odd elements represent the other.
address, and the implicit transfer is between the implicit register and the
implicit address.
SISD — —
SIMD SW 16-bit PM(Ic, Md) DM(Ia+2, Mb) PM(Md, Ic) DM(Mb+2, Ia)
PM(Ic+2, Md) PM(Md+2, Ic)
In(explicit
SIMD mode, both aligned (explicit even address) and unaligned
odd address) transfers are supported.
R0=DM(I1,M1); /* I1 points to NW space */
S0=DM(I1+1,M1); /* implicit instruction */
R10=PM(I10,M11); /* I1 points to SW space */
S10=PM(I10+2,M11); /* implicit instruction */
When the processors are in SIMD mode, if the DAG register is a destina-
tion of a transfer from a register file data register source, the processor
executes the explicit move only on the condition in PEx becoming true,
whereas the implicit move is not performed. This is also true when both
the source and the destination is a DAG register.
BIT SET MODE1 PEYEN; /* SIMD */
NOP; / * effect latency */
I8 = R5; /* Loads I8 with R5 */
I0 M0 L0 B0
I1 M1 L1 B1
SRD1L
I2 M2 L2 B2
I3 M3 L3 B3
I4 M4 L4 B4
I5 M5 L5 B5
SRD1H
I6 M6 L6 B6
I7 M7 L7 B7
DAG2 REGISTERS
I8 M8 L8 B8
I9 M9 L9 B9
SRD2L
I10 M10 L10 B10
Example 1
BIT SET MODE1 SRD1L; /* Activate alternate dag1 lo regs */
NOP; /* Wait for access to alternates */
R0 = DM(i0,m1);
Example 2
BIT SET MODE1 SRD1L; /*activate alternate dag1 lo registers */
R13 = R12 + R11; /* Any unrelated instruction */
R0 = DM(I0,M1);
DAG Interrupts
The DAG interrupt overview is shown in Table 6-5.
There is one set of registers (I7 and I15) in each DAG that can generate an
interrupt on circular buffer overflow (address wraparound). For more
information, see “DAG Status” on page 6-31.
When a program needs to use I7 or I15 without circular buffering and the
processor has the circular buffer overflow interrupts unmasked, the pro-
gram should disable the generation of these interrupts by setting the
B7/B15 and L7/L15 registers to values that prevent the interrupts from
occurring. If, for example, I7 were accessing the address range 0x1000 –
0x2000, the program could set B7 = 0x0000 and L7 = 0xFFFF. Because the
processor generates the circular buffer interrupt based on the wraparound
equations on page 6-23, setting the L register to zero does not necessarily
achieve the desired results. If the program is using either of the circular
buffer overflow interrupts, it should avoid using the corresponding I regis-
ter(s) (I7 or I15) where interrupt branching is not needed.
There are two special situations to be aware of when using circular buffers:
1. In the case of circular buffer overflow interrupts, if CBUFEN = 1 and
register L7 = 0 (or L15 = 0), then the CB7I (or CB15I) interrupt
occurs at every change of I7 (or I15), after the index register (I7 or
I15) crosses the base register (B7 or B15) value. This behavior is
independent of the context of both primary and alternate DAG
registers.
2. When a LW access, SIMD access, or normal word access with the
LW option crosses the end of the circular buffer, the processor com-
pletes the access before responding to the end of buffer condition.
Enable interrupts and use an interrupt service routine (ISR) to handle the
overflow condition immediately. This method is appropriate if it is
important to handle all overflows as they occur; for example in a
“ping-pong” or swap I/O buffer pointers routine.
DAG Status
The DAGs can provide buffer overflow information when executing circu-
lar buffer addressing for the I7 or I15 registers. When a buffer overflow
occurs (a circular buffering operation increments the I register past the
end of the buffer or decrements below the start of the buffer), the appro-
priate DAG updates a buffer overflow flag in a sticky status (STKYx)
register. Use the BIT TST instruction to examine overflow flags in the STKY
register after a series of operations. If an overflow flag is set, the buffer has
overflowed or wrapped around at least once. This method is useful when
overflow handling is not time sensitive.
SISD Mode
Programs can use odd or even modify values (1, 2, 3, …) to step through a
buffer in single- or dual-data, SISD or broadcast load mode regardless of
the data word size (long word, extended-precision normal word, normal
word, or short word).
Note that programs must step through a buffer twice, once for addressing
even short word addresses and once for addressing odd short word
addresses.
Features
The following are the memory interface features.
• Four independent internal memory blocks comprised of RAM and
ROM.
• Each block can be configured for different combinations of code
and data storage.
• Each block consists of four columns and each column is 16 bits
wide.
• Each block maps to separate regions in memory address space and
can be accessed as 16-bit, 32-bit, 48-bit, or 64-bit words.
• Each block also has its own two-deep self clearing shadow write
buffers with automatic hit detection and data forwarding logic for
read access.
• Memory aliasing allows inter access of same space from different
word sizes
The following code examples and Table 7-1 illustrate the differences
between Harvard and Super Harvard capabilities.
Standard Harvard Architecture
When instructions and data passing over the PM bus cause a conflict, the
conflict cache resolves them using hardware that act as a third bus feeding
the sequencer’s pipeline with instructions.
Processor core and I/O processor accesses to internal memory are com-
pletely independent and transparent to one another. Each block of
memory can be accessed by the processor core and I/O processor in every
cycle provided the access is to different block of the memory.
Functional Description
The following sections provide detail about the processor’s memory
function.
Figure 7-1 shows how the memory map addresses the different memory
regions.
31 23 21 20 18 17 0
If bit 17-16 = 00 IOP peripheral
if bit 17-16 = 11 IOP core
Internal Memory
Peripheral Peripheral
Core Bus DMA Bus
Internal Memory
Figure 7-3. Memory and Internal Buses Block Diagram (All Other
SHARC Products)
If the core requests continuously the bridge, it stalls for one core cycle for
each write starting with the second. Therefore, each write takes two cycles
except for the first, which takes just one.
When the core requests a write once in every cycle of PCLK clock, (every
alternate CCLK cycle) then writes occur without stalls.
Accesses to IOP registers (from the processor core) should not use
Type 1 (dual access) or LW or forced LW instructions.
Note that an atomic write and read from the same IOP peripheral register
takes 11 (best case) or 13 (worst case) CCLK cycles. The following addi-
tional information about access to peripheral data buffers should be noted.
• Attempting to write to a full (or read from empty) peripheral data
buffer causes the core to hang indefinitely, unless the BHD (buffer
hang disable) bit for that peripheral is set.
• In case of a full transmit buffer, the held-off I/O processor register
read or write access incurs one extra core-clock cycle.
• Interrupted IOP register reads and writes, if preceded by another
write creates one additional core stall cycle.
To prevent out of order instruction execution the above code can be mod-
ified to:
N:r0=SPIEN;
N+1:dm(SPICTL)=r0;
N+2:nop; nop; nop; nop; nop;
N+7:bit CLR FLAGS FLG0;
or:
N:r0=SPIEN;
N+1:dm(SPICTL)=r0;
N+2:r10=dm(SPICTL); /* dummy read forces previous write
to complete */
N+3:bit CLR FLAGS FLG0;
On-Chip Buses
The processor has up to four sets of internal buses connected to its sin-
gle-ported memory, the program memory (PM), data memory (DM), and
I/O processor (IOP) buses. The IOP bus is designed to run only at half
the core clock frequency. The three buses share the single port on each of
the four memory blocks. Memory accesses from the processor’s core (com-
putational units, data address generators, or program sequencer) use the
PM or DM buses, while the I/O processor uses the IOP bus for memory
accesses. The I/O processor can access external memory devices. For more
information about the external memory and I/O capabilities of the proces-
sor, see the product-specific hardware reference. Figure 7-2 on page 7-6
and Figure 7-3 on page 7-7 show the bus structures of the
ADSP-21362/3/4/5/6 processors and the ADSP-21367/8/9 and later
products respectively.
Rotation 3 Rotation 2
Rotation 2 Rotation 1
Addresses
Rotation 1 Rotation 0
0 15 0 15 0 15 0 15
For long word (64 bits), normal word (32 bits), and short word (16 bits)
memory accesses, the processor selects from fixed columns in memory. No
rotations of words within columns occur for these data types.
0 15 0 15 0 15 0 15
Column 3 Column 2 Column 1 Column 0
0 15 0 15 0 15 0 15
Column 3 Column 2 Column 1 Column 0
Figure 7-6. Mixed Instructions and Data With One Unused Location
0 15 0 15 0 15 0 15
Column 3 Column 2 Column 1 Column 0
Figure 7-7. Mixed Instructions and Data With Two Unused Locations
m = B + (3/2 (n – B)) + 1)
where:
• n is the first unused address after the end of 48-bit words
• B is the base normal word 48-bit address of the internal memory
block
• m is the first 32-bit normal word address to use after the end of
48-bit words. For the ADSP-21367 memory layout:
• block 0 = 0x80000 <= n <= 0x93FFF
• block 1 = 0xA0000 <= n <= 0xB3FFF
• block 2 = 0xC0000 <= n <= 0xC1554
• block 3 = 0xE0000 <= n <= 0xE1554
Note that the linker verifies the wrapping rules of different output
sections and returns an overlap error message during project build
if the rules are violated.
CORE
BLOCK 0 BLOCK 1
DMA
Data can be read from internal memory in either of the following ways.
1. From the shadow write FIFO (caused by immediately read of the
same data after a write)
2. From the memory block
Interrupts
Table 7-3 provides an overview of interrupts associated with the SHARC
memory.
starts at the first internal RAM normal word address. If the boot mode is
selected to reserved boot mode on ROM based versions, the vector table
starts in ROM normal word address.
The internal interrupt vector table (IIVT) bit in the SYSCTL register over-
rides the default placement of the vector table. If IIVT is set (=1), the
interrupt vector table starts at internal RAM regardless of the booting
mode. If IIVT is cleared (=0), the IIVT starts in the internal ROM.
For information about processor booting, see the processor-specific hard-
ware manual.
interrupt.
The I/O processor’s DMA controller cannot generate the
For more information, see “Mode Control 2 Register
IICDI
… … … … … … … …
… … … … … … … …
… … … … … … … …
WORD Y2 WORD X2
ADDRESS
ADDRESS
WORD Y1 WORD X1
WORD Y0 WORD X0
The following code example shows the access for even and odd addresses.
When accessing an odd address, the sticky bit is set to indicate the
unaligned access.
bit set mode2 U64MAE; /* set bit for aligned or
unaligned 64-bit access*/
r0 = 0x11111111;
r1 = 0x22222222;
pm(0x98200) = r0(lw); /* even address in 32-bit, access
is aligned */
pm(0x98201) = r0(lw); /* odd address in 32-bit, sticky
bit is set */
ing
Where a cross (†) appears in the registers in any of the follow-
PEx
figures, it indicates that the processor zero-fills or sign-extends
the most significant 16 bits of the data register while loading the
short word value into a 40-bit data register. Zero-filling or
sign-extending depends on the state of the SSE bit in the MODE1 sys-
tem register. For short word transfers, the least significant 8 bits of
the data register are always zero.
data bus. The processor drives the other short word lanes of the data buses
with zeros.
In SISD mode, the instruction accesses the PEx registers to transfer data
from memory. This instruction accesses WORD X0, whose short word
address has “00” for its least significant two bits of address. Other loca-
tions within this row have addresses with least significant two bits of “01”,
“10”, or “11” and select WORD X1, WORD X2, or WORD X3 from memory
respectively. The syntax targets register RX in PEx.
… … … … … … … …
… … … … … … … …
… … … … … … … …
WORD Y11 WORD Y10 WORD Y9 WORD Y8 WORD X11 WORD X10 WORD X9 WORD X8
ADDRESS
ADDRESS
WORD Y7 WORD Y6 WORD Y5 WORD Y4 WORD X7 WORD X6 WORD X5 WORD X4
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
SA SX
39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SISD, SHORT WORD, SINGLE-DATA TRANSFERS ARE:
UREG = PM(SHORT WORD ADDRESS);
UREG = DM(SHORT WORD ADDRESS);
PM(SHORT WORD ADDRESS) = UREG;
DM(SHORT WORD ADDRESS) = UREG;
… … … … … … … …
… … … … … … … …
… … … … … … … …
WORD Y11 WORD Y10 WORD Y9 WORD Y8 WORD X11 WORD X10 WORD X9 WORD X8
ADDRESS
ADDRESS
WORD Y7 WORD Y6 WORD Y5 WORD Y4 WORD X7 WORD X6 WORD X5 WORD X4
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
SA SX
39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SIMD, SHORT WORD, DUAL-DATA TRANSFERS ARE:
DREG = PM(SHORT WORD ADDRESS), DREG = DM(SHORT WORD ADDRESS);
PM(SHORT WORD ADDRESS) = DREG, DM(SHORT WORD ADDRESS) = DREG;
… … … … … … … …
… … … … … … … …
… … … … … … … …
WORD Y11 WORD Y10 WORD Y9 WORD Y8 WORD X11 WORD X10 WORD X9 WORD X8
ADDRESS
ADDRESS
WORD Y7 WORD Y6 WORD Y5 WORD Y4 WORD X7 WORD X6 WORD X5 WORD X4
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
SA SX
39-24 23-8 7-0 39-24 23-8 7-0
… … … … … … … …
… … … … … … … …
… … … … … … … …
WORD Y11 WORD Y10 WORD Y9 WORD Y8 WORD X11 WORD X10 WORD X9 WORD X8
ADDRESS
ADDRESS
WORD Y7 WORD Y6 WORD Y5 WORD Y4 WORD X7 WORD X6 WORD X5 WORD X4
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
SA SX
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SIMD, SHORT WORD, DUAL-DATA TRANSFERS ARE:
DREG = PM(SHORT WORD ADDRESS), DREG = DM(SHORT WORD ADDRESS);
PM(SHORT WORD ADDRESS) = DREG, DM(SHORT WORD ADDRESS) = DREG;
For normal word accesses, the processor zero-fills the least signifi-
cant 8 bits of the data register on loads and truncates these bits on
stores to memory.
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
WORD Y3 WORD Y2 WORD X3 WORD X2
RA RY RX
39-24 23-8 7-0 39-24 23-8 7-0
WORD X0 0X00
SA SX
39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SISD, NORMAL WORD, SINGLE-DATA TRANSFERS ARE:
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
WORD Y3 WORD Y2 WORD X3 WORD X2
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
SA SX
39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SISD, NORMAL WORD, DUAL-DATA TRANSFERS ARE:
DREG = PM(NORMAL WORD ADDRESS), DREG = DM(NORMAL WORD ADDRESS);
PM(NORMAL WORD ADDRESS) = DREG, DM(NORMAL WORD ADDRESS) = DREG;
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
WORD Y3 WORD Y2 WORD X3 WORD X2
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
WORD X0 0X00
SA SX
39-24 23-8 7-0 39-24 23-8 7-0
WORD X1 0X00
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SIMD, NORMAL WORD, SINGLE-DATA TRANSFERS ARE:
UREG = PM(NORMAL WORD ADDRESS);
UREG = DM(NORMAL WORD ADDRESS);
PM(NORMAL WORD ADDRESS) = UREG;
DM(NORMAL WORD ADDRESS) = UREG;
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
WORD Y3 WORD Y2 WORD X3 WORD X2
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
SA SX
39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SIMD, NORMAL WORD,
DUAL-DATA TRANSFERS ARE:
DREG = PM(NORMAL WORD ADDRESS), DREG = DM(NORMAL WORD ADDRESS);
PM(NORMAL WORD ADDRESS) = DREG, DM(NORMAL WORD ADDRESS) = DREG;
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
WORD Y2 WORD Y1 WORD X2 WORD X1
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
WORD X0
SA SX
39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SISD OR SIMD, EXT. PREC. NORMAL WORD, SINGLE-DATA
TRANSFERS ARE:
UREG = PM(EXTENDED PRECISION NORMAL WORD ADDRESS);
UREG = DM(EXTENDED PRECISION NORMAL WORD ADDRESS);
PM(EXTENDED PRECISION NORMAL WORD ADDRESS) = UREG;
DM(EXTENDED PRECISION NORMAL WORD ADDRESS) = UREG;
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
WORD Y2 WORD Y1 WORD X2 WORD X1
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
WORD Y0 WORD X0
SA SY SX
39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SISD, EXTENDED PRECISION NORMAL WORD, DUAL-DATA
TRANSFERS ARE:
DREG = PM(EXT. PREC. NORMAL WORD ADDRESS), DREG = DM(EXT. PREC. NORMAL WORD ADDRESS);
PM(EXT. PREC. NORMAL WORD ADDRESS) = DREG, DM(EXT. PREC. NORMAL WORD ADDRESS) = DREG;
… … … … … … … …
… … … … … … … …
… … … … … … … …
WORD Y2 WORD X2
ADDRESS
ADDRESS
WORD Y1 WORD X1
WORD Y0 WORD X0
PEX REGISTERS RB RA RY RX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
PEY REGISTERS SB SA SY SX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SISD OR SIMD, LONG WORD, SINGLE-DATA TRANSFERS ARE:
UREG = PM(LONG WORD ADDRESS);
UREG = DM(LONG WORD ADDRESS);
PM(LONG WORD ADDRESS) = UREG;
DM(LONG WORD ADDRESS) = UREG;
… … … … … … … …
… … … … … … … …
… … … … … … … …
WORD Y2 WORD X2
ADDRESS
ADDRESS
WORD Y1 WORD X1
WORD Y0 WORD X0
PEX REGISTERS RB RA RY RX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
WORD Y0, 63-32 0X00 WORD Y0, 31-0 0X00 WORD X0, 63-32 0X00 WORD X0, 31-0 0X00
PEY REGISTERS SB SA SY SX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SISD, LONG WORD, DUAL-DATA TRANSFERS ARE:
DREG = PM(LONG WORD ADDRESS), DREG = DM(LONG WORD ADDRESS);
PM(LONG WORD ADDRESS) = DREG, DM(LONG WORD ADDRESS) = DREG;
… … … … … … … …
… … … … … … … …
… … … … … … … …
WORD Y11 WORD Y10 WORD Y9 WORD Y8 WORD X11 WORD X10 WORD X9 WORD X8
ADDRESS
ADDRESS
WORD Y7 WORD Y6 WORD Y5 WORD Y4 WORD X7 WORD X6 WORD X5 WORD X4
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
SA SX
39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR BROADCAST, SHORT WORD, SINGLE-DATA TRANSFERS ARE:
DREG = PM(SHORT WORD ADDRESS);
DREG = DM(SHORT WORD ADDRESS);
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
ADDRESS
WORD Y11 WORD Y10 WORD Y9 WORD Y8 WORD X11 WORD X10 WORD X9 WORD X8
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
SY SX
39-24 23-8 7-0 39-24 23-8 7-0
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
WORD Y3 WORD Y2 WORD X3 WORD X2
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
WORD X0 0X00
SA SX
39-24 23-8 7-0 39-24 23-8 7-0
WORD X0 0X00
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR BROADCAST, NORMAL WORD, SINGLE-DATA TRANSFERS ARE:
DREG = PM(NORMAL WORD ADDRESS);
DREG = DM(NORMAL WORD ADDRESS);
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
WORD Y3 WORD Y2 WORD X3 WORD X2
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
SY SX
39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR BROADCAST, NORMAL WORD, DUAL-DATA TRANSFERS ARE:
DREG = PM(NORMAL WORD ADDRESS), DREG = DM(NORMAL WORD ADDRESS);
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
WORD Y2 WORD Y1 WORD X2 WORD X1
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
WORD X0
SA SX
39-24 23-8 7-0 39-24 23-8 7-0
WORD X0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR BROADCAST, EXTENDED NORMAL WORD, SINGLE-DATA TRANSFERS ARE:
DREG = PM(EP NORMAL WORD ADDRESS);
DREG = DM(EP NORMAL WORD ADDRESS);
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
WORD Y2 WORD Y1 WORD X2 WORD X1
RA RX
39-24 23-8 7-0 39-24 23-8 7-0
WORD Y0 WORD X0
SY SX
39-24 23-8 7-0 39-24 23-8 7-0
WORD Y0 WORD X0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR BROADCAST, EXTENDED NORMAL WORD,
DUAL-DATA TRANSFERS ARE:
DREG = PM(EP NORMAL WORD ADDRESS), DREG = DM(EPNORMAL WORD ADDRESS);
… … … … … … … …
… … … … … … … …
… … … … … … … …
WORD Y2 WORD X2
ADDRESS
ADDRESS
WORD Y1 WORD X1
WORD Y0 WORD X0
PEX REGISTERS RB RA RY RX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
PEY REGISTERS SB SA SY SX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR BROADCAST, LONG WORD, SINGLE-DATA TRANSFERS ARE:
… … … … … … … …
… … … … … … … …
… … … … … … … …
WORD Y2 WORD X2
ADDRESS
ADDRESS
WORD Y1 WORD X1
WORD Y0 WORD X0
PEX REGISTERS RB RA RY RX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
WORD Y0, 63-32 0X00 WORD Y0, 31-0 0X00 WORD X0, 63-32 0X00 WORD X0, 31-0 0X00
PEY REGISTERS SB SA SY SX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
WORD Y0, 63-32 0X00 WORD Y0, 31-0 0X00 WORD X0, 63-32 0X00 WORD X0, 31-0 0X00
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR BROADCAST, LONG WORD, DUAL-DATA TRANSFERS ARE:
Incessorcaseonly
of conflicting dual access to the data register file, the pro-
performs the access with higher priority. For more
information on how the processor prioritizes accesses, see “Register
Files” in Chapter 2, Register Files.
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
WORD Y7 WORD Y6 WORD Y5 WORD Y4 WORD X1
PEX REGISTERS RB RA RY RX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
0X0000† WORD Y0 0X00 WORD X0, 63-32 0X00 WORD X0, 31-0 0X00
PEY REGISTERS SB SA SY SX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SISD, MIXED WORD, DUAL-DATA TRANSFERS ARE:
DREG = PM(SHORT, NORMAL, EP NORMAL, LONG ADD), DREG = DM(SHORT, NORMAL, EP NORMAL, LONG ADD);
PM(SHORT, NORMAL, EP NORMAL, LONG ADD) = DREG, DM(SHORT, NORMAL, EP NORMAL, LONG ADD) = DREG;
… … … … … … … …
… … … … … … … …
… … … … … … … …
ADDRESS
ADDRESS
PEX REGISTERS RB RA RY RX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
PEY REGISTERS SB SA SY SX
39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0 39-24 23-8 7-0
OTHER INSTRUCTIONS WITH SIMILAR DATA FLOWS FOR SIMD, MIXED WORD,
DUAL-DATA TRANSFERS ARE:
DREG = PM(ADDRESS), DREG = DM(ADDRESS);
PM(ADDRESS) = DREG, DM(ADDRESS) = DREG;
Features
The JTAG port has the following features.
• Support Boundary scan—PCB interconnect test
• Support standard emulation—start stop and single step
• Enhanced standard emulation with instruction and data break-
points, event count, valid and invalid address range detection
• Support enhanced emulation—statistical profiling for benchmark-
ing, and background telemetry channel (BTC) for memory
on-the-fly debug
• Support for user breakpoint—user instruction for breakpoint
Functional Description
The following sections provide descriptions about JTAG functionality.
An ADI specific pin (EMU) is used in the JTAG emulators from Analog
Devices. This pin is not defined in the IEEE-1149.1 specification. Refer
to the IEEE 1149.1 JTAG specification for detailed information on the
JTAG interface.
Target systems must have a 14-pin connector in order to accept the Ana-
log Devices Tools product line of JTAG emulator in-circuit probe, a
14-pin plug. For more information refer to Engineer-to-Engineer note
EE-68.
BOUNDARY REGISTER
N-2 2
N-1 1
0
N
BYPASS REGISTER
TDI 1 TDO
4 0
3 1
INSTRUCTION REGISTER
TAP Controller
The TAP controller is a synchronous, 16-state, finite-state machine con-
trolled by the TCK and TMS pins. Transitions to the various states in the
diagram occur on the rising edge of TCK and are defined by the state of the
TMS pin, here denoted by either a logic 1 or logic 0 state. For full details of
the operation, see the JTAG standard. Figure 8-2 shows the state diagram
for the TAP controller.
Test-Logic_Reset
1 0
1 1 1
Run-Test/Idle Select-DR-Scan Select-IR-Scan
0
0 0
1 1
Capture-DR Capture-IR
0 0
0 0
Shift-DR Shift-IR
1 1
1 1
Exit1-DR Exit1-IR
0 0 0
0
Pause-DR Pause-IR
1 1
0 0
Exit2-DR Exit2-IR
1 1
Update-DR Update-IR
1 0 1 0
Instruction Registers
Information in this section describes the control (JTAG) registers. The
instruction register is used to determine the action to be performed and
the data register to be accessed. There are two types of instructions, one
for boundary scan mode and the other for emulation mode. This register
selects the performed test and/or the access of the test data register. The
instruction register is 5 bits long with no parity bit.
No special values need to be written into any register prior to the selection
of any instruction. Other registers, reserved for use by Analog Devices,
exist. However, this group of registers should not be accessed as they can
cause damage to the part.
Breakpoints
This section explains the different types of breakpoint and conditions to
hit breakpoints.
Software Breakpoints
Software breakpoints are implemented by the processor as a special type of
instruction. The instruction, EMUIDLE is not a public instruction, and is
only decoded by the processor when specific bits are set in emulation con-
trol. If the processor encounters the EMUIDLE instruction and the specific
bits are not set in emulation control, then the processor executes a NOP
instruction. The EMUIDLE instruction triggers a high emulator interrupt.
When EMUIDLE is executed, the emulation clock counter halts
immediately.
Automatic Breakpoints
The IDDE (tools environment) places the labels (_main) and
(___lib_prog_term) automatically at software breakpoints (EMUIDLE). If
you place the (_main) label at the beginning of user code it will simplify
start execution code after reset (initialization like DDR2/SDRAM or run-
time environment) until the breakpoint (_main) is hit before the programs
enters user code.
For more information, refer to the tools documentation.
Hardware Breakpoints
Hardware breakpoints allow much greater flexibility than software break-
points provided by the EMUIDLE instruction. As such, they require much
more design thought and resources within the processor. At the simplest
level, hardware breakpoints are helpful when debugging ROM code where
the emulation software can not replace instructions with the EMUIDLE
instruction. As hardware breakpoint units capabilities are increased, so are
the benefits to the developer. At a minimum, an effective hardware break-
point unit will have the capability to trigger a break on load, store, and
fetch activities.
Additionally, address ranges, both inclusive (bounded) and exclusive
(unbounded) should be included.
Operating Modes
The following sections detail the operation of the JTAG port.
shift register so that data can be read from or written to them through a
serial test access port (TAP).
The SHARC processors contain a test access port compatible with the
industry-standard IEEE 1149.1 (JTAG) specification. Only the IEEE
1149.1 features specific to the processors are described here. For more
information, see the IEEE 1149.1 specification and the other documents
listed in “References” on page 8-22.
The boundary scan allows a variety of functions to be performed on each
input and output signal of the SHARC processors. Each input has a latch
that monitors the value of the incoming signal and can also drive data into
the chip in place of the incoming value. Similarly, each output has a latch
that monitors the outgoing signal and can also drive the output in place of
the outgoing value. For bidirectional pins, the combination of input and
output functions is available.
Todrivenprotect the internal logic when the boundary outputs are over
or signals are received on the boundary inputs, make sure
that nothing else drives data on the processor’s output pins.
Boundary Scan Description Language (BSDL) is a subset of VHDL that is
used to describe how JTAG (IEEE 1149.1) is implemented in a particular
device. For a device to be JTAG compliant, it must have an associated
BSDL file. For the SHARC processors, BSDL files are available on the
Analog Devices Inc., web site.
• Examine registers
• Perform cycle counting
The processor must be halted to send data and commands, but once an
operation is completed by the emulator, the system is set running at full
speed with no impact on system timing. The emulator does not impact
target loading or timing. The emulator’s in-circuit probe connects to a
variety of host computers (USB or PCI) with plug-in boards.
Emulation Control
The processor is free running. In order to observe the state of the core, the
emulator must first halt instruction execution and enter emulation mode.
In this mode, the emulation software sets up a halt condition by selecting
the EMUCTL register and enabling bits 1–0 and 5.
The emulator then returns to run-test-idle. At this point, the processor is
not halted. In the next scan, the emulator selects the EMUIR register, and
shifts in the NOP instruction. At the very beginning of the scan, the TMS sig-
nal rises, and at this point, before the scan has ended, the processor halts.
When the emulator finishes the scan by returning to run-test-idle, the
processor executes a NOP instruction. Not that the EMUCTL register is only
accessible via the TAP.
Conditional Breakpoints
The breakpoint sets are grouped into four types:
• 4x instruction breakpoints (IA)
• 2x data breakpoints for DM bus (DA)
• 1x data breakpoints for PM bus (PA)
• 1x data breakpoints for DMA (I/O)
The individual breakpoint signals in each group are logically ORed
together to create a composite breakpoint signal per group.
Each breakpoint group has an enable bit in the EMUCTL/BRKCTL register.
When set, these bits add the specified breakpoint group into the genera-
tion of the effective breakpoint signal. If cleared, the specified breakpoint
group is not used in the generation of the effective breakpoint signal. This
allows the user to trigger the effective breakpoint from a subset of the
breakpoint groups.
These composite signals can be optionally ANDed or ORed together to
create the effective breakpoint event signal used to generate an emulator
interrupt. The ANDBKP bit in the BRKCTL register selects the function used.
Note that programs must load this register with a value greater or equal to
zero for proper breakpoint generation under the condition that bit 25
(UMODE bit) in the BRKCTL register is set.
Statistical Profiling
Statistical profiling allows the emulation software to sample the processors
PC value while the processor is running. By sampling at random intervals,
a profile can be created which can aid the developer in tuning
performance critical code sections. As a second use, statistical profiling can
also aid in finding dead code as well as being used to make code partition
decisions. Fundamentally, statistical profiling is supported by one addi-
tional JTAG shift register called EMUPC and a register which latches the
sampled PC. The EMPUC register is a 24-bit serial shift register which
registers
If the bit in the
UMODE register is set, all address breakpoint
BRKCTL
can be written in user space.
For more information, see “Breakpoint Control Register (BRKCTL)” on
page A-47.
Programming Examples
Listing 8-1 is an example that shows how to trigger an exception for a
valid address.
r5 = 0x15;
dm(EMUN) = r5; /* set event count */
USTAT1 = dm(BRKCTL);
BIT SET USTAT1 ENBDA; /* enable the dm access break points */
dm(BRKCTL) = USTAT1;
ISR_BKPI:
r4 = dm(EEMUSTAT); /* read status bits */
rti; /* status register cleared */
dm(DMA2S) = r4;
dm(DMA2E) = r5;
scanned in, while in user space, the instruction is taken from an emulation
instruction register, rather that from the PMD bus. In user space, the pro-
gram counter also stops incrementing. All other aspects of instruction
execution are the same in both modes.
Control for breakpoints is also available in emulation space. The emula-
tion control register has equivalent control bits to the BRKCTL register to
control breakpoints. The control of breakpoints can be flipped back and
forth between emulation space and the core by flipping the (UMODE) bit 25
in the BRKCTL register.
Note that the EMUCTL and BRKCTL register bit settings are almost identical.
The EMUCTL register is accessed by the debugger over the TAP while the
BRKCTL register access is user code specific.
JTAG Interrupts
Table 8-4 provides an overview of the interrupts associated with the JTAG
port.
Interrupt Types
Four different types of interrupts/breakpoints are generated.
tCK is specified in ns and 5 extra tCK cycles are required for taking the
TAP from the capture DR to the select DR scan state. For example, if tCK
is running at 50 MHz, then the throughput for INDATA and OUT-
DATA are ~ 43 Mbits/sec and 39 Mbits/sec respectively. See Figure 8-2
on page 8-4 for other read/write data.
References
• IEEE Standard 1149.1-1990. Standard Test Access Port and
Boundary-Scan Architecture. To order a copy, contact the IEEE
society.
• Maunder, C.M. and R. Tulloss. Test Access Ports and Boundary
Scan Architectures. IEEE Computer Society Press, 1991.
• Parker, Kenneth. The Boundary Scan Handbook. Kluwer Aca-
demic Press, 1992.
• Bleeker, Harry P. van den Eijnden, and F. de Jong. Boundary-Scan
Test—A Practical Approach. Kluwer Academic Press, 1993.
• Hewlett-Packard Co. HP Boundary-Scan Tutorial and BSDL Ref-
erence Guide. (HP part# E1017-90001) 1992.
cond Status condition (see condition codes in Table 4-37 on page 4-92)
(LW) Long Word (forces long word access in normal word range)
5b VISA
Type 1a Syntax
Compute + parallel memory (data and program) transfer.
Type 1b Syntax
Parallel data memory and program memory transfers with register file,
without the Type 1 compute operation.
SISD Mode
In SISD mode, the Type 1 instruction provides parallel accesses to data
and program memory from the register file. The specified I registers
address data and program memory. The I values are post-modified and
updated by the specified M registers. Pre-modify offset addressing is not
supported. For more information on register restrictions, see Chapter 6,
Data Address Generators.
SIMD Mode
In SIMD mode, the Type 1 instruction provides the same parallel accesses
to data and program memory from the register file as are available in SISD
mode, but provides these operations simultaneously for the X and Y pro-
cessing elements.
The X element uses the specified I registers to address data and program
memory, and the Y element adds one to the specified I registers to address
data and program memory.
The I values are post-modified and updated by the specified M registers.
Pre-modify offset addressing is not supported. For more information on
register restrictions, see Chapter 6, Data Address Generators.
The X element uses the specified Dreg registers, and the Y element uses the
complementary registers (Cdreg) that correspond to the Dreg registers. For
a list of complementary registers, see Table 2-3 on page 2-6.
Broadcast Mode
If the broadcast read bits—BDCST1 (for I1) or BDCST9 (for I9)—are set, the
Y element uses the specified I register without adding one.
The following code compares the Type 1 instruction’s explicit and
implicit operations in SIMD and Broadcast modes.
Examples
R7=BSET R6 BY R0, DM(I0,M3)=R5, PM(I11,M15)=R4;
R8=DM(I4,M1), PM(I12 M12)=R0;
When the processors are in SISD mode, the first instruction in this exam-
ple performs a computation along with two memory writes. DAG1 is used
When the processors are in broadcast mode (the BDCST1 bit is set in the
MODE1 system register), the R0 (PEx) data register in this example is loaded
with the value from data memory utilizing the I1 register from DAG1,
and S0 (PEy) is loaded with the same value.
Type 2a Syntax
Compute operation, condition
IF COND compute ;
Type 2b Syntax
Compute operation, without the Type 2 condition
compute ;
Type 2c Syntax
Short (16-bit) compute operation, without the Type 2 condition
short compute ;
SISD Mode
In SISD mode, the Type 2 instruction provides a conditional compute
instruction. The instruction is executed if the specified condition tests
true.
SIMD Mode
In SIMD mode, the Type 2 instruction provides the same conditional
compute instruction as is available in SISD mode, but provides the opera-
tion simultaneously for the X and Y processing elements. The instruction
is executed in a processing element if the specified condition tests true in
that element independent of the condition result for the other element.
The following pseudo code compares the Type 2 instruction’s explicit and
implicit operations in SIMD mode.
Examples
IF MV R6=SAT MRF (UI);
When the processors are in SISD mode, the condition is evaluated in the
PEx processing element. If the condition is true, the computation is per-
formed and the result is stored in register R6.
When the processors are in SIMD mode, the condition is evaluated on
each processing element, PEx and PEy, independently. The computation
executes on both PEs, either one PE, or neither PE dependent on the out-
come of the condition. If the condition is true in PEx, the computation is
performed and the result is stored in register R6. If the condition is true in
PEy, the computation is performed and the result is stored in register S6.
Type 3a Syntax
Transfer operation between data or program memory and universal regis-
ter, condition, compute operation
, PM(Ic, Md)
, PM(Md, Ic)
Type 3b Syntax
Transfer operation between data or program memory and universal regis-
ter, optional condition, without the Type 3 optional compute operation
PM(Ic, Md)
PM(Md, Ic)
Type 3c Syntax
Transfer operation between data memory and data register, without the
Type 3 optional condition, without the Type 3 optional compute
operation
dreg = DM(Ia,Mb);
SISD Mode
In SISD mode, the Type 3a and 3b instruction provides access between
data or program memory and a universal register. The specified I register
addresses data or program memory. The I value is either pre-modified (M,
I order) or post-modified (I, M order) by the specified M register. If it is
post-modified, the I register is updated with the modified value. If a com-
pute operation is specified, it is performed in parallel with the data access.
The optional (LW) in this syntax lets programs specify long word address-
ing, overriding default addressing from the memory map. If a condition is
specified, it affects the entire instruction. Note that the Ureg may not be
from the same DAG (that is, DAG1 or DAG2) as Ia/Mb or Ic/Md. For
more information on register restrictions, see Chapter 6, Data Address
Generators.
SIMD Mode
In SIMD mode, the Type 3a and 3b instruction provides the same access
between data or program memory and a universal register as is available in
SISD mode, but provides this operation simultaneously for the X and Y
processing elements.
The X element uses the specified I register to address data or program
memory. The I value is either pre-modified (M, I order) or post-modified
(I, M order) by the specified M register. The Y element adds one/two (for
normal/short word access) to the specified I register (before pre-modify or
post-modify) to address data or program memory. If the I value post-mod-
ified, the I register is updated with the modified value from the specified
M register. The optional (LW) in this syntax lets programs specify long
word addressing, overriding default addressing from the memory map.
For the universal register, the X element uses the specified Ureg register,
and the Y element uses the corresponding complementary register (Cureg).
For a list of complementary registers, see Table 2-3 on page 2-6. Note that
the Ureg may not be from the same DAG (DAG1 or DAG2) as Ia/Mb or
Ic/Md.
Broadcast Mode
If the broadcast read bits—BDCST1 (for I1) or BDCST9 (for I9)—are set, the
Y element uses the specified I and M registers without implicit address
addition.
The following code compares the Type 3 instruction’s explicit and
implicit operations in SIMD mode.
Examples
R6=R3-R11, DM(I0,M1)=ASTATx;
IF NOT SV F8=CLIP F2 BY F14, F7=PM(I12,M12);
When the processors are in SISD mode, the computation and a data mem-
ory write in the first instruction are performed in PEx. The second
instruction stores the result of the computation in F8, and the result of the
program memory read into F7 if the condition’s outcome is true.
When the processors are in SIMD mode, the result of the computation in
PEx in the first instruction is stored in R6, and the result of the parallel
computation in PEy is stored in S6. In addition, there is a simultaneous
data memory write of the values stored in ASTATx and ASTATy. The condi-
tion is evaluated on each processing element, PEx and PEy,
independently. The computation executes on both PEs, either one PE, or
neither PE, dependent on the outcome of the condition. If the condition
is true in PEx, the computation is performed, the result is stored in regis-
ter F8 and the result of the program memory read is stored in F7. If the
condition is true in PEy, the computation is performed, the result is stored
in register SF8, and the result of the program memory read is stored in
SF7.
IF NOT SV F8=CLIP F2 BY F14, F7=PM(I9,M12);
When the processors are in broadcast mode (the BDCST9 bit is set in the
MODE1 system register) and the condition tests true, the computation is
performed and the result is stored in register F8. Also, the result of the
program memory read via the I9 register from DAG2 is stored in F7. The
SF7 register is loaded with the same value from program memory as F7.
Type 4a Syntax
Index-relative transfer between data or program memory and register file,
optional condition, optional compute operation
, PM(Ic, <data6>)
, PM(<data6>, Ic)
PM(Ic, <data6>) ;
PM(<data6>, Ic) ;
Type 4b Syntax
Index-relative transfer between data or program memory and register file,
optional condition, without the Type 4 optional compute operation
PM(Ic, <data6>)
PM(<data6>, Ic)
PM(Ic, <data6>) ;
PM(<data6>, Ic) ;
SISD Mode
In SISD mode, the Type 4 instruction provides access between data or
program memory and the register file. The specified I register addresses
data or program memory. The I value is either pre-modified (data order, I)
or post-modified (I, data order) by the specified immediate data. If it is
post-modified, the I register is updated with the modified value. If a com-
pute operation is specified, it is performed in parallel with the data access.
If a condition is specified, it affects the entire instruction. For more infor-
mation on register restrictions, see Chapter 6, Data Address Generators.
SIMD Mode
In SIMD mode, the Type 4 instruction provides the same access between
data or program memory and the register file as is available in SISD mode,
Broadcast Mode
If the broadcast read bits—BDCST1 (for I1) or BDCST9 (for I9)—are set, the
Y element uses the specified I and M registers without adding one.
The following pseudo code compares the Type 4 instruction’s explicit and
implicit operations in SIMD mode.
, cdreg = DM(Ia+k, 0) ;
PM(Ic+k, 0) ;
Examples
IF FLAG0_IN F1=F5*F12, F11=PM(I10,6);
R12=R3 AND R1, DM(6,I1)=R6;
When the processors are in SISD mode, the computation and program
memory read in the first instruction are performed in PEx if the condi-
tion’s outcome is true. The second instruction stores the result of the
logical AND in R12 and writes the value within R6 into data memory.
When the processors are in broadcast mode (the BDCST9 bit is set in the
MODE1 system register) and the condition tests true, the computation is
performed, the result is stored in register F1, and the program memory
value is read into register F11 via the I9 register from DAG2. The SF11
register is also loaded with the same value from program memory as F11.
Type 5a Syntax
Type 5b Syntax
Transfer between two universal registers or swap between a data register in
each processing element, optional condition, without the Type 5 optional
compute operation
SISD Mode
In SISD mode, the Type 5 instruction provides transfer (=) from one uni-
versal register to another or provides a swap (<->) between a data register
in the X processing element and a data register in the Y processing ele-
ment. If a compute operation is specified, it is performed in parallel with
the data access. If a condition is specified, it affects the entire instruction.
SIMD Mode
In SIMD mode, the Type 5 instruction provides the same transfer (=)
from one register to another as is available in SISD mode, but provides
Examples
IF TF MRF=R2*R6(SSFR), M4=R0;
LCNTR=L7;
R0 <-> S1;
When the processors are in SISD mode, the condition in the first instruc-
tion is evaluated in the PEx processing element. If the condition is true,
MRF is loaded with the result of the computation and a register transfer
occurs between R0 and M4. The second instruction initializes the loop
Syntax
, PM(Ic, Md)
PM(Ic, Md) ;
SISD Mode
In SISD mode, the Type 6 instruction provides an immediate shift, which
is a shifter operation that takes immediate data as its Y-operand. The
immediate data is one 8-bit value or two 6-bit values, depending on the
operation. The X-operand and the result are register file locations.
For more information on shifter operations, see “Shifter/Shift Immediate
Computations” on page 11-58. For more information on register restric-
tions, see Chapter 6, Data Address Generators.
If an access to data or program memory from the register file is specified,
it is performed in parallel with the shifter operation. The I register
addresses data or program memory. The I value is post-modified by the
specified M register and updated with the modified value. If a condition
is specified, it affects the entire instruction.
SIMD Mode
In SIMD mode, the Type 6 instruction provides the same immediate shift
operation as is available in SISD mode, but provides this operation simul-
taneously for the X and Y processing elements.
Broadcast Mode
If the broadcast read bits—BDCST1 (for I1) or BDCST9 (for I9)—are set, the
Y element uses the specified I and M registers without adding one.
The following code compares the Type 6 instruction’s explicit and
implicit operations in SIMD mode.
, cdreg = DM(Ia+k, 0) ;
PM(Ic+k, 0) ;
If broadcast mode memory read k=0.
If SIMD mode NW access k=1, SW access k=2.
Examples
IF GT R2 = LSHIFT R6 BY 0x4, DM(I4,M4)=R0;
IF NOT SZ R3 = FEXT R1 BY 8:4;
When the processors are in SISD mode, the computation and data mem-
ory write in the first instruction are performed in PEx if the condition’s
outcome is true. In the second instruction, register R3 is loaded with the
result of the computation if the outcome of the condition is true.
When the processors are in SIMD mode, the condition is evaluated on
each processing element, PEx and PEy, independently. The computation
and data memory write executes on both PEs, either one PE, or neither PE
dependent on the outcome of the condition. If the condition is true in
PEx, the computation is performed, the result is stored in register R2, and
the data memory value is written from register R0. If the condition is true
in PEy, the computation is performed, the result is stored in register S2,
and the value within S0 is written into data memory. The second instruc-
tion’s condition is also evaluated on each processing element, PEx and
PEy, independently. If the outcome of the condition is true, register R3 is
loaded with the result of the computation on PEx, and register S3 is
loaded with the result of the computation on PEy.
R2 = LSHIFT R6 BY 0x4, F3=DM(I1,M3);
When the processors are in broadcast mode (the BDCST1 bit is set in the
MODE1 system register), the computation is performed, the result is stored
in R2, and the data memory value is read into register F3 via the I1 register
from DAG1. The SF3 register is also loaded with the same value from data
memory as F3.
Type 7a Syntax
Type 7b Syntax
Index register modify, optional condition, without the Type 7 optional
compute operation
SISD Mode
In SISD mode, the Type 7 instruction provides an update of the specified
Ia/Ic register by the specified Mb/Md register. If the destination register is
not specified, Ia/Ic is used as destination register. Unless destination I reg-
ister is specified or implied to be the same as the source I register, the
source I register is left unchanged. M register is always left unchanged. If a
compute operation is specified, it is performed in parallel with the data
access. If a condition is specified, it affects the entire instruction. For
more information on register restrictions, see Chapter 6, Data Address
Generators.
SIMD Mode
In SIMD mode, the Type 7 instruction provides the same update of the
specified I register by the specified M register as is available in SISD
mode, but provides additional features for the optional compute operation.
If a compute operation is specified, it is performed simultaneously on the
X and Y processing elements in parallel with the transfer. If a condition is
specified, it affects the entire instruction. The instruction is executed in a
processing element if the specified condition tests true in that element
independent of the condition result for the other element.
The index register modify operation, in SIMD mode, occurs based on the
logical ORing of the outcome of the conditions tested on both PEs. In the
second instruction, the index register modify also occurs based on the log-
ical ORing of the outcomes of the conditions tested on both PEs. Because
both threads of a SIMD sequence may be dependent on a single DAG
index value, either thread needs to be able to cause a modify of the index.
Examples
IF NOT FLAG2_IN R4=R6*R12(SUF), MODIFY(I10,M8);
IF FLAG2_IN R4=R6*R12(SUF), I9 = MODIFY(I10,M8);
IF NOT LCE MODIFY(I3,M1);
IF NOT LCE I0 = MODIFY(I3,M1);
MODIFY(I10,M9);
I15 = MODIFY(I11,M12);
I0 = MODIFY(I2,M2);
I3 = MODIFY(I3,M5); /* Semantically same as MODIFY(I3,M5) */;
The COND field selects whether the operation specified in the COMPUTE field
and branch are executed. If the COND is true, the compute and branch are
executed. If no condition is specified, COND is true condition, and the com-
pute and branch are executed.
The ELSE field selects whether the condition is not true, in this case the
computation is performed. The ELSE condition always requires an
condition.
The COMPUTE field specifies a compute operation using the ALU, multi-
plier, or shifter. Because there are a large number of options available for
computations, these operations are described separately in Chapter 11,
Computation Types.
• “Type 8a ISA/VISA (cond + branch)” on page 9-32
• “Type 9a ISA/VISA (cond + Branch + comp/else comp)” on
page 9-35
• “Type 10a ISA (cond + branch + else comp + mem data move)” on
page 9-40
• “Type 11a ISA/VISA (cond + branch return + comp/else comp)
Type 11c VISA (cond + branch return)” on page 9-44
• “Type 12a ISA/VISA (do until loop counter expired)” on
page 9-48
• “Type 13a ISA/VISA (do until termination)” on page 9-49
11c VISA
Syntax
(CI)
(DB, LA)
(DB, CI)
(PC, <reladdr24>)
SISD Mode
In SISD mode, the Type 8 instruction provides a jump or call to the spec-
ified address or PC-relative address. The PC-relative address is a 24-bit,
twos-complement value. The Type 8 instruction supports the following
modifiers.
• (DB)—delayed branch—starts a delayed branch
• (LA)—loop abort—causes the loop stacks and PC stack to be
popped when the jump is executed. Use the (LA) modifier if the
jump transfers program execution outside of a loop. Do not use
(LA) if there is no loop or if the jump address is within the loop.
status of the current interrupt without leaving the interrupt service rou-
tine, This feature reduces the interrupt routine to a normal subroutine
and allows the interrupt to occur again, as a result of a different event or
task in the SHARC processor system. The jump (CI) instruction should
be located within the interrupt service routine. For more information on
interrupts, see Chapter 4, Program Sequencer.
To reduce the interrupt service routine to a normal subroutine, the jump
(CI) instruction clears the appropriate bit in the interrupt latch register
(IRPTL) and interrupt mask pointer (IMASKP). The processor then allows
the interrupt to occur again.
When returning from a reduced subroutine, programs must use the (LR)
modifier of the RTS if the interrupt occurs during the last two instruc-
tions of a loop. For related information, see “Type 11a ISA/VISA (cond +
branch return + comp/else comp) Type 11c VISA (cond + branch return)”
on page 9-44.
SIMD Mode
In SIMD mode, the Type 8 instruction provides the same jump or call
operation as in SISD mode, but provides additional features for handling
the optional condition.
If a condition is specified, the jump or call is executed if the specified
condition tests true in both the X and Y processing elements.
SIMD Explicit Operation (Program Sequencer Operation Stated in the Instruction Syntax)
IF (PEx AND PEy <addr24> (DB) ;
COND) JUMP
(PC, <reladdr24>) (LA)
(CI)
(DB, LA)
(DB, CI)
Examples
IF AV JUMP(PC,0x00A4) (LA);
CALL init (DB); /* init is a program label */
JUMP (PC,2) (DB,CI); /* clear current int. for reuse */
When the processors are in SISD mode, the first instruction performs a
jump to the PC-relative address depending on the outcome of the condi-
tion tested in PEx. In the second instruction, a jump to the program label
init occurs. A PC-relative jump takes place in the third instruction.
When the processors are in SIMD mode, the first instruction performs a
jump to the PC-relative address depending on the logical ANDing of the
outcomes of the conditions tested in both PEs. In SIMD mode, the sec-
ond and third instructions operate the same as in SISD mode. In the
second instruction, a jump to the program label init occurs. A PC-rela-
tive jump takes place in the third instruction.
Type 9a Syntax
(CI)
(DB, LA)
(DB, CI)
Type 9b Syntax
Indirect (or PC-relative) jump/call, optional condition, without the
Type 9 optional compute operation
(CI)
(DB, LA)
(DB, CI)
(PC, <reladdr6>)
SISD Mode
In SISD mode, the Type 9 instruction provides a jump or call to the spec-
ified PC-relative address or pre-modified I register value. The PC-relative
address is a 6-bit, two’s-complement value. If an I register is specified, it is
modified by the specified M register to generate the branch address. The I
register is not affected by the modify operation. The Type 9 instruction
supports the following modifiers:
• (DB)—delayed branch—starts a delayed branch
• (LA)—loop abort—causes the loop stacks and PC stack to be
popped when the jump is executed. Use the (LA) modifier if the
jump transfers program execution outside of a loop. Do not use
(LA) if there is no loop or if the jump address is within the loop.
SIMD Mode
In SIMD mode, the Type 9 instruction provides the same jump or call
operation as is available in SISD mode, but provides additional features
for the optional condition.
If a condition is specified, the jump or call is executed if the specified
condition tests true in both the X and Y processing elements.
Note that for the compute, the X element uses the specified registers and
the Y element uses the complementary registers. For a list of complemen-
tary registers, see Table 2-3 on page 2-6.
The following code compares the Type 9 instruction’s explicit and
implicit operations in SIMD mode.
Examples
JUMP(M8,I12), R6=R6-1;
IF EQ CALL(PC,17)(DB), ELSE R6=R6-1;
When the processors are in SISD mode, the indirect jump and compute in
the first instruction are performed in parallel. In the second instruction, a
call occurs if the condition is true, otherwise the computation is
performed.
When the processors are in SIMD mode, the indirect jump in the first
instruction occurs in parallel with both processing elements executing
computations. In PEx, R6 stores the result, and S6 stores the result in PEy.
In the second instruction, the condition is evaluated independently on
each processing element, PEx and PEy. The call executes based on the log-
ical ANDing of the PEx and PEy conditional tests. So, the call executes if
the condition tests true in both PEx and PEy. Because the ELSE inverts the
conditional test, the computation is performed independently on either
PEx or PEy based on the negative evaluation of the condition code seen by
that processing element. If the computation is executed, R6 stores the
result of the computation in PEx, and S6 stores the result of the computa-
tion in PEy.
Type 10a ISA (cond + branch + else comp + mem data move)
Indirect (or PC-relative) jump or optional compute operation with trans-
fer between data memory and register file. This instruction is not
supported for VISA instructions.
Syntax
SISD Mode
In SISD mode, the Type 10a instruction provides a conditional jump to
either specified PC-relative address or pre-modified I register value. In
parallel with the jump, this instruction also provides a transfer between
data memory and a data register with optional parallel compute operation.
For this instruction, the If condition and ELSE keywords are not optional
and must be used. If the specified condition is true, the jump is executed.
If the specified condition is false, the data memory transfer and optional
compute operation are performed in parallel. Only the compute operation
is optional in this instruction.
The PC-relative address for the jump is a 6-bit, twos-complement value. If
an I register is specified (Ic), it is modified by the specified M register (Md)
to generate the branch address. The I register is not affected by the modify
operation. For this jump, programs may not use the delay branch (DB),
loop abort (LA), or clear interrupt (CI) modifiers.
For the data memory access, the I register (Ia) provides the address. The I
register value is post-modified by the specified M register (Mb) and is
updated with the modified value. Pre-modify addressing is not available
for this data memory access.
SIMD Mode
In SIMD mode, the Type 10a instruction provides the same conditional
jump as is available in SISD mode, but the jump is executed if the speci-
fied condition tests true in both the X or Y processing elements.
In parallel with the jump, this instruction also provides a transfer between
data memory and a data register in the X and Y processing elements. An
optional parallel compute operation for the X and Y processing elements is
also available.
For this instruction, the If condition and ELSE keywords are not optional
and must be used. If the specified condition is true in both processing ele-
ments, the jump is executed. The the data memory transfer and optional
compute operation specified with the ELSE are performed in an element
when the condition tests false in that element.
Note that for the compute, the X element uses the specified Dreg register
and the Y element uses the complementary Cdreg register. For a list of
complementary registers, see Table 2-3 on page 2-6. Only the compute
operation is optional in this instruction.
The addressing for the jump is the same in SISD and SIMD modes, but
addressing for the data memory access differs slightly. For the data mem-
ory access in SIMD mode, X processing element uses the specified I
register (Ia) to address memory. The I register value is post-modified by
the specified M register (Mb) and is updated with the modified value. The
Y element adds one to the specified I register to address memory.
Pre-modify addressing is not available for this data memory access.
The following pseudo code compares the Type 10a instruction’s explicit
and implicit operations in SIMD mode.
Broadcast Mode
If the broadcast read bits—BDCST1 (for I1) or BDCST9 (for I9)—are set, the
Y element uses the specified I register without adding one.
Examples
IF TF JUMP(M8, I8), ELSE R6=DM(I6, M1);
When the processors are in SISD mode, the indirect jump in the first
instruction is performed if the condition tests true. Otherwise, R6 stores
the value of a data memory read. The second instruction is much like the
first, however, it also includes an optional compute, which is performed in
parallel with the data memory read.
When the processors are in SIMD mode, the indirect jump in the first
instruction executes depending on the outcome of the conditional in both
processing element. The condition is evaluated independently on each
processing element, PEx and PEy. The indirect jump executes based on
the logical ANDing of the PEx and PEy conditional tests. So, the indirect
jump executes if the condition tests true in both PEx and PEy. The data
memory read is performed independently on either PEx or PEy based on
the negative evaluation of the condition code seen by that PE.
The second instruction is much like the first instruction. The second
instruction, however, includes an optional compute also performed in par-
allel with the data memory read independently on either PEx or PEy and
based on the negative evaluation of the condition code seen by that pro-
cessing element.
IF TF JUMP(M8,I8), ELSE R6=DM(I1,M1);
When the processors are in broadcast mode (the BDCST1 bit is set in the
MODE1 system register), the instruction performs an indirect jump if the
condition tests true. Otherwise, R6 stores the value of a data memory read
via the I1 register from DAG1. The S6 register is also loaded with the
same value from data memory as R6.
(DB, LR)
, ELSE compute
(LR)
(DB, LR)
SISD Mode
In SISD mode, the Type 11 instruction provides a return from a subrou-
tine (RTS) or return from an interrupt service routine (RTI). A return
causes the processor to branch to the address stored at the top of the PC
stack. The difference between RTS and RTI is that the RTS instruction
only pops the return address off the PC stack, while the RTI does that
plus:
• Pops status stack if the ASTAT and MODE1 status registers have been
pushed—if the interrupt was IRQ2-0 or the timer interrupt
• Clears the appropriate bit in the interrupt latch register (IRPTL)
and the interrupt mask pointer (IMASKP)
The return executes when the optional If condition is true (or if no con-
dition is specified). If a compute operation is specified without the ELSE, it
is performed in parallel with the return. If a compute operation is specified
with the ELSE, it is performed only when the If condition is false. Note
that a condition must be specified if an ELSE compute clause is specified.
RTS supports two modifiers (DB) and (LR); RTI supports one modifier,
(DB). If the delayed branch (DB) modifier is specified, the return is
delayed; otherwise, it is non-delayed.
If the return is not a delayed branch and occurs as one of the last three
instructions of a loop, programs must use the loop reentry (LR) modifier
with the subroutine’s RTS instruction. The (LR) modifier assures proper
reentry into the loop. For example, the processor checks the termination
condition in counter-based loops by decrementing the current loop
counter (CURLCNTR) during execution of the instruction two locations
before the end of the loop. In this case, the RTS (LR) instruction prevents
the loop counter from being decremented again, avoiding the error of dec-
rementing twice for the same loop iteration.
Programs must also use the (LR) modifier for RTS when returning from a
subroutine that has been reduced from an interrupt service routine with a
jump (CI) instruction. This case occurs when the interrupt occurs during
the last two instructions of a loop. For a description of the jump (CI)
instruction, see “Type 8a ISA/VISA (cond + branch)” on page 9-32 or
“Type 9a ISA/VISA (cond + Branch + comp/else comp)” on page 9-35.
SIMD Mode
In SIMD mode, the Type 11 instruction provides the same return opera-
tions as are available in SISD mode, except that the return is executed if
the specified condition tests true in both the X and Y processing
elements.
In parallel with the return, this instruction also provides a parallel compute
or ELSE compute operation for the X and Y processing elements. If a con-
dition is specified, the optional compute is executed in a processing
element if the specified condition tests true in that processing element. If
a compute operation is specified with the ELSE, it is performed in an ele-
ment when the condition tests false in that element.
Note that for the compute, the X element uses the specified registers, and
the Y element uses the complementary registers. For a list of complemen-
tary registers, see Table 2-3 on page 2-6.
The following pseudo code compares the Type 11 instruction’s explicit
and implicit operations in SIMD mode.
IF (PEx AND PEy COND) RTI (DB) , (if PEx COND) compute ;
, ELSE (if NOT PEx) compute
IF (PEx AND PEy COND) RTI (DB) , (if PEy COND) compute ;
, ELSE (if NOT PEy) compute
Examples
RTI, R6=R5 XOR R1;
IF le RTS(DB);
IF sz RTS, ELSE R0=LSHIFT R1 BY R15;
When the processors are in SISD mode, the first instruction performs a
return from interrupt and a computation in parallel. The second instruc-
tion performs a return from subroutine only if the condition is true. In the
third instruction, a return from subroutine is executed if the condition is
true. Otherwise, the computation executes.
When the processors are in SIMD mode, the first instruction performs a
return from interrupt and both processing elements execute the computa-
tion in parallel. The result from PEx is placed in R6, and the result from
PEy is placed in S6. The second instruction performs a return from sub-
routine (RTS) if the condition tests true in both PEx or PEy. In the third
instruction, the condition is evaluated independently on each processing
element, PEx and PEy. The RTS executes based on the logical ANDing of
the PEx and PEy conditional tests. So, the RTS executes if the condition
tests true in both PEx and PEy. Because the ELSE inverts the conditional
test, the computation is performed independently on either PEx or PEy
based on the negative evaluation of the condition code seen by that pro-
cessing element. The R0 register stores the result in PEx, and S0 stores the
result in PEy if the computations are executed.
Syntax
Examples
LCNTR=100, DO fmax UNTIL LCE; /* fmax is a program label */
LCNTR=R12, DO (PC,16) UNTIL LCE;
The processor (in SISD or SIMD) executes the action at the indicated
address for the duration of the loop.
Syntax
(PC, <reladdr24>)
SISD Mode
In SISD mode, the Type 13 instruction sets up a conditional program
loop. The loop start address is pushed on the PC stack. The loop end
address and the termination condition are pushed on the loop stack. The
end address can be either a label for an absolute 24-bit program memory
address or a PC-relative, 24-bit twos-complement address. The loop exe-
cutes until the termination condition tests true.
SIMD Mode
In SIMD mode, the Type 13 instruction provides the same conditional
program loop as is available in SISD mode, except that in SIMD mode the
loop executes until the termination condition tests true in both the X and
Y processing elements.
The following code compares the Type 13 instruction’s explicit and
implicit operations in SIMD mode.
SIMD Explicit Operation (Program Sequencer Operation Stated in the Instruction Syntax
DO <addr24> UNTIL (PEx AND PEy) termination ;
(PC, <reladdr24>)
Examples
DO end UNTIL FLAG1_IN; /* end is a program label */
DO (PC,7) UNTIL AC;
When the processors are in SISD mode, the end program label in the first
instruction specifies the start address for the loop, and the loop is executed
until the instruction’s condition tests true. In the second instruction, the
start address is given in the form of a PC-relative address. The loop exe-
cutes until the instruction’s condition tests true.
When the processors are in SIMD mode, the end program label in the first
instruction specifies the start address for the loop, and the loop is executed
until the instruction’s condition tests true in both PEx or PEy. In the sec-
ond instruction, the start address is given in the form of a PC-relative
address. The loop executes until the instruction’s condition tests true in
both PEx or PEy.
PM(<addr32>)
PM(<addr32>) (LW);
SISD Mode
In SISD mode, the Type 14 instruction sets up an access between data or
program memory and a universal register, with direct addressing. The
entire data or program memory address is specified in the instruction.
Addresses are 32 bits wide (0 to 2 32–1). The optional (LW) in this syntax
lets programs specify long word addressing, overriding default addressing
from the memory map.
SIMD Mode
In SIMD mode, the Type 14 instruction provides the same access between
data or program memory and a universal register, with direct addressing,
as is available in SISD mode, except that addressing differs slightly, and
the transfer occurs in parallel for the X and Y processing elements.
For the memory access in SIMD mode, the X processing element uses the
specified 32-bit address to address memory. The Y element adds k to the
specified 32-bit address to address memory.
For the universal register, the X element uses the specified Ureg, and the Y
element uses the complementary register (Cureg) that corresponds to the
Ureg register specified in the instruction. For a list of complementary reg-
isters, see Table 2-3 on page 2-6. Note that only the Cureg subset registers
which have complementary registers are effected by SIMD mode.
The following code compares the Type 14 instruction’s explicit and
implicit operations in SIMD mode.
Examples
DM(temp)=MODE1; /* temp is a program label */
LCNTR=PM(0x90500);
When the processors are in SISD mode, the first instruction performs a
direct memory write of the value in the MODE1 register into data memory
with the data memory destination address specified by the program label,
temp. The second instruction initializes the LCNTR register with the value
found in the specified address in program memory.
PM(<data32>, Ic)
PM(<data32>, Ic)
PM(<data7>, Ic)
PM(<data7>, Ic)
SISD Mode
In SISD mode, the Type 15 instruction sets up an access between data or
program memory and a universal register, with indirect addressing using I
registers. The I register is pre-modified with an immediate value specified
in the instruction. The I register is not updated. Address modifiers are 32
bits wide (0 to 232–1). The Ureg may not be from the same DAG (that is,
SIMD Mode
In SIMD mode, the Type 15 instruction provides the same access between
data or program memory and a universal register, with indirect addressing
using I registers, as is available in SISD mode, except that addressing dif-
fers slightly, and the transfer occurs in parallel for the X and Y processing
elements.
The X processing element uses the specified I register—pre-modified with
an immediate value—to address memory. The Y processing element adds
k to the pre-modified I value to address memory. The I register is not
updated.
The Ureg specified in the instruction is used for the X processing element
transfer and may not be from the same DAG (that is, DAG1 or DAG2) as
Ia/Mb or Ic/Md. The Y element uses the complementary register (Cureg)
that correspond to the Ureg register specified in the instruction. For a list
of complementary registers, see Table 2-3 on page 2-6. Note that only the
Cureg subset registers which have complimentary registers are effected by
SIMD mode. For more information on register restrictions, see
Chapter 6, Data Address Generators.
The following code compares the Type 15 instruction’s explicit and
implicit operations in SIMD mode.
Examples
DM(24,I5)=TCOUNT;
USTAT1=PM(offs,I13); /* “offs” is a user-defined constant */
When the processors are in SISD mode, the first instruction performs a
data memory write, using indirect addressing and the Ureg timer register,
TCOUNT. The DAG1 register I5 is pre-modified with the immediate value
of 24. The I5 register is not updated after the memory access occurs. The
second instruction performs a program memory read, using indirect
addressing and the system register, USTAT1. The DAG2 register I13 is
pre-modified with the immediate value of the defined constant, offs. The
I13 register is not updated after the memory access occurs.
PM(Ic, Md)
PM(Ic, Md)
SISD Mode
In SISD mode, the Type 16 instruction sets up a write of 32-bit immedi-
ate data to data or program memory, with indirect addressing. The data is
placed in the most significant 32 bits of the 40-bit memory word. The
least significant 8 bits are loaded with 0s. The I register is post-modified
and updated by the specified M register.
SIMD Mode
In SIMD mode, the Type 16 instruction provides the same write of 32-bit
immediate data to data or program memory, with indirect addressing, as is
available in SISD mode, except that addressing differs slightly, and the
transfer occurs in parallel for the X and Y processing elements.
The X processing element uses the specified I register to address memory.
The Y processing element adds k to the I register to address memory. The
I register is post-modified and updated by the specified M register.
Examples
DM(I4,M0)=19304;
PM(I14,M11)=count; /* count is user-defined constant */
When the processors are in SISD mode, the two immediate memory
writes are performed on PEx. The first instruction writes to data memory
and the second instruction writes to program memory. DAG1 and DAG2
are used to indirectly address the locations in memory to which values are
written. The I4 and I14 registers are post-modified and updated by M0 and
M11 respectively.
When the processors are in SIMD mode, the two immediate memory
writes are performed in parallel on PEx and PEy. The first instruction
writes to data memory and the second instruction writes to program mem-
ory. DAG1 and DAG2 are used to indirectly address the locations in
memory to which values are written. The I4 and I14 registers are
post-modified and updated by M0 and M11 respectively.
ureg = <data32> ;
ureg = <data16> ;
SISD Mode
In SISD mode, the Type 17 instruction writes 16-bit/32-bit immediate
data to a universal register. If the register is 40 bits wide, the data is placed
in the most significant 32 bits, and the least significant 8 bits are loaded
with 0s.
SIMD Mode
In SIMD mode, the Type 17 instruction provides the same write of 32-bit
immediate data to universal register as is available in SISD mode, but pro-
vides parallel writes for the X and Y processing elements.
The X element uses the specified Ureg, and the Y element uses the comple-
mentary Cureg. Note that only the Cureg subset registers which have
complimentary registers are effected by SIMD mode. For a list of comple-
mentary registers, see Table 2-3 on page 2-6.
Examples
ASTATx=0x0;
M15=mod1; /* mod1 is user-defined constant */
When the processors are in SISD mode, the two instructions load imme-
diate values into the specified registers.
Because of the register selections in this example, the second instruction in
this example operates the same in SIMD and SISD mode. The ASTATx
(system) register is included in the Cureg subset. In the first instruction,
the immediate data write to the system register ASTATx and its compli-
mentary register ASTATy are performed in parallel on PEx and PEy
respectively. In the second instruction, the M15 register is not included in
the Cureg subset. So, the second instruction operates the same in SIMD
and SISD mode.
22c VISA
23–24 Reserved
Syntax
CLR
TGL
TST
XOR
SISD Mode
In SISD mode, the Type 18 instruction provides a bit manipulation oper-
ation on a system register. This instruction can set, clear, toggle or test
specified bits, or compare (XOR) the system register with a specified data
value. In the first four operations, the immediate data value is a mask.
The set operation sets all the bits in the specified system register that are
also set in the specified data value. The clear operation clears all the bits
that are set in the data value. The toggle operation toggles all the bits that
are set in the data value. The test operation sets the bit test flag (BTF in
ASTATx/y) if all the bits that are set in the data value are also set in the sys-
tem register. The XOR operation sets the bit test flag ( BTF in ASTATx/y) if
the system register value is the same as the data value.
For more information on shifter operations, see Chapter 11, Computa-
tion Types. For more information on system registers, see Appendix A,
Registers.
SIMD Mode
In SIMD mode, the Type 18 instruction provides the same bit manipula-
tion operations as are available in SISD mode, but provides them in
parallel for the X and Y processing elements.
The X element operation uses the specified Sreg, and the Y element opera-
tions uses the complementary Csreg. For a list of complementary registers,
see Table 2-3 on page 2-6.
The following code compares the Type 18 instruction’s explicit and
implicit operations in SIMD mode.
Examples
BIT SET MODE2 0x00000070;
BIT TST ASTATx 0x00002000;
When the processors are in SISD mode, the first instruction sets all of the
bits in the MODE2 register that are also set in the data value, bits 4, 5, and 6
in this case. The second instruction sets the bit test flag ( BTF in ASTATx) if
all the bits set in the data value, just bit 13 in this case, are also set in the
system register.
Syntax
Examples
MODIFY (I4, 304);
/* operation is the same as I4=MODIFY(I4,304) */
BITREV (I7, space);
/* “space” is a user-defined constant,
operation is the same as I7=BITREV(I7,space) */
I3 = MODIFY (I2,0x123);
I9 = MODIFY (I9,0x1);
I2 = BITREV (I1,122);
I15 =BITREV(I12,0x10);
Syntax
Examples
PUSH LOOP, PUSH STS;
POP PCSTK, FLUSH CACHE;
In SISD and SIMD, the first instruction pushes the loop stack and status
stack. The second instruction pops the PC stack and flushes the cache.
NOP ;
NOP
IDLE ;
EMUIDLE ;
(PC, <reladdr24>)
RFRAME ;
RFRAME ;
This chapter lists the various instruction type opcodes and their ISA or
VISA operation. The instruction types linked into normal word space are
valid ISA opcodes and if linked into short word space they become valid
VISA opcodes (valid for the ADSP-214xx processors). Note that all VISA
instructions are first MSB aligned, then decoded, then executed (therefore
starting with bit 47).
DATAEX 6a For two 6-bit immediate Y input data or the 12-bit immediate for bit
FIFO, the DATAEX field adds 4 MSBs to the DATA field, creating a
12-bit immediate value. The six LSBs are the shift value (bit6) and the
six MSBs are the length value (len6)
TERM 13a Termination condition codes 0–31 (see Table 10-4 on page 10-33)
The letter after the instruction in the next sections denotes the instruction
size as follows: a = 48-bit, b = 32-bit, c = 16-bit.
For ISA/VISA instructions bits 47–40 are used to decode the instruction
set types and for VISA instructions bits 36–34 are optionally decoded.
Type 1a
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
D P
001 M DMI DMM M DM DREG PMI PMM PM DREG
D D
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
Type 1b
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33
D P
001 M DMI DMM M DM DREG
D D
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
Type 2b
47 46 45 44 43 42 41 40 39
000 00001 1
38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
COMPUTE
Type 2c
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
Type 3a
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
Type 3b
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33
010 U I M COND
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
G D L UREG 0111111
Type 3c
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
Type 4b
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33
011 0 I G D U COND
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
Type 5a
Ureg = Ureg transfer
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
SRC
011 1 0 SRC UREG HIGH COND UREG DEST UREG
LOW
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
SRC
UREG DEST UREG 0111111
LOW
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
DREG 0111111
Type 6a
with mem data move
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SHIFTIM
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SHIFTIM
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
Type 7b
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
Is M IdIs 0111111
Type 8a
direct branch
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ADDR
PC-relative branch
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RELADDR
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
Type 9b
with indirect branch
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
RELADDR J CI 0111111
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
Type 11a
branch return from subroutine
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
COMPUTE
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
Type 12a
with immediate loop counter load
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RELADDR
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RELADDR
Type 13a
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RELADDR
Type 14a
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ADDR
(lower 24 bits)
Type 15a
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
DATA
(lower 24 bits)
Type 15b
47 46 45 44 43 42 41 40 39 38 37 36 35 34
1001 I D L G 010
33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
UREG DATA
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
100 1 I M G DATA
(upper 8 bits)
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
DATA
(lower 24 bits)
Type 16b
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
1001 I M G 001
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
DATA
Type 17a
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
DATA
(lower 24 bits)
Type 17b
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
DATA
Type 18a
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
DATA
(lower 24 bits)
Type 19a
with modify
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
DATA
(lower 24 bits)
with bit-reverse
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
DATA
(lower 24 bits)
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
L L S S P P F
000 10111 P P P P P P C
U O U O U O
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Type 21a
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
000 00000 0
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Type 21c
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
000 00000 0 0 1
Type 22a
47 46 45 44 43 42 41 40 39 38 3 36 35 34 33 32 31 30 29 28 27 26 25 24
7
23 22 21 20 19 18 17 16 15 14 1 12 11 10 9 8 7 6 5 4 3 2 1 0
3
Type 22c
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ADDR
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RELADDR
RFRAME
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24
23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Type 25c
47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
Register Opcodes
The SHARC core classifies the following register types.
• universal register (UREG)
• data register (DREG) subgroup of UREG
• system register (SREG) subgroup of UREG
• non universal register
When operating in SIMD mode, most of the register types use comple-
mentary registers (CDREG, CSREG, UUREG). One exception is for the
combined PX register (PX1 and PX2) which are classified as complementary
universal registers (CUREG). This classification is required to understand
the instruction coding for universal registers in the tables in the following
sections.
0010 R2 I2 M2 L2 B2 S2 MODE1
0011 R3 I3 M3 L3 B3 S3 PC MMASK
0010 S2 I2 M2 L2 B2 R2 MODE1
0011 S3 I3 M3 L3 B3 R3 PC MMASK
EQ 00000 NE 10000
LT 00001 GE 10001
LE 00010 GT 10010
1 For ADSP-21368/ADSP-2146x valid bus master condition, for ADSP-214xx valid bit shifter
FIFO.
2 COND selects whether the operation specified in the COMPUTE field is executed. If the
COND is true, the compute is executed. If no condition is specified, COND is TRUE condition,
and the compute is executed.
This chapter describes the fields from the instruction set types (COM-
PUTE, SHORT COMPUTE and SHIFT IMMEDIATE). The 23-bit
compute field is a mini instruction within the ADSP-21xxx instruction.
You can specify a value in this field for a variety of compute operations,
which include the following.
• Single-function operations involve a single computation unit.
• Shift immediate functions (type 6a only)
• Short compute functions (type 2c only)
• Multifunction operations specify parallel operation of the multi-
plier and the ALU or two operations in the ALU.
• The MR register transfer is a special type of compute operation used
to access the fixed-point accumulator in the multiplier.
For each instruction, the assembly language syntax, including options, and
its related functionality is described. All related status flags are listed.
Rn = Rx + Ry
Function
Adds the fixed-point fields in registers Rx and Ry. The result is placed in
the fixed-point field in register Rn. The floating-point extension field in
Rn is set to all 0s. In saturation mode (the ALU saturation mode bit in
MODE1 set) positive overflows return the maximum positive number
(0x7FFF FFFF), and negative overflows return the minimum negative
number (0x8000 0000).
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Set if the XOR of the carries of the two most significant adder stages is 1,
otherwise cleared
AC Set if the carry from the most significant adder stage is 1, otherwise cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS Sticky indicator for AV bit set
AIS No effect
Rn = Rx – Ry
Function
Subtracts the fixed-point field in register Ry from the fixed-point field in
register Rx. The result is placed in the fixed-point field in register Rn. The
floating-point extension field in Rn is set to all 0s. In saturation mode (the
ALU saturation mode bit in MODE1 set) positive overflows return the maxi-
mum positive number (0x7FFF FFFF), and negative overflows return the
minimum negative number (0x8000 0000).
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Set if the XOR of the carries of the two most significant adder stages is 1,
otherwise cleared
AC Set if the carry from the most significant adder stage is 1, otherwise cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS Sticky indicator for AV bit set
AIS No effect
Rn = Rx + Ry + CI
Function
Adds with carry (AC from ASTAT) the fixed-point fields in registers Rx and
Ry. The result is placed in the fixed-point field in register Rn. The float-
ing-point extension field in Rn is set to all 0s. In saturation mode (the
ALU saturation mode bit in MODE1 set) positive overflows return the maxi-
mum positive number (0x7FFF FFFF), and negative overflows return the
minimum negative number (0x8000 0000).
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Set if the XOR of the carries of the two most significant adder stages is 1,
otherwise cleared
AC Set if the carry from the most significant adder stage is 1, otherwise cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS Sticky indicator for AV bit set
AIS No effect
Rn = Rx – Ry + CI – 1
Function
Subtracts with borrow (AC – 1 from ASTAT) the fixed-point field in register
Ry from the fixed-point field in register Rx. The result is placed in the
fixed-point field in register Rn. The floating-point extension field in Rn is
set to all 0s. In saturation mode (the ALU saturation mode bit in MODE1
set) positive overflows return the maximum positive number
(0x7FFF FFFF), and negative overflows return the minimum negative
number (0x8000 0000).
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Set if the XOR of the carries of the two most significant adder stages is 1,
otherwise cleared
AC Set if the carry from the most significant adder stage is 1, otherwise cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS Sticky indicator for AV bit set
AIS No effect
Rn = (Rx + Ry)/2
Function
Adds the fixed-point fields in registers Rx and Ry and divides the result by
2. The result is placed in the fixed-point field in register Rn. The float-
ing-point extension field in Rn is set to all 0s. Rounding is to nearest
(IEEE) or by truncation, as defined by the rounding mode bit in the MODE1
register.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Cleared
AC Set if the carry from the most significant adder stage is 1, otherwise cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS No effect
COMP(Rx, Ry)
Function
Compares the signed fixed-point field in register Rx with the fixed-point
field in register Ry. Sets the AZ flag if the two operands are equal, and the
AN flag if the operand in register Rx is smaller than the operand in register
Ry.
The ASTAT register stores the results of the previous eight ALU compare
operations in CACC bits 31–24. These bits are shifted right (bit 24 is
overwritten) whenever a fixed-point or floating-point compare instruction
is executed.
ASTATx/y Flags
AZ Set if the signed operands in registers Rx and Ry are equal, otherwise cleared
AN Set if the signed operand in the Rx register is smaller than the operand in the
Ry register, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Cleared
CACC The MSB bit of CACC is set if the X operand is greater than the Y operand
(its value is the AND of AZ and AN); otherwise cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS No effect
COMPU(Rx, Ry)
Function
Compares the unsigned fixed-point field in register Rx with the
fixed-point field in register Ry, Sets the AZ flag if the two operands are
equal, and the AN flag if the operand in register Rx is smaller than the
operand in register Ry. This operation performs a magnitude comparison
of the fixed-point contents of Rx and Ry.
The ASTAT register stores the results of the previous eight ALU compare
operations in CACC bits 31–24. These bits are shifted right (bit 24 is
overwritten) whenever a fixed-point or floating-point compare instruction
is executed.
ASTATx/y Flags
AZ Set if the unsigned operands in registers Rx and Ry are equal, otherwise
cleared
AN Set if the unsigned operand in the Rx register is smaller than the operand in
the Ry register, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Cleared
CACC The MSB bit of CACC is set if the X operand is greater than the Y operand
(its value is the AND of AZ and AN); otherwise cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS No effect
Rn = Rx + CI
Function
Adds the fixed-point field in register Rx with the carry flag from the ASTAT
register (AC). The result is placed in the fixed-point field in register Rn.
The floating-point extension field in Rn is set to all 0s. In saturation mode
(the ALU saturation mode bit in MODE1 set) positive overflows return the
maximum positive number (0x7FFF FFFF).
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Set if the XOR of the carries of the two most significant adder stages is 1,
otherwise cleared
AC Set if the carry from the most significant adder stage is 1, otherwise cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS Sticky indicator for AV bit set
AIS No effect
Rn = Rx + CI – 1
Function
Adds the fixed-point field in register Rx with the borrow from the ASTAT
register (AC – 1). The result is placed in the fixed-point field in register Rn.
The floating-point extension field in Rn is set to all 0s. In saturation mode
(the ALU saturation mode bit in MODE1 set) positive overflows return the
maximum positive number (0x7FFF FFFF).
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Set if the XOR of the carries of the two most significant adder stages is 1,
otherwise cleared
AC Set if the carry from the most significant adder stage is 1, otherwise cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS Sticky indicator for AV bit set
AIS No effect
Rn = Rx + 1
Function
Increments the fixed-point operand in register Rx. The result is placed in
the fixed-point field in register Rn. The floating-point extension field in
Rn is set to all 0s. In saturation mode (the ALU saturation mode bit in
MODE1 set), overflow causes the maximum positive number (0x7FFF FFFF)
to be returned.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Set if the XOR of the carries of the two most significant adder, stages is 1,
otherwise cleared
AC Set if the carry from the most significant adder stage is 1, otherwise cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS Sticky indicator for AV bit set
AIS No effect
Rn = Rx – 1
Function
Decrements the fixed-point operand in register Rx. The result is placed in
the fixed-point field in register Rn. The floating-point extension field in
Rn is set to all 0s. In saturation mode (the ALU saturation mode bit in
MODE1 set), underflow causes the minimum negative number
(0x8000 0000) to be returned.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Set if the XOR of the carries of the two most significant adder stages is 1,
otherwise cleared
AC Set if the carry from the most significant adder stage is 1, otherwise cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS Sticky indicator for AV bit set
AIS No effect
Rn = –Rx
Function
Negates the fixed-point operand in Rx by two’s-complement. The result is
placed in the fixed-point field in register Rn. The floating-point extension
field in Rn is set to all 0s. Negation of the minimum negative number
(0x8000 0000) causes an overflow. In saturation mode (the ALU satura-
tion mode bit in MODE1 set), overflow causes the maximum positive
number (0x7FFF FFFF) to be returned.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s
AN Set if the most significant output bit is 1
AV Set if the XOR of the carries of the two most significant adder stages is 1
AC Set if the carry from the most significant adder stage is 1, otherwise cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS Sticky indicator for AV bit set
AIS No effect
Rn = ABS Rx
Function
Determines the absolute value of the fixed-point operand in Rx. The
result is placed in the fixed-point field in register Rn. The floating-point
extension field in Rn is set to all 0s. The ABS of the minimum negative
number (0x8000 0000) causes an overflow. In saturation mode (the ALU
saturation mode bit in MODE1 set), overflow causes the maximum positive
number (0x7FFF FFFF) to be returned.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Set if the XOR of the carries of the two most significant adder stages is 1,
otherwise cleared
AC Set if the carry from the most significant adder stage is 1, otherwise cleared
AS Set if the fixed-point operand in Rx is negative, otherwise cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS Sticky indicator for AV bit set
AIS No effect
Rn = PASS Rx
Function
Passes the fixed-point operand in Rx through the ALU to the fixed-point
field in register Rn. The floating-point extension field in Rn is set to all 0s.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS No effect
Rn = Rx AND Ry
Function
Logically ANDs the fixed-point operands in Rx and Ry. The result is
placed in the fixed-point field in Rn. The floating-point extension field in
Rn is set to all 0s.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS No effect
Rn = Rx OR Ry
Function
Logically ORs the fixed-point operands in Rx and Ry. The result is placed
in the fixed-point field in Rn. The floating-point extension field in Rn is
set to all 0s.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS No effect
Rn = Rx XOR Ry
Function
Logically XORs the fixed-point operands in Rx and Ry. The result is
placed in the fixed-point field in Rn. The floating-point extension field in
Rn is set to all 0s.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS No effect
Rn = NOT Rx
Function
Logically complements the fixed-point operand in Rx. The result is placed
in the fixed-point field in Rn. The floating-point extension field in Rn is
set to all 0s.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS No effect
Rn = MIN(Rx, Ry)
Function
Returns the smaller of the two fixed-point operands in Rx and Ry. The
result is placed in the fixed-point field in register Rn. The floating-point
extension field in Rn is set to all 0s.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS No effect
Rn = MAX(Rx, Ry)
Function
Returns the larger of the two fixed-point operands in Rx and Ry. The
result is placed in the fixed-point field in register Rn. The floating-point
extension field in Rn is set to all 0s.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS No effect
Rn = CLIP Rx BY Ry
Function
Returns the fixed-point operand in Rx if the absolute value of the operand
in Rx is less than the absolute value of the fixed-point operand in Ry. Oth-
erwise, returns |Ry| if Rx is positive, and –|Ry| if Rx is negative. The result
is placed in the fixed-point field in register Rn. The floating-point exten-
sion field in Rn is set to all 0s.
ASTATx/y Flags
AZ Set if the fixed-point output is all 0s, otherwise cleared
AN Set if the most significant output bit is 1, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS No effect
Fn = Fx + Fy
Function
Adds the floating-point operands in registers Fx and Fy. The normalized
result is placed in register Fn. Rounding is to nearest (IEEE) or by trunca-
tion, to a 32-bit or to a 40-bit boundary, as defined by the rounding mode
and rounding boundary bits in MODE1. Post-rounded overflow returns
±infinity (round-to-nearest) or ±NORM.MAX (round-to-zero).
Post-rounded denormal returns ±zero. Denormal inputs are flushed to
±zero. A NAN input returns an all 1s result.
ASTATx/y Flags
AZ Set if the post-rounded result is a denormal (unbiased exponent < –126) or
zero, otherwise cleared
AN Set if the floating-point result is negative, otherwise cleared
AV Set if the post-rounded result overflows (unbiased exponent > +127), other-
wise cleared
AC Cleared
AS Cleared
AI Set if either of the input operands is a NAN, or if they are opposite-signed
infinities, otherwise cleared
STKYx/y Flags
AUS Sticky indicator for AZ bit set
AVS Sticky indicator for AV bit set
AOS No effect
AIS Sticky indicator for AI bit set
Fn = Fx – Fy
Function
Subtracts the floating-point operand in register Fy from the floating-point
operand in register Fx. The normalized result is placed in register Fn.
Rounding is to nearest (IEEE) or by truncation, to a 32-bit or to a 40-bit
boundary, as defined by the rounding mode and rounding boundary bits
in MODE1. Post-rounded overflow returns ±infinity (round-to-nearest) or
±NORM.MAX (round-to-zero). Post-rounded denormal returns ±zero.
Denormal inputs are flushed to ±zero. A NAN input returns an all 1s
result.
ASTATx/y Flags
AZ Set if the post-rounded result is a denormal (unbiased exponent < –126) or
zero, otherwise cleared
AN Set if the floating-point result is negative, otherwise cleared
AV Set if the post-rounded result overflows (unbiased exponent > +127), other-
wise cleared
AC Cleared
AS Cleared
AI Set if either of the input operands is a NAN, or if they are like-signed infini-
ties, otherwise cleared
STKYx/y Flags
AUS Sticky indicator for AZ bit set
AVS Sticky indicator for AV bit set
AOS No effect
AIS Sticky indicator for AI bit set
Function
Adds the floating-point operands in registers Fx and Fy, and places the
absolute value of the normalized result in register Fn. Rounding is to near-
est (IEEE) or by truncation, to a 32-bit or to a 40-bit boundary, as defined
by the rounding mode and rounding boundary bits in MODE1.
Post-rounded overflow returns +infinity (round-to-nearest) or
+NORM.MAX (round-to-zero). Post-rounded denormal returns +zero.
Denormal inputs are flushed to ±zero. A NAN input returns an all 1s
result.
ASTATx/y Flags
AZ Set if the post-rounded result is a denormal (unbiased exponent < –126) or
zero, otherwise cleared
AN Cleared
AV Set if the post-rounded result overflows (unbiased exponent > +127), other-
wise cleared
AC Cleared
AS Cleared
AI Set if either of the input operands is a NAN, or if they are opposite-signed
infinities, otherwise cleared
STKYx/y Flags
AUS Sticky indicator for AZ bit set
AVS Sticky indicator for AV bit set
AOS No effect
AIS Sticky indicator for AI bit set
Function
Subtracts the floating-point operand in Fy from the floating-point oper-
and in Fx and places the absolute value of the normalized result in register
Fn. Rounding is to nearest (IEEE) or by truncation, to a 32-bit or to a
40-bit boundary, as defined by the rounding mode and rounding bound-
ary bits in MODE1. Post-rounded overflow returns +infinity
(round-to-nearest) or +NORM.MAX (round-to-zero). Post-rounded
denormal returns +zero. Denormal inputs are flushed to ±zero. A NAN
input returns an all 1s result.
ASTATx/y Flags
AZ Set if the post-rounded result is a denormal (unbiased exponent < –126) or
zero, otherwise cleared
AN Cleared
AV Set if the post-rounded result overflows (unbiased exponent > +127), other-
wise cleared
AC Cleared
AS Cleared
AI Set if either of the input operands is a NAN, or if they are like-signed infini-
ties, otherwise cleared
STKYx/y Flags
AUS Sticky indicator for AZ bit set
AVS Sticky indicator for AV bit set
AOS No effect
AIS Sticky indicator for AI bit set
Fn = (Fx + Fy)/2
Function
Adds the floating-point operands in registers Fx and Fy and divides the
result by 2, by decrementing the exponent of the sum before rounding.
The normalized result is placed in register Fn. Rounding is to nearest
(IEEE) or by truncation, to a 32-bit or to a 40-bit boundary, as defined by
the rounding mode and rounding boundary bits in MODE1. Post-rounded
overflow returns ±infinity (round-to-nearest) or ±NORM.MAX
(round-to-zero). Post-rounded denormal results return ±zero. A denormal
input is flushed to ±zero. A NAN input returns an all 1s result.
ASTATx/y Flags
AZ Set if the post-rounded result is a denormal (unbiased exponent < –126) or
zero, otherwise cleared
AN Set if the floating-point result is negative, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Set if either of the input operands is a NAN, or if they are opposite-signed
infinities, otherwise cleared
STKYx/y Flags
AUS Sticky indicator for AZ bit set
AVS No effect
AOS No effect
AIS Sticky indicator for AI bit set
COMP(Fx, Fy)
Function
Compares the floating-point operand in register Fx with the float-
ing-point operand in register Fy. Sets the AZ flag if the two operands are
equal, and the AN flag if the operand in register Fx is smaller than the oper-
and in register Fy.
The ASTAT register stores the results of the previous eight ALU compare
operations in CACC bits 31–24. These bits are shifted right (bit 24 is
overwritten) whenever a fixed-point or floating-point compare instruction
is executed.
ASTATx/y Flags
AZ Set if the operands in registers Fx and Fy are equal, otherwise cleared
AN Set if the operand in the Fx register is smaller than the operand in the Fy reg-
ister, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Set if either of the input operands is a NAN, otherwise cleared
CACC The MSB of CACC is set if the X operand is greater than the Y operand (its
value is the AND of AZ and AN); otherwise cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS Sticky indicator for AI bit set
Fn = –Fx
Function
Complements the sign bit of the floating-point operand in Fx. The com-
plemented result is placed in register Fn. A denormal input is flushed to
±zero. A NAN input returns an all 1s result.
ASTATx/y Flags
AZ Set if the result operand is a ±zero, otherwise cleared
AN Set if the floating-point result is negative, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Set if the input operand is a NAN, otherwise cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS Sticky indicator for AI bit set
Fn = ABS Fx
Function
Returns the absolute value of the floating-point operand in register Fx by
setting the sign bit of the operand to 0. Denormal inputs are flushed to
+zero. A NAN input returns an all 1s result.
ASTATx/y Flags
AZ Set if the result operand is +zero, otherwise cleared
AN Cleared
AV Cleared
AC Cleared
AS Set if the input operand is negative, otherwise cleared
AI Set if the input operand is a NAN, otherwise cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS Sticky indicator for AI bit set
Fn = PASS Fx
Function
Passes the floating-point operand in Fx through the ALU to the float-
ing-point field in register Fn. Denormal inputs are flushed to ±zero. A
NAN input returns an all 1s result.
ASTATx/y Flags
AZ Set if the result operand is a ±zero, otherwise cleared
AN Set if the floating-point result is negative, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Set if the input operand is a NAN, otherwise cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS Sticky indicator for AI bit set
Fn = RND Fx
Function
Rounds the floating-point operand in register Fx to a 32 bit boundary.
Rounding is to nearest (IEEE) or by truncation, as defined by the round-
ing mode bit in MODE1. Post-rounded overflow returns ±infinity
(round-to-nearest) or ±NORM.MAX (round-to-zero). A denormal input
is flushed to ±zero. A NAN input returns an all 1s result.
ASTATx/y Flags
AZ Set if the result operand is a ±zero, otherwise cleared
AN Set if the floating-point result is negative, otherwise cleared
AV Set if the post-rounded result overflows (unbiased exponent > +127), other-
wise cleared
AC Cleared
AS Cleared
AI Set if the input operand is a NAN, otherwise cleared
STKYx/y Flags
AUS No effect
AVS Sticky indicator for AV bit set
AOS No effect
AIS Sticky indicator for AI bit set
Fn = SCALB Fx BY Ry
Function
Scales the exponent of the floating-point operand in Fx by adding to it the
fixed-point two’s-complement integer in Ry. The scaled floating-point
result is placed in register Fn. Overflow returns ±infinity (round-to-near-
est) or ±NORM.MAX (round-to-zero). Denormal returns ±zero.
Denormal inputs are flushed to ±zero. A NAN input returns an all 1s
result.
ASTATx/y Flags
AZ Set if the result is a denormal (unbiased exponent < –126) or zero, otherwise
cleared
AN Set if the floating-point result is negative, otherwise cleared
AV Set if the result overflows (unbiased exponent > +127), otherwise cleared
AC Cleared
AS Cleared
AI Set if the input is a NAN, an otherwise cleared
STKYx/y Flags
AUS Sticky indicator for AZ bit set
AVS Sticky indicator for AV bit set
AOS No effect
AIS Sticky indicator for AI bit set
Rn = MANT Fx
Function
Extracts the mantissa (fraction bits with explicit hidden bit, excluding the
sign bit) from the floating-point operand in Fx. The unsigned-magnitude
result is left-justified (1.31 format) in the fixed-point field in Rn. Round-
ing modes are ignored and no rounding is performed because all results
are inherently exact. Denormal inputs are flushed to ±zero. A NAN or an
infinity input returns an all 1s result (–1 in signed fixed-point format).
ASTATx/y Flags
AZ Set if the result is zero, otherwise cleared
AN Cleared
AV Set if the input operand is an infinity, otherwise cleared
AC Cleared
AS Set if the input is negative, otherwise cleared
AI Set if the input operand is a NAN, otherwise cleared
STKYx/y Flags
AUS No effect
AVS Sticky indicator for AV bit set
AOS No effect
AIS Sticky indicator for AI bit set
Rn = LOGB Fx
Function
Converts the exponent of the floating-point operand in register Fx to an
unbiased two’s-complement fixed-point integer. The result is placed in the
fixed-point field in register Rn. Unbiasing is done by subtracting 127
from the floating-point exponent in Fx. If saturation mode is not set, a
±infinity input returns a floating-point +infinity and a ±zero input returns
a floating-point –infinity. If saturation mode is set, a ±infinity input
returns the maximum positive value (0x7FFF FFFF), and a ±zero input
returns the maximum negative value (0x8000 0000). Denormal inputs are
flushed to ±zero. A NAN input returns an all 1s result.
ASTATx/y Flags
AZ Set if the fixed-point result is zero, otherwise cleared
AN Set if the result is negative, otherwise cleared
AV Set if the input operand is an infinity or a zero, otherwise cleared
AC Cleared
AS Cleared
AI Set if the input is a NAN, otherwise cleared
STKYx/y Flags
AUS No effect
AVS Sticky indicator for AV bit set
AOS No effect
AIS Sticky indicator for AI bit set
Rn = FIX Fx
Rn = TRUNC Fx
Rn = FIX Fx BY Ry
Rn = TRUNC Fx BY Ry
Function
Converts the floating-point operand in Fx to a two’s-complement 32-bit
fixed-point integer result.
If the MODE1 register TRUNC bit=1, the Fix operation truncates the mantissa
towards –infinity. If the TRUNC bit=0, the Fix operation rounds the man-
tissa towards the nearest integer.
The trunc operation always truncates toward 0. The TRUNC bit does not
influence operation of the trunc instruction.
If a scaling factor (Ry) is specified, the fixed-point two’s-complement inte-
ger in Ry is added to the exponent of the floating-point operand in Fx
before the conversion.
The result of the conversion is right-justified (32.0 format) in the
fixed-point field in register Rn. The floating-point extension field in Rn is
set to all 0s.
In saturation mode (the ALU saturation mode bit in MODE1 set) positive
overflows and +infinity return the maximum positive number
(0x7FFF FFFF), and negative overflows and –infinity return the mini-
mum negative number (0x8000 0000).
For the Fix operation, rounding is to nearest (IEEE) or by truncation, as
defined by the rounding mode bit in MODE1. A NAN input returns a float-
ing-point all 1s result. If saturation mode is not set, an infinity input or a
result that overflows returns a floating-point result of all 1s.
ASTATx/y Flags
AZ Set if the fixed-point result is zero, otherwise cleared
AN Set if the fixed-point result is negative, otherwise cleared
AV Set if the conversion causes the floating-point mantissa to be shifted left,
that is, if the floating-point exponent + scale bias is >157 (127 + 31 – 1) or if
the input is ±infinity, otherwise cleared
AC Cleared
AS Cleared
AI Set if the input operand is a NAN or, when saturation mode is not set, either
input is an infinity or the result overflows, otherwise cleared
STKYx/y Flags
AUS Sticky indicator Set if the pre-rounded result is between -1.0 and 1.0 (except
-1, 1, 0), otherwise not effected
AVS Sticky indicator for AV bit set
AOS No effect
AIS Sticky indicator for AI bit set
Fn = FLOAT Rx BY Ry
Fn = FLOAT Rx
Function
Converts the fixed-point operand in Rx to a floating-point result. If a scal-
ing factor (Ry) is specified, the fixed-point two’s-complement integer in
Ry is added to the exponent of the floating-point result. The final result is
placed in register Fn. Rounding is to nearest (IEEE) or by truncation, as
defined by the rounding mode, to a 40-bit boundary, regardless of the val-
ues of the rounding boundary bits in MODE1. The exponent scale bias may
cause a floating-point overflow or a floating-point underflow. Overflow
generates a return of ±infinity (round-to-nearest) or ±NORM.MAX
(round-to-zero); underflow generates a return of ±zero.
Fn = RECIPS Fx
Function
Creates an 8-bit accurate seed for 1/Fx, the reciprocal of Fx. The mantissa
of the seed is determined from a ROM table using the 7 MSBs (excluding
the hidden bit) of the Fx mantissa as an index. The unbiased exponent of
the seed is calculated as the two’s-complement of the unbiased Fx expo-
nent, decremented by one; that is, if e is the unbiased exponent of Fx,
then the unbiased exponent of Fn = –e – 1. The sign of the seed is the sign
of the input. A ±zero returns ±infinity and sets the overflow flag. If the
unbiased exponent of Fx is greater than +125, the result is ±zero. A NAN
input returns an all 1s result.
The following code performs floating-point division using an iterative
convergence algorithm.1 The result is accurate to one LSB in whichever
format mode, 32-bit or 40-bit, is set. The following inputs are required:
F0=numerator, F12=denominator, F11=2.0. The quotient is returned in
F0. (The two indented instructions can be removed if only a ±1 LSB accu-
rate single-precision result is necessary.) Note that, in the algorithm
example’s comments, references to R0, R1, R2, and R3 do not refer to
data registers. Rather, they refer to variables in the algorithm.
F0=RECIPS F12, F7=F0; /* Get 8-bit seed R0=1/D */
F12=F0*F12; /* D' = D*R0 */
F7=F0*F7, F0=F11-F12; /* F0=R1=2-D', F7=N*R0 */
F12=F0*F12; /* F12=D'-D'*R1 */
F7=F0*F7, F0=F11-F12; /* F7=N*R0*R1, F0=R2=2-D' */
F12=F0*F12; /* F12=D'=D'*R2 */
F7=F0*F7, F0=F11-F12; /* F7=N*R0*R1*R2, F0=R3=2-D' */
F0=F0*F7; /* F7=N*R0*R1*R2*R3 */
1
Cavanagh, J. 1984. Digital Computer Arithmetic. McGraw-Hill. Page 284.
ASTATx/y Flags
AZ Set if the floating-point result is ±zero (unbiased exponent of Fx is greater
than +125), otherwise cleared
AN Set if the input operand is negative, otherwise cleared
AV Set if the input operand is ±zero, otherwise cleared
AC Cleared
AS Cleared
AI Set if the input operand is a NAN, otherwise cleared
STKYx/y Flags
AUS Sticky indicator for AZ bit set
AVS Sticky indicator for AV bit set
AOS No effect
AIS Sticky indicator for AI bit set
Fn = RSQRTS Fx
Function
Creates a 4-bit accurate seed for 1/(Fx)½, the reciprocal square root of Fx.
The mantissa of the seed is determined from a ROM table, using the LSB
of the biased exponent of Fx concatenated with the six MSBs (excluding
the hidden bit of the mantissa) of Fx’s index.
The unbiased exponent of the seed is calculated as the two’s-complement
of the unbiased Fx exponent, shifted right by one bit and decremented by
one; that is, if e is the unbiased exponent of Fx, then the unbiased expo-
nent of Fn = –INT[e/2] – 1.
The sign of the seed is the sign of the input. The input ±zero returns
±infinity and sets the overflow flag. The input +infinity returns +zero. A
NAN input or a negative nonzero input returns a result of all 1s.
The following code calculates a floating-point reciprocal square root
(1/(x)½) using a Newton-Raphson iteration algorithm.1 The result is accu-
rate to one LSB in whichever format mode, 32-bit or 40-bit, is set.
To calculate the square root, simply multiply the result by the original
input. The following inputs are required: F0=input, F8=3.0, F1=0.5. The
result is returned in F4. (The four indented instructions can be removed if
only a ±1 LSB accurate single-precision result is necessary.)
F4=RSQRTS F0; /* Fetch 4-bit seed */
F12=F4*F4; /* F12=X0^2 */
F12=F12*F0; /* F12=C*X0^2 */
F4=F1*F4, F12=F8-F12; /* F4=.5*X0, F12=3-C*X0^2 */
F4=F4*F12; /* F4=X1=.5*X0(3-C*X0^2) */
F12=F4*F4; /* F12=X1^2 */
1
Cavanagh, J. 1984. Digital Computer Arithmetic. McGraw-Hill. Page 278.
F12=F12*F0; /* F12=C*X1^2 */
F4=F1*F4, F12=F8-F12; /* F4=.5*X1, F12=3-C*X1^2 */
F4=F4*F12; /* F4=X2=.5*X1(3-C*X1^2) */
F12=F4*F4; /* F12=X2^2 */
F12=F12*F0; /* F12=C*X2^2 */
F4=F1*F4, F12=F8-F12; /* F4=.5*X2, F12=3-C*X2^2 */
F4=F4*F12; /* F4=X3=.5*X2(3-C*X2^2) */
Note that this code segment can be made into a subroutine by adding an
RTS(DB) clause to the third-to-last instruction.
ASTATx/y Flags
AZ Set if the floating-point result is +zero (Fx = +infinity), otherwise cleared
AN Set if the input operand is –zero, otherwise cleared
AV Set if the input operand is ±zero, otherwise cleared
AC Cleared
AS Cleared
AI Set if the input operand is negative and nonzero, or a NAN, otherwise
cleared
STKYx/y Flags
AUS No effect
AVS Sticky indicator for AV bit set
AOS No effect
AIS Sticky indicator for AI bit set
Fn = Fx COPYSIGN Fy
Function
Copies the sign of the floating-point operand in register Fy to the float-
ing-point operand from register Fx without changing the exponent or the
mantissa. The result is placed in register Fn. A denormal input is flushed
to ±zero. A NAN input returns an all 1s result.
ASTATx/y Flags
AZ Set if the floating-point result is ±zero, otherwise cleared
AN Set if the floating-point result is negative, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Set if either of the input operands is a NAN, otherwise cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS Sticky indicator for AI bit set
Fn = MIN(Fx, Fy)
Function
Returns the smaller of the floating-point operands in register Fx and Fy. A
NAN input returns an all 1s result. The MIN of +zero and –zero returns
–zero. Denormal inputs are flushed to ±zero.
ASTATx/y Flags
AZ Set if the floating-point result is ±zero, otherwise cleared
AN Set if the floating-point result is negative, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Set if either of the input operands is a NAN, otherwise cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS Sticky indicator for AI bit set
Fn = MAX(Fx, Fy)
Function
Returns the larger of the floating-point operands in registers Fx and Fy. A
NAN input returns an all 1s result. The MAX of +zero and –zero returns
+zero. Denormal inputs are flushed to ±zero.
ASTATx/y Flags
AZ Set if the floating-point result is ±zero, otherwise cleared
AN Set if the floating-point result is negative, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Set if either of the input operands is a NAN, otherwise cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS Sticky indicator for AI bit set
Fn = CLIP Fx BY Fy
Function
Returns the floating-point operand in Fx if the absolute value of the oper-
and in Fx is less than the absolute value of the floating-point operand in
Fy. Else, returns | Fy | if Fx is positive, and –| Fy | if Fx is negative. A
NAN input returns an all 1s result. Denormal inputs are flushed to ±zero.
ASTATx/y Flags
AZ Set if the floating-point result is ±zero, otherwise cleared
AN Set if the floating-point result is negative, otherwise cleared
AV Cleared
AC Cleared
AS Cleared
AI Set if either of the input operands is a NAN, otherwise cleared
STKYx/y Flags
AUS No effect
AVS No effect
AOS No effect
AIS Sticky indicator for AI bit set
Modifiers
Some of the instructions accept the following Mod1, Mod2, and Mod3
modifiers enclosed in parentheses and that consist of three or four letters
that indicate whether:
• The x-input is signed (S) or unsigned (U).
• The y-input is signed or unsigned.
• The inputs are in integer (I) or fractional (F) format.
• The result written to the register file is rounded-to-nearest (R).
“Multiplier Instruction Summary” on page 3-18 provides information on
multiplier instructions. Table 3-6 on page 3-20 lists the options for the
mod1 – mod3 options and the corresponding opcode values.
Rn = Rx * Ry (mod1)
MRF = Rx * Ry (mod1)
MRB = Rx * Ry (mod1)
Function
Multiplies the fixed-point fields in registers Rx and Ry.
If rounding is specified (fractional data only), the result is rounded. The
result is placed either in the fixed-point field in register Rn or one of the
MR accumulation registers.
If Rn is specified, only the portion of the result that has the same format
as the inputs is transferred (bits 31–0 for integers, bits 63–32 for frac-
tional). The floating-point extension field in Rn is set to all 0s. If MRF or
MRB is specified, the entire 80-bit result is placed in MRF or MRB.
ASTATx/y Flags
MN Set if the result is negative, otherwise cleared
MV Set if the upper bits are not all zeros (signed or unsigned result) or ones (signed
result); number of upper bits depends on format; for a signed result,
fractional=33, integer=49; for an unsigned result, fractional=32, integer=48
MU Set if the upper 48 bits of a fractional result are all zeros (signed or unsigned
result) or ones (signed result) and the lower 32 bits are not all zeros; integer
results do not underflow
MI Cleared
STKYx/y Flags
MUS No effect
MVS No effect
MOS Sticky indicator for MV bit set
MIS No effect
Rn = MRF + Rx * Ry (mod1)
Rn = MRB + Rx * Ry (mod1)
MRF = MRF + Rx * Ry (mod1)
MRB = MRB + Rx * Ry (mod1)
Function
Multiplies the fixed-point fields in registers Rx and Ry, and adds the prod-
uct to the specified MR register value. If rounding is specified (fractional
data only), the result is rounded. The result is placed either in the
fixed-point field in register Rn or one of the MR accumulation registers,
which must be the same MR register that provided the input. If Rn is speci-
fied, only the portion of the result that has the same format as the inputs is
transferred (bits 31–0 for integers, bits 63–32 for fractional). The float-
ing-point extension field in Rn is set to all 0s. If MRF or MRB is specified, the
entire 80-bit result is placed in MRF or MRB.
ASTATx/y Flags
MN Set if the result is negative, otherwise cleared
MV Set if the upper bits are not all zeros (signed or unsigned result) or ones (signed
result); number of upper bits depends on format; for a signed result,
fractional=33, integer=49; for an unsigned result, fractional=32, integer=48
MU Set if the upper 48 bits of a fractional result are all zeros (signed or unsigned
result) or ones (signed result) and the lower 32 bits are not all zeros; integer
results do not underflow
MI Cleared
STKYx/y Flags
MUS No effect
MVS No effect
MOS Sticky indicator for MV bit set
MIS No effect
Rn = MRF – Rx * Ry (mod1)
Rn = MRB – Rx * Ry (mod1)
MRF = MRF – Rx * Ry (mod1)
MRB = MRB – Rx * Ry (mod1)
Function
Multiplies the fixed-point fields in registers Rx and Ry, and subtracts the
product from the specified MR register value. If rounding is specified (frac-
tional data only), the result is rounded. The result is placed either in the
fixed-point field in register Rn or in one of the MR accumulation registers,
which must be the same MR register that provided the input. If Rn is speci-
fied, only the portion of the result that has the same format as the inputs is
transferred (bits 31–0 for integers, bits 63–32 for fractional). The float-
ing-point extension field in Rn is set to all 0s. If MRF or MRB is specified, the
entire 80-bit result is placed in MRF or MRB.
ASTATx/y Flags
MN Set if the result is negative, otherwise cleared
MV Set if the upper bits are not all zeros (signed or unsigned result) or ones (signed
result); number of upper bits depends on format; for a signed result,
fractional=33, integer=49; for an unsigned result, fractional=32, integer=48
MU Set if the upper 48 bits of a fractional result are all zeros (signed or unsigned
result) or ones (signed result) and the lower 32 bits are not all zeros; integer
results do not underflow
MI Cleared
STKYx/y Flags
MUS No effect
MVS No effect
MOS Sticky indicator for MV bit set
MIS No effect
Function
If the value of the specified MR register is greater than the maximum value
for the specified data format, the multiplier sets the result to the maxi-
mum value. Otherwise, the MR value is unaffected. The result is placed
either in the fixed-point field in register Rn or one of the MR accumulation
registers, which must be the same MR register that provided the input. If
Rn is specified, only the portion of the result that has the same format as
the inputs is transferred (bits 31–0 for integers, bits 63–32 for fractional).
The floating-point extension field in Rn is set to all 0s. If MRF or MRB is
specified, the entire 80-bit result is placed in MRF or MRB.
ASTATx/y Flags
MN Set if the result is negative, otherwise cleared
MV Cleared
MU Set if the upper 48 bits of a fractional result are all zeros (signed or unsigned
result) or ones (signed result) and the lower 32 bits are not all zeros; integer
results do not underflow
MI Cleared
STKYx/y Flags
MUS No effect
MVS No effect
MOS No effect
MIS No effect
Function
Rounds the specified MR value to nearest at bit 32 (the MR1–MR0 bound-
ary). The result is placed either in the fixed-point field in register Rn or
one of the MR accumulation registers, which must be the same MR register
that provided the input. If Rn is specified, only the portion of the result
that has the same format as the inputs is transferred (bits 31–0 for inte-
gers, bits 63–32 for fractional). The floating-point extension field in Rn is
set to all 0s. If MRF or MRB is specified, the entire 80-bit result is placed in
MRF or MRB.
ASTATx/y Flags
MN Set if the result is negative, otherwise cleared
MV Set if the upper bits are not all zeros (signed or unsigned result) or ones (signed
result); number of upper bits depends on format; for a signed result,
fractional=33, integer=49; for an unsigned result, fractional=32, integer=48
MU Set if the upper 48 bits of a fractional result are all zeros (signed or unsigned
result) or ones (signed result) and the lower 32 bits are not all zeros; integer
results do not underflow
MI Cleared
STKYx/y Flags
MUS No effect
MVS No effect
MOS Sticky indicator for MV bit set
MIS No effect
MRF = 0
MRB = 0
Function
Sets the value of the specified MR register to zero. All 80 bits (MR2, MR1, MR0)
are cleared.
ASTATx/y Flags
MN Cleared
MV Cleared
MU Cleared
MI Cleared
STKYx/y Flags
MUS No effect
MVS No effect
MOS No effect
MIS No effect
MRxF/B = Rn
Rn = MRxF/B
Function
A transfer to an MR register places the fixed-point field of register Rn in the
specified MR register. The floating-point extension field in Rn is ignored. A
transfer from an MR register places the specified MR register in the
fixed-point field in register Rn. The floating-point extension field in Rn is
set to all 0s.
ASTATx/y Flags
MN Cleared
MV Cleared
MU Cleared
MI Cleared
STKYx/y Flags
MUS No effect
MVS No effect
MOS No effect
MIS No effect
Fn = Fx * Fy
Function
Multiplies the floating-point operands in registers Fx and Fy and places
the result in the register Fn.
ASTATx/y Flags
MN Set if the result is negative, otherwise cleared
MV Set if the unbiased exponent of the result is greater than 127, otherwise cleared
MU Set if the unbiased exponent of the result is less than –126, otherwise cleared
MI Set if either input is a NAN or if the inputs are ±infinity and ±zero, otherwise
cleared
STKYx/y Flags
MUS Sticky indicator for MU bit set
MVS Sticky indicator for MV bit set
MOS No effect
MIS Sticky indicator for MI bit set
Modifiers
Some of the instructions in this group accept the following modifiers
enclosed in parentheses.
• (SE) = Sign extension of deposited or extracted field
• (EX) = Extended exponent extract
• (NU) = No update (bit FIFO)
“Shifter Instruction Summary” on page 3-31 provides information on
shifter instructions. Table 3-8 on page 3-31 lists the options.
Rn = LSHIFT Rx BY Ry
Rn = LSHIFT Rx BY <data8>
Function
Logically shifts the fixed-point operand in register Rx by the 32-bit value
in register Ry or by the 8-bit immediate value in the instruction. The
shifted result is placed in the fixed-point field of register Rn. The float-
ing-point extension field of Rn is set to all 0s. The shift values are
two’s-complement numbers. Positive values select a left shift, negative val-
ues select a right shift. The 8-bit immediate data can take values between
–128 and 127 inclusive, allowing for a shift of a 32-bit field from off-scale
right to off-scale left.
ASTATx/y Flags
SZ Set if the shifted result is zero, otherwise cleared
SV Set if the input is shifted to the left by more than 0, otherwise cleared
SS Cleared
Rn = Rn OR LSHIFT Rx BY Ry
Rn = Rn OR LSHIFT Rx BY <data8>
Function
Logically shifts the fixed-point operand in register Rx by the 32-bit value
in register Ry or by the 8-bit immediate value in the instruction. The
shifted result is logically ORed with the fixed-point field of register Rn
and then written back to register Rn. The floating-point extension field of
Rn is set to all 0s. The shift values are two’s-complement numbers. Posi-
tive values select a left shift, negative values select a right shift. The 8-bit
immediate data can take values between –128 and 127 inclusive, allowing
for a shift of a 32-bit field from off-scale right to off-scale left.
ASTATx/y Flags
SZ Set if the shifted result is zero, otherwise cleared
SV Set if the input is shifted left by more than 0, otherwise cleared
SS Cleared
Rn = ASHIFT Rx BY Ry
Rn = ASHIFT Rx BY <data8>
Function
Arithmetically shifts the fixed-point operand in register Rx by the 32-bit
value in register Ry or by the 8-bit immediate value in the instruction.
The shifted result is placed in the fixed-point field of register Rn. The
floating-point extension field of Rn is set to all 0s. The shift values are
two’s-complement numbers. Positive values select a left shift, negative val-
ues select a right shift. The 8-bit immediate data can take values between
–128 and 127 inclusive, allowing for a shift of a 32-bit field from off-scale
right to off-scale left.
ASTATx/y Flags
SZ Set if the shifted result is zero, otherwise cleared
SV Set if the input is shifted left by more than 0, otherwise cleared
SS Cleared
Rn = Rn OR ASHIFT Rx BY Ry
Rn = Rn OR ASHIFT Rx BY <data8>
Function
Arithmetically shifts the fixed-point operand in register Rx by the 32-bit
value in register Ry or by the 8-bit immediate value in the instruction.
The shifted result is logically ORed with the fixed-point field of register
Rn and then written back to register Rn. The floating-point extension
field of Rn is set to all 0s. The shift values are two’s-complement numbers.
Positive values select a left shift, negative values select a right shift. The
8-bit immediate data can take values between –128 and 127 inclusive,
allowing for a shift of a 32-bit field from off-scale right to off-scale left.
ASTATx/y Flags
SZ Set if the shifted result is zero, otherwise cleared
SV Set if the input is shifted left by more than 0, otherwise cleared
SS Cleared
Rn = ROT Rx BY Ry
Rn = ROT Rx BY <data8>
Function
Rotates the fixed-point operand in register Rx by the 32-bit value in regis-
ter Ry or by the 8-bit immediate value in the instruction. The rotated
result is placed in the fixed-point field of register Rn. The floating-point
extension field of Rn is set to all 0s. The shift values are two’s-complement
numbers. Positive values select a rotate left; negative values select a rotate
right. The 8-bit immediate data can take values between –128 and 127
inclusive, allowing for a rotate of a 32-bit field from full right wrap
around to full left wrap around.
ASTATx/y Flags
SZ Set if the rotated result is zero, otherwise cleared
SV Cleared
SS Cleared
Rn = BCLR Rx BY Ry
Rn = BCLR Rx BY <data8>
Function
Clears a bit in the fixed-point operand in register Rx. The result is placed
in the fixed-point field of register Rn. The floating-point extension field
of Rn is set to all 0s. The position of the bit is the 32-bit value in register
Ry or the 8-bit immediate value in the instruction. The 8-bit immediate
data can take values between 31 and 0 inclusive, allowing for any bit
within a 32-bit field to be cleared. If the bit position value is greater than
31 or less than 0, no bits are cleared.
ASTATx/y Flags
SZ Set if the output operand is 0, otherwise cleared
SV Set if the bit position is greater than 31, otherwise cleared
SS Cleared
Rn = BSET Rx BY Ry
Rn = BSET Rx BY <data8>
Function
Sets a bit in the fixed-point operand in register Rx. The result is placed in
the fixed-point field of register Rn. The floating-point extension field of
Rn is set to all 0s. The position of the bit is the 32-bit value in register Ry
or the 8-bit immediate value in the instruction. The 8-bit immediate data
can take values between 31 and 0 inclusive, allowing for any bit within a
32-bit field to be set. If the bit position value is greater than 31 or less
than 0, no bits are set.
ASTATx/y Flags
SZ Set if the output operand is 0, otherwise cleared
SV Set if the bit position is greater than 31, otherwise cleared
SS Cleared
Rn = BTGL Rx BY Ry
Rn = BTGL Rx BY <data8>
Function
Toggles a bit in the fixed-point operand in register Rx. The result is placed
in the fixed-point field of register Rn. The floating-point extension field
of Rn is set to all 0s. The position of the bit is the 32-bit value in register
Ry or the 8-bit immediate value in the instruction. The 8-bit immediate
data can take values between 31 and 0 inclusive, allowing for any bit
within a 32-bit field to be toggled. If the bit position value is greater than
31 or less than 0, no bits are toggled.
ASTATx/y Flags
SZ Set if the output operand is 0, otherwise cleared
SV Set if the bit position is greater than 31, otherwise cleared
SS Cleared
BTST Rx BY Ry
BTST Rx BY <data8>
Function
Tests a bit in the fixed-point operand in register Rx. The SZ flag is set if
the bit is a 0 and cleared if the bit is a 1. The position of the bit is the
32-bit value in register Ry or the 8-bit immediate value in the instruction.
The 8-bit immediate data can take values between 31 and 0 inclusive,
allowing for any bit within a 32-bit field to be tested. If the bit position
value is greater than 31 or less than 0, no bits are tested.
ASTATx/y Flags
SZ Cleared if the tested bit is a 1, is set if the tested bit is a 0 or if the bit posi-
tion is greater than 31
SV Set if the bit position is greater than 31, otherwise cleared
SS Cleared
Rn = FDEP Rx BY Ry
Rn = FDEP Rx BY <bit6>:<len6>
Function
Deposits a field from register Rx to register Rn. (See Figure 11-1.) The
input field is right-aligned within the fixed-point field of Rx. Its length is
determined by the len6 field in register Ry or by the immediate len6 field
in the instruction. The field is deposited in the fixed-point field of Rn,
starting from a bit position determined by the bit6 field in register Ry or
by the immediate bit6 field in the instruction. Bits to the left and to the
right of the deposited field are set to 0. The floating-point extension field
of Rn (bits 7–0 of the 40-bit word) is set to all 0s. Bit6 and len6 can take
values between 0 and 63 inclusive, allowing for deposit of fields ranging in
length from 0 to 32 bits, and to bit positions ranging from 0 to off-scale
left.
39 19 13 7 0
Ry len6 bit6
39 7 0
Rx
len6 = number of bits to take from Rx, starting from LSB of 32-bit field
39 7 0
Rn deposit field
Example
If len6=14 and bit6=13, then the 14 bits of Rx are deposited in Rn bits
34–21 (of the 40-bit word).
39 31 23 15 7 0
|--------|--------|--abcdef|ghijklmn|--------| Rx
\-------------/
14 bits
39 31 23 15 7 0
|00000abc|defghijk|lmn00000|00000000|00000000| Rn
\--------------/
|
bit position 13 (from reference point)
ASTATx/y Flags
SZ Set if the output operand is 0, otherwise cleared
SV Set if any bits are deposited to the left of the 32-bit fixed-point output field
(that is, if len6 + bit6 > 32), otherwise cleared
SS Cleared
Rn = Rn OR FDEP Rx BY Ry
Rn = Rn OR FDEP Rx BY <bit6>:<len6>
Function
Deposits a field from register Rx to register Rn. The field value is logically
ORed bitwise with the specified field of register Rn and the new value is
written back to register Rn. The input field is right-aligned within the
fixed-point field of Rx. Its length is determined by the len6 field in regis-
ter Ry or by the immediate len6 field in the instruction.
The field is deposited in the fixed-point field of Rn, starting from a bit
position determined by the bit6 field in register Ry or by the immediate
bit6 field in the instruction. Bit6 and len6 can take values between 0 and
63 inclusive, allowing for deposit of fields ranging in length from 0 to 32
bits, and to bit positions ranging from 0 to off-scale left.
Example
39 31 23 15 7 0
|--------|--------|--abcdef|ghijklmn|--------| Rx
\--------------/
len6 bits
39 31 23 15 7 0
|abcdefgh|ijklmnop|qrstuvwx|yzabcdef|ghijklmn| Rn old
\--------------/
|
bit position bit6 (from reference point)
39 31 23 15 7 0
|abcdeopq|rstuvwxy|zabtuvwx|yzabcdef|ghijklmn| Rn new
OR result
ASTATx/y Flags
SZ Set if the output operand is 0, otherwise cleared
SV Set if any bits are deposited to the left of the 32-bit fixed-point output field
(that is, if len6 + bit6 > 32), otherwise cleared
SS Cleared
Rn = FDEP Rx BY Ry (SE)
Rn = FDEP Rx BY <bit6>:<len6> (SE)
Function
Deposits and sign-extends a field from register Rx to register Rn. (See
Figure 11-2.) The input field is right-aligned within the fixed-point field
of Rx. Its length is determined by the len6 field in register Ry or by the
immediate len6 field in the instruction. The field is deposited in the
fixed-point field of Rn, starting from a bit position determined by the bit6
field in register Ry or by the immediate bit6 field in the instruction. The
MSBs of Rn are sign-extended by the MSB of the deposited field, unless
the MSB of the deposited field is off-scale left. Bits to the right of the
deposited field are set to 0. The floating-point extension field of Rn (bits
7–0 of the 40-bit word) is set to all 0s. Bit6 and len6 can take values
between 0 and 63 inclusive, allowing for deposit of fields ranging in
length from 0 to 32 bits into bit positions ranging from 0 to off-scale left.
39 19 13 7 0
Ry len6 bit6
39 7 0
Rx
len6 = number of bits to take from Rx, starting from LSB of 32-bit field
39 7 0
Example
39 31 23 15 7 0
|--------|--------|--abcdef|ghijklmn|--------| Rx
\---------------/
len6 bits
39 31 23 15 7 0
|aaaaaabc|defghijk|lmn00000|00000000|00000000| Rn
\----/\--------------/
sign |
extension bit position bit6
(from reference point)
ASTATx/y Flags
SZ Set if the output operand is 0, otherwise cleared
SV Set if any bits are deposited to the left of the 32-bit fixed-point output field
(that is, if len6 + bit6 > 32), otherwise cleared
SS Cleared
Rn = Rn OR FDEP Rx BY Ry (SE)
Rn = Rn OR FDEP Rx BY <bit6>:<len6> (SE)
Function
Deposits and sign-extends a field from register Rx to register Rn. The
sign-extended field value is logically ORed bitwise with the value of regis-
ter Rn and the new value is written back to register Rn. The input field is
right-aligned within the fixed-point field of Rx. Its length is determined
by the len6 field in register Ry or by the immediate len6 field in the
instruction. The field is deposited in the fixed-point field of Rn, starting
from a bit position determined by the bit6 field in register Ry.
The bit position can also be determined by the immediate bit6 field in the
instruction. Bit6 and len6 can take values between 0 and 63 inclusive to
allow the deposit of fields ranging in length from 0 to 32 bits into bit posi-
tions ranging from 0 to off-scale left.
Example
39 31 23 15 7 0
|--------|--------|--abcdef|ghijklmn|--------| Rx
\-------------/
len6 bits
39 31 23 15 7 0
|aaaaaabc|defghijk|lmn00000|00000000|00000000|
\----/\--------------/
sign |
extension bit position bit6
(from reference point)
39 31 23 15 7 0
|abcdefgh|ijklmnop|qrstuvwx|yzabcdef|ghijklmn| Rn old
39 31 23 15 7 0
|vwxyzabc|defghijk|lmntuvwx|yzabcdef|ghijklmn| Rn new
OR result
ASTATx/y Flags
SZ Set if the output operand is 0, otherwise cleared
SV Set if any bits are deposited to the left of the 32-bit fixed-point output field
(that is, if len6 + bit6 > 32), otherwise cleared
SS Cleared
Rn = FEXT Rx BY Ry
Rn = FEXT Rx BY <bit6>:<len6>
Function
Extracts a field from register Rx to register Rn. (See Figure 11-3.) The
output field is placed right-aligned in the fixed-point field of Rn. Its
length is determined by the len6 field in register Ry or by the immediate
len6 field in the instruction. The field is extracted from the fixed-point
field of Rx starting from a bit position determined by the bit6 field in reg-
ister Ry or by the immediate bit6 field in the instruction. Bits to the left of
the extracted field are set to 0 in register Rn. The floating-point extension
field of Rn (bits 7–0 of the 40-bit word) is set to all 0s. Bit6 and len6 can
take values between 0 and 63 inclusive, allowing for extraction of fields
ranging in length from 0 to 32 bits, and from bit positions ranging from 0
to off-scale left.
39 19 13 7 0
Ry len6 bit6
39 7 0
Rx extract field
39 7 0
Rn
Example
39 31 23 15 7 0
|-----abc|defghijk|lmn-----|--------|--------| Rx
\--------------/
len6 bits |
bit position bit6
(from reference point)
39 31 23 15 7 0
|00000000|00000000|00abcdef|ghijklmn|00000000| Rn
ASTATx/y Flags
SZ Set if the output operand is 0, otherwise cleared
SV Set if any bits are extracted from the left of the 32-bit fixed-point, input field
(that is, if len6 + bit6 > 32), otherwise cleared
SS Cleared
Rn = FEXT Rx BY Ry (SE)
Rn = FEXT Rx BY <bit6>:<len6> (SE)
Function
Extracts and sign-extends a field from register Rx to register Rn. The out-
put field is placed right-aligned in the fixed-point field of Rn. Its length is
determined by the len6 field in register Ry or by the immediate len6 field
in the instruction. The field is extracted from the fixed-point field of Rx
starting from a bit position determined by the bit6 field in register Ry or
by the immediate bit6 field in the instruction. The MSBs of Rn are
sign-extended by the MSB of the extracted field, unless the MSB is
extracted from off-scale left.
The floating-point extension field of Rn (bits 7–0 of the 40-bit word) is
set to all 0s. Bit6 and len6 can take values between 0 and 63 inclusive,
allowing for extraction of fields ranging in length from 0 to 32 bits and
from bit positions ranging from 0 to off-scale left.
Example
39 31 23 15 7 0
|-----abc|defghijk|lmn-----|--------|--------| Rx
\--------------/
len6 bits |
bit position bit6
(from reference point)
39 31 23 15 7 0
|aaaaaaaa|aaaaaaaa|aaabcdef|ghijklmn|00000000| Rn
\-------------------/
sign extension
ASTATx/y Flags
SZ Set if the output operand is 0, otherwise cleared
SV Set if any bits are extracted from the left of the 32-bit fixed-point input field
(that is, if len6 + bit6 > 32), otherwise cleared
SS Cleared
Rn = EXP Rx
Function
Extracts the exponent of the fixed-point operand in Rx. The exponent is
placed in the shf8 field in register Rn. The exponent is calculated as the
two’s-complement of:
# leading sign bits in Rx – 1
ASTATx/y Flags
SZ Set if the extracted exponent is 0, otherwise cleared
SV Cleared
SS Set if the fixed-point operand in Rx is negative (bit 31 is a 1), otherwise
cleared
Rn = EXP Rx (EX)
Function
Extracts the exponent of the fixed-point operand in Rx, assuming that the
operand is the result of an ALU operation. The exponent is placed in the
shf8 field in register Rn. If the AV status bit is set, a value of +1 is placed in
the shf8 field to indicate an extra bit (the ALU overflow bit). If the AV sta-
tus bit is not set, the exponent is calculated as the two’s-complement of:
# leading sign bits in Rx – 1
ASTATx/y Flags
SZ Set if the extracted exponent is 0, otherwise cleared
SV Cleared
SS Set if the exclusive OR of the AV status bit and the sign bit (bit 31) of the
fixed-point operand in Rx is equal to 1, otherwise cleared
Rn = LEFTZ Rx
Function
Extracts the number of leading 0s from the fixed-point operand in Rx.
The extracted number is placed in the bit6 field in Rn.
ASTATx/y Flags
SZ Set if the MSB of Rx is 1, otherwise cleared
SV Set if the result is 32, otherwise cleared
SS Cleared
Rn = LEFTO Rx
Function
Extracts the number of leading 1s from the fixed-point operand in Rx.
The extracted number is placed in the bit6 field in Rn.
ASTATx/y Flags
SZ Set if the MSB of Rx is 0, otherwise cleared
SV Set if the result is 32, otherwise cleared
SS Cleared
Rn = FPACK Fx
Function
Converts the IEEE 32-bit floating-point value in Fx to a 16-bit float-
ing-point value stored in Rn. The short float data format has an 11-bit
mantissa with a four-bit exponent plus sign bit. The 16-bit floating-point
numbers reside in the lower 16 bits of the 32-bit floating-point field.
The result of the FPACK operation is:
135 < exp1 Largest magnitude representation
120 < exp 135 Exponent is MSB of source exponent concatenated with the three LSBs
of source exponent; the packed fraction is the rounded upper 11 bits of
the source fraction
109 < exp 120 Exponent=0; packed fraction is the upper bits (source exponent – 110)
of the source fraction prefixed by zeros and the “hidden” 1; the packed
fraction is rounded
exp < 110 Packed word is all zeros
1 exp = source exponent sign bit remains the same in all cases
The short float type supports gradual underflow. This method sacrifices
precision for dynamic range. When packing a number which would have
underflowed, the exponent is set to zero and the mantissa (including “hid-
den” 1) is right-shifted the appropriate amount. The packed result is a
denormal which can be unpacked into a normal IEEE floating-point
number.
ASTATx/y Flags
SZ Cleared
SV Set if overflow occurs, cleared otherwise
SS Cleared
Fn = FUNPACK Rx
Function
Converts the 16-bit floating-point value in Rx to an IEEE 32-bit float-
ing-point value stored in Fx.
Result
0 < exp1 15 Exponent is the three LSBs of the source exponent prefixed by the MSB
of the source exponent and four copies of the complement of the MSB;
the unpacked fraction is the source fraction with 12 zeros appended
exp = 0 Exponent is (120 – N) where N is the number of leading zeros in the
source fraction; the unpacked fraction is the remainder of the source
fraction with zeros appended to pad it and the “hidden” 1 stripped away
1 exp = source exponent sign bit remains the same in all cases
The short float type supports gradual underflow. This method sacrifices
precision for dynamic range. When packing a number that would have
underflowed, the exponent is set to 0 and the mantissa (including “hid-
den” 1) is right-shifted the appropriate amount. The packed result is a
denormal, which can be unpacked into a normal IEEE floating-point
number.
ASTATx/y Flags
SZ Cleared
SV Cleared
SS Cleared
BITDEP Rx by Ry|<bitlen12>
Function
Deposits the bitlen number of bits (specified by Ry or bitlen) in the bit
FIFO from Rx. The bits read from Rx are right justified. Write pointer
incremented by the number of bit appended. To understand the BITDEP
instruction, it is easiest to observe how the data register and bit FIFO
behave during instruction execution. If the data register, Rx (40 Bits),
contains:
39 32
|--------|
31 23 15 7 0
|--------|----abcd|efghijkl|--------|
\-----------/
bitlen bits
And, the bit FIFO (64 Bits), before instruction execution contains:
63 55 47 39 32
|qwertyui|opasdfgh|lmn-----|--------|
^- BFFWRP – Write Pointer
31 23 15 7 0
|--------|--------|--------|--------|
Then, after instruction execution, the bit FIFO (64 Bits) contains:
63 55 47 39 32
|qwertyui|opasdfgh|lmnabcde|fghijkl-|
^- BFFWRP – Write Pointer
31 23 15 7 0
|--------|--------|--------|--------|
ASTATx/y Flags
SF Set if updated BFFWRP>= 32, otherwise cleared
SZ Cleared
SV Set if any bits are deposited to the left of the 32-bit fixed-point output field
(that is, if Ry or bitlen12 > 32), otherwise cleared
SS Cleared
Rn = BFFWRP
Function
Transfers write pointer value to Rn.
Examples
For bit FIFO examples, see the BITDEP instruction “BITDEP Rx by
Ry|<bitlen12>” on page 11-86.
ASTATx/y Flags
SZ Cleared
SV Cleared
SS Cleared
SF Not affected
BFFWRP = Rn|<data7>
Function
Updates write pointer from Rn or the immediate 7 bit data specified.
Only 7 least significant bits of Rn are written.
The maximum permissible data to be written into BFFWRP is 64.
Examples
For bit FIFO examples, see the BITDEP instruction “BITDEP Rx by
Ry|<bitlen12>” on page 11-86.
ASTATx/y Flags
SF is set if updated BFFWRP is greater than or equal to 32, cleared other-
wise. SV is set if the written value is greater than 64 else SV is cleared.
Flags SZ, SS are cleared.
SZ Cleared
SF Set if updated BFFWRP 32, otherwise cleared
SV Set if written <data7> is 64, otherwise cleared
SS Cleared
Rn = BITEXT Rx|<bitlen12>(NU)
Function
Extracts bitlen number of bits (specified by Rx or bitlen) from the bit
FIFO and places the data in Rn. The bits in Rn are right justified. Decre-
ments write pointer by same number as read bits. Remaining content of
the bit FIFO is left-shifted so that it is MSB aligned. The optional modi-
fier NU (no update) or query only, returns the requested number of bits as
usual but does not modify the bit FIFO or Write pointer. To understand
the BITEXT instruction, it is easiest to observe how the data register and
bit FIFO behave during instruction execution. If the bit FIFO (64 bits)
contains:
63 55 47 39
|abcdefgh|ijklmn--|--------
\-----------/ ^ - BFFWRP Pointer
bitlen bits
31 23 15 7 0
|--------|--------|--------|--------|--------|
ASTATx/y Flags
A value of more than 32 in the lower 6 bits of Rx or the bitlen immediate
field is prohibited and use of such a value sets SV. Attempts to get more
bits than those in the bit FIFO results in undefined pointer and bit FIFO.
SV is set in that case. SF is set if write pointer is greater than or equal to
32. SZ is set if output is zero, otherwise cleared. SS is cleared. Usage of the
NU modifier affects SV, SZ, and SS as described above and the SF flag is
not updated.
SZ Set if output is zero, otherwise cleared
SF Set if updated BFFWRP 32, otherwise cleared. If NU modifier is used SF
reflects the un-updated Write pointer status
SV Set if an attempt is made to extract more bits than those in bit FIFO, other-
wise cleared
SS Cleared
Multifunction Computations
Multifunction instructions are parallelized single ALU and Multiplier
instructions. For functional description and status flags and for parallel
Multiplier and ALU instructions input operand constraints see “ALU
Fixed-Point Computations” on page 11-1 and “Multiplier Fixed-Point
Computations” on page 11-49. This section lists all possible instruction
syntax options.
Note that the MRB register is not supported in multifunction
instructions.
Note that both instructions above are typically used for fixed- or float-
ing-point FFT butterfly calculations.
Short Compute
The following compute instructions are supported as type 2c instructions
in VISA space under the condition that one source and one destination
register must be identical.
Rn = Rn + Rx
Rn = Rn – Rx
Rn = PASS Rx
COMP (Rn, Rx)
Rn = NOT Rx
Rn = Rn AND Rx
Rn = Rx + 1
Rn = Rn OR Rx
Rn = Rx – 1
Rn = Rn XOR Rx
Rn = Rn * Rx (SSI)
Fn = Fn + Fx
Fn = Fn – Fx
Fn = Fn * Fx
COMP (Fn, Fx)
Fn = FLOAT Rx
This chapter lists the opcodes associated with the computation types
described in Chapter 11, Computation Types. Table 12-1 provides a sum-
mary of computation type bits and Table 12-2 provides a summary of the
shift immediate computation type.
Single Computation
Multiple Computation
Data Move
0 xxxxxx Fixed
Single-Function Opcodes
In single computation operations the compute field of a single-function
operation is made up of the following bit fields.
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
CU OPCODE Rn Rx Ry
Bits Description
CU Specifies the computation unit for the compute operation, where: 00=ALU,
01=Multiplier, and 10=Shifter
Opcode Specifies the compute operation
Rn Specifies register for the compute result
Rx Specifies register for the compute’s x operand
Ry Specifies register for the compute’s y operand
ALU Opcodes
Table 12-3 and Table 12-4 summarize the syntax and opcodes for the
fixed-point and floating-point ALU operations, respectively.
Rn = Rx + Ry 0000 0001
Rn = Rx – Ry 0000 0010
Rn = Rx + Ry + CI 0000 0101
Rn = Rx – Ry + CI – 1 0000 0110
Rn = Rx + CI 0010 0101
Rn = Rx + CI – 1 0010 0110
Rn = Rx + 1 0010 1001
Rn = Rx – 1 0010 1010
Rn = – Rx 0010 0010
Rn = Rx OR Ry 0100 0001
Fn = Fx + Fy 1000 0001
Fn = Fx – Fy 1000 0010
Multiplier Opcodes
This section describes the multiplier operations. These tables use the fol-
lowing symbols to indicate location of operands and other features:
• y = y-input (1 = signed, 0 = unsigned)
• x = x-input (1 = signed, 0 = unsigned)
• f = format (1 = fractional, 0 = integer)
• r = rounding (1 = yes, 0 = no)
Table 12-5 and Table 12-6 summarize the syntax and opcodes for the
fixed-point and floating-point multiplier operations.
Mod1 Modifiers
The Mod1 modifiers in Table 12-7 are optional modifiers. It is enclosed
in parentheses and consists of three or four letters that indicate whether:
• The x-input is signed (S) or unsigned (U).
• The y-input is signed or unsigned.
• The inputs are in integer (I) or fractional (F) format.
• The result written to the register file will be rounded-to-nearest
(R).
(SSI) _ _11 0_ _0
(SUI) _ _01 0_ _0
(USI) _ _10 0_ _0
(UUI) _ _00 0_ _0
(SSF) _ _11 1_ _0
(SUF) _ _01 1_ _0
(USF) _ _10 1_ _0
(UUF) _ _00 1_ _0
(SSFR) _ _11 1_ _1
(SUFR) _ _01 1_ _1
(USFR) _ _10 1_ _1
(UUFR) _ _00 1_ _1
Mod3 Modifiers
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Table 12-10 indicates how the opcode specifies the MR register, and Dreg
specifies the data register. D determines the direction of the transfer (0 =
to register file, 1 = to MR register).
0000 MR0F
0001 MR1F
0010 MR2F
0100 MR0B
0101 MR1B
0110 MR2B
Table 12-11 shows opcodes which are merged for shifter computa-
tions and shifter immediate operations. For shifter computations,
the entire 8-bit opcode is valid, for shift immediate (type 6 instruc-
tions) the upper 6 MSBs represent valid bits.
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 OPCODE DATA Rn Rx
Bits Description
Rx Specifies input register
Rn Specifies result register
Data Immediate <data7> <data8>, <bit6>:<len6>, <bitlen12>
For immediate data > 8bits (<bit6>:<len6>, <bitlen12>) refer to DATAEX field
Table 10-1 on page 10-1.
OPCODE Specifies the immediate operation
11 10 9 8 7 6 5 4 3 2 1 0
OP Rn Rx
OP Operation OP Operation
0000 Rn = Rn + Rx 1000 Fn = Fn + Fx
0001 Rn = Rn – Rx 1001 Fn = Fn – Fx
0010 Rn = PASS Rx 1010 Fn = FLOAT Rx
0011 COMP (Rn, Rx) 1011 COMP (Fn, Fx)
0100 Rn = NOT Rx 1100 Rn = Rn AND Rx
0101 Rn = Rx + 1 1101 Rn = Rn OR Rx
0110 Rn = Rx – 1 1110 Rn = Rn XOR Rx
0111 Rn = Rn * Rx (SSI) 1111 Fn = Fn * Fx
Multifunction Opcodes
Multifunction opcodes are described in the following sections.
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 00 0111 Rs Ra Rx Ry
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 00 1111 Fs Fa Fx Fy
Bits Description
Rx Specifies fixed-point X input ALU register
Ry Specifies fixed-point Y input ALU register
Rs Specifies fixed-point ALU subtraction result
Ra Specifies fixed-point ALU addition result
Fx Specifies floating-point X input ALU register
Fy Specifies floating-point Y input ALU register
Fs Specifies floating-point ALU subtraction result
Fa Specifies floating-point ALU addition result
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Bits Description
Rxa Specifies fixed-point X input ALU register (R11–8)
Rya Specifies fixed-point Y input ALU register (R15–12)
Rs Specifies fixed-point ALU subtraction result
Ra Specifies fixed-point ALU addition result
Fxa Specifies floating-point X input ALU register (F11–8)
Fya Specifies floating-point Y input ALU register (F15–12)
Fs Specifies floating-point ALU subtraction result
Fa Specifies floating-point ALU addition result
Bits Description
Fxm Specifies floating-point X input multiply register (F3–0)
Fym Specifies floating-point Y input multiply register (F7–4)
Fm Specifies floating-point multiply result register
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Bits Description
Rxa Specifies fixed-point X input ALU register (R11–8)
Rya Specifies fixed-point Y input ALU register (R15–12)
Ra Specifies fixed-point ALU result
Fxa Specifies floating-point X input ALU register (F11–8)
Fya Specifies floating-point Y input ALU register (F15–12)
Fa Specifies floating-point ALU result
Bits Description
Rm Specifies fixed-point multiply result register
Fxm Specifies floating-point X input multiply register (F3–0)
Fym Specifies floating-point Y input multiply register (F7–4)
Fm Specifies floating-point multiply result register
Table 12-12 provides the syntax and opcode for each of the parallel multi-
plier and ALU instructions for both fixed-point and floating-point
versions.
file is provided with the VisualDSP tools and can be found in the
VisualDSP/processortype/include directory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
CBUFEN RND32
Circular Buffer Addressing Enable Rounding for 32-Bit Float-
ing-Point Data Select
BDCST1
CSEL
Broadcast Register Loads Indexed With I1 Enable
Bus Master Code Selection
BDCST9
(ADSP-21368/2146x only)
Broadcast Register Loads Indexed With I9 Enable
PEYEN
Processor Element Y Enable
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
BR8
TRUNC
Bit-Reverse Addressing for I8
Truncation Rounding Mode
BR0
Select
SSE Bit-Reverse Addressing for I0
Fixed-point Sign Extension SRCU
Select Secondary MR Registers Enable
ALUSAT SRD1H
ALU Saturation Select Secondary Registers DAG1
IRPTEN High Enable
Global Interrupt Enable SRD1L
NESTM Secondary Registers DAG1
Low Enable
Nesting Multiple Interrupts Enable
SRD2H
SRRFL Secondary Registers DAG2
Secondary Registers Register File High Enable
Low Enable
SRD2L
SRRFH Secondary Registers DAG2
Secondary Registers Register File High Enable Low Enable
2 SRCU MRx Result Registers Swap Enable. Enables the swapping of the
MRF and MRB registers contents if set (= 1). This can be used as
foreground and background registers. In SIMD Mode the swapping
also performed between MSF and MSB registers.
This works similar to the data register swapping instructions
Rx<->Sx.
3 SRD1H Secondary Registers For DAG1 High Enable. Enables (use secondary
if set, = 1) or disables (use primary if cleared, = 0) secondary DAG1
registers for the upper half (I, M, L, B7–4) of the address generator.
4 SRD1L Secondary Registers For DAG1 Low Enable. Enables (use secondary
if set, = 1) or disables (use primary if cleared, = 0) secondary DAG1
registers for the lower half (I, M, L, B3–0) of the address generator.
5 SRD2H Secondary Registers For DAG2 High Enable. Enables (use secondary
if set, = 1) or disables (use primary if cleared, = 0) secondary DAG2
registers for the upper half (I, M, L, B15–12) of the address generator.
6 SRD2L Secondary Registers For DAG2 Low Enable. Enables (use secondary
if set, = 1) or disables (use primary if cleared, = 0) secondary DAG2
registers for the lower half (I, M, L, B11–8) of the address generator.
7 SRRFH Secondary Registers For Register File High Enable. Enables (use sec-
ondary if set, = 1) or disables (use primary if cleared, = 0) secondary
data registers for the upper half (R15-R8/S15-S8) of the computa-
tional units.
9–8 Reserved
10 SRRFL Secondary Registers For Register File Low Enable. Enables (use sec-
ondary if set, = 1) or disables (use primary if cleared, = 0) secondary
data registers for the lower half (R7-R0/S7-S0) of the computational
units.
12 IRPTEN Global Interrupt Enable. Enables (if set, = 1) or disables (if cleared,
= 0) all maskable interrupts.
13 ALUSAT ALU Saturation Select. Selects whether the computational units satu-
rate results on positive or negative fixed-point overflows (if 1) or
return unsaturated results (if 0).
14 SSE Fixed-point Sign Extension Select. Selects whether the core unit
sign-extend short-word, 16-bit data (if 1) or zero-fill the upper 16
bits (if 0).
15 TRUNC Truncation Rounding Mode Select. Selects whether the ALU or mul-
tiplier units round results with round-to-zero (if 1) or round-to-near-
est (if 0).
18–17 CSEL Bus Master Selection. These bits indicate whether the processor has
control of the external bus as follows:
00 = processor is bus master
01, 10, 11 = processor is not bus master.
The bus master condition (BM) indicates whether the SHARC pro-
cessor is the current bus master in EP shared systems (for example
ADSP-21368/2146x with shared SDRAM/DDR2 memory). To
enable the use of this condition, bits 17 and 18 of MODE1 must
both be zeros; otherwise the condition is always evaluated as false.
20–19 Reserved
31–25 Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
U64MAE CAFRZ
Cache Freeze
Unaligned 64-Bit Memory
Access Enable
IIRAE
Illegal IOP Register
Access Enable
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
EXTCADIS IRQ0E
External Cache Interrupt Request
Only Disable Sensitivity Select
TIMEN IRQ1E
Timer Enable
Interrupt Request
CADIS Sensitivity Select
Cache Disable IRQ2E
Interrupt Request Sensi-
tivity Select
0 IRQ0E Sensitivity Select. Selects sensitivity for the flag configured as IRQ0
as edge-sensitive (if set, = 1) or level-sensitive (if cleared, = 0).
1 IRQ1E Sensitivity Select. Selects sensitivity for the flag configured as IRQ1
as edge-sensitive (if set, = 1) or level-sensitive (if cleared, = 0).
2 IRQ2E Sensitivity Select. Selects sensitivity for the flag configured as IRQ2
as edge-sensitive (if set, = 1) or level-sensitive (if cleared, = 0).
3 Reserved
4 CADIS Cache Disable. This bit disables the instruction cache (if set, = 1)
or enables the cache (if cleared, = 0). If this bit is set, then the cach-
ing of instructions from internal memory and external memory
both are disabled (see bit 6).
5 TIMEN Timer Enable. Enables the core timer (starts, if set, = 1) or disables
the core timer (stops, if cleared, = 0).
6 EXTCADIS External Cache Only Disable. Disables the caching of the instruc-
tions coming from external memory (if set, =1) or enables caching
of the instructions coming from external memory (if cleared, = 0
and CADIS bit 4 = 0). This bit can only be used with the
ADSP-214xx products.
18–7 Reserved
19 CAFRZ Cache Freeze. Freezes the instruction cache (retain contents if set,
= 1) or thaws the cache (allow new input if cleared, = 0).
20 IIRAE Illegal I/O Processor Register Access Enable. Enables (if set, = 1)
or disables (if cleared, = 0) detection of I/O processor register
accesses. If IIRAE is set, the processor flags an illegal access by set-
ting the IIRA bit in the STKYx register.
31–22 Reserved
• Timer
• Interrupt mask and latch (for more information, see “Core Inter-
rupt Control” in Appendix B, Core Interrupt Control.
251 Set to 1 when a CALL pushes the return address under the situation when the
loop termination condition tests true in the cycle CALL is in the Address stage of
the pipeline OR when the push is result of servicing an interrupt.
Loop Registers
The loop registers are used set up and track loops in programs. These reg-
isters are described below.
Timer Registers
The SHARC processors contain a timer used to generate interrupts from
the core. These registers are described below.
and
Programs cannot change the output selects of the register
FLAGS
provide a new value in the same instruction. Instead, programs
must use two write instructions—the first to change the output
select of a particular FLAG pin, and the second to provide the new
value as shown in the example below.
bit set FLAGS FLG1O; /* set Flag1 IO output */
bit set FLAGS FLG1; /* set Flag1 level 1 */
30–0 (Even bits) FLGx FLAGx Value. Indicates the state of the FLAGx pin—high (if
set, = 1) or low (if cleared, = 0).
31–1 (Odd bits) FLGxO FLAGx Output Select. Selects the I/O direction for the
FLAGx pin, the flag is programmed as an output (if set, = 1)
or input (if cleared, = 0).
computation units. These registers also provide local storage for operands
and results in SIMD mode.
The S prefix on register names do not effect the 32-bit or 40-bit data
transfer; the naming convention determines how the ALU, multiplier, and
shifter treat the data and determines which processing element’s data reg-
isters are being used.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SF
Shifter Bit FIFO AZ
ALU Zero/Float-
SS ing-Point Underflow
Shifter Input Sign
AV
SZ ALU Overflow
Shifter Zero
AN
SV ALU Negative
Shifter Overflow
AC
AF ALU Fixed-Point Carry
ALU Floating-Point Operation
AS
MI ALU Sign Input
Multiplier Floating-Point Invalid Operation (for ABS and MANT)
MU AI
Multiplier Floating-Point Underflow ALU Floating-Point
Invalid Operation
MV
Multiplier Overflow MN
Multiplier Negative
Iflatency
these registers are loaded manually, there is a one cycle effect
before the new value in the register can be used in a
ASTATx
conditional instruction.
1 AV ALU Overflow. Indicates if the last ALU operation’s result overflowed (if
set, = 1) or did not overflow (if cleared, = 0). The ALU updates AV for all
fixed-point and floating-point ALU operations. For fixed-point results,
the processor sets AV and the AOS bit in the STKYx/y register when the
XOR of the two most significant bits (MSBs) is a 1. For floating-point
results, the processor sets AV and the AVS bit in the STKYx/y register
when the rounded result overflows (unbiased exponent > 127).
2 AN ALU Negative. Indicates if the last ALU operation’s result was negative (if
set, = 1) or positive (if cleared, = 0). The ALU updates AN for all
fixed-point and floating-point ALU operations.
3 AC ALU Fixed-Point Carry. Indicates if the last ALU operation had a carry
out of the MSB of the result (if set, = 1) or had no carry (if cleared, = 0).
The ALU updates AC for all fixed-point operations. The processor clears
AC during the fixed-point logic operations: PASS, MIN, MAX, COMP,
ABS, and CLIP. The ALU reads the AC flag for the fixed-point accumu-
late operations: Addition with Carry and Fixed-point Subtraction with
Carry.
4 AS ALU Sign Input (for ABS and MANT). Indicates if the last ALU ABS or
MANT operation’s input was negative (if set, = 1) or positive (if cleared,
= 0). The ALU updates AS only for fixed- and floating-point ABS and
MANT operations. The ALU clears AS for all operations other than ABS
and MANT.
Table A-6. ASTATx and ASTATy Register Bit Descriptions (RW) (Cont’d)
Bit Name Description
Table A-6. ASTATx and ASTATy Register Bit Descriptions (RW) (Cont’d)
Bit Name Description
Table A-6. ASTATx and ASTATy Register Bit Descriptions (RW) (Cont’d)
Bit Name Description
12 SZ Shifter Zero. Indicates if the last shifter operation’s result was zero
(if set, = 1) or non-zero (if cleared, = 0). The shifter updates SZ for all
shifter operations. The processor also sets SZ if the shifter operation per-
forms a bit test on a bit outside of the 32-bit fixed-point field.
13 SS Shifter Input Sign. Indicates if the last shifter operation’s input was nega-
tive (if set, = 1) or positive (if cleared, = 0). The shifter updates SS for all
shifter operations.
14 (RO) SF Shifter Bit FIFO. Indicates the current value of Bit FIFO Write Pointer.
SF is set when write pointer is greater than or equal to 32, otherwise it is
cleared.
(for all ADSP-214xx processors only)
17–15 Reserved
18 BTF Bit Test Flag for System Registers. Indicates if the system register bit is
true (if set, = 1) or false (if cleared, = 0). The processor sets BTF when the
bit(s) in a system register and value in the Bit Tst instruction match. The
processor also sets BTF when the bit(s) in a system register and value in
the Bit Xor instruction match.
23–19 Reserved
31–24 CACC Compare Accumulation Shift Register. Bit 31 of CACC indicates which
operand was greater during the last ALU compare operation: X input (if
set, = 1) or Y input (if cleared, = 0). The other seven bits in CACC form a
right-shift register, each storing a previous compare accumulation result.
With each new compare, the processor right shifts the values of CACC,
storing the newest value in bit 31 and the oldest value in bit 24.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0
CB7s
LSEM
DAG1 Circular Buffer 7
Loop Stack Empty
Overflow
LSOV
CB15S
Loop Stack Overflow
DAG2 Circular Buffer 15
SSEM Overflow
Status Stack Empty
IIRA
SSOV
Status Stack Overflow Illegal Access Occurred
U64MA
PCEM
Unaligned 64-Bit Memory
PC Stack Empty Access
PCFL
PC Stack Full
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
AUS
MIS ALU Floating-Point
Multiplier Floating-Point Invalid Operation Underflow
MUS AVS
Multiplier Floating-Point Underflow ALU Floating-Point
Overflow
MVS
Multiplier Floating-Point Overflow AOS
ALU Fixed-Point
MOS Overflow
Multiplier Fixed-Point Overflow
AIS
ALU Floating-Point Invalid Operation
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
MIS AUS
Multiplier Floating-Point ALU Floating-Point Underflow
Invalid Operation
AVS
ALU Floating-Point Overflow
MUS
Multiplier Floating-Point Underflow AOS
ALU Fixed-Point Overflow
MVS
Multiplier Floating-Point Overflow AIS
ALU Floating-Point
MOS Invalid Operation
Multiplier Fixed-Point Overflow
0 (WC) AUS ALU Floating-Point Underflow. A sticky indicator for the ALU AZ bit.
For more information, see “AZ” on page A-17.
1 (WC) AVS ALU Floating-Point Overflow. A sticky indicator for the ALU AV bit.
For more information, see “AV” on page A-17.
2 (WC) AOS ALU Fixed-Point Overflow. A sticky indicator for the ALU AV bit. For
more information, see “AV” on page A-17.
4–3 Reserved
5 (WC) AIS ALU Floating-Point Invalid Operation. A sticky indicator for the ALU
AI bit. For more information, see “AI” on page A-18.
6 (WC) MOS Multiplier Fixed-Point Overflow. A sticky indicator for the multiplier
MV bit. For more information, see “MV” on page A-18.
7 (WC) MVS Multiplier Floating-Point Overflow. A sticky indicator for the multi-
plier MV bit. For more information, see “MV” on page A-18.
8 (WC) MUS Multiplier Floating-Point Underflow. A sticky indicator for the multi-
plier MU bit. For more information, see “MU” on page A-19.
Table A-7. STKYx and STKYy Register Bit Descriptions (RW) (Cont’d)
Bit Name Description
9 (WC) MIS Multiplier Floating-Point Invalid Operation. A sticky indicator for the
multiplier MI bit. For more information, see “MI” on page A-19.
16–10 Reserved
19 IIRA Illegal IOP Register Access. Indicates if set (= 1) the core had accessed
the IOP register space or not.
21 (RO) PCFL PC Stack Full. Indicates if the PC stack is full (if 1) or not full (if 0)—
Not a sticky bit, cleared by a Pop.
22 (RO) PCEM PC Stack Empty. Indicates if the PC stack is empty (if 1) or not empty
(if 0)—Not sticky, cleared by a push.Set by default.
23 (RO) SSOV Status Stack Overflow. Indicates if the status stack is overflowed (if 1)
or not overflowed (if 0)—sticky bit.
24 (RO) SSEM Status Stack Empty. Indicates if the status stack is empty (if 1) or not
empty (if 0)—not sticky, cleared by a push. Set by default.
25 (RO) LSOV Loop Stack Overflow. Indicates if the loop counter stack and loop stack
are overflowed (if 1) or not overflowed (if 0)—sticky bit.
Table A-7. STKYx and STKYy Register Bit Descriptions (RW) (Cont’d)
Bit Name Description
26 (RO) LSEM Loop Stack Empty. Indicates if the loop counter stack and loop stack
are empty (if 1) or not empty (if 0)—not sticky, cleared by a push. Set
by default.
31–27 Reserved
Miscellaneous Registers
The following sections provide descriptions of the misc ella no us
registers.
4 SYSRST Software Reset. Resets the processor in the same manner as the soft-
ware reset bit in the SYSCTL register. The SYSRST bit must be
cleared by the emulator.
0 = Normal operation
1 = Reset
5 ENBRK- Enable the Emulation Status Pin. Enables the EMU pin operation
OUT Whenever core enters emulation space it is notified by assertion of the
EMU pin to the emulator.
0 = EMU pin at high impedance state
1 = EMU pin enabled
6 IOSTOP Stop IOP DMAs in EMU Space. Disables all DMA requests when the
processors are in emulation space. Data that is currently in the external
port, link port, or SPORT DMA buffers is held there unless the inter-
nal DMA request was already granted. IOSTOP causes incoming data
to be held off and outgoing data to cease. Because SPORT receive data
cannot be held off, it is lost and the overrun bit is set.
0 = I/O continues
1 = I/O stops
7 Reserved
8 NEGPA11 Negate program memory data address breakpoint. Enable breakpoint
events if the address is greater than the end register value OR less than
the start register value. This function is useful to detect index range
violations in user code.
0 = Disable breakpoint
1 = Enable breakpoint
9 NEGDA1 Negate data memory address breakpoint #1 See NEGPA1 bit descrip-
tion.
10 NEGDA2 Negate data memory address breakpoint #2. See NEGPA1 bit
description.
11 NEGIA1 Negate instruction address breakpoint #1. See NEGPA1 bit descrip-
tion.
12 NEGIA2 Negate instruction address breakpoint #2. See NEGPA1 bit descrip-
tion.
13 NEGIA3 Negate instruction address breakpoint #3. See NEGPA1 bit descrip-
tion.
14 NEGIA4 Negate instruction address breakpoint #4. See NEGPA1 bit descrip-
tion.
16 Reserved
18 ENBDA Enable data memory address breakpoints. See ENBPA bit descrip-
tion.
20–21 Reserved
23–22 PA1MODE PA1 breakpoint triggering mode. Trigger on the following conditions:
00 = Breakpoint is disabled
01 = WRITE accesses only
10 = READ accesses only
11 = Any access
25–24 DA1MODE DA1 breakpoint triggering mode. See PA1MODE bit description.
27–26 DA2MODE DA2 breakpoint triggering mode. See PA1MODE bit description.
29–28 IO1MODE IO1 breakpoint triggering mode. See PA1MODE bit description.
31–30 Reserved
33 Reserved
34 NOBOOT No boot on reset. Forces the processor to not boot from any external
DMA source, instead halt the core at the internal reset vector location.
If this bit is set the emulator has control over the DSP and the external
boot is aborted during debug sessions.
0 = Disable
1 = Force no boot mode
35 Reserved
36 BHO Buffer Hang Override. The global BHO control bit overrides all buf-
fer hang disable bits in the peripheral’s control register.
0 = No effect
1 = Override peripheral BHD operation
37 Reserved
1 Instruction address and program memory breakpoint negates have an effect latency of 4 core
clock cycles.
1 EMUREADY Indicates that core has finished executing the previous emu-
lator instructions
2 INIDLE Indicates that core was in IDLE prior to the latest emulator
interrupt
7–4 Reserved
before a read of the register returns the updated value. This is referred to
as a read latency of one cycle.
Note that the effect latency and read latency are counted in a number of
processor cycles rather than instruction cycles. Therefore, there may be sit-
uations when the effect latency may not be observed, such as when the
pipeline stalls or when an interrupt breaks the normal sequence of
instructions. Here, the effect latency and the read latency are interpreted
as the maximum number of instructions, which is unaffected by the new
settings after a write to one register.
In the SHARC 5-stage pipeline products, effect latencies were intention-
ally added in direct core writes to various registers for backward
compatibility to the 3-stage pipeline products (though these latencies are
not necessitated by the architecture as such). In some cases it is done by
adding stall(s) to the pipeline, whereas in other cases, the execution (actual
write-back to concerned registers) is delayed.
Table A-10 and Table A-11 summarize the number of extra cycles
(latency) for a write to take effect (effect latency) and for a new value to
appear in the register (read latency). A 0 (zero) indicates that the write
takes effect or appears in the register on the next cycle after the write
instruction is executed, and a 1 indicates one extra cycle.
PC Execute address 24 -- --
1 All bits except CAFRZ, U64MAE, IIRAE have one cycle of effect latency.
2 Bits 29–20 are the various mask pointer bits. These bits have one cycle of read latency. Other bits
do not have read latency.
• When the contents of the ASTAT registers are updated by any opera-
tion other than a compute operation, the following instruction
stalls for a cycle, if it performs a conditional branch and the condi-
tion is anything other than NOT LCE. An example is when ASTAT is
explicitly loaded or when the sequencer branches to, or returns
from an ISR involving a PUSH/POP of the status stack.
• The effect latency in the case of a FLAGS register is felt when a con-
ditional instruction dependent on the FLAGS register values is
executed after modifications to the FLAGS register.
BIT SET FLAGS 0x1; /* set FLAG0 */
IF FLAG0_IN R0 = R0+1; /* conditional compute – aborts */
IF FLAG0_IN R0 = R0+1; /* conditional compute – executes */
Interrupt Registers
This section provides information on the registers that are used to config-
ure and control interrupts.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SFT3I P5I
User Software Interrupt 3 Programmable Interrupt 5
SFT2I P14I
User Software Interrupt 2 Programmable Interrupt 14
SFT1I
P15I
User Software Interrupt 1
Programmable Interrupt 15
SFT0I
P16I
User Software Interrupt 0
Programmable Interrupt 16
EMULI
Emulator Interrupt CB7I
FLTII DAG1 Circular Buffer 7I
Floating-point Invalid Operation Overflow
FLTUI CB15I
Floating-point Underflow DAG1 Circular Buffer 15
Overflow
FLTOI
Floating-point Overflow TMZLI
Timer Expired Low Priority
FIXI
Fixed-point Overflow
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
P4I EMUI
Programmable Interrupt 4 Emulator Interrupt
P3I RSTI
Programmable Interrupt 3 Reset
P2I IICDI
Programmable Interrupt 2 Illegal Input Condition Detected
P1I SOVFI
Programmable Interrupt 1 Stack Full/Overflow
P0I TMZHI
Programmable Interrupt 0 Timer Expired High Priority
IRQ0I SPERRI
IRQ0_I Hardware Interrupt SPORT Error Interrupt
IRQ1I BKPI
IRQ1_I Hardware Interrupt Hardware Breakpoint Interrupt
IRQ2I
IRQ2_I Hardware Interrupt
0 (RO) EMUI Emulator Interrupt. An EMUI occurs when the external emulator trig-
gers an interrupt or the core hits a emulator breakpoint.
Note this interrupt has highest priority, it is read-only and non-mas-
kable
1 (RO) RSTI Reset Interrupt. An RSTI occurs as an external device asserts the
RESET pin or after a software reset (SYSCTL register). Note this inter-
rupt is read-only and non-maskable.
4 TMZHI Core Timer Expired High Priority. A TMZHI occurs when the timer
decrements to zero. Note that this event also triggers a TMZLI. Since
the timer expired event (TCOUNT decrements to zero) generates two
interrupts, TMZHI and TMZLI, programs should unmask the timer
interrupt with the desired priority and leave the other one masked.
7 Reserved
22 TMZLI Core Timer Expired (Low Priority) Interrupt. A TMZLI occurs when
the timer decrements to zero. (Refer to TMZHI)
23 FIXI Fixed-Point Overflow Interrupt. Refer to the status registers for the
execution units (ASTATx/y, STKYx/y).
24 FLTOI Floating-Point Overflow Interrupt. Refer to the status registers for the
execution units (ASTATx/y, STKYx/y).
areThefor interrupt
bits in the
MSKP register, and the entire
LIRPTL register
controller use only. Modifying these bits interferes
IMASKP
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
P12IMSK
Programmable Interrupt 12
P18IMSKP
Mask
Programmable Interrupt 18 Mask P13IMSK
Pointer
P17IMSKP Programmable Interrupt 13
Mask
Programmable Interrupt 17 Mask Pointer
P17IMSK
P13MASKP
Programmable Interrupt 17
Programmable Interrupt 13 Mask Pointer Mask
P12IMSKP P18IMSK
Programmable Interrupt 12 Mask Pointer Programmable Interrupt 18
P11IMSKP Mask
Programmable Interrupt 11 Mask Pointer P6IMSKP
P10IMSKP Programmable Interrupt 6
Mask Pointer
Programmable Interrupt 10 Mask Pointer
P7IMSKP
P9IMSKP Programmable Interrupt 7
Programmable Interrupt 9 Mask Pointer Mask Pointer
P8IMSKP
Programmable Interrupt 8 Mask Pointer
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
P6I
P11IMASK Programmable Interrupt 6
Programmable Interrupt 11 Mask P7I
P10IMASK Programmable Interrupt 7
Programmable Interrupt 10 Mask P8I
P9IMSK
Programmable Interrupt 8
Programmable Interrupt 9 Mask P9I
P8IMSK
Programmable Interrupt 9
Programmable Interrupt 8 Mask P10I
P7IMSK Programmable Interrupt 10
Programmable Interrupt 7 Mask P11I
P6IMSK Programmable Interrupt 11
Programmable Interrupt 6 Mask P12I
P18I Programmable Interrupt 12
Programmable Interrupt 18
P13I
P17I
Programmable Interrupt 13
Programmable Interrupt 17
31–30 Reserved
Memory-Mapped Registers
This section describes all IOP core registers which are memory mapped in
the core clock domain.
writes
The
to the
register has an effect latency of 1 cycle. If a program
SYSCTL
or register before it access external
SYSCTL BRKCTL
memory it must perform at least two non external access before the
external access.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Bits 29–16
Processor-specific bit set-
tings. See product-specific
hardware reference.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
SRST
IMDW3
Software Reset
Internal Memory Block 3 Data Width IIVT
IMDW2 Internal Interrupt Vector
Internal Memory Block 2 Data Width Table
IMDW1 DCPR
Internal Memory Block 1 Data Width DMA Channel Priority
IMDW0 Rotating
Internal Memory Block 0 Data Width
0 SRST Software Reset. When set, this bit resets the processor and the processor
responds to the non-maskable RSTI interrupt and clears (=0)
SRST. Unlike the HW reset, the PLL and Power Management
(PMCTL register) are not reset. The part does also boot after SW
reset. After one core clock cycle, the registers are put in the default set-
tings (effect latency). The RESETOUT pin is asserted for 2 PCLK cycles.
0 = No software reset
1 = Software reset
1 Reserved
2 IIVT Internal Interrupt Vector Table. If bit set (=1), IVT starts at internal
RAM address, if cleared (=0) at internal ROM address. The default
IIVT bit setting is enabled (=1) with any valid boot mode
(BOOT_CFGx pins).
If the reserved boot mode is selected, IIVT bit is cleared (= 0).
6–3 Reserved
7 DCPR DMA Channel Priority Rotating. This bit enables or disables priority
rotation among DMA channels on the DMA peripheral bus (IOD or
IOD0). Permits core writes.
0 = Arbiter uses fixed priority
1 = Arbiter uses rotating priority
8 Reserved
9 IMDW0 Internal Memory Data Width 0. Selects the data access size for internal
memory block0 as 48- or 32-bit data. Permits core writes.
0 = Data bus width is 32 bits
1 = Data bus width is 48 bits
10 IMDW1 Internal Memory Data Width 1. Selects the data access size for internal
memory block1 as 48- or 32-bit data. Permits core writes.
0 = Data bus width is 32 bits
1 = Data bus width is 48 bits
11 IMDW2 Internal Memory Data Width 2. Selects the data access size for internal
memory block2 as 48- or 32-bit data. Permits core writes.
0 = Data bus width is 32 bits
1 = Data bus width is 48 bits
12 IMDW3 Internal Memory Data Width 3. Selects the data access size for internal
memory block3 as 48- or 32-bit data. Permits core writes.
0 = Data bus width is 32 bits
1 = Data bus width is 48 bits
15–13 Reserved
31–30 Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
NEGIA4
ENBIO0
Negate Instruction Address
IOD0 (Peripheral DMA Bus)
Breakpoint #4
Breakpoint Enable
NEGIO1
ENBIO1
Negate I/O Address
IOD1 (EP DMA Bus) Breakpoint Breakpoint #1
Enable
ENBPA
UMODE
Enable Program Memory
Enable User Mode Breakpoint
Address Breakpoints
ANDBKP ENBDA
AND composite breakpoints Enable Data Memory
ENBIA Breakpoints
Enable Instruction Address Breakpoints
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
9–8 Reserved
18 Reserved
23–22 Reserved
31–28 Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
STATIO1
DMA EP Address Breakpoint
Status
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
EEMUINENS
STATPA
EEMUIN Interrupt Enable
Program Memory Break-
EEMUENS point Status
Enhanced Emulation Feature Enable STATDA0
Status
DM Breakpoint #0 Status
EEMUINFULL STATDA1
EEMUIN FIFO Full Status DM Breakpoint #1 Status
EEMUOUTFULL STATIA0
EEMUOUT FIFO Full Status Instruction Breakpoint #0 Status
EEMUOUTRDY STATIA1
EEMUOUT Valid Data Status Instruction Breakpoint #1 Status
EEMUOUTIRQEN STATIA2
EEMUOUT Interrupt Enable Instruction Breakpoint #2 Status
STATIO0 STATIA3
DMA Peripheral Address Break- Instruction Breakpoint #3 Status
point Status
Register Listing
Table A-17 lists all available core non memory-mapped registers and their
reset values. Table A-19 on page A-56 lists all memory-mapped I/O regis-
ters, their reset values and their addresses.
1 All PSAx registers are cleared for the ADSP-2137x products only.
Interrupt Acknowledge
When an interrupt is triggered, the sequencer typically finishes the current
instruction and jumps to the IVT (interrupt vector table). From IVT the
address then typically vectors to the ISR routine. The sequencer jumps
into this routine, performs program execution and then exits the routine
by executing the RTI (return from interrupt) instruction. However this
rule does not apply for all cases. There are two interrupt acknowledge
mechanisms used in an ISR Routine for the core are shown below and in
Table B-1:
• RTI instruction
• Clear status bit + RTI instruction
The Arithmetic exception unit (computation units) is designed such that
in order to terminate correctly, the status register must be read to identify
the source of the interrupt. Afterwards, programs must write into that sta-
tus bit (clear mechanism) in order to terminate the interrupt properly.
Ifunwanted
the acknowledge mechanism rules are not followed correctly,
and sporadic interrupts will occur.
Interrupt Priority
The core related interrupts have a fixed priority and cannot be changed (as
the programmable interrupts for peripherals can).
7 0x1C Reserved
The hidden bit effectively increases the precision of the floating-point sig-
nificand to 24 bits from the 23 bits actually stored in the data format. It
also ensures that the significand of any number in the IEEE normalized
number format is always greater than or equal to one and less than two.
31 30 23 22 0
s e7 s s s e0 1 . f22 s s s f0
The IEEE Standard also provides several special data types in the sin-
gle-precision floating-point format:
• An exponent value of 255 (all ones) with a non-zero fraction is a
not-a-number (NAN). NANs are usually used as flags for data flow
control, for the values of uninitialized variables, and for the results
of invalid operations such as 0 * .
• Infinity is represented as an exponent of 255 and a zero fraction.
Note that because the fraction is signed, both positive and negative
infinity can be represented.
• Zero is represented by a zero exponent and a zero fraction. As with
infinity, both positive zero and negative zero can be represented.
The IEEE single-precision floating-point data types supported by the pro-
cessor and their interpretations are summarized in Table C-1.
Extended-Precision Floating-Point
Format
The extended-precision floating-point format is 40 bits wide, with the
same 8-bit exponent as in the IEEE standard format but with a 32-bit sig-
nificand. This format is shown in Figure C-2. In all other respects, the
extended-precision floating-point format is the same as the IEEE standard
format.
39 38 31 30 0
s e7 s s s e0 1 . f30 s s s f0
15 14 11 10 0
s e3 s s s e0 1 . f10 s s s f0
120 < exp 135 Exponent is most significant bit (MSB) of source exponent concatenated
with the three least significant bits (LSBs) of source exponent. The
packed fraction is the rounded upper 11 bits of the source fraction.
109 < exp 120 Exponent = 0. Packed fraction is the upper bits (source exponent – 110)
of the source fraction prefixed by zeros and the “hidden” one. The
packed fraction is rounded.
0 < exp 15 Exponent is the 3 LSBs of the source exponent prefixed by the MSB of the
source exponent and four copies of the complement of the MSB. The
unpacked fraction is the source fraction with 12 zeros appended.
exp = 0 Exponent is (120 – N) where N is the number of leading zeros in the source
fraction. The unpacked fraction is the remainder of the source fraction with
zeros appended to pad it and the “hidden” one stripped away.
The short float type supports gradual underflow. This method sacrifices
precision for dynamic range. When packing a number which would have
underflowed, the exponent is set to zero and the mantissa (including
hidden 1) is right-shifted the appropriate amount. The packed result is a
denormal, which can be unpacked into a normal IEEE floating-point
number.
Fixed-Point Formats
The processor supports two 32-bit fixed-point formats—fractional and
integer. In both formats, numbers can be signed (two’s-complement) or
unsigned. The four possible combinations are shown in Figure C-4. In the
fractional format, there is an implied binary point to the left of the most
significant magnitude bit. In integer format, the binary point is under-
stood to be to the right of the LSB. Note that the sign bit is negatively
weighted in a two’s-complement format.
If one operand is signed and the other unsigned, the result is signed. If
both inputs are signed, the result is signed and automatically shifted left
one bit. The LSB becomes zero and bit 62 moves into the sign bit posi-
tion. Normally bit 63 and bit 62 are identical when both operands are
signed. (The only exception is full-scale negative multiplied by itself.)
Thus, the left-shift normally removes a redundant sign bit, increasing the
precision of the most significant product. Also, if the data format is frac-
tional, a single bit left-shift renormalizes the MSP to a fractional format.
The signed formats with and without left-shifting are shown in
Figure C-5.
ALU outputs have the same width and data format as the inputs. The
multiplier, however, produces a 64-bit product from two 32-bit inputs. If
both operands are unsigned integers, the result is a 64-bit unsigned
integer. If both operands are unsigned fractions, the result is a 64-bit
unsigned fraction. These formats are shown in Figure C-5.
The multiplier has an 80-bit accumulator to allow the accumulation of
64-bit products. For more information on the multiplier and accumula-
tor, see “Multiplier” on page 3-13.
BIT 31 30 29 2 1 0
SIGNED
INTEGER WEIGHT -231 230 229 s s s 22 21 20
s
SIGN
BIT
BINARY
POINT
BIT 31 30 29 2 1 0
SIGNED
FRACTIONAL WEIGHT -2-0 2-1 2-2 s s s 2-29 2-30 2-31
s
SIGN
BIT
BINARY
POINT
BIT 31 30 29 2 1 0
UNSIGNED
INTEGER 20s
WEIGHT 231 230 229 s s s 22 21
BINARY
POINT
BIT 31 30 29 2 1 0
UNSIGNED
FRACTIONAL WEIGHT .2-1 2-2 2-3 s s s 2-30 2-31 2-32
BINARY POINT
BINARY
POINT
BIT 63 62 61 2 1 0
UNSIGNED
FRACTIONAL
WEIGHT 2-1 2-2 2-3 s s s 2-62 2-63 2-64
s
BINARY
POINT
SIGNED INTEGER,
NO LEFT SHIFT
WEIGHT -263 262 261 s s s 22 21 20
s
SIGN
BIT
BINARY
POINT
BIT 63 62 61 2 1 0
SIGNED FRACTIONAL,
WEIGHT -20 2-1 2-2 s s s 2-61 2-62 2-63
WITH LEFT SHIFT s
SIGN
BIT
BINARY
POINT
Alternate Registers.
See index registers on page G-7.
Arithmetic Logic Unit (ALU).
This part of a processing element performs arithmetic and logic operations
on fixed-point and floating-point data.
Asynchronous Transfers.
Communications in which data can be transmitted intermittently rather
than in a steady stream.
Barrel Shifter.
This part of a processing element completes logical shifts, arithmetic
shifts, bit manipulation, field deposit, and field extraction operations on
32-bit operands. Also, the shifter can derive exponents.
Base Address.
The starting address of a circular buffer to which the DAG wraps around.
This address is stored in a DAG Bx register.
Base Register.
A base (Bx) register is a data address generator (DAG) register that sets up
the starting address for a circular buffer.
Bit-Reverse Addressing.
The data address generator (DAG) provides a bit-reversed address during
a data move without reversing the stored address.
Boot Modes.
The boot mode determines how the processor starts up (loads its initial
code). The ADSP-2136x processors can boot from its SPI port or through
its parallel port via an EPROM.
Broadcast Data Moves.
The data address generator (DAG) performs dual data moves to comple-
mentary registers in each processing element to support SIMD mode.
Bus Slave or Slave Mode.
The ADSP-21368/ADSP-2146x processors can be a bus slave to another
processor. The current processor becomes a bus slave when the BR signal of
the requester is asserted.
Cache Entry.
The smallest unit of memory that is transferred to/from the next level of
memory from/to a cache as a result of a cache miss.
Cache Hit.
A memory access that is satisfied by a valid, present entry in the cache.
Cache Miss.
A memory access that does not match any valid entry in the cache.
DMA Chaining.
The processor supports chaining together multiple DMA sequences. In
chained DMA, the I/O processor loads the next transfer control block
(DMA parameters) into the DMA parameter registers when the current
DMA finishes and auto-initializes the next DMA sequence.
DMA Parameter Registers.
These registers function similarly to data address generator registers, set-
ting up a memory access process. These registers include internal index
registers, internal modify registers, count registers, chain pointer registers,
external index registers, external modify registers, and external count
registers.
DMA TCB Chain Loading.
This is the process that the I/O processor uses for loading the TCB of the
next DMA sequence into the parameter registers during chained DMA.
Edge-Sensitive Interrupt.
The processor detects this type of interrupt if the input signal is high
(inactive) on one cycle and low (active) on the next cycle when sampled on
the rising edge of clock.
Endian Format, Little Versus Big.
The processor uses big-endian format—moves data starting with most-sig-
nificant-bit and finishing with least-significant-bit—in almost all
instances. There are some exceptions (such as serial port operations) which
provide both little-endian and big-endian format support to ensure their
compatibility with different devices.
Flag Update.
The processor’s update to status flags occurs at the end of the cycle in
which the status is generated and is available on the next cycle.
General-Purpose Input/Output Pins.
See programmable flag pins.
Harvard Architecture.
Processor’s use memory architectures that have separate buses for program
and data storage. The two buses let the processor get a data word and an
instruction simultaneously.
I/O Processor Register.
One of the control, status, or data buffer registers of the processor's
on-chip I/O processor.
IDLE.
An instruction that causes the processor to cease operations, holding its
current state until an interrupt occurs. Then, the processor services the
interrupt and continues normal execution.
Index Registers.
An index register is a data address generator (DAG) register that holds an
address and acts as a pointer to memory.
Indirect Branches.
These are JUMP or CALL instructions that use a dynamic—changes at run-
time—address that comes from the PM data address generator.
Inexact Flags.
An exception flag whose bit position is inexact.
Input Clock.
Device that generates a steady stream of timing signals to provide the fre-
quency, duty cycle, and stability to allow accurate internal clock
multiplication via the phase locked loop (PLL) module.
Interleaved Data.
SIMD mode requires a special memory layout since the implicit modifier
is 1 or 2 based on NW or SW addresses. This then requires data to be in
an interleaved organization in the memory layout.
Internal Memory Space.
Internal memory space refers to the processor’s on-chip SRAM and mem-
ory-mapped registers.
Interrupts.
Subroutines in which a runtime event (not an instruction) triggers the exe-
cution of the routine.
JTAG Port.
This port supports the IEEE standard 1149.1 Joint Test Action Group
(JTAG) standard for system test. This standard defines a method for seri-
ally scanning the I/O status of each component in a system. This interface
is also used for processor debug.
Jumps.
Program flow transfers permanently to another part of program memory.
Latency.
Latency of memory access is the time between when an address is posted
on the address bus and the core receives data on the corresponding data
bus.
Length Registers.
A length register is a data address generator (DAG) register that sets up the
range of addresses a circular buffer.
Level-Sensitive Interrupts.
The processor detects this type of interrupt if the signal input is low
(active) when sampled on the rising edge of clock.
Loops.
One sequence of instructions executes several times with zero overhead.
Memory Blocks and Banks.
The processor’s internal memory is divided into blocks that are each asso-
ciated with different data address generators. The processor’s external
memory spaces is divided into banks, which may be addressed by either
data address generator.
Modified Addressing.
The DAG generates an address that is incremented by a value or a register.
Modify Instruction.
The data address generator (DAG) increments the stored address without
performing a data move.
Modify Registers.
A modify register is a data address generator (DAG) register that provides
the increment or step size by which an index register is pre- or post-modi-
fied during a register move.
Multifunction Computations.
Using the many parallel data paths within its computational units, the
processor supports parallel execution of multiple computational instruc-
tions. These instructions complete in a single cycle, and they combine
parallel operation of the multiplier and the ALU or dual ALU functions.
The multiple operations perform the same as if they were in correspond-
ing single-function computations.
Multiplier.
This part of a processing element does floating-point and fixed-point mul-
tiplication and executes fixed-point multiply/add and multiply/subtract
operations.
Nonzero numbers.
Nonzero, finite numbers are divided into two classes: normalized and
denormalized.
Neighbor Data Registers.
In long word addressed accesses, the processor moves data to or from two
neighboring data registers. The least-significant-32 bits moves to or from
the explicit (named) register in the neighbor register pair. In forced long
word accesses (normal word address with LW mnemonic), the processor
converts the normal word address to long word, placing the even normal
word location in the explicit register and the odd normal word location in
the other register in the neighbor pair.
Peripherals.
This refers to everything outside the processor core. The SHARC proces-
sors’ peripherals include internal memory, parallel port, I/O processor,
JTAG port, and any external devices that connect to the processor.
Detailed information about the peripherals is found in the product-spe-
cific hardware reference.
Peripheral Clock.
The peripheral clock controls the processor’s peripherals and is defined as
(Peripheral) Clock Period = 2 × tCCLK.
Post-Modify Addressing.
The data address generator (DAG) provides an address during a data move
and auto-increments the stored address for the next move.
Precision.
The precision of a floating-point number depends on the number of bits
after the binary point in the storage format for the number. The processor
supports two high precision floating-point formats: 32-bit IEEE sin-
gle-precision floating-point (which uses 8 bits for the exponent and 24
bits for the mantissa) and a 40-bit extended precision version of the IEEE
format.
Pre-Modify Addressing.
The data address generator (DAG) provides a modified address during a
data move without incrementing the stored address.
Register File.
This is the set of registers that transfer data between the data buses and the
computation units and DAGs. These registers also provide local storage
for operands and results.
Register Swaps.
This special type of register-to-register move instruction uses the special
swap operator, <->. A register-to-register swap occurs when registers in
different processing elements exchange values.
ROM (Read-Only Memory).
A data storage device manufactured with fixed contents. This term is most
often used to refer to non-volatile semiconductor memory.
Numerics A
16-bit ABS (absolute value) computation, 11-14,
floating-point data, 11-84, 11-85 11-26, 11-27, 11-31
floating-point format, 3-29, C-4 absolute address, G-4
memory block, 7-14 AC (ALU fixed-point carry) bit, 3-9, A-17
memory organization, 7-12 access between DM or PM and a universal
packing, floating point, C-4 register, 9-13, 9-53, 9-56
32-bit access between DM or PM and the register
fixed-point format, C-6 file, 9-18
single-precision floating-point format, accessing memory, 7-19
C-2 addition
40-bit computation, 11-2
addressable memory, 7-19 with borrow computation, 11-10
extended-precision floating-point with carry computation, 11-4, 11-9
format, C-3 with division computation, 11-6
floating-point operands, 3-13 address
register-to-register transfers, 2-10 calculating, 7-18
48-bit addressing
access, 7-1 and address ranges, 7-19
data transfers (PX register), 2-11 even short words, 7-28
instructions, 7-20 gaps in, 7-19
64-bit odd short words, 7-28
ALU product (multiplier), C-6 short versus long word, 7-19
data passing, 1-9 short word, 7-19
PX register, 2-10 storing top-of-loop addresses, A-10
signed fixed-point product, C-6 AF (ALU floating point operation) bit, 3-9,
unsigned fixed-point product, 3-36 A-19
unsigned integer, C-6 AI (ALU floating-point invalid operation)
bit, 3-9, A-18
AIS (ALU floating-point invalid) bit, 3-10,
A-23
aligning data, 7-12
FACK (floating-point pack) computation, illegal input condition detected (IICD) bit,
11-84 7-25, A-39
FPACK/FUNPACK (floating-point illegal IOP register access (IIRA) bit, A-24
pack/unpack) instructions, C-4 illegal I/O processor register access enable
fractional (IIRAE) bit, 7-25, A-8
input(s), 3-20 IMASK (interrupt mask) register, A-36
results, 3-14, C-6 IMASKP (interrupt mask pointer) register,
freezing the cache, 4-90 9-45, A-37
FUNPACK (floating-point unpack) IMDWx (internal memory data width)
computation, 3-29, 11-85 bits, 2-10, 7-20
immediate data···»DM|PM (Type 16),
9-60
G
immediate data···»ureg (Type 17)
general-purpose IOP Timer 2 interrupt instruction, 9-62
mask (GPTMR2IMSK) bit, A-43 immediate data (16-bit)···»DM|PM
global interrupt enable, A-5 (Type 16c), 9-60
GPTMR2IMSK bit, A-43 immediate data (16-bit)···»ureg (Type 17c)
instruction, 9-62
H immediate shift/dreg«···»DM|PM (Type 6)
hardware breakpoints, 8-17 instruction, 9-25
Harvard architecture, 7-2, G-7 immediate shift instruction, 12-9
implicit operations
complementary registers, 2-6
I increment (Rn = Rx + 1) computation,
IDLE instruction, defined, G-7 11-11
IDLE (Type 21d), 9-71 INDATA interrupt enable
idle (type 22), 9-72 (EEMUINENS) bit, A-53
IEEE 1149.1 JTAG standard, G-8 index (Ix) registers, A-25, G-7
IEEE 754/854 standard, 3-37 indirect addressing, 1-6, 9-60
IEEE floating-point number conversion, indirect branch, G-7
3-29 indirect jump Call|Compute (Type 9)
IEEE standard 754/854, C-1 instruction, 9-35
IICD (illegal input condition interrupt) bit, indirect jump Call (Type 9c) instruction,
7-25, A-39 9-36
IIRAE (illegal IOP register access enable) indirect jump or compute/dreg«···»DM
bit, 7-25, A-8 (Type 10), 9-40
IIRAE (illegal IOP register access inexact flags, G-7
enable) bit, A-8 infinity, round-to, 3-38
IIRA (illegal IOP register access) bit, A-24
U
W
U64MA bit, 7-25, A-8, A-24
UMODE (user mode breakpoint function wait states, defined, G-13
enable) bit, 8-17 word rotations, 7-14
unaligned 64-bit memory access (U64MA) write 32-bit immediate data to DM or PM,
bit, A-8 9-60
underflow, 11-84, 11-85 write 32-bit immediate data to register,
underflow exception, 3-38 9-62
universal registers (Ureg), 1-9, 2-2, 2-10, writing memory, 7-23
9-53, 9-56, 10-31, G-13
unpacking (32-to-16-bit data), C-4 X
unsigned XOR (logical) computation, 11-18
fixed-point product, 3-36
input, 3-20
update an I register with an M register, 9-28 Z
ureg«···»DM|PM, register modify zero, round-to, 3-38
(Type 3c), 9-12 zero (MRF = 0) computation, 11-55