ARM Arch (Compatibility Mode)
ARM Arch (Compatibility Mode)
Dr.R.Murugan
Assistant Professor
Department of Electronics and Communication
Engineering
National Institute of Technology Silchar
TM
1 1
Agenda
Introduction
Architecture
Programmers Model
Instruction Set
GPIO Ports
TM
2 2
History of ARM
• ARM (Advanced RISC Machine) started as a new, powerful, CPU design for
the replacement of the 8-bit 6502 in Acorn Computers (Cambridge, UK,
1985)
• First models had only a 26-bit program counter, limiting the memory space
to 64 MB (not too much by today standards, but a lot at that time).
• 1990 spin-off: ARM renamed Advanced RISC Machines
• ARM now focuses on Embedded CPU cores
• IP licensing: Almost every silicon manufacturer sells some microcontroller
with an ARM core. Some even compete with their own designs.
• Processing power with low current consumption
• Good MIPS/Watt figure
• Ideal for portable devices
• Compact memories: 16-bit opcodes (Thumb)
• New cores with added features
• Harvard architecture (ARM9, ARM11, Cortex)
• Floating point arithmetic
• Vector computing (VFP, NEON)
• Java language (Jazelle) 3
TM
3
Facts
• 32-bit CPU
• 3-operand instructions (typical): ADD Rd,Rn,Operand2
• RISC design…
• Few, simple, instructions
• Load/store architecture (instructions operate on registers, not memory)
• Large register set
• Pipelined execution
• … Although with some CISC touches…
• Multiplication and Load/Store Multiple are complex instructions (many cycles
longer than regular, RISC, instructions)
• … And some very specific details
• No stack. Link register instead
• PC as a regular register
• Conditional execution of all instructions
• Flags altered or not by data processing instructions (selectable)
• Concurrent shifts/rotations (at the same time of other processing)
• …
TM
4 4
Agenda
Introduction
Architecture
Programmers Model
Instruction Set
GPIO Ports
TM
5 5
Topologies
Von Neumann Harvard
ARM9s
ARM7s and newers
and olders
Inst. Data
AHB
bus
I D
Cache Cache
MEMORY
& I/O
Bus Interface
AHB
Memory-mapped I/O: bus
Architecture
Address Register Address
Incrementer
PC bus
PC
REGISTER
BANK
ALU bus
Control Lines
INSTRUCCTION
DECODER
Multiplier
A bus
B bus
SHIFT
A.L.U.
Instruction Reg.
Thumb to
ARM
Write Data Reg. Read Data Reg.
translator
D[31:0]
ARM Pipelining examples
ARM7TDMI Pipeline
1 Clock cycle
ARM9TDMI Pipeline
1 Clock cycle
TM
8 8
ARM7TDMI Pipelining (I)
TM
9 9
ARM7TDMI Pipelining (II)
• More complex instructions:
ARM9
TM
11 11
Arithmetic and Carry Flag
Introduction
Architecture
Programmers Model
Instruction Set
GPIO Ports
TM
13 13
Data Sizes and Instruction Sets
TM
14 14
Processor Modes
TM
15 15
The Registers
TM
16 16
The ARM Register Set
cpsr
spsr spsr spsr spsr spsr spsr
TM
17 17
Special Registers
SP (R13): Stack Pointer. There is no stack in the ARM architecture. Even so,
R13 is usually reserved as a pointer for the program-managed stack
CPSR : Current Program Status Register. Holds the visible status register
SPSR : Saved Program Status Register. Holds a copy of the previous status
register while executing exception or interrupt routines
- It is copied back to CPSR on the return from the exception or interrupt
- No SPSR available in User or System modes
TM
18 18
Register Organization Summary
User,
FIQ IRQ SVC Undef Abort
SYS
r0
r1
User
r2 mode
r3 r0-r7,
r4 r15, User User User User
r5 and mode mode mode mode
cpsr r0-r12, r0-r12, r0-r12, r0-r12,
r6
r15, r15, r15, r15,
r7 and and and and
r8 r8 cpsr cpsr cpsr cpsr
r9 r9
r10 r10
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)
cpsr
spsr spsr spsr spsr spsr
TM
19 19
Program Status Registers
31 28 27 24 23 16 15 8 7 6 5 4 0
N Z C V undefined I F T mode
f s x c
TM
20 20
Program Counter (R15)
TM
21 21
Exception Handling
TM
22 22
Agenda
Introduction
Architecture
Programmers Model
Instruction Set (for ARM state)
GPIO Ports
TM
23 23
Conditional Execution and Flags
TM
24 24
Condition Codes
TM
25 25
Examples of conditional
execution
Use a sequence of several conditional instructions
if (a==0) func(1);
CMP r0,#0
MOVEQ r0,#1
BLEQ func
TM
26 26
Data processing Instructions
Consist of :
Arithmetic: ADD ADC SUB SBC RSB RSC
Logical: AND ORR EOR BIC
Comparisons: CMP CMN TST TEQ
Data movement: MOV MVN
CF Destination 0 Destination CF
Destination CF
TM
28 28
Using the Barrel Shifter:
The Second Operand
Operand Operand
Register, optionally with shift operation
1 2
Shift value can be either be:
5 bit unsigned integer
Specified in bottom byte of another
Barrel register.
Shifter Used for multiplication by a power of 2
Example: ADD R1, R2, R3, LSL #2
Immediate value
8 bit number, with a range of 0-255.
ALU
Rotated right through even number of
positions
Allows increased range of 32-bit
constants to be loaded directly into
Result registers
TM
29 29
Immediate constants (1)
11 8 7 0
rot immed_8
Quick Quiz:
x2 0xe3a004ff
Shifter
ROR
MOV r0, #???
4 bit rotate value (0-15) is multiplied by two to give range 0-30 in steps of 2
Rule to remember is “8-bits shifted by an even number of bit positions”.
TM
30 30
Immediate constants (2)
Examples:
31 0
ror #0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 range 0-0x000000ff step 0x00000001
TM
31 31
Loading 32 bit constants
or
Generate a LDR instruction with a PC-relative address to read the constant
from a literal pool (Constant data area embedded in the code).
For example
LDR r0,=0xFF => MOV r0,#0xFF
LDR r0,=0x55555555 => LDR r0,[PC,#Imm12]
…
…
DCD 0x55555555
This is the recommended way of loading constants into a register
TM
32 32
Loading addresses: ADR
TM
33 33
Data processing instr. FLAGS
Flags are changed only if the S bit of the op-code is set:
Mnemonics ending with “s”, like “movs”, and comparisons: cmp, cmn, tst, teq
N and Z have the expected meaning for all instructions
N: bit 31 (sign) of the result
Z: set if result is zero
Logical instructions (AND, EOR, TST, TEQ, ORR, MOV, BIC, MVN)
V: unchanged
C: from barrel shifter if shift ≠ 0. Unchanged otherwise
Arithmetic instructions (SUB, RSB, ADD, ADC, SBC, RSC, CMP, CMN)
V: Signed overflow from ALU
C: Carry (bit 32 of result) from ALU
When PC is the destination register (exception return)
CPSR is copied from SPSR. This includes all the flags.
No change in user or system modes
Example: SUBS PC,LR,#4 @ return from IRQ
TM
34 34
Multiply
Syntax:
MUL{<cond>}{S} Rd, Rm, Rs Rd = Rm * Rs
MLA{<cond>}{S} Rd,Rm,Rs,Rn Rd = (Rm * Rs) + Rn
[U|S]MULL{<cond>}{S} RdLo, RdHi, Rm, Rs RdHi,RdLo := Rm*Rs
[U|S]MLAL{<cond>}{S} RdLo, RdHi, Rm, Rs RdHi,RdLo:=(Rm*Rs)+RdHi,RdLo
Cycle time
Basic MUL instruction
2-5 cycles on ARM7TDMI
1-3 cycles on StrongARM/XScale
2 cycles on ARM9E/ARM102xE
+1 cycle for ARM9TDMI (over ARM7TDMI)
+1 cycle for accumulate (not on 9E though result delay is one cycle longer)
+1 cycle for “long”
Above are “general rules” - refer to the TRM for the core you are using
for the exact details
TM
35 35
Branch instructions
31 28 27 25 24 23 0
Cond 1 0 1 L Offset
The processor core shifts the offset field left by 2 positions, sign-extends
it and adds it to the PC
± 32 Mbyte range
How to perform longer branches or absolute address branches?
solution: LDR PC,…
TM
36 36
ARM Branches and Subroutines
BL <subroutine>
Stores return address in LR
Returning implemented by restoring the PC from LR
For non-leaf subroutines, LR will have to be stacked
func1 func2
STMFD :
: sp!,{regs,lr}
:
: :
:
BL func1 BL func2
:
: :
:
: LDMFD
sp!,{regs,pc} MOV pc, lr
Syntax:
LDR{<cond>}{<size>} Rd, <address>
STR{<cond>}{<size>} Rd, <address>
e.g. LDREQB
TM
38 38
Address accessed
TM
39 39
Pre or Post Indexed Addressing?
Pre-indexed: STR r0,[r1,#12]
Offset r0
Source
12 0x20c 0x5 0x5 Register
for STR
r1
Base
Register 0x200 0x200
Base-update possible: r0 r1
LDM r10!,{r0-r6} r0
TM
41 41
LDM/STM for Stack Operations
Traditionally, a stack grows down in memory, with the last “pushed”
value at the lowest address. The ARM also supports ascending stacks,
where the stack structure grows up through memory.
The value of the stack pointer can either:
• Point to the last occupied address (Full stack)
– and so needs pre-decrementing/incrementing (ie before the push)
• Point to an unoccupied address (Empty stack)
– and so needs post-decrementing/incrementing (ie after the push)
TM
42 42
Stack Examples
0x418
SP r5 SP
r4 r5
r3 r4
r1 r3
r0 r1
Old SP Old SP r5 Old SP Old SP r0 0x400
r5 r4
r4 r3
r3 r1
r1 r0
SP r0 SP
0x3e8
TM
43 43
LDM/STM Alias Names
TM
44 44
LDM/STM: ^ modifier
The ^ modifier changes the behavior of LDM and STM. There are 2 cases:
TM
45 45
PSR Transfer Instructions
31 28 27 24 23 16 15 8 7 6 5 4 0
N Z C V undefined I F T mode
f s x c
where
<psr> = CPSR or SPSR
[_fields] = any combination of ‘fsxc’
In User Mode, all bits can be read but only the condition flags (_f) can be
written.
TM
46 46
Software Interrupt (SWI)
31 28 27 24 23 0
Condition Field
TM
47 47
Thumb State
Thumb is a 16-bit instruction set
Optimized for code density from C code (~65% of ARM code size)
Improved performance from memory with a narrow data bus
Subset of the functionality of the ARM instruction set
Core has additional execution state - Thumb
Switch between ARM and Thumb via the BX Rn instruction (Branch and eXchange).
If Rn.0 is 1 (odd address) the processor will change to thumb state.
15 0
ADD r2,#1
Thumb instruction set limitations:
16-bit Thumb Instruction
Conditional execution only for branches
Source and destination registers identical
Only Low registers (R0-R7) used
Constants are of limited size
31 0 Inline barrel shifter not used
ADDS r2,r2,#1
No MSR, MRS instructions
32-bit ARM Instruction
TM
48 48
Atomic data swap
TM
49 49
Exception / Interrupt Return
TM
50 50
Coprocessors
Coprocessor instructions:
Coprocessor data operation: CDP
Coprocessor Load/Store: LDC, STC
Coprocessor register transfer: MRC, MCR
(some coprocessors, like P14 and P15, only support MRC and MCR)
TM
51 51
Agenda
Introduction
Architecture
Programmers Model
Instruction Set
GPIO Ports
TM
52 52
ARM7TDMI-S processor GPIO Ports
LPC2148 has two 32-bit General Purpose I/O ports
1. PORT0
2. PORT1
PORT0 is a 32-bit port
Out of these 32 pins, 28 pins can be configured as either general purpose input
or output.
1 of these 32 pins (P0.31) can be configured as general-purpose output only.
3 of these 32 pins (P0.24, P0.26 and P0.27) are reserved. Hence, they are not
available for use. Also, these pins are not mentioned in pin diagram.
PORT1 is also a 32-bit port. Only 16 of these 32 pins (P1.16 – P1.31) are
available for use as general-purpose input or output.
Almost every pin of these two ports has some alternate function available. For
example, P0.0 can be configured as the TXD pin for UART0 or as PWM1 pin as
well. The functionality of each pin can be selected using the Pin Function Select
Registers.
TM
53 53
LPC2148 Pin Diagram
TM
54 54
Pin Function Select Registers
Pin Function Select Registers are 32-bit registers. These registers are used to
select or configure specific pin functionality.
There are 3 Pin Function Select Registers in LPC2148:
1. PINSEL0 : - PINSEL0 is used to configure PORT0 pins P0.0 to P0.15.
2. PINSEL1 : - PINSEL1 is used to configure PORT0 pins P0.16 to P0.31.
3. PINSEL2 : - PINSEL2 is used to configure PORT1 pins P1.16 to P1.31.
TM
55 55
Fast and Slow GPIO Registers
There are 5 Fast (also called Enhanced GPIO Features Registers) GPIO Registers and
4 Slow (also called Legacy GPIO Registers) GPIO Registers available to control PORT0
and PORT1.
The Slow Registers allow backward compatibility with earlier family devices using the
existing codes.
Slow GPIO Registers
IOxPIN (GPIO Port Pin value register)
TM
56 56