04_ARM_Architecture_Overview.ppt
04_ARM_Architecture_Overview.ppt
Overview
4T 5TE 6 7
ARM7TDMI ARM926EJ- S ARM1136JF -S Cortex-A8/R4/M3/M1
ARM922T ARM946E-S ARM1176JZF-S
Thumb-2
ARM966E-S ARM11 MPCore
Thumb
Extensions:
instruction set Improved SIMD Instructions
ARM/Thumb v7A (applications) – NEON
Unaligned data support
Interworking
v7R (real time) – HW Divide
Extensions:
DSP instructions
V7M (microcontroller) – HW
Thumb-2 (6T2)
Extensions: Divide and Thumb-2 only
TrustZone (6Z)
Jazelle (5TEJ)
Multicore (6K)
Confidential
1
ARM Architecture profiles
§ Application profile (ARMv7-A à e.g. Cortex-A8)
§ Memory management support (MMU)
§ Highest performance at low power
§ Influenced by multi-tasking OS system requirements
§ TrustZone and Jazelle-RCT for a safe, extensible system
Programmer’s Model
Confidential
2
Data Sizes and Instruction Sets
§ When used in relation to the ARM:
§ Halfword means 16 bits (two bytes)
§ Word means 32 bits (four bytes)
§ Doubleword means 64 bits (eight bytes)
Processor Modes
§ The ARM has seven basic operating modes:
§ Each mode has access to own stack and a different subset of registers
§ Some operations can only be carried out in a privileged mode
Mode Description
Supervisor Entered on reset and when a Software Interrupt
(SVC) instruction (SWI) is executed
Exception modes
Confidential
3
The ARM Register Set
User mode IRQ FIQ Undef Abort SVC
r0
r1
r2 ARM has 37 registers, all 32-bits long
r3 A subset of these registers is accessible
r4 in each mode
r5
r6
r7
r8 r8
r9 r9
r10 r10
r11 r11
r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)
cpsr
spsr spsr spsr spsr spsr
N Z C V Q de J U n d
GE[3:0]
e f IT cond_abc
i n e d E A I F T mode
f s x c
Confidential
4
Data alignment
§ Prior to architecture v6 data accesses must be appropriately aligned for
access size
§ Unaligned addresses will produce unexpected/undefined results
Exception Handling
§ When an exception occurs, the core:
§ Copies CPSR into SPSR_<mode>
§ Sets appropriate CPSR bits
§ Change to ARM state 0x1C FIQ
IRQ
§ Change to exception mode 0x18
0x14 (Reserved)
§ Disable interrupts (if appropriate)
0x10 Data Abort
§ Stores the return address in LR_<mode> 0x0C Prefetch Abort
§ Sets PC to vector address 0x08 Software Interrupt
0x04 Undefined Instruction
10
Confidential
5
Introduction to
Instruction Sets
11
12
Confidential
6
Thumb Instruction Set
§ Thumb is a 16-bit instruction set
§ Optimized for code density from C code (~65% of ARM code size)
§ Improved performance from narrow memory
§ Subset of the functionality of the ARM instruction set
13
14
Confidential
7
Thumb 2 Performance / Density
Thumb-2
Performance
Code density
15
Processor Cores
16
Confidential
8
ARM7TDMI Processor
§ Architecture v4T
§ 3-stage pipeline
§ Single interface to memory
17
ARM926EJ-S Processor
ARM926EJ-S
§ Architecture v5TE
§ 5-stage pipeline
§ Single-cycle 32x16 multiplier
§ Caches and TCMs
§ Memory management unit (MMU)
§ 2 AHB memory interfaces
§ Jazelle technology
18
Confidential
9
ARM1176JZ(F)-S Processor Core
§ TrustZone
§ 8-stage pipeline
§ Branch prediction
§ Four AXI memory ports
§ IEM (Intelligent Energy
Management)
§ Integrated VFP coprocessor
19
§ 1 – 4 MP11 processors
MP11 MP11 MP11 MP11
§ Cache coherency
§ Distributed interrupt controller
20
Confidential
10
ARM Cortex-M3 Processor
§ Architecture v7-M (Thumb-2 only) à
Very different from previous ARM
processors
§ No CPSR register
§ Vector table contains addresses, not
instructions
§ Processor automatically saves/restores
state in exceptions
§ Only 2 processor modes (Thread/Handler)
§ No Coprocessor 15 3-stage pipeline with
static branch prediction
§ Atypical Implementation
§ Fixed memory map
§ Integrated interrupt controller
§ Serial-Wire Debug
21
§ Architecture v7-A
§ 14 stage pipeline
§ NEON media processor
22
Confidential
11
The Instruction Pipeline
23
24
Confidential
12
Optimal Pipelining
Cycle 1 2 3 4 5 6 7 8 9
Operation
ADD F D E
SUB F D E
ORR F D E M
AND F D E
ORR F D E
EOR F D E W
25
Cycle 1 2 3 4 5 6 7 8 9
Address Operation
0x8000 BL 0x8FEC F D E EL EA
0x8004 SUB F D
0x8008 ORR F M
0x8FEC AND F D E
0x8FF0 ORR F D E
0x8FF4 EOR F D E W
26
Confidential
13
Cortex-A8 Integer Pipeline
Branch Mispredict Penalty
Replay Penalty
F0 F1 F2 D0 D1 D2 D3 D4 E0 E1 E2 E3 E4 E5
DEC
Queue Early
RAM DEC SEQ Shift ALU SAT BP WB ALU
AGU DEC Score Regfile Update
TLB board MUL
Queue & Issue Remap
Early Logic Route MUL1 MUL2 ADD WB PIPE0
DEC
Branch DEC
Pred. Reg
Pending File BP ALU
Instruction Fetch Replay Shift ALU SAT Update WB
Queue PIPE1
Instruction Decode
LOAD
AGU RAM + Format BP WB
TLB Fwd Update STORE
27
Reference Slides
28
Confidential
14
Reference Material
§ ARM ARM (“Architecture Reference Manual”)
§ ARM DDI 0100E covers v5TE DSP extensions
§ Can be purchased from booksellers - ISBN 0-201-737191 (Addison-Wesley)
§ Available for download from ARM’s website
§ ARM v7-M ARM available for download from ARM’s website
§ Contact ARM if you need a different version (v6, v7-AR, etc.)
Naming Conventions
§ ARMx5z (e.g. ARM1156T2-S) indicates cache, MPU and error correcting memory
§ ARMx7z (e.g. ARM1176JZ-S) indicates AXI bus, & physically mapped caches and
MMU
30
Confidential
15
Which architecture is my processor?
Processor core Architecture
§ ARM7TDMI family v4T
§ ARM720T, ARM740T
§ ARM9TDMI family v4T
§ ARM920T,ARM922T,ARM940T
§ ARM9E family v5TE, v5TEJ
§ ARM946E-S, ARM966E-S, ARM926EJ -S
§ ARM10E family v5TE, v5TEJ
§ ARM1020E, ARM1022E, ARM1026EJ -S
§ ARM11 family v6
§ ARM1136J(F)-S v6
§ ARM1156T2(F)-S v6T2
§ ARM1176JZ(F)-S v6Z
§ ARM11 MPCore v6
§ Cortex family
§ ARM Cortex -A8 v7-A
§ ARM Cortex -R4(F) v7-R
§ ARM Cortex -M3 v7-M
§ ARM Cortex -M1 v6-M
§ For ARM processor naming conventions and features, please see the Appendix
31
ARMv4T Cores:
7TDMI 720T 740T 920T 940T SA1100
Architecture von Neumann von Neumann von Neumann Harvard Harvard Harvard
Associativity N/A 4-way 4-way 64- way 64- way 32- way
TCM No No No No No No
Random
Replacement N/A Random Random Random Round Robin
Round Robin
32
Confidential
16
ARMv5 Cores:
926EJ-S 946E-S 966E-S 968E-S 1026EJ-S XScale
Architecture Harvard Harvard Harvard Harvard Harvard Harvard
4-128K Instr 0-1024K Instr None None 0-128K Instr 32K Instr
Cache 4-128K Data 0-1024K Data 0-128K Data 32K Data
8 words/line 8 words/line 8 words/line 8 words/line
0-1024K Instr 0-1024K Instr 0-64M Instr 0-64M Instr 0-1024K Instr
TCM No
0-1024K Data 0-1024K Data 0-64M Data 0-64M Data 0-1024K Data
Random Random Random Random
Replacement N/A N/A
Round Robin Round Robin Round Robin Round Robin
Write Write Through Write Through Write Through Write Through Write Through
N/A
Strategy Write Back Write Back Write Back Write Back Write Back
33
ARMv6 Cores:
1136EJ(F)- 1156T2(F)- 1176JZ(F)-
MPCore11
S S S
Architecture Harvard Harvard Harvard Harvard
34
Confidential
17
Cortex Cores:
Cortex-M3 Cortex-M1 Cortex-R4 Cortex-A8
Architecture Harvard Harvard Harvard Harvard
MPU
MMU/MPU MPU None MMU
(optional)
35
TrustZone Computing
§ TrustZone adds a “parallel world” to allow trusted programs and data to
be safely separated from the OS and applications
§ Features:
§ New Secure Monitor Mode:
gate -keeper for secure state
§ New S-bit in CP15 to indicate when
the processor is running in a
secured state
§ Security state exposed on external
bus accesses to permit security-
aware memory and peripherals
§ Ability to restrict debug to non-
secure state
36
Confidential
18
NEON Media Processor Features
§ Single Instruction Multiple Data (SIMD) Media Processor
37
End
38
Confidential
19