Adv Comp Arch Q3'11
Adv Comp Arch Q3'11
1
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
SoC Design
Development Tools
2
ARM Ltd
Founded in November 1990
Spun out of Acorn Computers
Initial funding from Apple, Acorn and VLSI
3
ARM’s Activities
Connected Community
Development Tools
Software IP
Processors
memory
System Level IP:
Data Engines
SoC
Fabric
3D Graphics
Physical IP
4
ARM Connected Community – 700+
5 5
Huge Range of Applications
IR Fire
Detector
Utility Exercise
Machines Intelligent
Intelligent toys Meters Energy Efficient Appliances
Vending
Tele-parking
6
World’s Smallest ARM Computer?
Battery Solar Cells
Wireless Sensor Network
Sensors, timers
A B C
Processor, SRAM and PMU
7
World’s Largest ARM Computer?
8
From 1mm3 to 1km3
1mm3 1km3
10¢ $1000
9
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
SoC Design
Development Tools
10
ARM Cortex Processors (v7)
Cortex-M0
12k gates...
11
Relative Performance*
2500
2000
Max Frequency (Mhz)
1500
1000
500
0
Cortex- Cortex- Cortex-A9
ARM7 ARM926 ARM1026 ARM1136 ARM1176 Cortex-A8
M0 M3 Dual-core
Max Freq (MHz) 50 150 184 470 540 610 750 1100 2000
Min Power (mW/MHz) 0.012 0.06 0.35 0.235 0.36 0.335 0.568 0.43 0.5
12
Cortex family
Cortex-A8 Cortex-R4 Cortex-M3
Architecture v7A Architecture v7R Architecture v7M
MMU MPU (optional) MPU (optional)
AXI AXI AHB Lite & APB
VFP & NEON support Dual Issue
13
Cortex-M0 DesignStart
ARM Cortex-M4
“32-bit/DSC” applications
Efficient digital signal control
ARM Cortex-M3
“16/32-bit” applications
Performance efficiency
ARM Cortex-M0
“8/16-bit” applications
Low-cost & simplicity
14
Cortex-M0 DesignStart (2)
Minimum usable
ARMv6-M instruction set architecture
NVIC Interrupt controller
Interrupt line configurations 1 to 32 16 only
Debug (SWD, JTAG) option
Up to 4 breakpoints, 2 watchpoints
Low power optimisations (ACG)
Multiple power domain support with WIC
Fast multiplier (1 cycle) option
System timer
15
ARM and Thumb Performance
30000
25000
20000
Dhrystone 2.1/sec
@ 20MHz
15000 ARM
Thumb
10000
5000
0
32-bit 16-bit 16-bit with
32-bit stack
16
The Thumb-2 instruction set
Variable-length instructions
ARM instructions are a fixed length of 32 bits
Thumb instructions are a fixed length of 16
bits
Thumb-2 instructions can be either 16-bit or
32-bit
17
Processor Modes
The ARM has seven basic operating modes:
18
The ARM Register Set
cpsr
spsr spsr spsr spsr spsr spsr
19
Exception Handling
20
Cortex-M3 Programmer’s Model
Main
Fully programmable in C
r0
r1
xPSR
21
Conditional Execution and Flags
ARM instructions can be made to execute conditionally by postfixing them with the
appropriate condition code field.
This improves code density and performance by reducing the number of
forward branch instructions.
CMP r3,#0 CMP r3,#0
BEQ skip ADDNE r0,r1,r2
ADD r0,r1,r2
skip
By default, data processing instructions do not affect the condition code flags but
the flags can be optionally set by using “S”. CMP does not need “S”.
loop
…
SUBS r1,r1,#1 decrement r1 and set flags
BNE loop if Z flag clear then branch
22
Data processing Instructions
Consist of :
Arithmetic: ADD ADC SUB SBC RSB RSC
Logical: AND ORR EOR BIC
Comparisons: CMP CMN TST TEQ
Data movement: MOV MVN
23
Using a Barrel Shifter:The 2nd Operand
Immediate value
8 bit number, with a range of 0-255.
24
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
SoC Design
Development Tools
25
The ARM7TDM Core
ABE A[31:0] Address
Incrementer
P
C BIGEND
MCLK
nWAIT
Register Bank PC Update
nRW
Instruction MAS[1:0]
A
Decode Stage Decoder
L A B ISYNC
Instruction nIRQ
U Decompression nFIQ
Multiplier nRESET
B B and ABORT
B nTRANS
u u Read Data
nMREQ
u s Barrel s Register SEQ
Shifter Control LOCK
s nM[4:0]
Logic
Write Data nOPC
Register nCPI
32 Bit ALU CPA
CPB
DBE D[31:0]
26
Cortex-M3 Datapath
I_HRDATA Instruction
Decode
Address
Register Barrel
Incrementer Mul/Div
Bank Shifter
I_HADDR ALU
A ALU
Address
Register
Writeback
INTADDR
27
Pipeline changes for ARM9TDMI
ARM7TDMI
ARM decode
Instruction ThumbARM Reg Reg
Shift ALU
Fetch decompress Read Write
Reg Select
ARM9TDMI
ARM or Thumb
Instruction Inst Decode Memory Reg
Shift + ALU Write
Fetch Reg Reg Access
Decode Read
FETCH DECODE EXECUTE MEMORY WRITE
28
Cortex-M3 Pipeline
Cortex-M3 has 3-stage fetch-decode-execute pipeline
Similar to ARM7
Cortex-M3 does more in each stage to increase overall
performance
Instruction
Fetch
Decode & Multiply & Divide Write
(Prefetch)
Register Read
29
ARM10 vs. ARM11 Pipelines
ARM10
Branch Memory
ARM or
Prediction Shift + ALU
Thumb Reg Read Access Reg
Instruction Write
Instruction
Decode Multiply
Fetch Multiply
Add
FETCH ISSUE DECODE EXECUTE MEMORY WRITE
ARM11
Data Data
Address Cache Cache
1 2
30
Full Cortex-A8 Pipeline Diagram
13-Stage Integer Pipeline 10-Stage NEON Pipeline
32
An Example AMBA System
High Performance
APB
ARM processor UART
High
Bandwidth AHB Timer
APB
External
Bridge
Memory Keypad
Interface
33
AHB Structure
Arbiter
HADDR
HADDR HWDATA Slave
Master HWDATA
#1
HRDATA
#1
HRDATA
Address/Control
Slave
#2
Master
#2
Write Data
Slave
Read Data #3
Master
#3
Slave
#4
Decoder
34
AHB basic signal timing
HCLK
HADDR A B C
HWRITE A B C
HWDATA A B
HRDATA A B
HREADY
35
Mali200 + GP2 SoC Integration
Shipped as synthesizable
Clock
Verilog
Reset
IRQs Mali 200 Mali GP2
Mali 200 + GP2 requires a
IDLEs single instant in the SoC,
with a small number of
AXI Fabric connections to be made.
36
Typical GPU SoC Design
Mali 200 Mali GP2
ARM1176JZF Int
PL390 nRst
D I Local AXI Interconnect
GIC
CLCD
Sys PL111
L230 Mali
Ctrl SDRAMC DDR
MMU
M S M
PL340 PHY
APB
Designed and optimised for AMBA: provides easier integration with ARM cores and fabric IP
Unified Memory Architecture
37
Physical IP*
Classic (180nm to 90nm):
Access to ARM Physical IP
Everything needed to implement a chip
High-quality libraries and memories
DesignStart:
Free access to ARM processor IP
ARM926EJ™ hardened from
180nm to 90nm for major foundry processes
Separate license needed to produce silicon
SoC designs can be done with these models
38
ARM PIPD Logic Product Families
High Speed
(SC12)
ECO Kits
(SC9 / SC10)
(SC9 / SC10 tapped
tapped))
Low Power
Low Area
Process
180nm 130nm 90nm 65nm 45nm 32nm 28nm Geometry
39
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
SoC Design
Development Tools
40
ARM Debug Architecture
Ethernet
Debugger (+ optional
trace tools)
41
Keil Development Tools for ARM
42
Keil Development Tools for ARM
43
University Resources
http://www.arm.com/support/university/
University@arm.com
44
Your Future at ARM…
Graduate and Internship/Co-op Opportunities
Engineering: Memory, Validation, Performance, DFT, R&D, GPU and more!
Sales and Marketing: Corporate and Technical
Corporate: IT, Patents, Services (Training and Support), and Human
Resources
Keep in Touch!
www.arm.com/about/careers
45
TI Panda Board
OMAP4430 Processor
1 GHz Dual-core ARM
Cortex-A9 (NEON+VFP)
C64x+ DSP
PowerVR SGX 3D GPU
1080p Video Support
POP Memory
1 GB LPDDR2 RAM
USB Powered
< 4W max consumption
(OMAP small % of that)
Many adapter options
(Car, wall, battery, solar, ..)
46
Project Ideas Using Panda
OS Projects
OS porting to ARM/Cortex (TI OMAP)
MythTV system
“Super-Panda” – stack of Pandas as compute engine and task
distribution
Linux applications
47
Fin
48
Nokia N95 Multimedia Computer
OMAP™ 2420
Applications Processor
ARM1136™ processor-based
SoC, developed using Magma ®
Blast® family and winner of
2005 INSIGHT Award for ‘Most
Innovative SoC’
ST WLAN Solution
Ultra-low power 802.11b/g WLAN
chip with ARM9™ processor-based
MAC
50
Targeting community development
Wikis, blogs,
$149 Personally promotion of
> 1000 participants affordable community
and growing activity
Active &
technical Freedom to
community innovate
Addressing
Open access to
open source Instant access to
hardware community >10 million lines
documentation of code
needs
Opportunity Free
to tinker and software
learn
51
Fast, low power, flexible expansion
OMAP3530 Processor
Peripheral I/O
600MHz Cortex-A8
DVI-D video out
NEON+VFPv3
3” SD/MMC+
16KB/16KB L1$
256KB L2$ S-Video out
430MHz C64x+ DSP USB 2.0 HS OTG
32K/32K L1$ I2C, I2S, SPI,
48K L1D MMC/SD
32K L2
JTAG
PowerVR SGX GPU
Stereo in/out
64K on-chip RAM
Alternate power
POP Memory RS-232 serial
128MB LPDDR RAM
256MB NAND flash USB Powered
2W maximum consumption
OMAP is small % of that
Many adapter options
Car, wall, battery, solar, …
52
And more… On-going collaboration at BeagleBoard.org
Live chat via IRC for 24/7 community support
Links to software projects to download
Other Features
4 LEDs
3”
USR0 Peripheral I/O
USR1
DVI-D video out
PMU_STAT
SD/MMC+
PWR
2 buttons S-Video out
USER USB HS OTG
RESET I2C, I2S, SPI,
4 boot sources MMC/SD
SD/MMC JTAG
NAND flash Stereo in/out
USB
Alternate power
Serial
RS-232 serial
53
Project Ideas Using Beagle
OS Projects
OS porting to ARM/Cortex (TI OMAP)
MythTV system
“Super-Beagle” – stack of Beagles as compute engine and task
distribution
Linux applications
54