0% found this document useful (0 votes)
95 views54 pages

Adv Comp Arch Q3'11

Uploaded by

auryb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views54 pages

Adv Comp Arch Q3'11

Uploaded by

auryb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

The ARM Architecture

1
Agenda
 Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
SoC Design
Development Tools

2
ARM Ltd
 Founded in November 1990
 Spun out of Acorn Computers
 Initial funding from Apple, Acorn and VLSI

 Designs the ARM range of RISC processor cores


 Licenses ARM core designs to semiconductor
partners who fabricate and sell to their
customers
 ARM does not fabricate silicon itself

 Also develop technologies to assist with the design-


in of the ARM architecture
 Software tools, boards, debug hardware
 Application software
 Bus architectures
 Peripherals, etc

3
ARM’s Activities

Connected Community
Development Tools
Software IP

Processors
memory
System Level IP:
Data Engines
SoC
Fabric
3D Graphics

Physical IP

4
ARM Connected Community – 700+

5 5
Huge Range of Applications

IR Fire
Detector
Utility Exercise
Machines Intelligent
Intelligent toys Meters Energy Efficient Appliances
Vending
Tele-parking

Equipment Adopting 32-bit ARM


Microcontrollers

6
World’s Smallest ARM Computer?
Battery Solar Cells
Wireless Sensor Network
Sensors, timers

Cortex-M0 +16KB RAM 65nm


UWB Radio antenna
10 kB Storage memory
~3fW/bit

12µAh Li-ion Battery

A B C
Processor, SRAM and PMU

Wirelessly networked into large scale


sensor arrays
Cortex-M0; 65¢ University of Michigan

7
World’s Largest ARM Computer?

4200 ARM powered


Neutrino Detectors

70 bore holes 2.5km deep

60 detectors per string


starting 1.5km down

1km3 of active telescope

Work supported by the National Science Foundation and University of Wisconsin-Madison

8
From 1mm3 to 1km3

1mm3 1km3

10¢ $1000

Mobile Home Mobile Computing Server


Embedded Consumer Enterprise PC HPC

9
Agenda
Introduction to ARM Ltd
 ARM Architecture/Programmers Model
Data Path and Pipelines
SoC Design
Development Tools

10
ARM Cortex Processors (v7)

 ARM Cortex-A family (v7-A):


 Applications processors for full OS
and 3rd party applications x1-4
Cortex-A15
...2.5GHz
x1-4

 ARM Cortex-R family (v7-R): Cortex-A9


Cortex-A8
x1-4
 Embedded processors for real-time Cortex-A5
signal processing, control applications 1-2
R Heron
Cortex-R4

 ARM Cortex-M family (v7-M): Cortex-M4

 Microcontroller-oriented processors Cortex™-M3


SC300™

for MCU and SoC applications Cortex-M1

Cortex-M0
12k gates...

11
Relative Performance*
2500

2000
Max Frequency (Mhz)

1500

1000

500

0
Cortex- Cortex- Cortex-A9
ARM7 ARM926 ARM1026 ARM1136 ARM1176 Cortex-A8
M0 M3 Dual-core
Max Freq (MHz) 50 150 184 470 540 610 750 1100 2000
Min Power (mW/MHz) 0.012 0.06 0.35 0.235 0.36 0.335 0.568 0.43 0.5

*Represents attainable speeds in 130, 90, 65, or 45nm processes

12
Cortex family
Cortex-A8 Cortex-R4 Cortex-M3
 Architecture v7A  Architecture v7R  Architecture v7M
 MMU  MPU (optional)  MPU (optional)
 AXI  AXI  AHB Lite & APB
 VFP & NEON support  Dual Issue

13
Cortex-M0 DesignStart

ARM Cortex-M4
“32-bit/DSC” applications
Efficient digital signal control

ARM Cortex-M3
“16/32-bit” applications
Performance efficiency

ARM Cortex-M0
“8/16-bit” applications
Low-cost & simplicity

14
Cortex-M0 DesignStart (2)

ARM Cortex-M0 processor Full product “M0_DS”


features options implementation
Zero jitter 32-bit RISC core  
AMBA AHB-lite interface  

Minimum usable
ARMv6-M instruction set architecture  
NVIC Interrupt controller  
Interrupt line configurations 1 to 32 16 only
Debug (SWD, JTAG) option 
Up to 4 breakpoints, 2 watchpoints 
Low power optimisations (ACG) 
Multiple power domain support with WIC 
Fast multiplier (1 cycle) option 
System timer  

Area (gates) 12k – 25k 16K

15
ARM and Thumb Performance

30000

25000

20000
Dhrystone 2.1/sec
@ 20MHz
15000 ARM
Thumb
10000

5000

0
32-bit 16-bit 16-bit with
32-bit stack

Memory width (zero wait state)

16
The Thumb-2 instruction set
 Variable-length instructions
 ARM instructions are a fixed length of 32 bits
 Thumb instructions are a fixed length of 16
bits
 Thumb-2 instructions can be either 16-bit or
32-bit

 Thumb-2 gives approximately 26%


improvement in code density over ARM

 Thumb-2 gives approximately 25%


improvement in performance over
Thumb

17
Processor Modes
 The ARM has seven basic operating modes:

 User : unprivileged mode under which most tasks run

 FIQ : entered when a high priority (fast) interrupt is raised

 IRQ : entered when a low priority (normal) interrupt is raised

 Supervisor : entered on reset and when a Software Interrupt


instruction is executed
 Abort : used to handle memory access violations

 Undef : used to handle undefined instructions

 System : privileged mode using the same registers as user mode

18
The ARM Register Set

Current Visible Registers


r0
Abort
Undef
SVC
IRQ
FIQ
User Mode
Mode
Mode
Mode
Mode r1
r2
r3 Banked out Registers
r4
r5
r6 User FIQ IRQ SVC Undef Abort
r7
r8 r8 r8
r9 r9 r9
r10 r10 r10
r11 r11 r11
r12 r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)

cpsr
spsr spsr spsr spsr spsr spsr

19
Exception Handling

 When an exception occurs, the ARM:


 Copies CPSR into SPSR_<mode>
 Sets appropriate CPSR bits
 Change to ARM state 0x1C FIQ
0x18 IRQ
 Change to exception mode
0x14 (Reserved)
 Disable interrupts (if appropriate) 0x10 Data Abort
 Stores the return address in LR_<mode> 0x0C Prefetch Abort
 Sets PC to vector address 0x08 Software Interrupt
0x04 Undefined Instruction
 To return, exception handler needs to: 0x00 Reset
 Restore CPSR from SPSR_<mode> Vector Table
 Restore PC from LR_<mode> Vector table can be at
0xFFFF0000 on ARM720T
This can only be done in ARM state. and on ARM9/10 family devices

20
Cortex-M3 Programmer’s Model
Main

 Fully programmable in C
r0
r1

 Stack-based exception model r2


r3

 Only two processor modes r4


r5
 Thread Mode for User tasks r6
r7
 Handler Mode for OS tasks and exceptions r8

 Vector table contains addresses r9


r10
r11
r12 Process
sp
sp
lr
r15 (pc)

xPSR

21
Conditional Execution and Flags
 ARM instructions can be made to execute conditionally by postfixing them with the
appropriate condition code field.
 This improves code density and performance by reducing the number of
forward branch instructions.
CMP r3,#0 CMP r3,#0
BEQ skip ADDNE r0,r1,r2
ADD r0,r1,r2
skip

 By default, data processing instructions do not affect the condition code flags but
the flags can be optionally set by using “S”. CMP does not need “S”.
loop

SUBS r1,r1,#1 decrement r1 and set flags
BNE loop if Z flag clear then branch

22
Data processing Instructions
 Consist of :
 Arithmetic: ADD ADC SUB SBC RSB RSC
 Logical: AND ORR EOR BIC
 Comparisons: CMP CMN TST TEQ
 Data movement: MOV MVN

 These instructions only work on registers, NOT memory.


 Syntax:

<Operation>{<cond>}{S} Rd, Rn, Operand2

 Comparisons set flags only - they do not specify Rd


 Data movement does not specify Rn
 Second operand is sent to the ALU via barrel shifter.

23
Using a Barrel Shifter:The 2nd Operand

Operand Operand Register, optionally with shift operation


1 2  Shift value can be either be:
 5 bit unsigned integer
 Specified in bottom byte of
Barrel another register.
Shifter
 Used for multiplication by constant

Immediate value
 8 bit number, with a range of 0-255.

ALU  Rotated right through even


number of positions
 Allows increased range of 32-bit
constants to be loaded directly into
registers
Result

24
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
 Data Path and Pipelines
SoC Design
Development Tools

25
The ARM7TDM Core
ABE A[31:0] Address
Incrementer

Address Register Incrementer

P
C BIGEND
MCLK
nWAIT
Register Bank PC Update
nRW
Instruction MAS[1:0]
A
Decode Stage Decoder
L A B ISYNC
Instruction nIRQ
U Decompression nFIQ
Multiplier nRESET
B B and ABORT
B nTRANS
u u Read Data
nMREQ
u s Barrel s Register SEQ
Shifter Control LOCK
s nM[4:0]
Logic
Write Data nOPC
Register nCPI
32 Bit ALU CPA
CPB

DBE D[31:0]

26
Cortex-M3 Datapath
I_HRDATA Instruction
Decode

Write Data D_HWDATA


Address Register
Incrementer
D_HRDATA
D_HADDR Read Data
Address Register
Register

Address
Register Barrel
Incrementer Mul/Div
Bank Shifter
I_HADDR ALU
A ALU
Address
Register
Writeback

INTADDR

27
Pipeline changes for ARM9TDMI

ARM7TDMI
ARM decode
Instruction ThumbARM Reg Reg
Shift ALU
Fetch decompress Read Write
Reg Select

FETCH DECODE EXECUTE

ARM9TDMI
ARM or Thumb
Instruction Inst Decode Memory Reg
Shift + ALU Write
Fetch Reg Reg Access
Decode Read
FETCH DECODE EXECUTE MEMORY WRITE

28
Cortex-M3 Pipeline
 Cortex-M3 has 3-stage fetch-decode-execute pipeline
 Similar to ARM7
 Cortex-M3 does more in each stage to increase overall
performance

1st Stage - Fetch 2nd Stage - Decode 3rd Stage - Execute

Address Data Phase


AGU Phase & Write Load/Store &
Back Branch

Instruction
Fetch
Decode & Multiply & Divide Write
(Prefetch)
Register Read

Branch Shift ALU & Branch


Branch forwarding & speculation

Execute stage branch (ALU branch & Load Store Branch)

29
ARM10 vs. ARM11 Pipelines
ARM10
Branch Memory
ARM or
Prediction Shift + ALU
Thumb Reg Read Access Reg
Instruction Write
Instruction
Decode Multiply
Fetch Multiply
Add
FETCH ISSUE DECODE EXECUTE MEMORY WRITE

ARM11

Shift ALU Saturate

Fetch Fetch MAC MAC MAC Write


Decode Issue
1 2 1 2 3 back

Data Data
Address Cache Cache
1 2

30
Full Cortex-A8 Pipeline Diagram
13-Stage Integer Pipeline 10-Stage NEON Pipeline

Architectural register file

NEON register file


31
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
 SoC Design
Development Tools

32
An Example AMBA System

High Performance
APB
ARM processor UART

High
Bandwidth AHB Timer
APB
External
Bridge
Memory Keypad
Interface

High-bandwidth DMA PIO


on-chip RAM Bus Master
Low Power
Non-pipelined
High Performance Simple Interface
Pipelined
Burst Support
Multiple Bus Masters

33
AHB Structure

Arbiter

HADDR
HADDR HWDATA Slave
Master HWDATA
#1
HRDATA
#1
HRDATA

Address/Control
Slave
#2
Master
#2

Write Data
Slave
Read Data #3
Master
#3

Slave
#4
Decoder

34
AHB basic signal timing

Address Phase Data Phase A Data Phase B


A Address Phase B

HCLK

HADDR A B C

HWRITE A B C

HWDATA A B

HRDATA A B

HRESP OKAY A OKAY B

HREADY

35
Mali200 + GP2 SoC Integration

 Shipped as synthesizable
Clock
Verilog
Reset
IRQs Mali 200 Mali GP2
 Mali 200 + GP2 requires a
IDLEs single instant in the SoC,
with a small number of
AXI Fabric connections to be made.

APB  IDLES can be used for


gating the Mali200 and GP2
AXI MMU core clock

36
Typical GPU SoC Design
Mali 200 Mali GP2

ARM1176JZF Int
PL390 nRst
D I Local AXI Interconnect
GIC
CLCD
Sys PL111
L230 Mali
Ctrl SDRAMC DDR
MMU
M S M
PL340 PHY

APB

PL301 High-performance matrix

APB Peripheral Sub-System

 Designed and optimised for AMBA: provides easier integration with ARM cores and fabric IP
 Unified Memory Architecture

37
Physical IP*
 Classic (180nm to 90nm):
Access to ARM Physical IP
 Everything needed to implement a chip
 High-quality libraries and memories

 DesignStart:
Free access to ARM processor IP
 ARM926EJ™ hardened from
180nm to 90nm for major foundry processes
 Separate license needed to produce silicon
 SoC designs can be done with these models

* Material is currently limited to research programs

38
ARM PIPD Logic Product Families
High Speed

High Performance (Advantage-HS / SAGE-HS)

(SC12)

Power Management Kits


High Density (Advantage™)
High Density (SAGE-X™)

ECO Kits
(SC9 / SC10)
(SC9 / SC10 tapped
tapped))
Low Power
Low Area

Ultra High Density (Metro™)


(SC7 / SC8)

Power Management Kits


ECO Kits

Process
180nm 130nm 90nm 65nm 45nm 32nm 28nm Geometry

39
Agenda
Introduction to ARM Ltd
ARM Architecture/Programmers Model
Data Path and Pipelines
SoC Design
 Development Tools

40
ARM Debug Architecture
Ethernet

Debugger (+ optional
trace tools)

JTAG port Trace Port


 EmbeddedICE Logic
 Provides breakpoints and processor/system
access
TAP
 JTAG interface (ICE) controller
 Converts debugger commands to JTAG ETM
signals
 Embedded trace Macrocell (ETM) EmbeddedICE
Logic
 Compresses real-time instruction and data
access trace
 Contains ICE features (trigger & filter logic)
 Trace port analyzer (TPA) ARM
 Captures trace in a deep buffer core

41
Keil Development Tools for ARM

 Includes ARM macro assembler, compilers (ARM RealView C/C++


Compiler, Keil CARM Compiler, or GNU compiler), ARM linker, Keil uVision
Debugger and Keil uVision IDE
 Keil uVision Debugger accurately simulates on-chip peripherals (I2C, CAN,
UART, SPI, Interrupts, I/O Ports, A/D and D/A converters, PWM, etc.)
 Evaluation Limitations
 16K byte object code + 16K data limitation
 Some linker restrictions such as base addresses for code/constants
 GNU tools provided are not restricted in any way
 http://www.keil.com/demo/

42
Keil Development Tools for ARM

43
University Resources

 http://www.arm.com/support/university/

 University@arm.com

44
Your Future at ARM…
 Graduate and Internship/Co-op Opportunities
 Engineering: Memory, Validation, Performance, DFT, R&D, GPU and more!
 Sales and Marketing: Corporate and Technical
 Corporate: IT, Patents, Services (Training and Support), and Human
Resources

 Incredible Culture and Comprehensive Benefit Package


 Competitive Reward
 Work/Life Balance
 Personal Development
 Brilliant Minds and Innovative Solutions

 Keep in Touch!
 www.arm.com/about/careers

45
TI Panda Board
OMAP4430 Processor
 1 GHz Dual-core ARM
Cortex-A9 (NEON+VFP)
 C64x+ DSP
 PowerVR SGX 3D GPU
 1080p Video Support

POP Memory
 1 GB LPDDR2 RAM

USB Powered
 < 4W max consumption
(OMAP small % of that)
 Many adapter options
(Car, wall, battery, solar, ..)

46
Project Ideas Using Panda
 OS Projects
 OS porting to ARM/Cortex (TI OMAP)
 MythTV system
 “Super-Panda” – stack of Pandas as compute engine and task
distribution
 Linux applications

 NEON Optimization Projects


 Codec optimization in ffmpeg (pick your favorite codec)
 Voice and image recognition
 Open-source Flash player optimizations (swfdec)

47
Fin

48
Nokia N95 Multimedia Computer
OMAP™ 2420
Applications Processor
ARM1136™ processor-based
SoC, developed using Magma ®
Blast® family and winner of
2005 INSIGHT Award for ‘Most
Innovative SoC’

Symbian OS™ v9.2


Operating System supporting ARM
processor-based mobile devices,
developed using ARM® RealView®
Compilation Tools

S60™ 3rd Edition


S60 Platform supporting ARM
processor-based mobile devices

Mobiclip™ Video Codec


Software video codec for ARM
processor-based mobile devices

ST WLAN Solution
Ultra-low power 802.11b/g WLAN
chip with ARM9™ processor-based
MAC

Connect. Collaborate. Create.


49
Beagle Board

50
Targeting community development
Wikis, blogs,
$149 Personally promotion of
> 1000 participants affordable community
and growing activity

Active &
technical Freedom to
community innovate
Addressing
Open access to
open source Instant access to
hardware community >10 million lines
documentation of code
needs
Opportunity Free
to tinker and software
learn

51
Fast, low power, flexible expansion
OMAP3530 Processor
Peripheral I/O
 600MHz Cortex-A8
 DVI-D video out
 NEON+VFPv3
3”  SD/MMC+
 16KB/16KB L1$
 256KB L2$  S-Video out
 430MHz C64x+ DSP  USB 2.0 HS OTG
 32K/32K L1$  I2C, I2S, SPI,
 48K L1D MMC/SD
 32K L2
 JTAG
 PowerVR SGX GPU
 Stereo in/out
 64K on-chip RAM
 Alternate power
POP Memory  RS-232 serial
 128MB LPDDR RAM
 256MB NAND flash USB Powered
 2W maximum consumption
 OMAP is small % of that
 Many adapter options
 Car, wall, battery, solar, …
52
And more… On-going collaboration at BeagleBoard.org
 Live chat via IRC for 24/7 community support
 Links to software projects to download

Other Features
 4 LEDs
3”
 USR0 Peripheral I/O
 USR1
 DVI-D video out
 PMU_STAT
 SD/MMC+
 PWR
 2 buttons  S-Video out
 USER  USB HS OTG
 RESET  I2C, I2S, SPI,
 4 boot sources MMC/SD
 SD/MMC  JTAG
 NAND flash  Stereo in/out
 USB
 Alternate power
 Serial
 RS-232 serial

53
Project Ideas Using Beagle
 OS Projects
 OS porting to ARM/Cortex (TI OMAP)
 MythTV system
 “Super-Beagle” – stack of Beagles as compute engine and task
distribution
 Linux applications

 NEON Optimization Projects


 Codec optimization in ffmpeg (pick your favorite codec)
 Voice and image recognition
 Open-source Flash player optimizations (swfdec)

54

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy