Efficient Embedded Systems
Efficient Embedded Systems
• Arm is committed to making the language we use inclusive, meaningful, and respectful.
Our goal is to remove and replace non-inclusive language from our vocabulary to reflect
our values and represent our global ecosystem.
•
• Arm is working actively with our partners, standards bodies, and the wider ecosystem to
adopt a consistent approach to the use of inclusive language and to eradicate and
replace offensive terms. We recognise that this will take time. This course contains
references to non-inclusive language; it will be updated with newer terms as those
terms are agreed and ratified with the wider community.
•
• Contact us at education@arm.com with questions or comments about this course. You
can also report non-inclusive and offensive terminology usage in Arm content at
terms@arm.com.
1 © 2021 Arm
Introduction to
Embedded Systems
Design
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Outline what is meant by Embedded Systems, give examples of its application and state
its benefits.
• Describe the attributes of embedded systems.
• Outline the constraints on embedded systems including their impact.
3 © 2021 Arm
Outline
• Introduction to Embedded Systems
• Example Embedded Systems:
– Bike Computer
– Motor Control Unit
– Gasoline automobile engine control unit
• Options for Building Embedded Systems
• Benefits of embedded systems
• Embedded System Functions
• Attributes of embedded systems
• MCU Hardware & Software for Concurrency
• Embedded System Constraints and their impacts
4 © 2021 Arm
Introduction
• What is an Embedded System? • Networks
• Application-specific computer system • Often embedded system will use multiple
• Built into a larger system processors communicating across a network to
• Why add a computer to the larger system? lower parts and assembly costs and improve
reliability
• Better performance
• More functions and features
• Lower cost
• More dependability
• Economics
• Microcontrollers (used for embedded
computers) are high-volume, so recurring cost
is low
• Nonrecurring cost dominated by software
development
5 © 2021 Arm
Introduction to embedded systems
• What is an embedded system?
• Application-specific computer system
• Built into a larger system embedded
• Often with real-time computing constraints system
• Why add an embedded computer to a larger system?
• Better performance
• More functions and features
• Lower cost, e.g., through automation
• More dependability
Embedded Computer
Input from Software Output to environment
environment Hardware
7 © 2021 Arm
Example embedded system: bike computer
• Functions
• Speed and distance measurement
Input:
• Constraints Wheel rotation
• Size Mode key
• Cost
• Power and energy
• Weight
• Inputs
• Wheel rotation indicator
• Mode key
• Output Output:
• Liquid Crystal Display Display speed and
distance
• Use Low Performance Microcontroller
• 8-bit, 10 MIPS
8 © 2021 Arm
Motor Control Unit
• Functions • 32-bit, 256KB flash memory, 80MHz
• Motor control
• System communications
• Current monitoring
• Rotation speed detection
• Constraints
• Reliability in
harsh environment
• Cost
• Weight
9 © 2021 Arm
Gasoline automobile engine control unit
• Functions Many inputs and outputs
• Fuel injection Discrete sensors & actuators
• Air intake setting Network interface to rest of car
• Spark timing
• Exhaust gas circulation Use high-performance microcontroller
• Electronic throttle control E.g. 32-bit, 3MB flash memory, 150-300 MHz
• Knock control
• Constraints
• Reliability in harsh environment
• Cost
• Weight
10 © 2021 Arm
Options for Building Embedded Systems
Implementation Design Unit Upgrades Size Weight Power System
Cost Cost & Bug Speed
Fixes
Dedicated Hardware
ASIC high very low hard tiny - 1 die very low low extremely
($500K/ fast
mask set)
Programmable logic – low mid easy small low medium to very fast
FPGA, PLD high
med. moderate
Generic Hardware
memory + peripherals
Microcontroller (int. low mid to low easy small low medium slow to
memory & peripherals) moderate
11 © 2021 Arm
Benefits of embedded systems
• Greater performance and efficiency
• Software makes it possible to provide sophisticated controls
• Lower costs
• Less expensive components can be used
• Manufacturing costs reduced
• Operating costs reduced
• Maintenance costs reduced
• More features
• Many not possible or practical with other approaches
• Better dependability
• Adaptive system which can compensate for failures
• Better diagnostics to improve repair time
12 © 2021 Arm
Embedded System Functions
• Closed-loop control system
• Monitor a process, adjust an output to maintain desired set point (temperature, speed, direction, etc.)
• Sequencing
• Step through different stages based on environment and system
• Signal processing
• Remove noise, select desired signal features
• Communications and networking
• Exchange information reliably and quickly
13 © 2021 Arm
Attributes of embedded systems
• Interfacing with larger system and environment
• Analog signals for reading sensors
– Typically use a voltage to represent a physical value
• Power electronics for driving motors, solenoids
• Digital interfaces for communicating with other digital devices
– Simple – switches
– Complex – displays
• Concurrent, reactive behaviors
• Must respond to sequences and combinations of events
• Real-time systems have deadlines on responses
• Typically must perform multiple separate activities concurrently
14 © 2021 Arm
Attributes of embedded systems
• Fault handling
• Many systems must operate independently for long periods of time, requiring them to handle
likely faults without crashing
• Often fault-handling code is larger and more complex than the normal-case code
• Diagnostics
• Help service personnel determine problems quickly
15 © 2021 Arm
Example Analog Sensor - Depth Gauge
V_ref
// Your software
Analog to ADC_Code = adc_read();
Pressure
Digital V_sensor = ADC_code * V_ref / ADC_MASK;
Sensor
Converter Pressure_kPa = 250 * (V_sensor / V_supply + 0.04);
Depth_ft = 33 * (Pressure_kPa – Atmos_Press_kPa) / 101.3;
Pressure
ADC
Voltages Output Codes
V_ref 111..111
V_sensor ADC_Code 111..110
111..101
111..100
V_sensor ADC_Code
000..001
Typical Absolute Pressure vs. Output Ground 000..000
3.0
2.5 on V_sensor and V_ref
2.0
1.5 • Code can convert that integer to a something more useful
1.0
0.5 • first a float representing the voltage,
0.0 • then another float representing pressure,
0 20 40 60 80 100 120 140 160 180 200 220 240 260
Pressure [kPa]
• finally another float representing depth
16 © 2021 Arm
Microcontroller vs. Microprocessor
• Both have a CPU core to execute instructions
• Microcontroller has peripherals for concurrent embedded interfacing and control
• Analog
• Non-logic level
signals
• Timing
• Clock generators
• Communications
• Reliability and safety
17 © 2021 Arm
CPUs → MCUs → embedded systems
• Microprocessor (CPU)
• Defined typically as a single processor core that supports at least instruction fetching, decoding, and
executing
• Normally can be used for general-purpose computing, but needs to be supported with memories
and Input/Outputs(IOs)
Register banks
ALU
Microprocessor
18 © 2021 Arm
CPUs → MCUs → embedded systems
• Microcontroller (MCU)
• Typically has a single processor core
• Has memory blocks, Digital IOs, Analog IOs, and other basic peripherals
• Typically used for basic control purpose, such as embedded applications
Program Data
Microprocessor
Memory Memory
System Bus
19 © 2021 Arm
CPUs → MCUs → embedded systems
• Embedded System
• Typically implemented using MCUs
• Often integrated into a larger mechanical or electrical system
• Usually has real-time constraints
Embedded
System
20 © 2021 Arm
MCU Hardware & Software for Concurrency
Peripheral Bus
• CPU executes instructions from one or more thread
of execution Timers
• Specialized hardware peripherals add dedicated
concurrent processing ADC
• Watchdog timer Cortex-M
• Analog interfacing Core
• Timers
Interrupt GPIO
• Communications with other devices
Controller
• Detecting external signal events
• LCD driver
UART
• Peripherals use interrupts to notify CPU of events
I2C
21 © 2021 Arm
Concurrent Hardware & Software Operation
Software Hardware Software Hardware Software
Timer
Main Peripheral Timer ISR A/D Converter Peripheral ADC ISR
Start timer Timer interrupt Start ADC
ADC interrupt
ADC interrupt
ADC interrupt
ADC_done = 1 ADC interrupt
Timer interrupt Start ADC
ADC interrupt
Time
ADC interrupt
ADC interrupt
ADC_done = 1 ADC interrupt
Timer interrupt Start ADC
ADC interrupt
ADC interrupt
ADC interrupt
ADC_done = 1 ADC interrupt
• Embedded systems rely on both MCU hardware peripherals and software to get
everything done on time
22 © 2021 Arm
Constraints
• Cost
• Competitive markets penalize products which don’t deliver adequate value for the cost
• Environment
• Temperatures may range from -40°C to 125°C, or even more
23 © 2021 Arm
Impact of Constraints
• Microcontrollers used (rather than microprocessors)
• Include peripherals to interface with other devices, respond efficiently
• On-chip RAM, ROM reduce circuit board complexity and cost
• Programming language
• Programmed in C rather than Java (smaller and faster code, so less expensive MCU)
• Some performance-critical code may be in assembly language
• Operating system
• Typically, no OS, but instead simple scheduler (or even just interrupts + main code
(foreground/background system)
• If OS is used, likely to be a lean RTOS
24 © 2021 Arm
Building embedded systems using MCUs
• In most embedded systems, MCUs are chosen to be the best solution, since they offer:
• Low development and manufacturing cost
• Easy porting and updating
• Light footprint
• Relatively low power consumption
• Satisfactory performance for low-end products
25 © 2021 Arm
Curriculum Overview
• Introductory Course: Building an Embedded System with an MCU
• Microcontroller concepts
• Software design basics
• Processor core architecture and interrupt system
• C as implemented in assembly language
• Peripherals and interfacing
26 © 2021 Arm
Why Are We…?
• Using C instead of Java (or Python, or your other favorite language)?
• C is the de facto standard for embedded systems because of:
– Precise control over what the processor is doing.
– Modest requirements for ROM, RAM, and MIPS, so much cheaper system
– Predictable behavior, no OS (e.g. Garbage Collection) preemption
• Learning assembly language?
• The compiler translates C into assembly language. To understand whether the compiler is doing a
reasonable job, you need to understand what it has produced.
• Sometimes we may need to improve performance by writing assembly versions of functions.
27 © 2021 Arm
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2021 Arm
Software Design Basics
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Outline the meaning of concurrency and give examples .
• Define the following terms: TRelease(i), TLatency(i), TResponsponse(i), TTask(i) and
TISR(i).
• Describe the following two scheduling approaches: hardware interrupt and software
scheduler.
• Explain static, dynamic run-to-completion and dynamic pre-emptive scheduling.
• Describe the following: Cyclic Executive with Interrupts, Run-To-Completion Scheduler
and Preemptive Scheduler.
• Identify the components of RTOS .
• Outline the steps in Waterfall V software development models.
2 © 2021 Arm
Overview
• Concurrency
• How do we make things happen at the right times?
• Software Engineering for Embedded Systems
• How do we develop working code quickly?
3 © 2021 Arm
CONCURRENCY
© 2021 Arm
MCU Hardware & Software for Concurrency
• CPU executes instructions from one or more thread Peripheral Bus
of execution
Timers
• Specialized hardware peripherals add dedicated
concurrent processing
• Watchdog timer ADC
• Analog interfacing Cortex-M
• Timers Core
• Communications with other devices GPIO
Interrupt
• Detecting external signal events Controller
• LCD driver
• Peripherals use interrupts to notify CPU of events UART
I2C
5 © 2021 Arm
Concurrent Hardware & Software Operation
Software Hardware Software Hardware Software
Timer
Main Peripheral Timer ISR A/D Converter Peripheral ADC ISR
Start timer Timer interrupt Start ADC
ADC interrupt
ADC interrupt
ADC interrupt
ADC_done = 1 ADC interrupt
Timer interrupt Start ADC
ADC interrupt
Time
ADC interrupt
ADC interrupt
ADC_done = 1 ADC interrupt
Timer interrupt Start ADC
ADC interrupt
ADC interrupt
ADC interrupt
ADC_done = 1 ADC interrupt
• Embedded systems rely on both MCU hardware peripherals and software to get
everything done on time
6 © 2021 Arm
CPU Scheduling
• MCU’s Interrupt system provides a basic scheduling approach for CPU
• “Run this subroutine every time this hardware event occurs”
• Is adequate for simple systems
• How do we make the processor responsive? (How do we make it do the right things at
the right times?)
• If we have more software threads than hardware threads, we need to share the processor.
7 © 2021 Arm
Definitions
TRelease
Other
processing
Ttask
Scheduler
Task or ISR Code
Latency
Response
Time
Time
• TRelease(i) = Time at which task (or interrupt) i requests service/is released/is ready to run
• TLatency (i) = Delay between release and start of service for task i
• TResponse(i) = Delay between request for service and completion of service for task i
• TTask(i) = Time needed to perform computations for task i
• TISR(i) = Time needed to perform interrupt service routine i
8 © 2021 Arm
Scheduling Approaches
• Rely on MCU’s hardware interrupt system to run right code
• Event-triggered scheduling with interrupts
• Works well for many simple systems
9 © 2021 Arm
Event-Triggered Scheduling using Interrupts
10 © 2021 Arm
Bike Computer Functions
Reset ISR 1: ISR 2: ISR 3:
Wheel rotation Mode Key Time of Day Timer
Configure timer, rotations++; mode++; cur_time ++;
inputs and if(rotations> mode = mode % lcd_refresh--;
outputs R_PER_MILE/10) { NUM_MODES; if (lcd_refresh==0) {
tenth_miles++; return from interrupt; convert tenth_miles
cur_time = 0; rotations = 0; and display
rotations = 0; } convert speed
tenth_miles = 0; speed = and display
circumference / if (mode == 0)
while (1) { (cur_time – prev_time); convert cur_time
sleep; compute avg_speed; and display
} prev_time = cur_time; else
return from interrupt convert avg_speed
and display
lcd_refresh =
LCD_REF_PERIOD
}
11 © 2021 Arm
A More Complex Application
12 © 2021 Arm
Application Software Tasks
• Dec: Decode GPS sentence to find current vehicle position.
• Check: Check to see if approaching any pothole locations. Takes longer as the number of
potholes in database increases.
• Rec: Record position to flash memory. Takes a long time if erasing a block.
• Sw: Read user input switches. Run 10 times per second
• LCD: Update LCD with map. Run 4 times per second
Dec
Check
Rec
Sw
LCD
Time
13 © 2021 Arm
How do we schedule these tasks?
Dec • Task scheduling: Deciding which task should be
running now
Check
• Two fundamental questions:
Rec • Do we run tasks in the same order every time?
Yes: Static schedule (cyclic executive, round-robin)
No: Dynamic, prioritized schedule
Sw
• Can one task preempt another, or must it wait for
LCD completion?
Yes: Preemptive
No: Non-preemptive (cooperative, run-to-completion)
14 © 2021 Arm
Static Schedule (Cyclic Executive)
Dec Check Rec Sw LCD Dec
15 © 2021 Arm
Static Schedule Example
GPS Data Arrives Checking complete
Response Time
16 © 2021 Arm
Dynamic Scheduling
• Allow schedule to be computed on-the-fly
• Based on importance or something else
• Simplifies creating multi-rate systems
17 © 2021 Arm
Dynamic RTC Schedule
GPS Data Arrives Checking complete
Response Time
18 © 2021 Arm
Task State and Scheduling Rules
• Scheduler chooses among Ready
tasks for execution based on priority
• Scheduling Rules Task is released
Ready
• If no task is running, scheduler starts the (ready to run)
highest priority ready task
• Once started, a task runs until it Start
completes highest
• Tasks then enter waiting state until Waiting priority
triggered or released again ready task
19 © 2021 Arm
Dynamic Preemptive Schedule
Response Time
20 © 2021 Arm
Comparison of Response Times
Static
Rec Sw LCD Dec Check
Dynamic Run-to-Completion
Rec Dec Check
Dynamic Preemptive
Dec Check
• Pros
• Preemption offers best response time
Can do more processing (support more potholes, or higher vehicle speed)
Or can lower processor speed, saving money, power
• Cons
• Requires more complicated programming, more memory
• Introduces vulnerability to data race conditions
21 © 2021 Arm
Common Schedulers
• Cyclic executive - non-preemptive and static
• Run-to-completion - non-preemptive and dynamic
• Preemptive and dynamic
22 © 2021 Arm
Cyclic Executive with Interrupts
BOOL DeviceARequest, DeviceBRequest,
• Two priority levels DeviceCRequest;
void interrupt HandleDeviceA() {
• main code – foreground
/* do A’s urgent work */
• Interrupts – background ...
DeviceARequest = TRUE;
• Example of a foreground / background }
void main(void) {
system while (TRUE) {
if (DeviceARequest) {
FinishDeviceA();
• Main user code runs in foreground }
if (DeviceBRequest) {
FinishDeviceB();
• Interrupt routines run in background (high }
priority) if (DeviceCRequest) {
FinishDeviceC();
• Run when triggered
}
• Handle most urgent work }
• Set flags to request processing by main loop }
23 © 2021 Arm
Run-To-Completion Scheduler
• Use a scheduler function to run task functions at the right rates
• Table stores information per task
Period: How many ticks between each task release
Release Time: how long until task is ready to run
ReadyToRun: task is ready to run immediately
• Scheduler runs forever, examining schedule table which indicates tasks which are ready to run (have been
“released”)
• A periodic timer interrupt triggers an ISR, which updates the schedule table
Decrements “time until next release”
If this time reaches 0, set that task’s Run flag and reload its time with the period
• Priority is typically static, so can use a table with highest priority tasks first for a fast, simple scheduler
implementation.
24 © 2021 Arm
Preemptive Scheduler
• Task functions need not run to completion, but can be interleaved with each other
• Simplifies writing software
• Improves response time
• Introduces new potential problems
• Worst case response time for highest priority task does not depend on other tasks, only
ISRs and scheduler
• Lower priority tasks depend only on higher priority tasks
25 © 2021 Arm
Task State and Scheduling Rules
• Scheduler chooses among Ready tasks
for execution based on priority
What the Ready
task needs
• Scheduling Rules happens
• A task’s activities may lead it to waiting This is This isn’t
(blocked) highest highest
• A waiting task never gets the CPU. It must
be signaled by an ISR or another task.
Waiting priority priority
ready task ready task
• Only the scheduler moves tasks between
ready and running
Task needs
something Running
to happen
26 © 2021 Arm
What’s an RTOS?
• What does Real-Time mean?
• Can calculate and guarantee the maximum response time for each task and interrupt service routine
• This “bounding” of response times allows use in hard-real-time systems (which have deadlines which
must be met)
• What’s in the RTOS
• Task Scheduler
Preemptive, prioritized to minimize response times
Interrupt support
• Core Integrated RTOS services
Inter-process communication and synchronization (safe data sharing)
Time management
• Optional Integrated RTOS services
I/O abstractions?
memory management?
file system?
networking support?
GUI??
27 © 2021 Arm
Comparison of Timing Dependence
Non-preemptive Non-preemptive Dynamic Preemptive Dynamic
Static
Device A ISR
Device A ISR Device A ISR
Device B ISR
Device B ISR Device B ISR
Device ... ISR
Device ... ISR
Device ... ISR Device Z ISR
Device Z ISR
Device Z ISR
Slowest Task
Task 1 Code
Task 5 Code Task 1 Code
Task 1 Code Task 4 Code
Task 2 Code Task 6 Code
Task 3 Code Task 2 Code
Task 2 Code
28 © 2021 Arm
Task 3 Max
Comparison of RAM Requirements
Non-preemptive Non-preemptive Preemptive
Static Dynamic Dynamic
© 2021 Arm
Good Enough Software, Soon Enough
• How do we make software correct enough without going bankrupt?
• Need to be able to develop (and test) software efficiently
• Follow a good plan
• Start with customer requirements
• Design architectures to define the building blocks of the systems (tasks, modules, etc.)
• Add missing requirements
Fault detection, management and logging
Real-time issues
Compliance to a firmware standards manual
Fail-safes
• Create detailed design
Implement the code, following a good development process
Perform frequent design and code reviews
Perform frequent testing (unit and system testing, preferably automated)
Use revision control to manage changes
• Perform postmortems to improve development process
31 © 2021 Arm
What happens when the plan meets reality?
• We want a robust plan which considers likely risks
• What if the code turns out to be a lot more complex than we expected?
• What if there is a bug in our code (or a library)?
• What if the system doesn’t have enough memory or throughput?
• What if the system is too expensive?
• What if the lead developer quits?
• What if the lead developer is incompetent, lazy, or both (and won’t quit!)?
• What if the rest of the team gets sick?
• What if the customer adds new requirements?
• What if the customer wants the product two months early?
• Successful software engineering depends on balancing many factors, many of which are
non-technical!
32 © 2021 Arm
Risk Reduction
• Plan to the work to accommodate risks
33 © 2021 Arm
Software Lifecycle Concepts
• Coding is the most visible part of a software development process but is not the only
one
• The software will likely be enhanced over time - Extensive downstream modification
and maintenance!
• Corrections, adaptations, enhancements & preventive maintenance
34 © 2021 Arm
Requirements
• Ganssle’s Reason #5 for why embedded projects fail: Vague Requirements
• Types of requirements
• Functional - what the system needs to do
• Nonfunctional - emergent system behaviors such as response time, reliability, energy efficiency, safety, etc.
• Constraints - limit design choices
• Representations
• Text – Liable to be incomplete, bloated, ambiguous, even contradictory
• Diagrams (state charts, flow charts, message sequence charts)
– Concise
– Can often be used as design documents
• Traceability
• Each requirement should be verifiable with a test
• Stability
• Requirements churn leads to inefficiency and often “recency” problem (most recent requirement change is
assumed to be most important)
35 © 2021 Arm
Design Before Coding
Architectural Detailed
Coding Test the Code
Design Design
36 © 2021 Arm
Design Before Coding
• How much of the system do you design before coding?
AD DD C T
AD DD C T
AD DD C T
AD DD C T
AD DD C T
AD DD C T
AD Prototyping DD C T
AD DD C T
AD DD C T
37 © 2021 Arm
Development Models
• How do we schedule these pieces?
38 © 2021 Arm
Waterfall (Idealized)
• Plan the work, and then Analysis
work the plan
• BUFD: Big Up-Front Design Specification
Operation and
Maintenance
39 © 2021 Arm
Waterfall (As Implemented)
• Reality: We are not omniscient,
Analysis
so there is plenty of
backtracking
Specification
Design
Implementation
Verification
Operation and
Maintenance
40 © 2021 Arm
V Model Overview
Requirements
Analysis
Requirements Validation provided by testing Functional
Specification Testing
Review
Architectural Integration
Design Testing
Review
Detailed Integration
Design Testing
Review
S/W Unit
Coding
Testing
Review
Code
• Themes:
• Link front and back ends of life-cycle for efficiency
• Provide “traceability” to ensure nothing falls through the cracks
41 © 2021 Arm
1. Requirements Specification and Validation Plan
• Result of Requirements Analysis
• Should contain:
• Introduction with goals and objectives of system
• Description of problem to solve
• Functional description
provides a “processing narrative” per function
lists and justifies design constraints
explains performance requirements
• Behavioral description shows how system reacts to internal or external events and situations
State-based behavior
General control flow
General data flow
• Validation criteria
tell us how we can decide that a system is acceptable. (Are we done yet?)
is the foundation for a validation test plan
• Bibliography and Appendix refer to all documents related to project and provide supplementary information
42 © 2021 Arm
2. Architectural (High-Level) Design
• Architecture defines the structure of the system
• Components
• Externally visible properties of components
• Relationships among components
43 © 2021 Arm
3. Detailed Design
• Describe aspects of how system behaves
• Flow charts for control or data
• State machine diagram
• Event sequences
44 © 2021 Arm
State Machine for Parsing NMEA-0183
Any char. except *, \r or \n
Start $ Append char to buf.
Append char to buf. Talker + Inc. counter
*, \r or \n, Sentence
non-text, or Type
buf==$SDDBT, $VWVHW, or
counter>6 $YXXDR
Enqueue all chars. from buf
/r or /n
Sentence
Body Any char. except *
Enqueue char
*
Enqueue char
Checksum
1
Any char.
Save as checksum1
Checksum
2
Any char.
Save as checksum2
45 © 2021 Arm
Flowcharts
46 © 2021 Arm
Sequence of Interactions between Components
Software Hardware Software Hardware Software
Main Timer Peripheral Timer ISR A/D Converter Peripheral ADC ISR
Start timer Timer interrupt Start ADC
ADC interrupt
ADC interrupt
ADC interrupt
ADC_done = 1 ADC interrupt
ADC interrupt
ADC interrupt
ADC_done = 1 ADC interrupt
47 © 2021 Arm
4. Coding and Code Inspections
• Coding driven directly by Detailed Design Specification
• Use a version control system while developing the code
• Follow a coding standard
• Eliminate stylistic variations which make understanding code more difficult
• Avoid known questionable practices
• Spell out best practices to make them easier to follow
• Perform code reviews
• Test effectively
• Automation
• Regression testing
48 © 2021 Arm
Peer Code Review
• Inspect the code before testing it
49 © 2021 Arm
5. Software Testing
• Testing IS NOT “the process of verifying the program works correctly”
• The program probably won’t work correctly in all possible cases
Professional programmers have 1-3 bugs per 100 lines of code after it is “done”
• Testers shouldn’t try to prove the program works correctly (impossible)
If you want and expect your program to work, you’ll unconsciously miss failure because human beings are
inherently biased
50 © 2021 Arm
Approaches to Testing
• Incremental Testing
• Code a function and then test it (module/unit/element testing)
• Then test a few working functions together (integration testing)
Continue enlarging the scope of tests as you write new functions
– Incremental testing requires extra code for the test harness
A driver function calls the function to be tested
A stub function might be needed to simulate a function called by the function under test, and which returns or
modifies data.
The test harness can automate the testing of individual functions to detect later bugs
51 © 2021 Arm
Why Test Incrementally?
• Finding out what failed is much easier
• With Big Bang, since no function has been thoroughly tested, most probably have bugs
• Now the question is “Which bug in which module causes the failure I see?”
• Errors in one module can make it difficult to test another module
Errors in fundamental modules (e.g. kernel) can appear as bugs in other many other dependent modules
52 © 2021 Arm
6. Perform Project Retrospectives
• Goals – improve your engineering processes
• Extract all useful information learned from the just-completed project – provide “virtual experience”
to others
• Provide positive non-confrontational feedback
• Document problems and solutions clearly and concisely for future use
53 © 2021 Arm
Example Postmortem Structure
• Product • Support
• Bugs • Tools
• Software design • Team burnout
• Hardware design • Change orders
• Process • Personnel availability
• Code standards
• Code interfacing
• Change control
• How we did it
• Team coordination
54 © 2021 Arm
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2021 Arm
The Arm Cortex-M4 Processor
Architecture
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Outline the features and benefits of the Arm Cortex-M4 processor.
• Outline the functions of the Cortex-M4 processor components including Nested
Vectored Interrupt Controller (NVIC), Wakeup Interrupt Controller (WIC), Memory
Protection Unit (MPU), Bus Interconnect and Debug System.
• Describe the Cortex-M4 processor core registers including their functions.
2 © 2021 Arm
Module syllabus
• Arm Architectures and Processors
• What is Arm Architecture
• Arm Processor Families
• Arm Cortex-M Series
• Cortex-M4 Processor
• Arm Processor vs. Arm Architectures
• Arm Cortex-M4 Processor
• Cortex-M4 Processor Overview
• Cortex-M4 Block Diagram
• Cortex-M4 Registers
3 © 2021 Arm
Arm architectures and processors
• Arm architecture is a family of RISC-based processor architectures
• Well-known for its power efficiency
• Hence widely used in mobile devices, e.g., smartphones and tablets
• Designed and licensed by Arm to a wide eco-system of partners
• Arm Holdings
• The company that designs Arm-based processors
• Arm does not manufacture, but it licenses designs to semiconductor partners who add their own
Intellectual Property (IP) on top of Arm’s IP, which they then fabricate and sell to customers.
• Arm also offers IP other than processors, such as physical IPs, interconnect IPs, graphics cores and
development tools.
4 © 2021 Arm
Arm processor families Cortex-A73
Cortex-A72
Cortex-A57
• Cortex-A series (Application) Cortex-A53
Cortex-A35
Cortex-A32
• High performance processors capable of full Operating System Cortex-A17
Cortex-A15
(OS) support Cortex-A9 Cortex-A
Cortex-A8
Cortex-A7
• Applications include smartphones, digital TV, smart books Cortex-A5
Cortex-R8
• Cortex-R series (Real-time) Cortex-R7
Cortex-R5 Cortex-R
• High performance and reliability for real-time applications; Cortex-R4
Cortex-M7, Cortex-M23, Cortex-M33
• Applications include automotive braking system, powertrains Cortex-M4
Cortex-M3
Cortex-M0+ Cortex-M
• Cortex-M series (Microcontroller) Cortex-M0
SC000
• Cost-sensitive solutions for deterministic microcontroller SC300 SecurCore
applications Arm11
Arm9 Classic
• Applications include microcontrollers, smart sensors Arm7
IP libraries SoC
Cortex-A9 Cortex-R5 Cortex-M4 Arm
ROM RAM
processor
Arm7 Arm9 Arm11
System bus Arm-based
DRAM ctrl FLASH ctrl SRAM ctrl SoC
Peripherals
AXI bus AHB bus APB bus
7 © 2021 Arm
Arm processors vs. Arm architectures
• Arm architecture
• Describes the details of instruction set, programmer’s model, exception model, and memory map
• Documented in the Architecture Reference Manual
• Arm processor
• Developed using one of the Arm architectures
• More implementation details, such as timing information
• Documented in processor’s Technical Reference Manual
Armv4/v4T Armv5/ v4E Armv6 Armv7 Armv8 Architecture
Architecture Architecture Architecture Architecture Armv7-A Armv8-A
e.g. Cortex-A9 e.g. Cortex-A53
Cortex-A57
Armv7-R
e.g. Cortex-R4 Armv8-R
Armv7-M Armv8-M
Arm v6-M
e.g. Cortex-M0, M1 e.g. Cortex-M4
Cortex-M4 Armv7E-M Harvard Entire Entire 1 cycle Yes Yes Yes Optional
Cortex-M7 Armv7E-M
Harvard Entire Entire 1 cycle Yes Yes Yes Optional
9 © 2021 Arm
Cortex-M4 processor overview
• Cortex-M4 Processor
• Introduced in 2010
• Designed with a large variety of highly efficient signal processing features
• Features extended single-cycle multiply accumulate instructions, optimized SIMD arithmetic,
saturating arithmetic and an optional Floating Point Unit.
• High performance efficiency
• 1.25 DMIPS/MHz (Dhrystone Million Instructions Per Second / MHz) at the order of µWatts / MHz
• Low power consumption
• Longer battery life – especially critical in mobile products
• Enhanced determinism
• The critical tasks and interrupt routines can be served quickly in a known number of cycles
10 © 2021 Arm
Cortex-M4 processor features
• 32-bit Reduced Instruction Set Computing (RISC) processor
• Harvard architecture
• Separated data bus and instruction bus
• Instruction set
• Includes the entire Thumb®-1 (16-bit) and Thumb®-2 (16/ 32-bit) instruction sets
• 3-stage + branch speculation pipeline
• Performance efficiency
• 1.25 – 1.95 DMIPS/MHz (Dhrystone Million Instructions Per Second / MHz)
• Supported interrupts
• Non-Maskable Interrupt (NMI) + 1 to 240 physical interrupts
• 8 to 256 interrupt priority levels
11 © 2021 Arm
Cortex-M4 processor features
• Supports sleep modes
• Up to 240 wake-up interrupts
• Integrated WFI (Wait For Interrupt) and WFE (Wait For Event) instructions and sleep on
exit capability (to be covered in more detail later)
• Sleep & deep sleep signals
• Optional retention mode with Arm Power Management Kit
• Enhanced instructions
• Hardware divide (2-12 Cycles)
• Single-cycle 16, 32-bit MAC, single-cycle dual 16-bit MAC
• 8, 16-bit SIMD arithmetic
12 © 2021 Arm
Cortex-M4 processor features
• Debug
• Optional JTAG & Serial-Wire Debug (SWD) Ports
• Up to 8 breakpoints and 4 watchpoints
• Memory Protection Unit (MPU)
• Optional 8 region MPU with sub regions and background region
13 © 2021 Arm
Cortex-M4 processor features
The Cortex-M4 processor is designed to meet the challenges of low dynamic power constraints while
retaining light footprints
180ULL ultra low power process –151 µW/MHz
90LP low power process – 32.82 µW/MHz
40LP low power process – 12.26 µW/MHz
14 © 2021 Arm
Cortex-M4 block diagram
Arm Cortex-M4 Microprocessor
Optional FPU
Nested Vector Optional
interrupt Optional Interrupt
WIC Embedded
Controller Processor core
components (NVIC)
Trace Macrocell
Optional
Optional Memory Optional Serial
Debug
protection unit Wire Viewer
Access Port
real-time
Optional Optional
Flash Data
program
patch data watchpoints tracing
tracing
Bus matrix
SRAM and
Code interface
peripheral interface
17 © 2021 Arm
Cortex-M4 block diagram
• Bus interconnect
• Allows data transfer to take place on different buses simultaneously
• Provides data transfer management, e.g. write buffer, bit-oriented operations (bit-band)
• May include bus bridges (e.g. AHB-to-APB bus bridge) to connect different buses into a
network using a single global memory space
• Includes the internal bus system, the data path in the processor core, and the AHB LITE
interface unit
• Debug subsystem
• Handles debug control, program breakpoints, and data watchpoints
• When a debug event occurs, it can put the processor core in a halted state, so developers can
analyse the status of the processor, such as register values and flags, at that point.
18 © 2021 Arm
Arm Cortex-M4 processor registers
• Processor registers
• The internal registers are used to store and process temporary data within the processor core
• All registers are inside the processor core, hence they can be accessed quickly
• Load-store architecture
To process memory data, they have to be first loaded from memory to registers,
processed inside the processor core using register data only, and then written back
to memory if needed
• Cortex-M4 registers
• Register bank
Sixteen 32-bit registers (thirteen are used for general-purpose)
• Special registers
19 © 2021 Arm
Cortex-M4 registers
Register bank R0
R1
R2
R3
Low
R4 Registers
R5
General purpose
R6
register
R7
R8
R9
R10 High
Registers
R11
R12 MSP
Stack Pointer (SP) R13(banked) Main Stack Pointer
Special registers Program Status Registers (PSR) x PSR APSR EPSR IPSR
PRIMASK Application Execution Interrupt
PSR PSR PSR
Interrupt mask register FAULTMASK
BASEPRI
Stack definition CONTROL
20 © 2021 Arm
Cortex-M4 registers
• R0 – R12: general purpose registers
• Low registers (R0 – R7) can be accessed by any instruction Data Data
• High registers (R8 – R12) sometimes cannot be accessed e.g. by some PUSH POP
Thumb (16-bit) instructions
Low
21 © 2021 Arm
Cortex-M4 registers
• Program Counter (PC)
• Records the address of the current instruction code Data Data
Code
22 © 2021 Arm
Cortex-M4 registers
• R14: Link Register (LR)
• The LR is used to store the return address of a subroutine or a function call
• The program counter (PC) will load the value from LR after a function is finished
Current PC Current LR
PC LR
1. Save current Main Main
PC to LR Program Program
Code region
Code region
LR code Load PC with the code
address in LR to
return to the main
2. Load PC with program
the starting
address of the
subroutine subroutine
subroutine Current PC
PC
APSR N Z C V Q Reserved
24 © 2021 Arm
Cortex-M4 registers
• APSR
• N: negative flag – set to one if the result from ALU is negative
• Z: zero flag – set to one if the result from ALU is zero
• C: carry flag – set to one if an unsigned overflow occurs
• V: overflow flag – set to one if a signed overflow occurs
• Q: sticky saturation flag – set to one if saturation has occurred in saturating arithmetic
instructions, or overflow has occurred in certain multiply instructions
• IPSR
• ISR number – current executing interrupt service routine number
• EPSR
• T: Thumb state – always one since Cortex-M4 only supports the Thumb state (more on
processor states in the next module)
25
• IC/IT: Interrupt-Continuable Instruction (ICI) bit, IF-THEN instruction status bit
© 2021 Arm
Cortex-M4 registers
• Exception mask registers
• 1-bit PRIMASK
If set to one, will block all the interrupts apart from non-maskable interrupt
(NMI) and the hard fault exception
• 1-bit FAULTMASK
If set to one, will block all the interrupts apart from NMI
• 1-bit BASEPRI
IF set to one, will block all interrupts of the same or lower level (only allowing for
interrupts with higher priorities)
• CONTROL: special register
• 1-bit stack definition
Set to one to use the process stack pointer (PSP)
Clear to zero to use the main stack pointer (MSP)
26 © 2021 Arm
Cortex-M4 registers
PRIMASK
PRIMASK Reserved
FAULTMASK
FAULTMASK Reserved
BASEPRI
BASEPRI Reserved
CONTROL Reserved
27 © 2021 Arm
Useful resources
• Architecture Reference Manual:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0403c/index.html
• Cortex-M4 Technical Reference Manual:
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0439d/DDI0439D_cortex_m4_processor_r0
p1_trm.pdf
• Cortex-M4 Devices Generic User Guide:
http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/DUI0553A_cortex_m4_dgug.pdf
28 © 2021 Arm
The Arm Cortex-M4 Processor
Architecture
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Outline the Cortex-M4 processor memory map and its memory regions including their
functions.
• Describe bit-band operation and describe its benefits.
• Define Endianness and the concepts of Little-endian and big-endian.
• Explain key features of the Thumb instruction sets.
2 © 2021 Arm
Module syllabus
• Cortex-M4 Memory Map
• Cortex-M4 Memory Map
• Bit-band Operations
• Cortex-M4 Program Image and Endianness
• Arm Cortex-M4 Processor Instruction Set
• Arm and Thumb Instruction Set
• Cortex-M4 Instruction Set
3 © 2021 Arm
Arm Cortex-M4 memory map
• Note that, despite the default definitions, the actual usage of the memory map can also be
flexibly defined by the user, apart from some fixed memory addresses such as the internal
private peripheral bus.
4 © 2021 Arm
Arm Cortex-M4 memory map
Reserved for other purposes Vendor specific 0xFFFFFFFF
ROM table
Memory 0xE0100000
512MB
Private peripherals Private Peripheral Bus 0xE00FFFFF External PPB
e.g. NVIC, SCS (PPB) 0xE0000000
External PPB
0xDFFFFFFF
Embedded trace macrocell
Mainly used for external peripherals Trace port interface unit
e.g. SD card External device 1GB
Reserved
0xA0000000
0x9FFFFFFF System Control Space, including
Mainly used for external memories Nested Vectored Interrupt
e.g. external DDR, FLASH, LCD External RAM 1GB Controller (NVIC) Internal PPB
0x60000000 Reserved
Mainly used for on-chip peripherals 0x5FFFFFFF Fetch patch and breakpoint unit
e.g. AHB, APB peripherals Peripherals 512MB
0x40000000 Data watchpoint and trace unit
0x3FFFFFFF
Mainly used for data memory
e.g. on-chip SRAM, SDRAM
SRAM 512MB Instrumentation trace macrocell
0x20000000
0x1FFFFFFF
Mainly used for program code
Code 512MB
e.g. on-chip FLASH 0x00000000
5 © 2021 Arm
Arm Cortex-M4 memory map
• Code region
• Primarily used to store program code
• Can also be used for data memory
• On-chip memory, such as on-chip FLASH
• SRAM region
• Primarily used to store data, such as heaps and stacks
• Can also be used for program code
• On-chip memory; despite its name “SRAM”, the actual device could be SRAM, SDRAM, etc
• Peripheral region
• Primarily used for peripherals, such as Advanced High-performance Bus (AHB) or Advanced Peripheral
Bus (APB) peripherals
• On-chip peripherals
6 © 2021 Arm
Arm Cortex-M4 memory map
• External RAM region
• Primarily used to store large data blocks or memory caches
• Off-chip memory, slower than on-chip SRAM region
• External device region
• Primarily used to map to external devices
• Off-chip devices, such as SD card
• Private Peripheral Bus (PPB)
• Provides access to internal and external processor resources
7 © 2021 Arm
Cortex-M4 memory map example
Chip Vendor specific 0xFFFFFFFF
Silicon Memory 0xE0100000
NVIC
Cortex-M4 PPB SCS Private Peripheral Bus 0xE00FFFFF
Debug CRTL (PPB) 0xE0000000
0xDFFFFFFF
External device
AHB/ APB
0xA0000000
0x9FFFFFFF
External RAM
On-chip FLASH On-chip SRAM Timer UART GPIO
0x60000000
(Code Region) (SRAM Region) Peripheral Region 0x5FFFFFFF
Peripherals
0x40000000
0x3FFFFFFF
SRAM
External memory interface External device interface 0x20000000
(External RAM Region) (External Device Region) Code
0x1FFFFFFF
0x00000000
External SRAM,
External LCD SD card
FLASH
8 © 2021 Arm
Bit-band operations
• Bit-band operations allow a single load/store operation to access a single bit in the memory,
for example, to change a single bit of one 32-bit data:
• Normal operation without bit-band (read-modify-write)
Read the value of 32-bit data
Modify a single bit of the 32-bit value (keep other bits unchanged)
Write the value back to the address
• Bit-band operation
Directly write a single bit (0 or 1) to the “bit-band alias address” of the data
9 © 2021 Arm
Bit-band operation example
• For example, in order to set bit[3] in word data in address 0x20000000:
;Read-Modify-Write Operation
10 © 2021 Arm
Bit-band operation example
;Bit-band Operation
• Bit-band operation
• Directly set the bit by writing ‘1’ to address 0x2200000C, which is the alias address of the fourth bit of
the 32-bit data at 0x20000000
• In effect, this single instruction is mapped to 2 bus transfers: read data from 0x20000000 to the buffer,
and then write to 0x20000000 from the buffer with bit [3] set
11 © 2021 Arm
Bit-band alias address
• Each bit of the 32-bit data is one-to-one mapped to the bit-band alias address
• For example, the fourth bit (bit [3]) of the data at 0x20000000 is mapped to the bit-band alias address
at 0x2200000C
• Hence, to set bit [3] of the data at 0x20000000, we only need to write ‘1’ to address 0x2200000C
• In Cortex-M4, there are two pre-defined bit-band alias regions: one for SRAM region, and one for
peripherals region
Real 32-bit data Bit-band alias
address address
0x20000008 0x22000100
0x20000004 0x22000080
0x20000000 0x22000000
0x2200000C
0x22000018
12 © 2021 Arm
Bit-band alias address
• SRAM region
• 32MB memory space (0x22000000 – 0x23FFFFFF) is used as the bit-band alias region for
1MB data (0x20000000 – 0x200FFFFF)
• Peripherals region
• 32MB memory space (0x42000000 – 0x43FFFFFF) is used as the bit-band alias region for
1MB data (0x40000000 – 0x400FFFFF)
0x43FFFFFF
Main program
Bit [1] modified by ISR is overwritten
by the main program
14 © 2021 Arm
Cortex-M4 program image
• The program image in Cortex-M4 contains
• Vector table – includes the starting addresses of exceptions (vectors) and the value of
the main stack point (MSP)
• C start-up routine
• Program code – application code and data
• C library code – program codes for C library functions
Code region
External Interrupts
SysTick
PendSV
Reserved
Start-up routine & Debug monitor
Program code & SVCall
C library code Reserved
Program
Image Usage fault
Bus fault
MemManage fault
Hard fault vector
NMI vector
Vector table Reset vector
0x00000000 Initial MSP value
15 © 2021 Arm
Cortex-M4 program image
16 © 2021 Arm
Cortex-M4 endianness
• Endian refers to the order of bytes stored in memory
• Big endian: lowest byte of a word-size data is stored in bit 0 to bit 7
• Big endian: lowest byte of a word-size data is stored in bit 24 to bit 31
• Cortex-M4 supports both little endian and big endian
• However, endianness only exists in the hardware level
18 © 2021 Arm
Arm and Thumb instruction set
• Mix of Arm and Thumb-1 Instruction sets
• Benefit from both 32-bit Arm (high performance) and 16-bit Thumb-1 (high code density)
• A multiplexer is used to switch between two states: Arm state (32-bit) and Thumb state (16-bit),
which requires a switching overhead
0
Arm
Incoming Instructions
Instruction
Instructions Executing
Thumb remap decoder
1
to Arm
T bit, 0: select Arm,
1: select Thumb
• Thumb-2 instruction set
• Consists of both 32-bit Thumb instructions and original 16-bit Thumb-1 instruction sets
• Compared to 32-bit Arm instructions set, code size is reduced by ~26%, with similar performance
• Capable of handling all processing requirements in one operation state
19 © 2021 Arm
Cortex-M4 instruction set
• Cortex-M4 processor
• Armv7-M architecture
• Supports 32-bit Thumb-2 instructions
• Can handle all processing requirements in one operation state (Thumb state)
• Compared with traditional Arm processors (which use state switching), advantages include:
No state switching overhead – both execution time and instruction space are saved
No need to separate Arm code and Thumb code source files, which makes the
development and maintenance of software easier
Easier to get optimized efficiency and performance
20 © 2021 Arm
Cortex-M4 instruction set
• Arm assembly syntax:
label
mnemonic operand1, operand2, … ; Comments
• Label is used as a reference to an address location
• Mnemonic is the name of the instruction
• Operand1 is the destination of the operation
• Operand2 is normally the source of the operation
• Comments are written after “ ; ”, which does not affect the program, e.g.:
MOVS R3, #0x11 ;Set register R3 to 0x11
• Assembly code can be assembled by either Arm assembler (armasm) or assembly tools from a
variety of vendors (e.g. GNU tool chain). When using the GNU tool chain, the syntax for labels
and comments is slightly different.
21 © 2021 Arm
Arm Cortex M4 Instruction Set - Overview
Instructions supported by the Cortex-M4 processor can be grouped as follows:
• Memory access instructions
• General data processing instructions
• Multiply and divide instructions
• Saturating instructions
• Packing and unpacking instructions
• Bitfield instructions
• Branch and control instructions
• Miscellaneous instructions
• Floating-point instructions
22 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
ADC, ADCS {Rd,} Rn, Op2 Add with Carry N,Z,C,V
ADD, ADDS {Rd,} Rn, Op2 Add N,Z,C,V
ADD, ADDW {Rd,} Rn, #imm12 Add N,Z,C,V
ADR Rd, label Load PC-relative Address
AND, ANDS {Rd,} Rn, Op2 Logical AND N,Z,C
ASR, ASRS Rd, Rm, <Rs|#n> Arithmetic Shift Right N,Z,C
B label Branch
BFC Rd, #lsb, #width Bit Field Clear
BFI Rd, Rn, #lsb, #width Bit Field Insert
BIC, BICS {Rd,} Rn, Op2 Bit Clear N,Z,C
BKPT #imm Breakpoint
BL label Branch with Link
BLX Rm Branch indirect with Link
BX Rm Branch indirect
23 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
24 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
LDRD Rt, Rt2, [Rn, #offset] Load Register with two bytes
25 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
LDRSB, LDRSBT Rt, [Rn, #offset] Load Register with Signed Byte
LDRSH, LDRSHT Rt, [Rn, #offset] Load Register with Signed Halfword
26 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
MUL, MULS {Rd,} Rn, Rm Multiply, 32-bit result N,Z
NOP No Operation
27 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
QASX {Rd, } Rn, Rm Saturating Add and Subtract with Exchange
REVSH Rd, Rn Reverse byte order in bottom halfword and sign extend
28 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
RRX, RRXS Rd, Rm Rotate Right with Extend N,Z,C
SHASX {Rd,} Rn, Rm Signed Halving Add and Subtract with Exchange
29 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
SHSAX {Rd,} Rn, Rm Signed Halving Subtract and Add with Exchange
SMLABB, SMLABT, SMLATB, SMLATT Rd, Rn, Rm, Ra Signed Multiply Accumulate Long (halfwords) Q
SMLAWB, SMLAWT Rd, Rn, Rm, Ra Signed Multiply Accumulate, word by halfword Q
SMMLA Rd, Rn, Rm, Ra Signed Most significant word Multiply Accumulate
30 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
SMMLS, SMMLR Rd, Rn, Rm, Ra Signed Most significant word Multiply Subtract
31 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
STM Rn{!}, reglist Store Multiple registers, increment after
32 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
SUB, SUBW {Rd,} Rn, #imm12 Subtract N,Z,C,V
33 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
UHASX {Rd,} Rn, Rm Unsigned Halving Add and Subtract with Exchange
UHSAX {Rd,} Rn, Rm Unsigned Halving Subtract and Add with Exchange
34 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
UMLAL RdLo, RdHi, Rn, Rm Unsigned Multiply with Accumulate (32 x 32 + 64), 64-
bit result
UMULL RdLo, RdHi, Rn, Rm Unsigned Multiply (32 x 32), 64-bit result
UQASX {Rd,} Rn, Rm Unsigned Saturating Add and Subtract with Exchange
UQSAX {Rd,} Rn, Rm Unsigned Saturating Subtract and Add with Exchange
USADA8 {Rd,} Rn, Rm, Ra Unsigned Sum of Absolute Differences and Accumulate
USAT Rd, #n, Rm {,shift #s} Unsigned Saturate Q
35 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
UXTAB16 {Rd,} Rn, Rm,{,ROR #} Rotate, dual extend 8 bits to 16 and Add
UXTAH {Rd,} Rn, Rm,{,ROR #} Rotate, unsigned extend and Add Halfword
UXTB {Rd,} Rm {,ROR #n} Zero extend a Byte
36 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
Compare two floating-point registers, or one floating- FPSCR
VCMPE.F32 Sd, <Sm | #0.0>
point register and zero with Invalid Operation check
VCVT.S32.F32 Sd, Sm Convert between floating-point and integer
VCVT.S16.F32 Sd, Sd, #fbits Convert between floating-point and fixed point
Convert between floating-point and integer with
VCVTR.S32.F32 Sd, Sm
rounding
VCVT<B|H>.F32.F16 Sd, Sm Converts half-precision value to single-precision
37 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
VMOV Sm, Sm1, Rt, Rt2 Copy 2 Arm core registers to 2 single precision
VMRS Rt, FPSCR Move FPSCR to Arm core register or APSR N,Z,C,V
38 © 2021 Arm
Cortex-M4 instruction set
Mnemonic Operands Brief description Flags
Note: full explanation of each instruction can be found in Cortex-M4 Devices’ Generic User Guide (Ref-4)
39 © 2021 Arm
Cortex-M4 instruction set
• Cortex-M4 suffix
• Some instructions can be followed by suffixes to update processor flags or execute the instruction on a
certain condition
S Update APSR (flags) ADDS R1, #0x21 Add 0x21 to R1 and update APSR
EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, Condition execution Branch to the label if not equal
BNE label
GE, LT, GT, LE e.g. EQ= equal, NE= not equal, LT= less than
40 © 2021 Arm
Data insertion and alignment
• Insert data inside programs
• DCD: insert a word-size data
• DCB: insert a byte-size data
• ALIGN:
– used before inserting a word-size data
– Uses a number to determine the alignment size
• For example:
…
ALIGN 4 ; Align to a word boundary
MY_DATA DCD 0x12345678 ; Insert a word-size data
MY_STRING DCB “Hello”, 0 ; Null terminated string
…
41 © 2021 Arm
Useful resources
• Architecture Reference Manual:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0403c/index.html
• Cortex-M4 Technical Reference Manual:
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0439d/DDI0439D_cortex_m4_processo
r_r0p1_trm.pdf
• Cortex-M4 Devices Generic User Guide:
http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/DUI0553A_cortex_m4_dgug.pdf
42 © 2021 Arm
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2021 Arm
C as Implemented in
Assembly Language
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Outline the compiler build stages.
• Identify the role of each register according to the AAPCS Core Register Use standard.
• Identify the type of memory (read-only or read/write) suitable for a given type of
information.
• Outline the variable class qualifiers.
• Outline the function of the linker.
• Identify how pointers are used in a C program.
• Identify and define simple 1-dimension and 2-dimension arrays.
• Explain the terms and functions of prolog and epilog.
2 © 2021 Arm
Module Syllabus
• We program in C for convenience
• There are no MCUs which execute C, only machine code
• So we compile the C to assembly code, a human-readable representation of machine code
• We need to know what the assembly code implementing the C looks like
• To use the processor efficiently
• To analyze the code with precision
• To find performance and other problems
• An overview of what C gets compiled to
• C start-up module, subroutines calls, stacks, data classes and layout, pointers, control flow, etc.
3 © 2021 Arm
C Programmer’s World: a comprehensive set of features
• As many functions and variables as you want!
• All the memory you could ask for!
• So many data types! Integers, floating point, etc.
• So many data structures! Arrays, lists, trees, sets, dictionaries
• So many control structures! Subroutines, if/then/else, loops, etc.
• Iterators! Polymorphism!
4 © 2021 Arm
Processor’s World
• Data types 23 251 151 11 3 1 1 1
• Integers
• More if you’re lucky! 213 6 234 2 u 1 1 1
• Instructions
• Math: +, -, *, / 2 33 72 1 a 1 1 a
• Logic: AND, OR
• Shift, rotate a 4 h e l l o 1
• Move, swap
• Compare 67 96 a 0 9 9 9 1
• Jump, branch
6 11 d 72 7 0 0 0
28 289 37 54 42 0 0 0
213 6 234 2 31 1 1 1
5 © 2021 Arm
Compiler Stages
• Parser
• reads in C code
• checks for syntax errors
• forms intermediate code (tree representation)
• High-Level Optimizer
• Modifies intermediate code (processor-independent)
• Code Generator
• Creates assembly code step by step from each node of the intermediate code
• Allocates variable uses to registers
• Low-Level Optimizer
• Modifies assembly code (parts are processor-specific)
• Assembler
• Creates object code (machine code)
• Linker/Loader
• Creates executable image from object file
6 © 2021 Arm
Examining Assembly Code before Debugger
• Compiler can generate assembly code listing for reference
• Select in project options
7 © 2021 Arm
Examining Disassembled Program in Debugger
8 © 2021 Arm
A Word on Code Optimizations
• Compiler and rest of toolchain try to optimize code:
• Simplifying operations
• Removing “dead” code
• Using registers
• These optimizations often get in way of understanding what the code does
• Fundamental trade-off: Fast or comprehensible code?
• Compiler optimization levels: Level 0 to Level 3
• Code examples here may use “volatile” data type modifier to reduce compiler optimizations and improve
readability
9 © 2021 Arm
Application Binary Interface
• Defines rules which allow separately developed functions to work together
• Arm Architecture Procedure Call Standard (AAPCS)
• Which registers must be saved and restored
• How to call procedures
• How to return from procedures
• C Library ABI (CLIBABI)
• C Library functions
• Runtime ABI (RTABI)
• Runtime helper functions: 32/32 integer division, memory copying, floating-point operations, data
type conversions, etc.
10 © 2021 Arm
Using Registers
© 2021 Arm
AAPCS Register Use Conventions
• Make it easier to create modular, isolated and integrated code
• Scratch registers are not expected to be preserved upon returning from a called subroutine
• r0-r3
• Preserved (“variable”) registers are expected to have their original values upon returning from a called
subroutine
• r4-r8, r10-r11
12 © 2021 Arm
AAPCS Core Register Use
Register Synonym Special Role in the procedure call standard
r15 PC The Program Counter.
r14 LR The Link Register.
r13 SP The Stack Pointer.
r12 IP The Intra-Procedure-call scratch register.
Must be saved, restored by callee-
r11 v8 Variable-register 8.
procedure, it may modify them.
r10 v7 Variable-register 7.
Platform register. The meaning of this register is defined
r9 v6,SB,TR
by the platform standard.
r8 v5 Variable-register 5. Must be saved, restored by callee-
r7 v4 Variable register 4. procedure, it may modify them.
r6 v3 Variable register 3. Calling subroutine expects these to
retain their value.
r5 v2 Variable register 2.
r4 v1 Variable register 1.
r3 a4 Argument / scratch register 4.
Don’t need to be saved. May be
r2 a3 Argument / scratch register 3. used for arguments, results, or
r1 a2 Argument / result / scratch register 2. temporary values.
r0 a1 Argument / result / scratch register 1.
13 © 2021 Arm
Memory requirements
© 2021 Arm
What Memory Does a Program Need?
• Eight possible types of ‘information’
• Code
int a, b;
• Read-only static data
const char c=123;
int d=31; • Writable static data
void main(void) { – Initialized
int e; – Zero-initialized
char f[32]; – Uninitialized
e = d + 7; • Heap
a = e + 29999; • Stack
strcpy(f,“Hello!”); • What goes where?
} • Code is obvious
• And the others?
15 © 2021 Arm
What Memory Does a Program Need?
• Can the information change?
• No? Put it in read-only, nonvolatile memory
int a, b;
– Instructions
const char c=123;
– Constant strings
int d=31;
– Constant operands
void main(void) {
– Initialization values
int e;
char f[32];
• Yes? Put it in read/write memory
e = d + 7; – Variables
a = e + 29999; – Intermediate computations
strcpy(f,“Hello!”); – Return address
} – Other housekeeping data
16 © 2021 Arm
What Memory Does a Program Need?
• How long does the data need to exist? Reuse memory if possible.
• Statically allocated
int a, b;
const char c=123; Exists from program start to end
int d=31; Each variable has its own fixed location
void main(void) { Space is not reused
int e;
• Automatically allocated
char f[32];
e = d + 7; Exists from function start to end
a = e + 29999; Space can be reused
strcpy(f,“Hello!”); • Dynamically allocated
}
Exists from explicit allocation to explicit de-allocation
Space can be reused
17 © 2021 Arm
Program Memory Use
RAM Flash ROM
int a, b;
Zero-Initialized Data const char c=123; Constant Data
int d=31;
void main(void) {
int e;
Initialized Data InitializationData
char f[32];
e = d + 7;
a = e + 29999;
Stack strcpy(f,“Hello!”); Startup and Runtime
} Library Code
18 © 2021 Arm
Activation Record
• Activation records are located
Lower
on the stack (Free stack space)
address
• Calling a function creates
Local storage <- Stack ptr
an activation record Activation record for
Return address
• Returning from a function current function
Arguments
deletes the activation record Local storage
Activation record for
Return address
caller function
Arguments
• Automatic variables and
Local storage
housekeeping information are Activation record for
Return address
stored in a function’s activation caller’s caller function
Arguments
record Higher Activation record for Local storage
address caller’s caller’s caller Return address
function Arguments
• Not all fields (LS, RA, Arg) may be present for each activation record
19 © 2021 Arm
Type and Class Qualifiers
• Const
• Never written by program, can be put in ROM to save RAM
• Volatile
• Can be changed outside of normal program flow: ISR, hardware register
• Compiler must be careful with optimizations
• Static
• Declared within function, retains value between function invocations
• Scope is limited to function
20 © 2021 Arm
Linker Map File
• Contains extensive information on functions and variables
• Value, type, size, object
21 © 2021 Arm
C Run-Time Start-Up Module
• After reset, MCU must: RAM Flash ROM
Zero-Initialized Data
Fill with Initialization Data
• Initialize hardware zeros
a, b 31
• Peripherals, etc.
• Set up stack pointer
Initialized Data Constant Data
d c: 123
• Initialize C or C++ runtime Hello!
environment
• Set up heap memory Stack Startup and Runtime
• Initialize variables e, f Library Code
22 © 2021 Arm
Accessing data in Memory
© 2021 Arm
Accessing Data
int siA;
• What does it take to get at a variable in memory? void static_auto_local() {
int aiB;
• Depends on location, which depends on
static int siC=3;
storage type (static, automatic, dynamic) int * apD;
int aiE=4, aiF=5, aiG=6;
siA = 2;
aiB = siC + siA;
apD = & aiB;
(*apD)++;
apD = &siC;
(*apD) += 9;
apD = &siA;
apD = &aiE;
apD = &aiF;
apD = &aiG;
(*apD)++;
aiE+=7;
*apD = aiE + aiF;
}
24 © 2021 Arm
Static Variables
• Static var can be located anywhere in 32-bit memory
space, so need a 32-bit pointer Load r0 with pointer to variable
Load r1 from [r0]
• Can’t fit a 32-bit pointer into a 16-bit instruction (or a Use value of variable
32-bit instruction), so save the pointer separate from
instruction, but nearby so we can access it with a short Label:
PC-relative offset 32-bit pointer to Variable
• Load the pointer into a register (r0)
• Can now load variable’s value into a register (r1) from
memory using that pointer in r0
• Similarly can store a new value to the variable in
memory
Variable
25 © 2021 Arm
Static Variables AREA ||.text||, CODE, READONLY, ALIGN=2
;;;20 siA = 2;
• Key 00000e 2102 MOVS r1,#2 ; r1 = 2
000010 4a37 LDR r2,|L1.240| ; r2 = &siA
• variable’s value
000012 6011 STR r1,[r2,#0] ; *r2 = r1
• variable’s address
;;;21 aiB = siC + siA;
• address of copy of variable’s 000014 4937 LDR r1,|L1.244| ; r1 = &siC
address 000016 6809 LDR r1,[r1,#0] ; r1 = *r1
• Addresses of siA and siC are stored as 000018 6812 LDR r2,[r2,#0] ; r2 = *r2
literals to be loaded into pointers 00001a 1889 ADDS r1,r1,r2 ; r1 = r1 + r2
...
• Variables siC and siA are located in
.data section with initial values |L1.240|
DCD ||siA||
|L1.244|
DCD ||siC||
AREA ||.data||, DATA, ALIGN=2
||siC||
DCD 0x00000003
||siA||
DCD 0x00000000
26 © 2021 Arm
Automatic Variables Stored on Stack
int main(void) {
• Automatic variables are stored in a function’s activation auto vars;
record (unless optimized and promoted to register) a();
• Activation records are located on the stack }
• Calling a function creates an activation record, allocating void a(void) {
space on stack auto vars;
• Returning from a function deletes the activation record, b();
freeing up space on stack }
• Variables in C are implicitly automatic; there is no need to void b(void) {
specify the keyword auto vars;
c();
}
void c(void) {
auto vars;
…
}
27 © 2021 Arm
Automatic Variables
Lower (Free stack
int main(void) { address space)
auto vars; <- Stack pointer while
a(); Local storage
executing C
} Activation record for
Saved regs
current function C
Arguments
void a(void) {
(optional)
auto vars;
Local storage <- Stack pointer while
b(); Activation record for
Saved regs executing B
} caller
Arguments
function B
(optional)
void b(void) {
Local storage <- Stack pointer while
auto vars; Activation record for
Saved regs executing A
c(); caller’s caller
} Arguments
function A
(optional)
void c(void) { Higher Local storage <- Stack pointer while
Activation record for
auto vars; address Saved regs executing main
caller’s caller’s caller
… Arguments
function main
} (optional)
28 © 2021 Arm
Addressing Automatic Variables
• Program must allocate space on stack for variables
• Stack addressing uses an offset from the stack pointer: [sp, #offset] Address Contents
SP
SP+0x4
• Items on the stack are word aligned
SP+0x8
• In instructions, one byte used for offset, which is multiplied by four
SP+0xC
• Possible offsets: 0, 4, 8, …, 1020
SP+0x10
• Maximum range addressable this way is 1024 bytes
SP+0x14
SP+0x18
SP+0x1C
SP+0x20
29 © 2021 Arm
Automatic Variables
Address Contents
SP aiG ;;;14 void static_auto_local( void ) {
SP+4 aiF 000000 b50f PUSH {r0-r3,lr}
SP+8 aiE ;;;15 int aiB;
SP+0xC aiB ;;;16 static int siC=3;
SP+0x10 r0 ;;;17 int * apD;
SP+0x14 r1 ;;;18 int aiE=4, aiF=5, aiG=6;
SP+0x18 r2 000002 2104 MOVS r1,#4
SP+0x1C r3 000004 9102 STR r1,[sp,#8]
SP+0x20 lr
000006 2105 MOVS r1,#5
000008 9101 STR r1,[sp,#4]
• Initialize aiE 00000a 2106 MOVS r1,#6
• Initialize aiF 00000c 9100 STR r1,[sp,#0]
…
• Initialize aiG ;;;21 aiB = siC + siA;
…
00001c 9103 STR r1,[sp,#0xc]
• Store value for aiB
30 © 2021 Arm
Using Pointers
© 2021 Arm
Using Pointers to Automatics
• C Pointer: a variable which holds the data’s
address
;;;22 apD = & aiB;
• aiB is on stack at SP+0xc 00001e a803 ADD r0,sp,#0xc
• Compute r0 with variable’s address from ;;;23 (*apD)++;
stack pointer and offset (0xc) 000020 6801 LDR r1,[r0,#0]
• Load r1 with variable’s value from memory 000022 1c49 ADDS r1,r1,#1
000024 6001 STR r1,[r0,#0]
• Operate on r1, save back to variable’s
address
32 © 2021 Arm
Using Pointers to Statics
• Load r0 with variable’s address from address ;;;24 apD = &siC;
of copy of variable’s address 000026 4833 LDR r0,|L1.244|
;;;25 (*apD) += 9;
• Load r1 with variable’s value from memory 000028 6801 LDR r1,[r0,#0]
00002a 3109 ADDS r1,r1,#9
00002c 6001 STR r1,[r0,#0]
• Operate on r1, save back to variable’s |L1.244|
address DCD ||siC||
AREA ||.data||, DATA, ALIGN=2
||siC||
DCD 0x00000003
33 © 2021 Arm
Array Access
© 2021 Arm
Array Access
• What does it take to get at an array element uint8 buff2[3];
in memory? uint16 buff3[5][7];
• Depends on how many dimensions
• Depends on element size and row width uint32 arrays(uint8 n, uint8 j) {
• Depends on location, which depends on volatile uint32 i;
storage type (static, automatic, dynamic) i = buff2[0] + buff2[n];
i += buff3[n][j];
return i;
}
35 © 2021 Arm
Accessing 1-D Array Elements
• Need to calculate element address: sum of: Address Contents
• array start address buff2 buff2[0]
• offset: index * element size buff2 + 1 buff2[1]
• buff2 is array of unsigned characters buff2 + 2 buff2[2]
36 © 2021 Arm
Accessing 2-D Array Elements
uint16 buff3[5][7]
Address Contents
buff3 buff3[0][0]
• var[rows][columns]
buff3+1 • Sizes
buff3+2 buff3[0][1] • Element: 2 bytes
buff3+3 • Row: 7*2 bytes = 14 bytes (0xe)
(etc.)
• Offset based on row index and column
buff3+10 buff3[0][5]
buff3+11
index
buff3+12 buff3[0][6] • column offset = column index *
buff3+13 element size
buff3+14 buff3[1][0] • row offset = row index * row size
buff3+15
buff3+16 buff3[1][1]
buff3+17
(etc.)
buff3+68 buff3[4][6]
buff3+69
37 © 2021 Arm
Code to Access 2-D Array
Instruction r0 r1 r2 r3 r4 Description
;;; i += buff3[n][j]; i j n - -
MOVS r3,#0xe - - - 0xe - Load row size
MULS r3,r2,r3 - - n n*0xe - Multiply by row number
LDR - - - - &buff3 Load address of buff3
r4,|L1.276
|
ADDS r3,r3,r4 - - - &buff3+n*0xe - Add buff3 address to row
offset
LSLS r4,r1,#1 - j - - j<<1 Multiply column number by
2 (buff3 is uint16 array)
LDRH r3,[r3,r4] - - - *(uint16)(&buff3+n*0xe+j<<1) j<<1 Load halfword r3 with
= buff3[n][j] element at r3+r4 (buff3 +
row offset + col offset)
ADDS r0,r3,r0 i+buff3[n][j] - - buff3[n][j] Add r3 to r0 (i)
38 © 2021 Arm
Function Prolog and Epilog
© 2021 Arm
Prolog and Epilog
• A function’s Prolog and Epilog are responsible for creating and destroying its activation record
• Remember AAPCS
• Scratch registers r0-r3 are not expected to be preserved upon returning from a called subroutine, can
be overwritten
• Preserved (“variable”) registers r4-r8, r10-r11 must have their original values upon returning from a
called subroutine
• Prolog must save preserved registers on stack
• Epilog must restore preserved registers from stack
• Prolog also may
• Handle function arguments
• Allocate temporary storage space on stack (subtract from SP)
• Epilog
• May de-allocate stack space (add to SP)
• Returns control to calling function
40 © 2021 Arm
Return Address
• Return address stored in LR by bl, blx instructions
41 © 2021 Arm
Function Prolog and Epilog
fun4 PROC
• Save r4 (preserved register) and ;;;102 int fun4(char a, int b, char c)
link register (return address) {
• Allocate 32 (0x20) bytes on stack ;;;103 volatile int x[8];
00010a b510 PUSH {r4,lr}
for array x by subtracting from SP
• Compute return value, placing in 00010c b088 SUB sp,sp,#0x20
return register r0 ...
44 © 2021 Arm
Calling Functions
© 2021 Arm
AAPCS Core Register Use
Register Synonym Special Role in the procedure call standard
r15 PC The Program Counter.
r14 LR The Link Register.
r13 SP The Stack Pointer.
r12 IP The Intra-Procedure-call scratch register.
r11 v8 Variable-register 8.
r10 v7 Variable-register 7.
Platform register. The meaning of this register is defined
r9 v6,SB,TR
by the platform standard.
r8 v5 Variable-register 5.
r7 v4 Variable register 4.
r6 v3 Variable register 3.
r5 v2 Variable register 2.
r4 v1 Variable register 1.
r3 a4 Argument / scratch register 4.
r2 a3 Argument / scratch register 3.
r1 a2 Argument / result / scratch register 2.
r0 a1 Argument / result / scratch register 1.
46 © 2021 Arm
Function Arguments and Return Values
• First, pass the arguments
• How to pass them?
Much faster to use registers than stack
But quantity of registers is limited
• Basic rules
Process arguments in order they appear in source code
Round size up to be a multiple of 4 bytes
Copy arguments into core registers (r0-r3), aligning doubles to even registers
Copy remaining arguments onto stack, aligning doubles to even addresses
Specific rules in AAPCS, Section 5.5
• Second, call the function
• Usually as subroutine with branch link (bl) or branch link and exchange instruction (blx)
• Exceptions in AAPCS
47 © 2021 Arm
Return Values
• Callee passes Return Value in register(s) or stack
• Registers
• Stack
• Caller function allocates space for return value, then passes pointer to space as an argument to callee
• Callee stores result at location indicated by pointer
48 © 2021 Arm
Call Example
int fun2(int arg2_1, int arg2_2) { fun2 PROC
int i; ;;;85 int fun2(int arg2_1, int
arg2_2 += fun3(arg2_1, 4, 5, 6); arg2_2) {
… ...
} 0000e0 2306 MOVS r3,#6
0000e2 2205 MOVS r2,#5
• Argument 4 into r3 0000e4 2104 MOVS r1,#4
• Argument 3 into r2 0000e6 4630 MOV r0,r6
• Argument 2 into r1
• Argument 0 into r0 0000e8 f7fffffe BL fun3
• Call fun3 with BL instruction
• Result was returned in r0, so add to r4 (arg2_2 0000ec 1904 ADDS r4,r0,r4
+= result)
49 © 2021 Arm
Call and Return Example
fun3 PROC
int fun3(int arg3_1, int arg3_2, ;;;81 int fun3(int arg3_1, int arg3_2,
int arg3_3, int arg3_4) { int arg3_3, int arg3_4) {
return arg3_1*arg3_2*
arg3_3*arg3_4; 0000ba b510 PUSH {r4,lr}
}
• Save r4 and Link Register on stack
• r0 = arg3_1*arg3_2 0000c0 4348 MULS r0,r1,r0
• r0 *= arg3_3 0000c2 4350 MULS r0,r2,r0
0000c4 4358 MULS r0,r3,r0
• r0 *= arg3_4
• Restore r4 and return from subroutine 0000c6 bd10 POP {r4,pc}
• Return value is in r0
50 © 2021 Arm
Control Flow
© 2021 Arm
Control Flow: Conditionals and Loops
• How does the compiler implement conditionals and loops?
for (i = 0; i < 10; i++){
x += i;
}
if (x){ switch (x) {
y++; case 1: while (x<10) {
} else { y += 3; x = x + 1;
y--; break; }
} case 31:
y -= 5; do {
break; x += 2;
default: } while (x < 20);
y--;
break;
}
52 © 2021 Arm
Control Flow: If/Else
;;;39 if (x){
000056 2900 CMP r1,#0
000058 d001 BEQ |L1.94|
T F ;;;40 y++;
00005a 1c52 ADDS r2,r2,#1
Condition 00005c e000 B |L1.96|
|L1.94|
;;;41 } else {
action_if action_else ;;;42 y--;
00005e 1e52 SUBS r2,r2,#1
|L1.96|
;;;43 }
53 © 2021 Arm
Control Flow: Switch 000066 d104 BNE |L1.114|
000068 e001 B |L1.110|
Evaluate |L1.106|
;;;46 case 1:
;;;47 y += 3;
T 00006a 1cd2 ADDS r2,r2,#3
= const1? action1 ;;;48 break;
00006c e003 B |L1.118|
F |L1.110|
T ;;;49 case 31:
= const2? action2 ;;;50 y -= 5;
00006e 1f52 SUBS r2,r2,#5
F ;;;51 break;
000070 e001 B |L1.118|
action3 |L1.114|
;;;52 default:
;;;53 y--;
000072 1e52 SUBS r2,r2,#1
;;;54 break;
;;;45 switch (x) { 000074 bf00 NOP
000060 2901 CMP r1,#1 |L1.118|
000062 d002 BEQ |L1.106| 000076 bf00 NOP
000064 291f CMP r1,#0x1f ;;;55 }
54 © 2021 Arm
Iteration: While
55 © 2021 Arm
Iteration: For
;;;61 for (i = 0; i < 10; i++){
init_expression 000080 2300 MOVS r3,#0
000082 e001 B |L1.136|
|L1.132|
loop_body ;;;62 x += i;
000084 18c9 ADDS r1,r1,r3
000086 1c5b ADDS r3,r3,#1
cond_expression ;61
|L1.136|
000088 2b0a CMP r3,#0xa
T ;61
Condition
00008a d3fb BCC |L1.132|
;;;63 }
F
56 © 2021 Arm
Iteration: Do/While
;;;65 do {
00008c bf00 NOP
loop_body
|L1.142|
;;;66 x += 2;
T 00008e 1c89 ADDS r1,r1,#2
Condition ;;;67 } while (x < 20);
000090 2914 CMP r1,#0x14
000092 d3fc BCC |L1.142|
F
57 © 2021 Arm
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2021 Arm
Interrupts and Low-Power
Features
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Outline the different types of interrupts and their function.
• Describe the steps taken by the processor in handling interrupts and exceptions.
• Outline the differences between interrupts and exceptions.
• Describe the functions of the Nested Vectored Interrupt Controller (NVIC).
2 © 2021 Arm
Module syllabus
• Interrupts
• What are interrupts?
• Why use interrupts?
• Interrupts
• Entering an exception handler
• Exiting an exception handler
• Microcontroller interrupts
• Timing analysis
• Program design with interrupts
• Sharing data safely between ISRS and other threads
3 © 2021 Arm
Example system with interrupt
• Goal: to change the color of the RGB LED when the switch is pressed
• Need to add an external switch
• (resistor is internal – a pull-up resistor)
4 © 2021 Arm
How can we detect when a switch is pressed?
• One option is polling; e.g. using software to check it regularly. However, polling is:
• Slow – user needs to check to see if switch is pressed
• Wasteful of CPU time – the faster the response needed, the more often user needs to check
• Not scalable – It’s difficult to build a multi-activity system that can respond quickly. The system’s
response time depends on all other processing it has to do.
• A better option is an interrupt; e.g., using special hardware in the MCU to detect and run ISR in response.
An interrupt is:
• Efficient – the code runs only when necessary
• Fast – it’s a hardware mechanism
• Scalable:
ISR response time doesn’t depend on most other processing
Code modules can be developed independently
5 © 2021 Arm
Interrupt or Exception Processing Sequence
• Main code is running.
• When the interrupt trigger occurs:
• The processor does some hardwired processing
• The processor executes the ISR, including return-from-interrupt instruction at the end
• Then the processor resumes running main code
Main code
Time
6 © 2021 Arm
Interrupts
• Hardware-triggered asynchronous software routine
• Triggered by hardware signal from peripheral or external device
• Asynchronous – can happen anywhere in the program (unless interrupt is disabled)
• Software routine – Interrupt Service Routine runs in response to interrupt
7 © 2021 Arm
Example program requirements and design
RGB LED
8 © 2021 Arm
Example exception handler
• Now we will examine the processor’s response to exception in detail:
9 © 2021 Arm
Use debugger for detailed processor view
• Can see registers, stack, source code, disassembly (object code)
• Note: compiler may generate code for function entry
• Place breakpoint on the handler function declaration line in source code, not at the first line of code
10 © 2021 Arm
Entering exception handler: CPU hardwired exception processing
• Finish current instruction
• Except for lengthy instructions
• Push context (registers) onto current stack (MSP or PSP)
• xPSR, Return Address, LR(R14), R12, R3, R2, R1, R0
• Switch to handler/privileged mode, use MSP
• Load PC with address of interrupt handler
• Load LR with EXC_RETURN code
• Load IPSR with exception number
• Start executing code of interrupt handler
11 © 2021 Arm
1. Finish current instruction
• Most instructions are short and finish quickly.
• Some instructions may take many cycles to execute.
• Load Multiple (LDM), Store Multiple (STM), Push, Pop, MULS (32 cycles for some CPU core
implementations)
• This will delay interrupt response significantly.
• If one of these is executing when the interrupt is requested, the processor:
• abandons the instruction
• responds to the interrupt
• executes the ISR
• returns from interrupt
• restarts the abandoned instruction
12 © 2021 Arm
2. Push context onto current stack
<previous> SP points here before interrupt
SP + 0x1C xPSR
SP + 0x18 PC
Decreasing SP + 0x14 LR
memory SP + 0x10 R12
address SP + 0x0C R3
SP + 0x08 R2
SP + 0x04 R1
SP + 0x00 R0 SP points here upon entering ISR
13 © 2021 Arm
Context saved on stack
SP value is reduced since
registers have been pushed
onto stack
14 © 2021 Arm
3. Switch to handler/privileged mode
• Handler mode always uses Main SP
Reset
Thread
Mode.
MSP or PSP.
Exception Starting
Processing Exception
Completed Processing
Handler
Mode
MSP
15 © 2021 Arm
Handler and privileged mode
16 © 2021 Arm
Update IPSR with exception number
17 © 2021 Arm
4. Load PC with address of exception handler
Memory Address Value
0x0000_0000 Initial Stack Pointer
• Which Program Counter is selected from
the vector table depends on which
0x0000_0004 Reset
exception is used
0x0000_0008 NMI_IRQHandler
…
IRQ0_Handler
IRQ1_Handler
…
Reset:
NMI_IRQHandler:
…
IRQ0_Handler:
IRQ1_Handler:
18 © 2021 Arm
Examine vector table with debugger
Exception IRQ number Vector Offset
number
Initial SP 0x00
• Why is the vector odd?
1 Reset 0x04 • LSB of address indicates that handler uses
2 -14 NMI 0x08 Thumb code
3 -13 HardFault 0x0C
4 0x10
5
6
7 Reserved
8
9
10
11 -5 SVCall 0x2C
12
Reserved
13
14 -2 PendSV 0x38
15 -1 SysTick 0x3C
16 0 IRQ0 0x40
17 1 IRQ1 0x44
18 2 IRQ2 0x48
. .
16+n n IRQn 0x40+4n
19 © 2021 Arm
Upon entry to handler
20 © 2021 Arm
5. Load LR with EXC_RETURN code
• EXC_RETURN value generated by CPU to provide information on how to return
• Which SP to restore registers from? MSP (0) or PSP (1)
Previous value of SPSEL
• Which mode to return to? Handler (0) or Thread (1)
Another exception handler may have been running when this exception was requested
21 © 2021 Arm
Updated LR with EXC_RETURN code
22 © 2021 Arm
6. Start executing exception handler
• Exception handler starts running, unless preempted by a higher-priority exception
23 © 2021 Arm
After handler has saved more context
24 © 2021 Arm
Exiting an exception handler
1. Execute instruction triggering exception return processing
2. Select return stack, restore context from that stack
3. Resume execution of code at restored address
25 © 2021 Arm
1. Execute instruction for exception return
• No “return from interrupt” instruction
• Use regular instruction instead
• BX LR - Branch to address in LR by loading PC with LR contents
• POP …, PC - Pop address from stack into PC
• … with a special value EXC_RETURN loaded into the PC to trigger exception handling processing
• BX LR used if EXC_RETURN is still in LR
• If EXC_RETURN has been saved on stack, then use POP
26 © 2021 Arm
What will be popped from stack?
• R4: 0x4040_0000
• PC: 0xFFFF_FFF9
27 © 2021 Arm
2. Select stack, restore context
• Check EXC_RETURN (bit 2) to determine from which SP to pop the context
EXC_RETURN Return Stack Description
0xFFFF_FFF1 0 (MSP) Return to exception handler with MSP
0xFFFF_FFF9 0 (MSP) Return to thread with MSP
0xFFFF_FFFD 1 (PSP) Return to thread with PSP
29 © 2021 Arm
Resume executing previous main thread code
• Exception handling registers have been restored: R0, R1, R2, R3, R12, LR, PC, xPSR
• SP is back to previous value
• Back in thread mode
• Next instruction to execute is at 0x0000_0A70
30 © 2021 Arm
Microcontroller interrupts
• Types of interrupts:
• Hardware interrupts
Asynchronous: not related to what code the processor is currently executing
Examples: interrupt is asserted, character is received on serial port, or ADC converter
finishes conversion
• Exceptions, faults, software interrupts
Synchronous: are the result of specific instructions executing
Examples: undefined instructions, overflow occurs for a given instruction
• We can enable and disable (mask) most interrupts as needed (maskable), others are non-maskable
• Interrupt service routine (ISR)
• Subroutine which processor is forced to execute to respond to a specific event
• After ISR completes, MCU goes back to previously executing code
31 © 2021 Arm
Nested vectored interrupt controller
• NVIC manages and prioritizes external interrupts
• Interrupts are types of exceptions
• Exceptions 16 through 16+N
• Modes
• Thread Mode: entered on reset
• Handler Mode: entered on executing an exception
• Privilege level
• Stack pointers
• Main Stack Pointer, MSP
• Process Stack Pointer, PSP
• Exception states: Inactive, Pending, Active, A&P
Port Module
Arm Cortex
Next Module NVIC Core
Another Module
32 © 2021 Arm
NVIC registers and state
• Enable - allows interrupt to be recognized
• Accessed through two registers (set bits for interrupts)
Set enable with NVIC_ISER, clear enable with NVIC_ICER
• CMSIS Interface: NVIC_EnableIRQ(IRQnum), NVIC_DisableIRQ(IRQnum)
• Pending - interrupt has been requested but is not yet serviced
• CMSIS: NVIC_SetPendingIRQ(IRQnum), NVIC_ClearPendingIRQ(IRQnum)
33 © 2021 Arm
Core exception mask register
• Similar to “Global interrupt disable” bit in other MCUs
• PRIMASK - Exception mask register (CPU core)
• Bit 0: PM Flag
Set to 1 to prevent activation of all exceptions with configurable priority
Clear to 0 to allow activation of all exception
• Access using CPS, MSR and MRS instructions
• Use to prevent data race conditions with code needing atomicity
• CMSIS-CORE API
• void __enable_irq() - clears PM flag
• void __disable_irq() - sets PM flag
• uint32_t __get_PRIMASK() - returns value of PRIMASK
• void __set_PRIMASK(uint32_t x) - sets PRIMASK to x
34 © 2021 Arm
Prioritization
• Exceptions are prioritized to order the response simultaneous requests (smaller number = higher priority)
• Priorities of some exceptions are fixed
• Reset: -3, highest priority
• NMI: -2
• Hard Fault: -1
• Priorities of other (peripheral) exceptions are adjustable
• Value is stored in the interrupt priority register (IPR0-7)
• 0x00
• 0x40
• 0x80
• 0xC0
35 © 2021 Arm
Special cases of prioritization
• Simultaneous exception requests?
• Lowest exception type number is serviced first
• New exception requested while a handler is executing?
• New priority higher than current priority?
New exception handler preempts current exception handler
• New priority lower than or equal to current priority?
New exception held in pending state
Current handler continues and completes execution
Previous priority level restored
New exception handled if priority level allows
36 © 2021 Arm
Timing analysis: big picture timing behavior
• Switch was pressed for about 0.21 s
• ISR runs in response to switch signal’s falling edge
• Main seems to be running continuously (signal toggles between 1 and 0)
• Does it really? You will investigate this in the lab exercise.
37 © 2021 Arm
Interrupt response latency
• Latency = time delay
• Why do we care?
• This is overhead which wastes time, and increases as the interrupt rate rises.
• This delays our response to external events, which may or may not be acceptable for the application,
such as sampling an analog waveform.
38 © 2021 Arm
Maximum interrupt rate
• We can only handle so many interrupts per second
• FMax_Int: maximum interrupt frequency
• FCPU: CPU clock frequency
• CISR: Number of cycles ISR takes to execute
• COverhead: Number of cycles of overhead for saving state, vectoring, restoring state, etc.
• FMax_Int = FCPU/(CISR+ COverhead)
• Note that model applies only when there is one interrupt in the system
• When processor is responding to interrupts, it isn’t executing our other code
• UInt: Utilization (fraction of processor time) consumed by interrupt processing
• UInt = 100%*FInt* (CISR+COverhead)/ FCPU
• CPU looks like it’s running the other code with CPU clock speed of (1-UInt)*FCPU
39 © 2021 Arm
Program design with interrupts
• How much work to do in ISR?
• Trade-off: faster response for ISR code will delay completion of other code
• In system with multiple ISRs with short deadlines, perform critical work in ISR and buffer partial
results for later processing
• Should ISRs re-enable interrupts?
• How to communicate between ISR and other threads?
• Data buffering
• Data integrity and race conditions
Volatile data – can be updated outside of the program’s immediate control
Non-atomic shared data – can be interrupted partway through read or write, is vulnerable to
race conditions
40 © 2021 Arm
Volatile data
• Compilers assume that variables in memory don’t change spontaneously, and optimize based on that
belief
• Don’t reload a variable from memory if current function hasn’t changed it
• Read variable from memory into register (faster access)
• Write back to memory at the end of the procedure, or before a procedure call, or when compiler runs
out of free registers
• This optimization can fail
• Example: reading from input port, polling for key press
While (SW_0) ; will read from SW_0 once and reuse that value
Will generate an infinite loop triggered by SW_0 being true
• Variables for which it fails
• Memory-mapped peripheral register – register changes on its own
• Global variables modified by an ISR – ISR changes the variable
• Global variables in a multithreaded application – another thread or ISR changes the variable
41 © 2021 Arm
The volatile directive
• We need to tell compiler which variables may change outside of its control
• Use volatile keyword to force compiler to reload these vars from memory for each use
• Now each C source read of a variable (e.g. status register) will result in an assembly language LDR
instruction
• See explanation in Nigel Jones’ “Volatile,” Embedded Systems Programming July 2001
42 © 2021 Arm
Non-atomic shared data
void GetDateTime(DateTimeType * DT){
• We want to keep track of current time and date DT->day = current_time.day;
DT->hour = current_time.hour;
• Use 1 Hz interrupt from timer DT->minute = current_time.minute;
DT->second = current_time.second;
}
• System:
• current_time structure tracks time and days since some reference event
• current_time’s fields are updated by periodic 1Hz timer ISR void DateTimeISR(void){
current_time.second++;
if (current_time.second > 59){
current_time.second = 0;
current_time.minute++;
if (current_time.minute > 59) {
current_time.minute = 0;
current_time.hour++;
if (current_time.hour > 23) {
current_time.hour = 0;
current_time.day++;
… etc.
43 © 2021 Arm }
Example: checking the time
• Problem: An interrupt at the wrong time will lead to half-updated data in DT
• Failure Case
• current_time is {10, 23, 59, 59} (10th day, 23:59:59)
• Task code calls GetDateTime(), which copies the current_time fields to DT: day = 10, hour = 23
• A timer interrupt occurs, which updates current_time to {11, 0, 0, 0}
• GetDateTime() resumes executing, copying the remaining current_time fields to DT: minute = 0,
second = 0
• DT now has a time stamp of {10, 23, 0, 0}.
• The system thinks time just jumped backwards one hour!
• Fundamental problem: “race condition”
• Preemption enables ISR to interrupt other code and possibly overwrite data
• Must ensure atomic (indivisible) access to the object
Native atomic object size depends on processor’s instruction set and word size
32 bits for Arm
44 © 2021 Arm
Examining the problem more closely
• Protect any data object which both:
• Requires multiple instructions to read or write (non-atomic access), and
• Is potentially written by an ISR
• How many tasks/ISRs can write to the data object?
• If one, then we have one-way communication
Must ensure the data isn’t overwritten partway through being read
Writer and reader don’t interrupt each other
• If more than one, we
Must ensure the data isn’t overwritten partway through being read
– Writer and reader don’t interrupt each other
Must ensure the data isn’t overwritten partway through being written
– Writers don’t interrupt each other
45 © 2021 Arm
Definitions
• Race condition: Anomalous behavior due to unexpected critical dependence on the relative timing of
events. Result of example code depends on the relative timing of the read and write operations.
• Critical section: A section of code which creates a possible race condition. The code section can only be
executed by one process at a time. Some synchronization mechanism is required at the entry and exit of
the critical section to ensure exclusive use.
46 © 2021 Arm
Solution: briefly disable preemption
void GetDateTime(DateTimeType * DT){
uint32_t m;
• Prevent preemption within critical section
• If an ISR can write to the shared data object, m = __get_PRIMASK();
need to disable interrupts __disable_irq();
• save current interrupt masking state in m
• disable interrupts DT->day = current_time.day;
• Restore previous state afterwards (interrupts DT->hour = current_time.hour;
may have already been disabled for another DT->minute = current_time.minute;
reason) DT->second = current_time.second;
__set_PRIMASK(m);
• Use CMSIS-CORE to save, control and
}
restore interrupt masking state
• Avoid disabling interrupts. Disabling delays response to all other processing requests
• Use as few instructions as possible, to make the time as short as possible
47 © 2021 Arm
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2021 Arm
General Purpose I/O
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Explain the concept of GPIO alternative functions and outline its advantages.
• Explain the functions and relevance of pull-up and pull-down resistors used in IO pins.
• Describe a simple input synchronisation circuitry consisting of two D-flipflops and
explain its importance.
• Describe the GPIO code structure and outline its layers.
• Write C program to set the GPIO mode and turn on the on-board LED.
2 © 2021 Arm
Overview
• How do we make a program light up LEDs in response to a switch?
• GPIO
• Basic Concepts
• Port Circuitry
• Alternate Functions
• Peripheral Access In C
• Circuit Interfacing
• Inputs
• Outputs
• Additional Port Configuration
3 © 2021 Arm
Basic Concepts
4 © 2021 Arm
GPIO Alternative Functions
• Pins may have different features
• Advantages:
• Saves space on the package
• Improves flexibility
5 © 2021 Arm
Pull-Up & Pull-Down Resistors
• Ensure a known value on the output if a pin is left
floating
• In our example, we want the switch SW1 to pull the Pull-down Pull-up
pin to ground, so we enable the pull-up
6 © 2021 Arm
Input Synchronization
• External signals are asynchronous to internal
clock
8 © 2021 Arm
Drivers Layer: How It Works
void gpio_set(Pin pin, int value)
1) mask = 1 << pin index
2) tmp = port_struct->data_reg & ~mask
3) tmp |= value << pin index
4) port_struct->data_reg = tmp
9 © 2021 Arm
Drivers Layer: How It Works
int gpio_get(Pin pin)
1) mask = 1 << pin index
2) tmp = port_struct->data_reg & mask
3) tmp >>= pin index
4) return tmp
10 © 2021 Arm
C Interface: GPIO Configuration
/*! This enum describes the directional setup of a GPIO pin. */
typedef enum {
Reset, //!< Resets the pin-mode to the default value.
Input, //!< Sets the pin as an input with no pull-up or pull-down.
Output, //!< Sets the pin as a low impedance output.
PullUp, //!< Enables the internal pull-up resistor and sets as input.
PullDown //!< Enables the internal pull-down resistor and sets as input.
} PinMode;
12 © 2021 Arm
Pseudocode for Program
Make LED1 and LED2 outputs
Make switch an input with a pull-up resistor
do forever {
if switch is not pressed {
Turn off LED1
Turn on LED2
} else {
Turn off LED2
Turn on LED1
}
}
13 © 2021 Arm
C Code
gpio_set_mode(P_LED1, Output); // Set LED pins to outputs
gpio_set_mode(P_LED2, Output);
gpio_set_mode(P_SW, Pullup); // Switch pin to resistive pull-up
while (1) {
if (gpio_get(P_SW)) {
// Switch is not pressed (active-LOW), turn LED1 off and LED2 on.
gpio_set(P_LED1, 0);
gpio_set(P_LED2, 1);
} else {
// Switch is pressed, turn off LED2 and LED1 on.
gpio_set(P_LED2, 0);
gpio_set(P_LED1, 1);
}
}
14 © 2021 Arm
Interfacing
© 2021 Arm
Inputs: What’s a One? A Zero?
• Input signal’s value is determined by voltage
16 © 2021 Arm
Outputs: What’s a One? A Zero?
• Nominal output voltages
• 1: VDD-0.5 V to VDD
• 0: 0 V to 0.5 V
Vout
Logic 0 out
Iout
17 © 2021 Arm
Output Example: Driving LEDs
• Need to limit current to a value which is safe for
both LED and MCU port driver
• Use current-limiting resistor
• R = (VDD – VLED)/ILED
• Set ILED = 4mA
• VLED depends on type of LED (mainly color)
• Red: ~1.8V
• Blue: ~2.7V
• Solve for R given VDD = ~3.0V
• Red: 300W
• Blue: 75W
• Demonstration code in Basic Light Switching Example
18 © 2021 Arm
Output Example: Driving a Speaker
• Create a square wave with a GPIO output
• Use capacitor to block DC value
• Use resistor to reduce volume if needed
void beep(void) {
unsigned int period = 20;
while (1) {
gpio_toggle(P_SPEAKER);
delay_ms(period/2);
}
}
19 © 2021 Arm
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2021 Arm
Analog Interfacing
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Describe the theory of analog signal to digital signal conversion and vice versa.
• Describe the two common types of analog-to-digital converters (ADCs).
• Explain the properties of analog-to-digital conversion including range, resolution,
quantization and sampling.
• Explain Nyquist criterion and its application in sampling frequency of the ADC.
• Explain the function of a sample and hold device in ADCs.
• Describe the mode of operations of the ADC flash and ADC successive approximation
converters.
• Describe the functions of the digital-to-analog converter.
• Describe the application of ADC in power failure detection and in battery monitoring.
2 © 2021 Arm
Module syllabus
• Converting Between Analog and Digital Values
• Analog to Digital Conversion Concepts
• Analog Interfacing Peripherals
• Digital to Analog Converter
• Analog Comparator
• Analog to Digital Converter
3 © 2021 Arm
Why It’s Needed
• Embedded systems often need to measure values of physical parameters
• These parameters are usually continuous (analog) and not in a digital form which computers (which
operate on discrete data values) can process
• Temperature • Pressure
– Thermometer (do you have a fever?) – Blood pressure monitor
– Thermostat for building, fridge, freezer – Altimeter
– Car engine controller – Car engine controller
– Chemical reaction monitor – Scuba dive computer
– Safety (e.g. microprocessor processor thermal management) – Tsunami detector
• Light (or infrared or ultraviolet) intensity • Acceleration
– Digital camera – Air bag controller
– IR remote control receiver
– Vehicle stability
– Tanning bed
– UV monitor – Video game remote
• Rotary position • Mechanical strain
– Wind gauge • Other
– Knobs – Touch screen controller
– EKG, EEG
– Breathalyzer
4 © 2021 Arm
Converting Between Analog and Digital
Values
© 2021 Arm
Example Analog Sensor - Depth Gauge
V_ref
// Your software
Analog to ADC_Code = adc_read();
Pressure
Digital V_sensor = ADC_code * V_ref / ADC_MASK;
Sensor
Converter Pressure_kPa = 250 * (V_sensor / V_supply + 0.04);
Depth_ft = 33 * (Pressure_kPa – Atmos_Press_kPa) / 101.3;
Water
Pressure Voltages
ADC
Output Codes
V_ref 111..111
V_sensor ADC_Code 111..110
111..101
111..100
V_sensor ADC_Code
000..001
Typical Absolute Pressure vs. Output Ground 000..000
3.0
2.5
2.0 based on V_sensor and V_ref
1.5
1.0 • Code can convert that integer to a something more
0.5
0.0 useful
0 20 40 60 80 100 120 140 160 180 200 220 240 260
Pressure [kPa]
• first a float representing the voltage,
• then another float representing pressure,
6 © 2021 Arm
• finally another float representing depth
Getting From Analog to Digital
• A Comparator tells us “Is Vin > Vref?”
• Compares an analog input voltage with an analog reference voltage and determines which is larger,
returning a 1-bit number Comparator
• E.g. Indicate if depth > 100ft Vin
• Set Vref to voltage pressure sensor returns with 100 ft depth. 0
Vref
• An Analog to Digital converter [AD or ADC] tells us how large Vin is as a fraction of Vref.
• Reads an analog input signal (usually a voltage) and produces a corresponding multi-bit number at the
output. A/D Converter
• E.g. calculate the depth V
ref 0
1
Vin
0
Clock 1
7 © 2021 Arm
Waveform Sampling and Quantization
Digital value
time
8 © 2021 Arm
Forward Transfer Function Equations
What code n will the ADC use to represent voltage Vin?
General Equation
n = converted code Simplification with V-ref = 0 V
Vin = sampled input voltage
V+ref = upper voltage reference 𝑉𝑖𝑛 𝑁
V-ref = lower voltage reference 𝑛= 2 + 1/2
𝑉+𝑟𝑒𝑓
N = number of bits of resolution in ADC
3.3𝑉 10
𝑛= 2 + 1/2 = 388
5𝑉
𝑉𝑖𝑛 − 𝑉−𝑟𝑒𝑓 𝑁
𝑛= 2 + 1/2
𝑉+𝑟𝑒𝑓 − 𝑉−𝑟𝑒𝑓
10 © 2021 Arm
Inverse Transfer Function
What range of voltages Vin_min to Vin_max does code n represent?
General Equation
n = converted code
Simplification with V-ref = 0 V
Vin_min = minimum input voltage for code n
Vin_max = maximum input voltage for code n 1
𝑛−
𝑉𝑖𝑛_𝑚𝑖𝑛 = 2 𝑉
V+ref = upper voltage reference 2 𝑁 +𝑟𝑒𝑓
1
𝑛−
𝑉𝑖𝑛_𝑚𝑖𝑛 = 2 𝑉
𝑁 +𝑟𝑒𝑓 − 𝑉−𝑟𝑒𝑓 + 𝑉−𝑟𝑒𝑓
2
1
𝑛+
𝑉𝑖𝑛_𝑚𝑎𝑥 = 2 𝑉
𝑁 +𝑟𝑒𝑓 − 𝑉−𝑟𝑒𝑓 + 𝑉−𝑟𝑒𝑓
2
11 © 2021 Arm
What if the Reference Voltage is not known?
• Example - running off an unregulated battery (to save power)
• Measure a known voltage and an unknown voltage
𝑛𝑢𝑛𝑘𝑛𝑜𝑤𝑛
𝑉𝑢𝑛𝑘𝑛𝑜𝑤𝑛 = 𝑉𝑘𝑛𝑜𝑤𝑛
𝑛𝑘𝑛𝑜𝑤𝑛
• Many MCUs include an internal fixed voltage source which ADC can measure for this purpose
• Can also solve for Vref
2𝑁
𝑉𝑟𝑒𝑓 = 𝑉𝑘𝑛𝑜𝑤𝑛
𝑛
12 © 2021 Arm
Analog to Digital conversion concepts
© 2021 Arm
A/D – Flash Conversion
1V
• A multi-level voltage divider is used to set voltage levels R Comparators
7/8 V
over the complete range of conversion. +
1
R
• A comparator is used at each level to determine 6/8 V
-
+
whether the voltage is lower or higher than the level. R 1
-
• The series of comparator outputs are encoded to a 5/8 V +
binary number in digital logic (a priority encoder) R 1
-
• Components used 4/8 V
+
R Encoder 3
• 2N resistors - 0
• 2N -1 comparators 3/8 V +
R 0
• Note -
2/8 V
• This particular resistor divider generates voltages +
R 0
which are not offset by ½ bit, so maximum error is 1 -
1/8 V
bit +
R 0
• We could change this offset voltage by using -
Voltage
100100
• Repeat 100000
000000
T1 T2 T3 T4 T5 T6
Start of Time
Conversion
15 © 2021 Arm
ADC Performance Metrics
• Number of bits determines overall accuracy.
• Linearity measures how well the transition voltages lie on a straight line.
16 © 2021 Arm
Sampling Problems
• When sampling varying signals there is an upper limit on the bandwidth of the converter which is set by
the ‘Nyquist criterion’
• Nyquist criterion
• Fsample >= 2 * Fmax frequency component
• Frequency components above ½ Fsample are aliased, distorting the measured signal
17 © 2021 Arm
Inputs
• Differential
• Use two channels, and compute difference between them
• Very good noise immunity
• Some sensors offer differential outputs (e.g. Wheatstone Bridge)
• Multiplexing
• Typically share a single ADC among multiple inputs
• Need to select an input, allow time to settle before sampling
• Signal Conditioning
• Amplify and filter input signal
• Protect against out-of-range inputs with clamping diodes
18 © 2021 Arm
Sample and Hold Devices
• Some A/D converters require the input analog signal
to be held constant during conversion, (e.g.
successive approximation devices)
• In other cases, peak capture or sampling at a
specific point in time necessitates a sampling
device.
• This function is accomplished by a sample and hold
device as shown to the right.
• These devices are incorporated into some A/D
converters.
19 © 2021 Arm
Analog Interfacing Peripherals
© 2021 Arm
GPIO Alternative Functions
• Pins may have different features
21 © 2021 Arm
Digital to Analog Converter
© 2021 Arm
Example: Waveform Generation
• DAC can be used to generate arbitrary waveforms
• Pre-generate lookup table
• Update DAC output value
• Delay
• Repeat
23 © 2021 Arm
C Code – Initialization
void tone_init(void) {
dac_init();
sinewave_init();
}
void sinewave_init(void) {
int n;
for (n = 0; n < NUM_STEPS; n++) {
sine_table[n] = MAX_DAC_CODE * (1 + sin(n*2*PI/NUM_STEPS)) / 2;
}
}
24 © 2021 Arm
C Code – Playback
void tone_play(int period_us, int num_cycles, wavetype wave) {
int sample, step;
while(num_cycles-- > 0) {
for (step = 0; step < NUM_STEPS; step++) {
switch(wave) {
case SINE: sample = sine_table[step]; break;
case SQUARE: sample = step < NUM_STEPS / 2 ? 0 : MAX_DAC_CODE;
break;
case RAMP: sample = (step * MAX_DAC_CODE) / NUM_STEPS; break;
}
dac_set(sample);
delay_us(period_us);
}
}
}
25 © 2021 Arm
Analog Comparator
© 2021 Arm
Example: Power Failure Detection
27 © 2021 Arm
Comparator Overview
28 © 2021 Arm
C Code – Comparator
void comparator_isr(int state) {
if (state) {
// Sense > Vref, turn off LEDs.
leds_set(0, 0, 0);
} else {
// Sense < Vref, turn on red LED.
leds_set(1, 0, 0);
}
}
int main(void) {
comparator_init();
comparator_set_trigger(CompBoth); // ISR on both rising and falling edges.
comparator_set_callback(comparator_isr);
…
29 © 2021 Arm
}
Analog to Digital Converter
© 2021 Arm
Example: Battery Monitoring
31 © 2021 Arm
Battery Discharge
Example Battery Discharge Curve
3
2.9
2.8
2.7
Voltage [V]
2.6
2.5
2.4
2.3
2.2
2.1
2
100 90 80 70 60 50 40 30 20 10 0
Capacity [%]
32 © 2021 Arm
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2021 Arm
Timer Peripherals
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Describe the mode of operation and function of the interrupt timer.
• Explain the three modes of operation of the standard timer including ‘compare mode’,
capture mode’ and pulse width modulation mode.
2 © 2021 Arm
Outline
• Types of Timer Peripherals
• Interrupt Timer
• PWM Module
• Low-Power Timer
• Real-Time Clock
• SYSTICK
3 © 2021 Arm
Types of Timer Peripherals
• Interrupt Timer
• Can generate periodically generate interrupts or trigger DMA (direct memory access) transfers
• PWM Module
• Connected to I/O pins, has input capture and output compare support
• Can generate PWM signals
• Can generate interrupt requests
• Low-Power Timer
• Can operate as timer or counter in all power modes
• Can wake up system with interrupt
• Can trigger hardware
• Real-Time Clock
• Powered by external 32.768 kHz crystal
• Tracks elapsed time (seconds) in register
• Can set alarm
• Can generate 1Hz output signal and/or interrupt
• Can wake up system with interrupt
• SysTick
• Part of CPU core’s peripherals
• Can generate periodic interrupt
4 © 2021 Arm
Timer/Counter Peripheral Introduction
Events Reload Value
Reload
or Presettable
PWM
Binary Counter
Clock
Interrupt
Current Count
• Common peripheral for microcontrollers
• Based on presettable binary counter, enhanced with configurability
• Count value can be read and written by MCU
• Count direction can often be set to up or down
• Counter’s clock source can be selected
Counter mode: count pulses which indicate events (e.g. odometer pulses)
Timer mode: clock source is periodic, so counter value is proportional to elapsed time (e.g. stopwatch)
• Counter’s overflow/underflow action can be selected
Generate interrupt
Reload counter with special value and continue counting
Toggle hardware output signal
5 © 2021 Arm
Interrupt Timer
© 2021 Arm
Interrupt Timer
Interrupt
Write Enabling timer loads Timer Interrupt generated, Interrupt generated, Interrupt generated, Interrupt generated,
1000 counter with 1000, counts down counter reloads with Write counter reloads withcounter reloads with counter reloads with
to timer starts counting to 0 1000, starts counting 700 700, starts counting 700, starts counting 700, starts counting
max to timer
max
• Load start value from register Read/write timer start value
8 © 2021 Arm
Configuring the Interrupt Timer
• Setup timer, set to tick at 10Hz
• timer_init(CLK_FREQ / 10);
• Set interrupt
• timer_set_callback(timer_isr);
• Enable module
• timer_enable();
• Disable module
• timer_disable();
9 © 2021 Arm
Example: Stopwatch
• Measure time with 100 us resolution
• Display elapsed time, updating screen every 10ms
• Controls
• S1: toggle start/stop
• Use interrupt timer
• Counter increment every 100 us
Set to timer to expire every 100 us
Calculate max value, e.g. at 24 MHz = round (100 us * 24MHz -1) = 2399
• LCD Update every 10 ms
Update LCD every nth ISR
n = 10 ms/100us = 100
Don’t update LCD in ISR! Too slow.
Instead set flag in ISR, poll it in main loop
10 © 2021 Arm
Timer / PWM Module
© 2021 Arm
Timer / PWM Module
• Core Counter
• Clock options - external or internal
• Prescaler to divide clock
• Can reload with set value, or overflow and wrap around
• N channels
• 3 modes
Capture Mode: Capture timer’s value when input signal changes
Output Compare: Change an output signal when timer reaches certain value
PWM: Generate pulse-width-modulated signal. Width of pulse is proportional to specified value.
• Possible triggering of interrupt, hardware trigger on overflow
• One I/O pin per channel
12 © 2021 Arm
Major Channel Modes
• Input Capture Mode
• Capture timer’s value when input signal changes
Rising edge, falling edge, both
• How long after I started the timer did the input change?
Measure time delay
13 © 2021 Arm
Input Capture Mode
Event on external signal
External Signal
Internal Counter
Value
Enable the module Fire interrupt with value saved Fire interrupt with value saved
14 © 2021 Arm
Wind Speed Indicator (Anemometer)
• Rotational speed (and pulse
frequency) is proportional to wind
velocity
• Two measurement options:
• Frequency (best for high speeds)
• Width (best for low speeds)
• Can solve for wind velocity v
𝐾 ∗ 𝑓𝑐𝑙𝑘
𝑣𝑤𝑖𝑛𝑑 =
𝑇𝑎𝑛𝑒𝑚𝑜𝑚𝑒𝑡𝑒𝑟
15 © 2021 Arm
TPM Capture Mode for Anemometer
• Configuration
• Set up module to count at given speed from internal clock
• Set up channel for input capture on rising edge
• Operation: Repeat
• First interrupt - on rising edge
Reconfigure channel for input capture on falling edge
Clear counter, start it counting
• Second interrupt - on falling edge
Read capture value, save for later use in wind speed calculation
Reconfigure channel for input capture on rising edge
Clear counter, start it counting
16 © 2021 Arm
Output Compare Mode
Compare Value
Timer
Value
Enable Overflow Overflow
timer
Toggle
Output
Pin Clear
Value
Set
• Action on match
– Toggle
– Clear
– Set
• When counter matches value …
• Output signal is generated
• Interrupt is called (if enabled)
17 © 2021 Arm
Pulse-Width Modulation
• Uses of PWM
• Digital power amplifiers are more efficient and less expensive than analog power amplifiers
Applications: motor speed control, light dimmer, switch-mode power conversion
Load (motor, light, etc.) responds slowly, averages PWM signal
• Digital communication is less sensitive to noise than analog methods
PWM provides a digital encoding of an analog value
Much less vulnerable to noise
• PWM signal characteristics
• Modulation frequency – how many
pulses occur per second (fixed)
• Period – 1/(modulation frequency)
• On-time – amount of time that each
pulse is on (asserted)
• Duty-cycle – on-time/period
• Adjust on-time (hence duty cycle) to
represent the analog value
18 © 2021 Arm
PWM Mode
Compare Value
Timer
Value
Enable Overflow Overflow
timer
Output
Pin
𝐶𝑜𝑚𝑝𝑎𝑟𝑒 𝑉𝑎𝑙𝑢𝑒
𝐷𝑢𝑡𝑦 𝐶𝑦𝑐𝑙𝑒 = ∙ 100%
𝑀𝑎𝑥 𝑉𝑎𝑙𝑢𝑒
19 © 2021 Arm
PWM to Drive Servo Motor
• Servo PWM signal
• 20 ms period
• 1 to 2 ms pulse width
20 © 2021 Arm
Low Power Timer
© 2021 Arm
Low Power Timer Overview
Average Interrupt routine
Current Current
Processor
Time
Sleeping mode With Low-Power
Always On Timer
• Features
• Count time or external pulses
• Generate interrupt when counter matches compare value
• Interrupt wakes MCU from any low power mode
• Current draw can be reduced to microamps or even nanoamps!
• Use the WFI instruction (Wait For Instruction)
• Puts CPU in low power mode until interrupt request
22 © 2021 Arm
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2021 Arm
Serial Communications
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Outline the concepts of both serial and parallel communication and give examples of
their applications.
• Explain the difference between synchronous and asynchronous serial communication.
• Compare serial and parallel communication including their advantages and
disadvantages.
• Outline the difference between synchronous half-duplex and full-duplex serial buses
• Describe UART and its communication protocol.
• Describe SPI communication protocol.
• Describe I2C communication protocol.
2 © 2021 Arm
Overview
• Serial communications
• Concepts
• Tools
• Software: polling, interrupts, and buffering
• Protocols:
• UART
• SPI
• I2C
3 © 2021 Arm
Why Communicate Serially?
• Native word size is multi-bit (8, 16, 32, etc.)
• Often it’s not feasible to support sending all the word’s bits at the same time
• Cost and weight: more wires needed, larger connectors needed
• Mechanical reliability: more wires => more connector contacts to fail
• Timing Complexity: some bits may arrive later than others due to variations in capacitance and
resistance across conductors
• Circuit complexity and power: may not want to have 16 different radio transmitters + receivers in the
system
4 © 2021 Arm
Example System: Voyager Spacecraft
• Launched in 1977
• Constraints: Reliability, power, size, weight, reliability, reliability, etc.
• “Uplink communications are via S-band (16-bits/sec command rate) while an X-band transmitter provides
downlink telemetry at 160 bits/sec normally and 1.4kbps for playback of high-rate plasma wave data. All
data are transmitted from and received at the spacecraft via the 3.7 meters high-gain antenna (HGA).”
http://voyager.jpl.nasa.gov/spacecraft/index.html
• Uplink – to spacecraft
• Downlink – from spacecraft
5 © 2021 Arm
Example System
Peripheral
Wr Rd Data
Rd Wr
Peripheral
Peripheral
Data
MCU
Data
Rd Wr
Data Rd Wr
Peripheral
6 © 2021 Arm
Parallel Buses
• All devices use buses to share data, read, and write signals
• MCU uses individual select lines to address each peripheral
• MCU requires fewer pins for data, but still one per data bit
• MCU can communicate with only one peripheral at a time
7 © 2021 Arm
Synchronous Serial Data Transmission
Parallel Data In
D3 D2 D1 D0
Serial Serial
D Q D Q D Q D Q Data In D Q D Q D Q D Q
Data Out
Clk Clk
D3 D2 D1 D0
Parallel Data Out
8 © 2021 Arm
Synchronous Full-Duplex Serial Data Bus
• Now can use two serial data lines - one for reading, one for writing.
• Allows simultaneous send and receive full-duplex communication
9 © 2021 Arm
Synchronous Half-Duplex Serial Data Bus
10 © 2021 Arm
Asynchronous Serial Communication
Data bits
Start bit Stop bit
Time Zero
Tbit*1.5
Tbit*2.5
Tbit*3.5
Tbit*4.5
Tbit*5.5
Tbit*6.5
Tbit*7.5
Tbit*8.5
Tbit*9.5
Data Sampling
Time at Receiver
12 © 2021 Arm
Error Detection
• Can send additional information to verify data was received correctly
• Need to specify which parity to expect: even, odd or none.
• Parity bit is set so that total number of “1” bits in data and parity is even (for even
parity) or odd (for odd parity)
• 01110111 has 6 “1” bits, so parity bit will be 1 for odd parity, 0 for even parity
• 01100111 has 5 “1” bits, so parity bit will be 0 for odd parity, 1 for even parity
• Single parity bit detects if 1, 3, 5, 7 or 9 bits are corrupted, but doesn’t detect an even
number of corrupted bits
• Stronger error detection codes (e.g. Cyclic Redundancy Check) exist and use multiple
bits (e.g. 8, 16), and can detect many more corruptions.
• Used for CAN, USB, Ethernet, Bluetooth, etc.
13 © 2021 Arm
Tools for Serial Communications Development
• Tedious and slow to debug serial protocols • Saelae 8-Channel Logic Analyzer
with just an oscilloscope • $150 (www.saelae.com)
• Plugs into PC’s USB port
• Instead use a logic analyzer to decode bus • Decodes SPI, asynchronous serial, I2C, 1-Wire,
traffic CAN, etc.
• Worth its weight in gold! • Build your own: with Logic Sniffer or
related open-source project
14 © 2021 Arm
Software Structure –
Handling asynchronous
Communication
© 2021 Arm
Software Structure
• Communication is asynchronous to program
• Don’t know what code the program will be executing …
when the next item arrives
when current outgoing item completes transmission
when an error occurs
• Need to synchronize between program and serial communication interface somehow
• Options
• Polling
Wait until data is available
Simple but inefficient of processor time
• Interrupt
CPU interrupts program when data is available
Efficient, but more complex
16 © 2021 Arm
Serial Communications and Interrupts
Main Program or
• Want to provide multiple threads of control in the other threads
program
• Main program (and subroutines it calls)
send_string get_string
• Transmit ISR – executes when serial interface is ready to send
another character
• Receive ISR – executes when serial interface receives a
character
• Error ISR(s) – execute if an error occurs
17 © 2021 Arm
Code to Implement Queues newer
older
data
• Enqueue at tail: tail is the index of the next free entry data
18 © 2021 Arm
Defining the Queues
typedef struct {
uint8_t *data; //!< Array of data, stored on the heap.
uint32_t head; //!< Index in the array of the oldest element.
uint32_t tail; //!< Index in the array of the youngest element.
uint32_t size; //!< Size of the data array.
} Queue;
19 © 2021 Arm
Initialization and Status Inquiries
int queue_init(Queue *queue, uint32_t size) {
queue->data = (uint8_t*)malloc(sizeof(uint8_t) * size);
queue->head = 0;
queue->tail = 0;
queue->size = size;
queue_enqueue(…, c)
• Receiving data:
queue_dequeue(…, &c)
22 © 2021 Arm
Software Structure –
Parsing Messages
© 2021 Arm
Decoding Messages
• Two types of messages
• Actual binary data sent
– First identify message type
– Second, based on this message type, copy binary data from message fields into variables
May need to use pointers and casting to get code to translate formats correctly and safely
• ASCII text characters representing data sent
First identify message type
Second, based on this message type, translate (parse) the data from the ASCII message format into a binary format
Third, copy the binary data into variables
24 © 2021 Arm
Example Binary Serial Data: TSIP
25 © 2021 Arm
Example ASCII Serial Data: NMEA-0183
$IDMSG,D1,D2,D3,D4,…,Dn*CS\r\n
• $ denotes the start of a message
• ID is a two letter mnemonic to describe the source of data, e.g. GP signifies GPS
• MSG is a three letter mnemonic to describe the message content.
• Commas are used to delaminate the data fields.
• Dn represents each of the data fields.
• * is used to separate the data from the checksum.
• CS contains two ASCII characters representing the hex value of the checksum.
• \r\n is the carriage return character followed by the new line character to denote the
end of a message.
26 © 2021 Arm
State Machine for Parsing NMEA-0183
Any char. except *, \r or \n
Start $ Append char to buf.
Append char to buf. Talker + Inc. counter
*, \r or \n, Sentence
non-text, or Type
buf==$SDDBT, $VWVHW, or $YXXDR
counter>6 Enqueue all chars. from buf
/r or /n
Sentence
Body Any char. except *
Enqueue char
*
Enqueue char
Checksum
1
Any char.
Save as checksum1
Checksum
2
Any char.
27 © 2021 Arm Save as checksum2
Parsing
switch (parser_state) {
case TALKER_SENTENCE_TYPE:
switch (msg[i]) {
‘*’:
‘\r’:
‘\n’:
parser_state = START;
break;
default:
if (Is_Not_Character(msg[i]) || n>6) {
parser_state = START;
} else {
buf[n++] = msg[i];
}
break;
}
if ((n==6) & … ){
parser_state = SENTENCE_BODY;
}
break;
case SENTENCE_BODY:
break;
28 © 2021 Arm
Asynchronous serial
(UART) Communications
© 2021 Arm
Transmitter Basics
Data
bits
Data Sampling
Time Zero
Time at Receiver
Tbit
Tbit
Tbit
Tbit
Tbit
Tbit
Tbit
Tbit
Tbit
Tbit
Tbit
• If no data to send, keep sending 1 (stop bit) – idle line
• When there is a data word to send
• Send a 0 (start bit) to indicate the start of a word
• Send each data bit in the word (use a shift register for the transmit buffer)
• Send a 1 (stop bit) to indicate the end of the word
30 © 2021 Arm
Receiver Basics
Data
bits
Data Sampling
Zero
Time
Tbit*10.5
Tbit*1.5
Tbit*2.5
Tbit*3.5
Tbit*4.5
Tbit*5.5
Tbit*6.5
Tbit*7.5
Tbit*8.5
Tbit*9.5
Time at
Receiver
• Wait for a falling edge (beginning of a Start bit)
• Then wait ½ bit time
• Do the following for as many data bits in the word
Wait 1 bit time
Read the data bit and shift it into a receive buffer (shift register)
• Wait 1 bit time
• Read the bit
if 1 (Stop bit), then OK
if 0, there’s a problem!
31 © 2021 Arm
For this to work…
• Transmitter and receiver must agree on several things (protocol)
• Order of data bits
• Number of data bits
• What a start bit is (1 or 0)
• What a stop bit is (1 or 0)
• How long a bit lasts
Transmitter and receiver clocks must be reasonably close, since the only timing reference is the start of the start bit
32 © 2021 Arm
Input Data Oversampling
33 © 2021 Arm
Baud Rate
• Need to divide high frequency clock down to desired baud rate * oversampling factor
• Example
• 24MHz -> 4800 baud with 16x oversampling
• Division factor = 24E6/(4800*16) = 312.5. Must round to closest integer value ( 312 or 313), will have a
slight frequency error.
34 © 2021 Arm
Using the UART
• When can we transmit? • When can we receive a byte?
• Transmit peripheral must be ready for data • Receive peripheral must have data
• Can poll the status register • Can poll the status register
• Or we can use an interrupt, in which case we • Or we can use an interrupt, and again we will
will need to queue up data need to queue the data
35 © 2021 Arm
Software for Polled Serial Comm.
void test_polled() {
uart_init(9600);
uart_enable();
while(1) {
uart_tx(uart_rx()); // echoes the received character back
}
}
36 © 2021 Arm
Example Receiver: Display Data on LCD
line = col = 0;
while (1) {
c = uart_rx();
lcd_set_cursor(col, line);
lcd_put_char(c);
col++;
if (col > 7) {
col = 0;
line++;
if (line > 1) {
line = 0;
}
}
}
37 © 2021 Arm
Software for Interrupt-Driven Serial Comm.
• Use interrupts
38 © 2021 Arm
Interrupt Handler
Queue rx_queue;
void uart_rx_isr(uint8_t rx) {
// Store the received character
queue_enqueue(&rx_queue, rx);
}
int main() {
queue_init(&rx_queue, 128);
uart_init(9600);
uart_set_rx_callback(uart_rx_isr);
uart_enable();
…
}
39 © 2021 Arm
USB to UART Interface
• PCs haven’t had external asynchronous serial interfaces for a while, so how do we
communicate with a UART? USB UART
TX RX
D+ / D- USB to
PC UART MCU
bridge
• USB to UART interface RX TX
• USB connection to PC
• Logic level (0-3.3V) to microcontroller’s UART (not RS232 voltage levels)
• USB01A USB to serial adapter
• http://www.pololu.com/catalog/product/391
• Can also supply 5V, 3.3V from USB
40 © 2021 Arm
Building on Asynchronous Comm.
• Problem #1
• Logic-level signals (0 to 1.65V, 1.65V to 3.3V) are sensitive to noise and signal degradation
• Problem #2
• Point-to-point topology does not support a large number of nodes well
Need a dedicated wire to send information from one device to another
Need a UART channel for each device the MCU needs to talk to
Single transmitter, single receiver per data wire
41 © 2021 Arm
Solution to Noise: Higher Voltages
• Use higher voltages to improve noise margin:
+3 V to +15 V, -3 V to -15 V
• Example IC (Maxim MAX3232) uses charge pumps to generate higher voltages from
3.3V supply rail
42 © 2021 Arm
Solution to Noise: Differential Signaling
43 © 2021 Arm
Solutions to Poor Scaling
• Approaches
• Allow one transmitter to drive multiple receivers (multi-drop)
• Connect all transmitters and all receivers to same data line (multi-point network). Need to add a
medium access control technique so all nodes can share the wire
• Example Protocols
• RS-232: higher voltages, point-to-point
• RS-422: higher voltages, differential data transmission, multi-drop
• RS-485: higher voltages, multi-point
44 © 2021 Arm
SPI Communications
© 2021 Arm
Hardware Architecture
• All chips share bus signals
• Clock SCK
• Data lines MOSI (master out, slave in) and MISO
(master in, slave out)
46 © 2021 Arm
Serial Data Transmission
Parallel Data In
D3 D2 D1 D0
Serial Serial
D Q D Q D Q D Q Data In D Q D Q D Q D Q
Data Out
Clk Clk
D3 D2 D1 D0
Parallel Data Out
• Use shift registers and a clock signal to convert between serial and parallel formats
• Synchronous: an explicit clock signal is along with the data signal
47 © 2021 Arm
SPI Example: Secure Digital Card Access
• SD cards have two CMD / CLK /
GND VDD SCLK
communication modes DAT3 / DI GND DAT0/
• Native 4-bit CS DO
DAT2 /
• Legacy SPI 1-bit X DAT1/
• VDD from 2.7 V to 3.6 V X
48 © 2021 Arm
I2C Communications
© 2021 Arm
I2C Bus Overview
• “Inter-Integrated Circuit” bus
• Multiple devices connected by a shared serial bus
• Bus is typically controlled by master device, subordinates respond when addressed
• I2C bus has two signal lines
• SCL: Serial clock
• SDA: Serial data
• Full details available in “The I2C-bus Specification”
50 © 2021 Arm
I2C Bus Connections
51 © 2021 Arm
Master Writing Data to Subordinate
Start
ACK
Write ACK
Address Data
Repeated
Start
52 © 2021 Arm
Master Reading Data from Subordinate
SDA
SCL
Start Stop
ACK
Read ACK NACK
Address Data Data
53 © 2021 Arm
I2C Addressing
• Each device (IC) has seven-bit address
• Different types of device have different default addresses
• Sometimes can select a secondary default address by tying a device pin to a different logic level
54 © 2021 Arm
Enabling i2c
void temperature_init(void) {
i2c_init();
i2c_enable();
}
void temperature_enable(void) {
// Start conversion.
i2c_start();
i2c_tx(subordinate_ADDRESS | WRITE);
i2c_tx(START_CONVERT_T);
i2c_stop();
}
55 © 2021 Arm
Reading 12 bits from a temperature sensor
float temperature_read(void) {
short temp;
i2c_start();
i2c_tx(subordinate_ADDRESS | WRITE);
i2c_tx(READ_TEMPERATURE);
i2c_start();
i2c_tx(subordinate_ADDRESS | READ);
temp = i2c_rx() << 8;
i2c_ack();
temp |= i2c_rx();
i2c_nack();
i2c_stop();
// Sign extend from 16-bit to 12-bit.
temp >>= 4;
// Convert from fixed-point to floating point.
return temp / (float)16;
56
} © 2021 Arm
PROTOCOL COMPARISON
57 © 2021 Arm
Factors to Consider
• How fast can the data get through?
• Depends on raw bit rate, protocol overhead in packet
• How many hardware signals do we need?
• May need clock line, chip select lines, etc.
• How do we connect multiple devices (topology)?
• Dedicated link and hardware per device - point-to-point
• One bus for manager transmit/subordinate receive, one bus for subordinate transmit/manager
receive
• All transmitters and receivers connected to same bus – multi-point
58 © 2021 Arm
Protocol Trade-Offs
Protocol Speed Signals Req. for Bidirectional Device Addressing Topology
Communication with N devices
UART (Point to Fast – Tens of Mbit/s 2*N (TxD, RxD) None Point-to-point full
Point) duplex
UART (Multi- Fast – Tens of Mbit/s 2 (TxD, RxD) Added by user in Multi-drop
drop) software
SPI Fast – Tens of Mbit/s 3+N for SCLK, MOSI, MISO, and Hardware chip select Multi-point full-
one SS per device signal per device duplex, multi-drop
half-duplex buses
I2C Moderate – 100kbit/s, 400 2 (SCL, SDA) In packet Multi-point half-
kbit/s, 1Mbit/s, 3.4Mbit/s. duplex bus
Packet overhead.
59 © 2021 Arm
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2021 Arm
Programming for Power-
Efficient Computing :
High Level Techniques
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Describe the risks with optimizing code at high level.
• Describe the advantages of high-level code optimisation.
• Identify different types of data structures and their mode of data access.
• Explain binary search algorithm.
• Describe the advantages and drawbacks of using optimized libraries in programming.
2 © 2021 Arm
Outline
• Low Power Computing
• Optimization & The Software Development Process
• Optimization Risks
• High Level Optimizations with Examples
• Better Algorithms – Search Example
• Making Searches Faster
• More Data Structures
• Optimization: Binary Search
• Use of Optimized Libraries
• Precision
• Open-source Software
3 © 2021 Arm
Low Power Computing
• Power = Energy / Time
• Personal Computers operate to a maximum power budget.
• Mobile Computers operate to maximum power and energy budgets.
• There are many ways in which we can reduce power consumption:
Reduce the amount of computations necessary e.g. optimize programs to remove
redundant operations, use optimized libraries, reduce memory transactions by
reusing local data, use better algorithms.
Toolchain optimization e.g. help the compiler do a better job though directives to
exploit hardware-specific features.
Exploit power efficient hardware-assisted techniques e.g. big.LITTLE
multiprocessing, Dynamic Voltage and Frequency Scaling (DVFS).
Improve hardware occupancy e.g. better scheduling algorithms, hide latency.
4 © 2021 Arm
Optimization & The Software Development Process
• Many opportunities for optimization Requirements
Multiple levels of design hierarchy.
Requirements, algorithm, architecture, design, Algorithm
implementation… and Architecture
5 © 2021 Arm
Optimization Risks
• Unpredictability of development effort – Bug fixes, feature changes, feature additions,
upgrades
needed – Basis for follow-up and evolved products,
• Balancing act platform for range of products
– Pro: expected performance gain • What if you’ve forgotten how your optimized
– Cons: additional development time required,
code works?
increased schedule risk
• What if someone else needs to maintain your
• Difficulties in prediction
optimized code?
– How much gain will we get after this
optimization? Will it be optimized enough so we • Optimization often hurts code maintainability
can stop optimizing the program? • Need to optimize in a way which retains
– How long will it take to perform this optimization? maintainability
– How many more optimizations will we need?
6 © 2021 Arm
High Level Optimizations: Do Less Work
• Fundamental concept: perform less computation
• Lazy (or deferred) execution: don’t compute data until needed
• Early decisions: for decisions based on computations, may be able to use intermediate results
• Applied broadly
• Many algorithms implement these concepts
• Compilers try to apply these in optimization passes
• Role of developer
• Implement concepts directly in source code
• Help the compiler apply these concepts
7 © 2021 Arm
Example Program: “Nearby Points of Interest”
• Find distance and bearing from current position to
the closest position of a fixed set of positions.
• Positions are described as coordinates on the
surface of the Earth (latitude, longitude)
(lat2, lon2)
a d
(lat1, lon1)
d = acos((sin 𝑙𝑎𝑡1 ∗ sin 𝑙𝑎𝑡2 + cos 𝑙𝑎𝑡1 ∗ cos 𝑙𝑎𝑡2 ∗ cos 𝑙𝑜𝑛2 − 𝑙𝑜𝑛1 ∗ 6371
180
a = 𝑎𝑡𝑎𝑛2 cos 𝑙𝑎𝑡1 ∗ sin 𝑙𝑎𝑡2 − sin 𝑙𝑎𝑡1 ∗ cos 𝑙𝑎𝑡2 ∗ cos 𝑙𝑜𝑛2 − 𝑙𝑜𝑛1 , sin 𝑙𝑜𝑛2 − 𝑙𝑜𝑛1 ∗ cos 𝑙𝑎𝑡2 ∗
𝜋
8 © 2021 Arm
Example Program: “Nearby Points of Interest” - Core Code:
Calculate Distance
float Calc_Distance( PT_T * p1, const PT_T * p2) {
// calculates distance in kilometers between locations
return acos(sin(p1->Lat*PI/180)*
sin(p2->Lat*PI/180) +
cos(p1->Lat*PI/180)*cos(p2->Lat*PI/180)*
cos(p2->Lon*PI/180 - p1->Lon*PI/180)) * 6371;
}
9 © 2021 Arm
Example Program: “Nearby Points of Interest”
• Calc_Distance is called on every point, returning closest_d (distance in km)
• This distance has two uses
• To identify closest point
• To be returned to calling function
11 © 2021 Arm
Optimized Code
• Eliminates NPoints-1 floating point multiplies
float Calc_Distance_in_Radians( PT_T * p1, const PT_T * p2) {
// calculates distance in radians between locations
return acos(p1->SinLat * p2->SinLat +
p1->CosLat * p2->CosLat * cos(p2->Lon - p1->Lon)); // no *6371 here
}
void Find_Nearest_Point( . . . ) {
while (strcmp(points[i].Name, "END")) {
d = Calc_Distance_in_Radians(&ref, &(points[i]) );
. . .
}
*distance = d*6371;
. . .
}
12 © 2021 Arm
Taking it Further
• Can we make distance comparisons without using acos?
• How is acos related to its argument X?
• Can we call acos just once – to compute the distance to the closest point?
float Calc_Distance_in_Radians( PT_T * p1, const PT_T * p2) {
// calculates distance in radians between locations
return acos(p1->SinLat * p2->SinLat +
p1->CosLat * p2->CosLat * cos(p2->Lon - p1->Lon));
}
13 © 2021 Arm
Taking it Further
float Calc_acos_arg ( PT_T * p1,
acos(X) const PT_T * p2) {
3.5
return (
3 p1->SinLat * p2->SinLat +
2.5
p1->CosLat * p2->CosLat
*cos(p2->Lon - p1->Lon));
2
}
1.5 • acos always decreases when input X increases
1 • Nearest point will have minimum distance and
maximum X
0.5
• So search for point with maximum argument to
0
-1 -0.5 0 0.5 1
acos function
X • After finding nearest point (max X), compute
distance_km = acos(X) * 6371
14 © 2021 Arm
Even More Optimized Code
• Eliminates NPoints-1 floating point acos calculations.
float Calc_Distance_inverse ( PT_T * p1, const PT_T * p2) {
// calculates distance in radians between locations
return (p1->SinLat * p2->SinLat +
p1->CosLat * p2->CosLat * cos(p2->Lon - p1->Lon); // no acos here
}
void Find_Nearest_Point( . . . ) {
while (strcmp(points[i].Name, "END")) {
d = Calc_Distance_inverse(&ref, &(points[i]) );
if (d>closest_d) closest_d = d;
i++;
}
*distance = acos(d)*6371;
. . .
}
15 © 2021 Arm
Better Algorithms –
Search Example
© 2021 Arm
Making Searches Faster
• Improve the data organization, possibly also enabling
a better algorithm
• Example data structure: List
• Each node holds a data element.
– May be connected to one or two other nodes with pointers:
One successor (next)
Optional: one predecessor (prev)
• Sequential access to data.
– Must start at current node (or start node) and traverse list by
visiting nodes via next or prev pointers.
• Examples: linked list, queue, circular queue, double-ended
queue.
17 © 2021 Arm
More Data Structures
• Tree - hierarchical
• Each node holds a data element. May be connected
to other nodes with pointers:
Up to one parent
Down to N children
• Sequential access to data. Must traverse by visiting
nodes, but additional connections reduce number of
intermediate nodes.
• Hierarchical structure. May be represented explicitly
with pointers or implicitly with index location of
element.
• Array – random access
• Same time to access each element
• Flat structure
• May be multidimensional
18 © 2021 Arm
Optimization: Binary Search Step 1 Step 2 Step 3 Step 4
20 © 2021 Arm
Precision
• Match the data types and approximations method to the range and accuracy
needs of your algorithm.
• You can trade-off accuracy/precision for speed, code size, energy/power.
• Floating point arithmetic is needed when you are dealing with a large range
of data values that fixed-point arithmetic cannot deal with.
• Single precision floating point uses 32 bits (8 bits for exponent, 24 for fraction
mantissa) whereas double-precision floating point precision uses 64 bits (11
bits for exponent, 53 for fraction mantissa).
• Floating point is slow if there is no hardware support (must be emulated in
software)
• The IEEE standard for floating point arithmetic (IEEE 754) is a technical
standard for floating-point computation widely implemented in hardware.
21 © 2021 Arm
Open Source Software
• Many optimized libraries are available as Open Source Software (OSS).
• You need to know about the type of license you are signing up to before you
use any OSS.
Permissive Copyleft
License requirements are minimal. Source code must be made
Broad grant of rights (with no available for binary distribution.
conditions for particular licensing Original work, any modifications,
terms). any derivative work must remain
Includes MIT, BSD, and Apache 2.0 under the same license.
licenses. Include GPL, LGPL, and MPL
licenses.
22 © 2021 Arm
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2021 Arm
Programming for Power-
Efficient Computing :
Low Level Techniques
© 2021 Arm
Learning Objectives
At the end of this lecture, you should be able to:
• Outline the stages of the compiler
• Identify the two types of scalar optimizations and their features.
• Explain machine-independent code optimization and its advantages.
• Explain machine-dependent code optimization and its advantages.
• Describe how the following could hinder compiler optimization: excessive variable
scope and Automatic Promotions in Arithmetic Expressions.
• Compare and contrast optimisation for power, energy, speed and code size.
2 © 2021 Arm
Outline
• What should the compiler be able to do?
• What could stop the compiler from optimizing?
• How can we tell if the compiler has applied the optimizations?
• How can we modify the source code to help the compiler optimize?
3 © 2021 Arm
Starting Points for Efficient Code
• Write correct code, then optimize.
• Consider the use of optimized libraries for critical paths in your program.
4 © 2021 Arm
Review of Compiler Stages
• Parser
• reads in source code e.g. C code, checks for lexical and syntax errors,
• forms intermediate code (tree representation)
• High-Level Optimizer
• Modifies intermediate code (processor-independent)
• Code Generator
• Creates assembly code step-by-step from each node of the intermediate code
• Allocates variable uses to registers
• Low-Level Optimizer
• Modifies assembly code (parts are processor-specific)
• Assembler
• Creates object code (machine code)
• Linker/Loader
• Creates executable image from object file
5 © 2021 Arm
don’t handcuff the
compiler
© 2021 Arm
Approach
• Which optimizations is the compiler capable of?
• How can we modify the source code to help the compiler optimize?
7 © 2021 Arm
What should the
compiler be able to do?
© 2021 Arm
Scalar Optimizations
• What should you expect your compiler to be able to do?
• Machine-Independent (MI)
• Eliminate code with no effect
• Specialize computations
• Eliminate redundant computations
• Machine-Dependent (MD)
• Take advantage of special hardware features
• Manage or hide latency
• Manage limited machine resources
9 © 2021 Arm
MI: Eliminate code with no effect
• Useless code (based on data-flow • Useless control flow (based on
graph - DFG) control flow graph - CFG)
• Mark critical operations Bi Bi
• Fold redundant branches
– sets return value for procedure Bj Bj
– input/output statement
– modifies non-local data
• Remove empty blocks Bi
Bj Bj
• Find operations which define data
(operands) used by these critical • Combine blocks Bi
Bi
operations Bj Bj
• Constant propagation
• if a variable has a known value at a given point in the program, it might possible to
specialize operations based on this knowledge e.g. perform calculations at compile
time rather than runtime.
• Peephole optimization
• Recognize patterns of assembly instructions which can be replaced with a faster set
11 © 2021 Arm
MI: Enabling Transformations
• Goal is to make code more amenable to other optimizations
• Loop unrolling for (i = 0; i < 100; i++)
function-call ();
for (i = 0; i < 100; i+=2){
function-call ();
• replicate loop body function-call ();
}
• Increases speed by reducing loop overhead
• At the expense of increase binary code size
if(x>y){
for (i=0; i<N; i++) { for (i=0; i<N; i++)
• Loop unswitching if(x>y) a[i] = b[i] * x;
a[i] = b[i] * x; } else
• hoist loop-invariant control-flow operations out of loop else {
a[i] = b[i] * y; for (i=0; i<N; i++)
}
• Renaming }
a[i] = b[i] * y;
13 © 2021 Arm
What could stop the
compiler from
optimizing?
© 2021 Arm
Excessive Variable Scope
• Avoid declaring variables as globals or statics when they could be locals
• Globals and statics are allocated permanent storage in memory, not reusable stack
space
15 © 2021 Arm
Automatic Promotions in Arithmetic Expressions
• How are expressions with mixed data types evaluated?
float f;
char c;
int r;
r = f * c;
16 © 2021 Arm
Resulting Object Code
• Call routine to convert c to
float f; float
char c; • Call routine to perform
int r; floating point multiply with f
• Call routine to convert result
r = f * c; from floating point to integer
• store result in r
17 © 2021 Arm
ANSI C Standard for Argument Promotions
• Integral function arguments smaller than an int for non-prototyped functions are
promoted to ints
• Extra time converting to int
• Extra space on stack
• So prototype all functions
• Function:
int Find_Average(char a, char b, char c, char d) {
...
}
• Correct, complete prototype:
int Find_Average(char a, char b, char c, char d);
• Parameter names are optional but good for documentation/maintainability.
• Where should the prototype go?
• If program is broken into modules, put prototype in header (.h) file
• Otherwise put prototype near top of C code file, before the function is called
18 © 2021 Arm
Compiler Optimization Levels e.g. Arm Compiler
• Optimizing for code vs. speed
• –Otime: the compiler aggressively optimizes for time e.g. aggressive inlining, at the expense of a
possible increase in image (binary) size.
• –Ospace: the compiler optimizes for image size at the expense of a possible increase in execution time.
• Optimization levels:
• -O0 : Minimum optimization. Turns off most optimizations. Best possible debug view.
• -O1 : Restricted optimization e.g. removes unused inline functions. Turns off optimizations that seriously
degrade the debug view.
• -O2 : High optimization. e.g. compiler automatically inlines functions if optimizing for time. The compiler
may perform optimizations that cannot be described by debug information.
• -O3 : Maximum optimization e.g. loop unrolling if optimizing for time. This can give significant
performance benefits at a small code size cost, but at the risk of a longer build time.
19 © 2021 Arm
How Can WE TELL IF THE
COMPILER HAS APPLIED
THE OPTIMIZATIONS?
© 2021 Arm
Profiling Computer Programs
• Code profilers are tools that instrument computer programs (either the program source
code or its binary form(s)).
• Code profilers can measure the frequency and execution time of certain instructions or
function calls, memory accesses and memory usage, event trace etc.
• Code profilers are crucial for code optimization e.g. for speed, memory footprint,
energy/power.
• Some of the most popular profilers include:
Statistical profilers such as
Shark (OSX), oprofile (Linux), VTune (Intel) and Streamline (Arm).
Intermediate language instrumentation tools such as OpenPAT.
Runtime instrumentation tools such as Valgrind and DynamoRIO.
21 © 2021 Arm
How can we modify the
source code to help the
compiler optimize?
© 2021 Arm
Iterative Optimization Process
1. Read the Compiler Manual
2. Apply the necessary hints/directives for the compiler to achieve the sought
optimizations
23 © 2021 Arm
Is Optimizing For Power The Same As Optimizing For Energy? (1/2)
• Not always. Optimizing for power aims to reduce the amount of energy consumed over an
interval of time whereas optimizing for energy aims to optimize the total energy
consumed.
• Remember: power is linked to heat, energy is the total you pay for in the end.
• Assume the power consumption is defined by the number of instructions executed at any
moment in time**. By rescheduling instructions on a VLIW processor for instance, or any
parallel computer system, we can have different power and energy consumption profiles.
Average sum (proportional to energy)
power
power
power
24 © 2021 Arm
Is Optimizing For Power The Same As Optimizing For Energy? (2/2)
• Suppose we can execute certain instructions faster (e.g. in half the time) by using certain
hardware features e.g. exploiting special SIMD hardware.
Average sum (proportional to energy)
Average sum (proportional to energy)
power
power
time time
Scheduling (a) Scheduling (d) exploits certain hardware
features (using the same power budget)
Average sum (proportional to energy)
power
time
Scheduling (e) exploits certain hardware features (at a higher power budget this time) –
still less energy consumed overall compared to (a)
25 © 2021 Arm
Is Optimizing For Power/Energy The Same As Optimizing For
Speed? (1/2)
• Not necessarily - often higher performance leads to more energy consumption e.g. through
increasing clock frequency.
• However, eliminating redundant instructions or memory hierarchy optimizations, for
instance, reduce the overall execution time (i.e. increase execution speed) and often result
in energy reductions consequently. But even this is not always guaranteed.
• Consider the following code:
for (i=0; i<10; i++) {
x=2*y
z[i] = w[i] + 1;
}
• “x = 2 * y” is loop invariant so we can take it out of the loop and execute it just once to save
time.
26 © 2021 Arm
Is Optimizing For Power/Energy The Same As Optimizing For
Speed? (2/2)
for (i=0; i<10; i++) { x=2*y
x=2*y for (i=0; i<10; i++) {
z[i] = w[i] + 1; z[i] = w[i] + 1;
} }
• In a VLIW architecture, however, the code to the left might well be quicker to run as the
processor could well be able to execute x = 2 * y in parallel with z[i] = w[i] + 1 so there
would be no need to execute it separately as in the case of the code to the right.
• Code to the left however would be more energy consuming as x = 2 * y would be executed
10 times. So here, quicker execution does not mean less energy consumption as redundant
computation is needed to run the code quicker!
• Other similar examples could include memory prefetching and branch prediction.
• In general, optimizing for power/energy on the back of optimizing for speed is more
demanding as we often have to compensate for the power/energy cost of extra hardware,
27
or extra
© 2021 Arm
redundancy.
Is Optimizing For Power/Energy The Same As Optimizing For Code
Size?
• Not always. Yes, reducing code size results in less memory usage which can reduce
power/energy consumption especially that memory/memory access is often
power/energy hungry.
• Eliminating redundant variables for instance reduces code size and power/energy
consumption.
• But reducing code size might also be achieved at the expense of more computations,
which consume extra power/energy as illustrated in the variable value-swap codes below.
temp= x; x= x + y;
x=y; y= x - y;
y=temp; x= x - y;
• Only two variables are used in the code to the right (instead of three in the code to the
left).
28 © 2021 Arm
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
© 2021 Arm