ARM Cortex-M4
ARM Cortex-M4
ARM doesn’t develop microcontrollers silicon chip but it only provides IP core (Intellectual Property)
for a microprocessor.
The business through which it sells the design to the various manufacturer is known as IP licensing.
ARM Embedded System
CLASSIC ARM
PROCESSOR
"The Thumb instruction set is a subset of the most commonly used 32-bit ARM
instructions. Thumb instructions are 16 bits long, and have a corresponding 32-bit ARM
instruction that has the same effect on processor model."
Code density refers loosely to how many microprocessor instructions it takes to perform a
requested action, and how much space each instruction takes up. Generally speaking, the less
space an instruction takes and the more work per instruction that a microprocessor can do, the
more dense its code is.
ARM Cortex-M4
INTRODUCTION to ARM Cortex-M4
All internal registers such as general purpose and special function, are of 32-bit.
Moreover, data paths, functional units ( ALU) which perform arithmetic (addition, subtraction, multiplication, etc)
and logical operations ( AND, OR, less than, greater than, etc) on data are also of 32-bit size.
Hence, size of functional unit, data path, internal registers, interfacing buses, memory address range are the main
factors that define the 32-bit size of ARM Cortex-M4 processor.
The size of processor in terms of bits defines the maximum addressable range or the maximum address range it can
handle.
For example, ARM Cortex-M4 microcontrollers can handle 2^32 = 4GB of memory address space
ARM Cortex-M4 is a processor core specifically designed for :
Efficient and high-performance execution of signal processing tasks and low power consumption.
Promised - low interrupt latency.
Support for Floating Point arithmetic.
Most suitable for real time application which requires a very quick interrupt response (it can not afford to delay)
Automotive: Used in automotive control systems, including engine control units (ECUs) and
Advanced driver assistance systems (ADAS).
Industrial: Employed in industrial automation, motor control, and sensor processing.
Consumer Electronics: Found in audio processing, wearable devices, and home automation systems.
Medical: Applied in medical devices requiring reliable signal processing and control.
Key Features
Architecture:
Based on the ARMv7-M architecture.
Harvard architecture with separate instruction and data buses.
Thumb-2 instruction set for improved code density and performance.
Performance:
Operates at up to 200 MHz.
32-bit processor core with a 3-stage pipeline (FETCH _DECODE_ EXECUTE).
Single-cycle multiply and hardware divide for efficient mathematical operations.
Memory:
Up to 4 GB addressable memory space.
Support for various memory configurations including Tightly Coupled Memory (TCM), on-chip
Flash, SRAM, and external memory interfaces.
Peripherals
Extensive range of integrated peripherals including timers, analog-to-digital converters, digital-to-analog
converters, communication interfaces (USART, I2C, SPI), and more.
Interrupts with nested vectored interrupt controller (NVIC) for fast interrupt handling.
Power Efficiency:
Designed for low-power operation with various power-saving modes.
Suitable for battery-operated devices and applications requiring energy efficiency.
Development Tools: Supported by a wide range of development tools including ARM's Keil MDK,
IAR Embedded Workbench, and open-source tools like GCC.
RTOS Support: Compatible with various real-time operating systems (RTOS) such as FreeRTOS,
RTX, and others.
FPU (Floating Point Unit): Optional unit for single-precision (32-bit) floating-point calculations.
Hardware support for addition, sub, mul, division and square root.
Registers: Includes a set of general-purpose and special-function registers for data storage and manipulation .
Harvard Architecture: Separate instruction and data buses for simultaneous access.
Pipeline: Fetch Stage: Retrieves instructions from memory. Decode Stage: Decodes the fetched instructions.
Execute Stage: Executes the decoded instructions using the ALU and FPU.
Memory Interfaces:
AHB-Lite Bus Interface: High-performance bus interface for connecting to system memory and peripherals.
APB (Advanced Peripheral Bus) Interface: Used for connecting to lower-speed peripherals.
NVIC: Manages interrupt handling with low latency and flexible priority levels.
Interrupt Lines: Supports up to 240 interrupt lines.
Priority Levels: Configurable priority levels for interrupts.
Integrated Peripherals:
Timers: Multiple timer modules for various timing and counting functions.
ADC (Analog-to-Digital Converter): Converts analog signals to digital data.
DAC (Digital-to-Analog Converter): Converts digital data to analog signals.
Communication Interfaces: Includes USART, I2C, SPI for serial communication.
Clock Control Unit: Manages the clock signals for the processor and peripherals.
Reset Control Unit: Handles reset signals to initialize or restart the processor and peripherals.
Power Management:
Power Control Unit: Manages power-saving modes and dynamic voltage and frequency scaling
(DVFS) for energy efficiency.
Trace Interface: Includes the Instrumentation Trace Macrocell (ITM) and Data Watchpoint and Trace (DWT) units
for real-time trace and debugging.
It is a debugging tool that enables various data and event tracing functionalities.
DWT uses a set of comparators and counters to monitor and capture data events and program execution
metrics
** Data Watch points: Allows the monitoring of specific memory addresses for read or write operations.
Flash Patch and Breakpoint (FPB) It is a powerful debugging feature that allows developers to set
breakpoints and patch program code in flash memory.
Breakpoint Support: The FPB unit can set hardware breakpoints in flash memory, which can be used
to halt program execution at specific points for debugging.
Patch Functionality: FPB allows for patching instructions in flash memory with new instructions,
enabling developers to test fixes or changes without rewriting the flash memory.
Eight Hardware Breakpoints: The Cortex-M4 supports up to eight hardware breakpoints, which
are more than sufficient for most debugging tasks.
TPIU collects trace data from ITM, DWT, and ETM and outputs it via a trace port.
The TPIU acts as a bridge between the trace sources within the Cortex-M4 (ITM, DWT, ETM) and
the external trace capture tools.
Processor Core:
ARMv7-M Architecture: The Cortex-M4 is based on the ARMv7-M architecture, which is optimized for low power and
high performance.
32-bit Core: It is a 32-bit processor, capable of handling 32-bit wide data paths and operations.
Pipeline:
3-Stage Pipeline: The processor employs a 3-stage pipeline (fetch, decode, execute), which helps improve instruction
throughput and performance.
Instruction Set:
Thumb-2 Technology: Utilizes the Thumb-2 instruction set, providing a mix of 16-bit and 32-bit instructions. This
enhances code density and performance, making it efficient in terms of memory usage and processing power.
Flash Memory
Non-volatile Storage: Used to store the firmware code and can be programmed and erased in blocks.
Size: Varies from model to model, ranging from a few kilobytes to several megabytes.
1.Harvard Architecture:
Separate Instruction and Data Buses: The Cortex-M4 uses a Harvard architecture with separate buses for
instructions and data, allowing for simultaneous access to instructions and data, which improves performance.
2.Memory Map:
Unified Address Space: The Cortex-M4 has a unified address space for code, data, and peripherals,
simplifying the memory management.
DSP Instructions: Includes specialized DSP instructions such as Single Instruction Multiple Data
(SIMD) instructions, which enable efficient processing of mathematical and signal processing
tasks.
Optional Floating Point Unit (FPU): The Cortex-M4 optionally includes a hardware Floating
Point Unit (FPU) for single-precision (32-bit) floating-point arithmetic, accelerating computation-
heavy applications.
2.Interrupt Handling:
Nested Vectored Interrupt Controller (NVIC): The NVIC supports low-latency interrupt
processing with up to 240 interrupt lines and priority levels, enabling responsive and flexible
interrupt handling.
Power Management
Debug Interface:
Integrated Debug and Trace.
6) 3*12 bit , 2.4 MSPS A/D converter, upto 24 channel and 7.2 MSPS in triple interleaved mode,2*12 bit D/A
converter.
9) DEBUG MODE:
Serial wire debug (swp), STAG interfaces, Trace macrocell.
10) Up to 140 I/O ports with interrupt capacity:
Up to 136 fact Digital Outputs up to 84 MHz
Up to 138 5V tolerant I/Os
11) Up to 15 Communication interfaces:
Up to 3x12C interfaces(System Management Bus (SMBus) and Power Management Bus (PMBus).
Up to 4 USART
Up to 3SPICS (42mbits/s) with muxed full duplexer 128 to achieve audio class accuracy via internal
audio or external clock.
2xLAN interfaces (2.0B active)
SDIO- Secure Digital Input Output interfaces
12) Advanced Connectivity:
USB 2.0 full speed device/host/OTG controller with an chip PLM.
USB 2.0 high speed/full speed device/host OTG controller with dedicated DMA, on chip full speed
PLU and UPLI.
10/100 Ethernet MAC with dedicated DMA (Supports IEEE 1588V2 hardware, M11/8M11 )
From reg bank A and B are data Load: instruction copies the data
bus connected. from memory to registers),
Store: instruction copies the data
A bus to ALU from registers to memory.
B BUS to ALU and Barrel shifter.
and
Through Rm & Rn register data
sent to ALU and Barrel shifter. 2. ALU operations (which only
occur between registers)
ALU and Barrel shifter are
combinational circuit, both
operation done in one cycle.
BARREL SHIFTER
R5=5 R7=8
MOV R7,R5,LSL #2
20
• ARM doesn't have actual shift operations
• Barrel shifter provides the mechanism to carry out the shifting operations.
• ARM has the barrel shifter in the data path
BARREL SHIFTER OPERATIONS
BARREL SHIFTER OPERATIONS
BARREL SHIFTER OPERATIONS
Cortex-M4 Register Bank
FAULTMASK Register
Purpose: Masks all exceptions except for NMI.
Configuration: The FAULTMASK register has a single bit (FAULTMASK[0]).
0: No effect.
1: Masks all exceptions except NMI.
BASEPRI Register
Purpose: Sets the base priority for exception handling, masking all exceptions
with a priority value equal to or less than BASEPRI.
Configuration: The BASEPRI register can hold an 8-bit value.
0x00: No effect, all interrupts are allowed.
Other values: Masks interrupts with priority levels equal to or lower than the
specified BASEPRI value.
CONTROL Register: It controls the stack pointer selection and the execution privilege level.
CONTROL[0] (nPRIV):
CONTROL[1] (SPSEL):
CONTROL[2] (FPCA):
The AHB is a high performance bus designed to interface memory and fast I/Os directly to the CPU.
The APB is designed for lower speed and low power consumption memory and peripherals.
Optional Memory Protection Unit (MPU) that can enhance the reliability and security of the system by:
Defining Memory Regions: The MPU can define up to 8 regions, each with specific access permissions
and attributes.
Access Control: It can control access permissions (read, write, execute) for different regions of memory.
Fault Handling: Generates faults when illegal memory accesses are detected, allowing for robust error
handling
Non-memory mapped region includes internal general purpose and special function registers of CPU.
These registers do not have addresses. We can access them using internal register names in assembly language.
The addressable memory space of a microcontroller or microprocessor depends on their address bus width.
For instance, if we take the example of ARM Cortex M4 32-bit microcontroller, its addressable memory space is
2^32 which is equal to 4 gigabytes of memory.
Each byte of this memory space has a unique memory address and the Cortex M4 microcontroller can access
each memory location either to read and write data to each memory location.
Memory Mapped Peripherals Registers
In contrast microcontroller internal registers, microcontrollers also have memory mapped I/O region
which belongs to different peripherals of a microcontroller such as GPIO, ADC, UART, SPI, I2C, Timers
and other peripherals that are supported by a specific microcontroller
BYTE: 8 BITS, Word: 4 Byte, Half word: 2 Byte
GPIO Port Mapping
It supports various operating modes and states that help manage power consumption and performance.
Operating Modes: Two primary modes
1. Thread Mode:
2. Handler Mode:
Description: The mode in which the processor executes exception or interrupt handlers.
Context: Entered when an interrupt or exception occurs.
Privileges: Always operates in a privileged level.
Processor States: The Cortex-M4 has several processor states related to power and operational modes.
1. Active State:
Description: The normal operational state where the CPU executes instructions.
Power Consumption: Full power.
2. Sleep State:
Sleep Now (Sleep-on-Exit):
Description: Processor enters sleep mode after exiting an interrupt service routine.
Normal Sleep:
Description: Entered by executing the WFI (Wait For Interrupt) or WFE (Wait For Event) instructions.
Power Consumption: Reduced power consumption; the core clock is stopped, but the system clock continues
running.
SysTick Timer : Can be used to implement a tick-less idle mode in an RTOS to reduce power consumption.
NVIC : Manages low-latency interrupt handling,
Sleep and Deep Sleep Modes: Utilize the WFI and WFE instructions to manage low power states.
Clock Gating: The processor can gate clocks to unused peripherals to save power.
An embedded system using a Cortex-M4 might have a power management strategy as follows:
Normal Operation: The system runs in Active State executing application code in Thread Mode.
Idle Periods: When the system is idle, it enters Sleep State using the WFI instruction to reduce power
consumption while waiting for an interrupt.
Low Power Modes: During longer periods of inactivity, the system enters Deep Sleep State.
Interrupt Handling: When an interrupt occurs, the processor switches to Handler Mode, processes the
interrupt, and depending on the configuration, either returns to Active State or remains in Sleep State if Sleep-
on-Exit is enabled.
EXCEPTIONS and INTERRUPTS
Exceptions:
Exceptions are anomalies or unusual conditions that occur during the execution of a program.
They can be due to various reasons like programming errors, hardware failures, or resource limitations.
In the context of processors, exceptions (also called traps or faults) are events that disrupt the normal flow of
execution and are typically used to handle unusual conditions such as hardware malfunctions, illegal
instructions, or other types of errors.
EXCEPTION HANDLING :
Exception handling in ARM processors is a fundamental part of how ARM architectures deal with unexpected
or exceptional conditions during execution.
2.Switch the Mode: The processor switches to a specific mode corresponding to the exception (e.g., Undefined Instruction
mode).
3.Vector Table Lookup: Processor uses the exception type to look up the appropriate handler address in the vector table.
4.State Save: The current program counter (PC) and processor status are saved.
6.Exception Handling: The handler processes the exception (e.g., logging, correcting the error).
7.Return to Normal Execution: The processor restores the saved state and resumes normal execution.
Interrupts: Signals emitted by hardware or software indicating an event that needs immediate attention.
They temporarily halt the CPU's current activities to execute a function called an interrupt handler or
interrupt service routine (ISR).
Interrupt Latency: amount of time between the generation of an interrupt and its handling
Interrupt handling in ARM processors involves managing signals from hardware devices that require immediate
attention.
This allows peripheral devices like timers, keyboards, and network cards to signal the processor when they need
to be serviced
Steps in Handling an Interrupt
1.Interrupt Occurrence: Hardware device triggers an interrupt.
2.Mode Switch: Processor switches to IRQ or FIQ mode.
3.Vector Table Lookup: Processor fetches the address of the interrupt handler from the vector table.
4.State Save: Current state (registers) is saved to allow the interrupted task to resume correctly later.
5.Handler Execution: Interrupt handler is executed to service the interrupt.
6.State Restore: Saved state is restored.
7.Return to Normal Execution: Processor resumes execution of the interrupted task.
ARM Cortex-M4 processor offers flexible exception and interrupt management,
providing efficient and versatile control over how the system responds to both
synchronous (exceptions) and asynchronous (interrupts) events.
1. Sleep Modes
Sleep Mode: The CPU clock is stopped while peripheral clocks continue to run. This mode allows the processor
to wake up quickly when needed.
Deep Sleep Mode: Both the CPU and peripheral clocks are stopped, significantly reducing power consumption.
The wake-up time is longer than in Sleep Mode but still relatively fast.
Stop Mode: Most of the system clocks are stopped, reducing power consumption further. The system requires a
longer time to wake up compared to Sleep and Deep Sleep modes.
Standby Mode: The highest power-saving mode, where most of the system is powered down, retaining only
essential information in SRAM. This mode has the longest wake-up time.
2. Dynamic Voltage and Frequency Scaling (DVFS)
Cortex-M4 can adjust its operating voltage and frequency according to the required performance, allowing
the system to reduce power consumption when full performance is not necessary.
5. Clock Gating
Cortex-M4 can gate clocks to unused peripherals, stopping the clock supply to these peripherals and thus
reducing dynamic power consumption
6. Low-Power Run and Low-Power Sleep Modes
These modes allow the system to operate at reduced clock frequencies and voltages while maintaining
functionality, balancing performance and power consumption.
Thumb-2 Instruction Set: The Cortex-M4 uses the Thumb-2 instruction set, providing a balance between
code density and performance, contributing to power efficiency.
Digital Signal Processing (DSP) Instructions: The integrated DSP instructions and single-cycle multiply-
accumulate (MAC) operations allow efficient execution of complex algorithms, reducing the need for higher
frequency operation.