0% found this document useful (0 votes)

3 views26 pages

ARM Cortex-M memory model

The document provides an in-depth guide on the ARM Cortex-M memory model, covering essential concepts such as memory management, address alignment, endianness, semaphores, synchronization, and atomicity. It explains how these principles are crucial for developing efficient and secure embedded systems, detailing the processor's address space, instruction execution modes, and the role of atomic operations in synchronization. Additionally, it discusses the implications of endianness in data handling and the importance of synchronization mechanisms like semaphores in real-time operating systems.

Uploaded by

Iheb Hamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views26 pages

ARM Cortex-M memory model

Uploaded by

Iheb Hamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

1

Introduction

The ARM Cortex-M memory model is a crucial aspect of embedded system design, shaping
how data is managed, accessed, and protected within microcontroller-based applications.
This comprehensive guide delves into key concepts of memory management, including
memory space, alignment, endianness, semaphores, synchronization, atomicity, memory
attributes, caches, and the Memory Protection Unit (MPU). By understanding these building
blocks, developers can create efficient, secure, and high-performance embedded systems,
ensuring memory operations run smoothly and predictably every time.

2
Understanding Address Alignment in ARM Cortex-M Processors

ARM Cortex-M processors are designed with a carefully structured address space and alignment
rules to ensure efficient and reliable operation. These rules govern two critical processes:
INSTRUCTION EXECUTION and INSTRUCTION FETCHING. Let’s explore these concepts step
by step.

Address Space Overview

ARM Cortex-M processors feature a 4 GB address space (32-bit), spanning from 0x00000000 to
0xFFFFFFFF. This space is divided into regions designated for different purposes, such as code,
SRAM, peripherals, and system memory. Each region has specific access requirements and
alignment constraints, critical for maintaining system stability and performance.

Within this space, alignment rules determine how the processor accesses and interprets memory
addresses during instruction execution and fetching. These rules ensure that memory accesses
remain efficient and compliant with the architecture's Thumb state requirements.

Instruction Execution: Aligned and Unaligned Access

Instruction execution involves processing data or branching to specific memory locations. To

manage this, the ARM Cortex-M architecture defines two alignment modes based on the nature
of the instruction:

• Word-Aligned Access:
o Addresses divisible by 4 (0b...00 in binary).
o Required for certain 32-bit operations or memory accesses.
• Half-Word Aligned Access:
o Addresses divisible by 2 (0b...0 in binary).
o Typical for Thumb and Thumb-2 instructions.

Aligned vs. Unaligned Access:

• Aligned Access: Occurs when the address adheres to the alignment rules of the instruction.
These operations execute smoothly without issue.
• Unaligned Access:

3
o Some instructions (e.g., LDR, STR) support unaligned access and execute without
error.
o Others (e.g., LDM, STM) do not support unaligned access. If such instructions
encounter unaligned addresses, they trigger a UsageFault.

Trapping Unaligned Access:

• The CCR.UNALIGN_TRP configuration bit enables developers to trap unaligned accesses:

o CCR.UNALIGN_TRP = 1: Even if an instruction supports unaligned access, the
processor raises a fault for user awareness.
o CCR.UNALIGN_TRP = 0: Instructions execute without interruption, even for
unaligned addresses.

Instruction Fetching: Enforcing Half-Word Alignment

While instruction execution offers some flexibility, instruction fetching adheres to stricter rules.
Fetching instructions involves reading the next instruction from memory, and this process always
requires half-word alignment.

• Alignment Requirement: Instruction fetches must occur at addresses divisible by 2 (0b...0

in binary).
• The Role of the T-Bit: In ARM Cortex-M, the T-bit (bit 0 of the Program Counter, PC)
signifies Thumb state:
o PC [0] = 1: Thumb state (required for Cortex-M execution).
o PC [0] = 0: ARM state (invalid for Cortex-M processors).

At first glance, addresses with PC [0] = 1 appear unaligned since they aren’t divisible by 2.
However:

1. During Fetching: The processor automatically resets bit 0 to 0 internally to ensure proper
alignment.
2. During Execution: The T-bit is preserved for instructions like BX, BLX, or POP {…, PC}, which
rely on it to maintain the Thumb state.

This behavior ensures that instructions fetched for execution are always properly aligned, while
the T-bit remains intact for Thumb state transitions.

Balancing Execution and Fetching

4
ARM Cortex-M processors manage a fine balance between execution and fetching:

• Instruction Execution: Flexible, supporting both aligned and unaligned access, depending
on the instruction and system configuration.
• Instruction Fetching: Strictly enforces half-word alignment by resetting PC [0] during fetch
operations.

This dual approach ensures system integrity, efficient execution, and adherence to architectural
requirements. Developers have the tools to manage and monitor unaligned access, while the
architecture handles alignment and Thumb state transitions seamlessly.

Key Takeaway

The ARM Cortex-M address space and alignment rules are integral to the processor's operation:

• Instruction execution offers flexibility with aligned and unaligned access options.
• Instruction fetching enforces strict half-word alignment while preserving the T-bit for
Thumb state transitions.

By understanding these principles, developers can optimize their applications and leverage the full
potential of ARM Cortex-M processors in embedded systems.

5
Understanding Endianness

Endianness determines the order in which bytes are arranged within a word in memory. In little-
endian systems, the least significant byte (LSB) is stored at the lowest memory address, whereas,
in big-endian systems, the most significant byte (MSB) occupies the lowest address. For example,
consider a 32-bit value 0x12345678 stored in memory:

• Little-Endian:
Memory: 0x00 → 0x78, 0x01 → 0x56, 0x02 → 0x34, 0x03 → 0x12

• Big-Endian:
Memory: 0x00 → 0x12, 0x01 → 0x34, 0x02 → 0x56, 0x03 → 0x78

The effect of the endianness mapping on data applies to the size of the element(s) being
transferred in the load and store instructions.

Endianness in ARM Cortex-M

ARM Cortex-M processors, including those in the STM32 family, support selectable endianness,
determined at reset by configuration input or software control. This feature comes with specific
rules:

• Data Access: The selected endianness applies only to data accesses. Instruction fetches
always follow little-endian format.
• System Control Space (SCS): Accesses to the SCS, including critical system registers, are
fixed to little-endian.
• Configuration: The AIRCR.ENDIANNESS bit in the Application Interrupt and Reset Control
Register indicates the current data access endianness. However, this configuration is static
and cannot be changed dynamically.

Byte Ordering and Instruction Encoding

Instruction alignment and byte ordering follow the processor’s little-endian conventions. A 32-bit
Thumb instruction is treated as two 16-bit halfwords (hw1 and hw2):

• hw1 is stored at the lower address.

6
Byte order for hw1 and hw2 in memory:

Instruction: hw1 | hw2

Byte at Address A → hw1 (bits 7:0)

Byte at Address A+1 → hw1 (bits 15:8)

Byte at Address A+2 → hw2 (bits 7:0)

Byte at Address A+3 → hw2 (bits 15:8)

Figure 1: Byte Ordering

Peripheral Endianness

While ARM Cortex-M processors support configurable data endianness, peripherals in the system
may follow a fixed or independent convention.

• SPI and Communication Protocols: SPI peripherals often transmit and receive data in
MSB-first order, corresponding to big-endian transmission. Interfacing with such
peripherals on a little-endian processor requires ensuring that data formats align
correctly.

• DMA Transfers: Peripherals like the DMA2D controller can swap byte order during
transfers, which is especially useful in graphics systems. For instance, pixel data in formats
such as ARGB8888 or RGB565 might have a different endianness requirement between
memory and display controllers. The DMA controller handles these transformations
automatically, optimizing performance and reducing the need for software intervention.

• Cases Without Built-in Support: When a peripheral lacks native support for byte-order
transformations, software-level adjustments are required. ARM Cortex-M processors
provide efficient instructions for this purpose:

7
o REV: Reverses the byte order of a 32-bit word.
o REV16: Reverses the byte order of each halfword in a 32-bit word.
o REVSH: Reverses the byte order of a 16-bit signed halfword and sign-extends it to
32 bits.

Key Takeaway

ARM Cortex-M processors and peripherals provide flexible options for managing endianness,
enabling seamless data handling even in systems with mixed endianness conventions. By
leveraging built-in features like configurable DMA transfers or byte-swapping instructions,
developers can ensure efficient and accurate data processing across heterogeneous systems.

8
Semaphores and Synchronization: The Memory Backbone of RTOS

Efficient synchronization is crucial for managing shared resources in RTOS environments. At its
core lies atomicity, a fundamental concept underpinning both synchronization primitives like
semaphores and memory operations. In ARM architectures, atomicity is more than a design
principle—it’s intricately embedded in the memory model and instruction set architecture (ISA).

Defining Atomicity

In the context of the ARM architecture, atomicity refers to the indivisibility of memory operations.
An operation is atomic if it is completed entirely or not at all, leaving the memory in a consistent
state throughout.

ARM defines two levels of atomicity:

1. Single-Copy Atomicity
o A memory access (read or write) is single-copy atomic if:
▪ After a series of writes, the value of the operand is always the result of one
complete write—not a mix of two writes.
▪ A read operation on the operand either retrieves the value before a write or
the value after the write, never an inconsistent mix of both.
o Example: In the ARMv7-M architecture, byte, halfword (aligned), and word (aligned)
accesses are guaranteed to be single-copy atomic.
2. Multi-Copy Atomicity
o In multiprocessing systems, multi-copy atomicity ensures that:
▪ All writes to a memory location are observed in the same order by all
processors.
▪ A write is not considered complete until all observers (e.g., cores or devices)
recognize it.
o Not all memory types support multi-copy atomicity; for example, writes to normal
memory are not multi-copy atomic.

Building Synchronization in ARM: Semaphores and Atomic Instructions

Synchronization mechanisms like semaphores rely on atomically reading and modifying data to
prevent interference during critical sections. ARM implements semaphores using hardware-level
atomic instructions that ensure mutual exclusion, preventing other tasks from accessing a shared

9
resource. This operation is vital for avoiding race conditions and maintaining consistency in real-
time systems. ARM's approach to semaphores and other synchronization primitives evolved
through key architectural advancements.

1. Early Solutions: Interrupt Disabling

In simpler systems, atomicity was achieved by disabling interrupts using instructions like CPSID.
This approach worked by temporarily halting other operations during a critical section.

While effective for uniprocessor systems, this method had limitations in multiprocessor or multi-
bus systems:

• Other processors or DMA masters could still access memory, leading to potential
contention.
• It was inefficient, blocking all interrupts and delaying unrelated tasks.

2. SWP Instruction: The First Hardware-Based Solution

The introduction of the SWP (Swap) instruction provided a hardware-based solution for atomicity:

• Mechanism: SWP locked the system bus during a read-modify-write operation, preventing
other masters from accessing memory.
• Usage: Commonly used for implementing binary semaphores.
• Drawbacks:
o Performance Bottlenecks: The bus lock delayed all memory operations, even those
unrelated to the semaphore.
o Scalability Issues: In systems with high memory contention, SWP's blocking nature
hindered real-time performance.

As systems grew more complex, it became clear that SWP was not scalable, prompting ARM to
seek a more efficient solution.

3. The Evolution: LDREX/STREX and the Role of the Memory Monitor

With the ARMv6 architecture, ARM introduced the LDREX/STREX instruction pair, which
revolutionized synchronization mechanisms in multiprocessor systems. These instructions are
designed to provide atomic operations on memory without requiring a bus lock, allowing for
improved scalability and efficiency.

How It Works:

1. LDREX (Load-Exclusive):

10
o This instruction reads a value from memory and simultaneously marks the memory
address with an "exclusive access monitor." This monitor tracks whether any other
processor or memory bus operation modifies the memory address after it has been
loaded.
o In essence, LDREX locks the memory address for the current processor, signaling
that a subsequent STREX operation will modify the value at that address only if no
other operation has intervened.

2. STREX (Store-Exclusive):
o After loading the value with LDREX, the STREX instruction attempts to store a new
value to the same memory address.
o STREX succeeds (i.e., the store operation is performed) only if the exclusive access
monitor confirms that no other processor or operation has modified the memory
location in the meantime.
o If another processor or bus operation has updated the address in question, the
exclusive monitor becomes invalid, causing STREX to fail (store not completed). The
CPU can then retry the operation or handle the failure according to the
synchronization protocol.

Role of the Memory Monitor:

The memory monitor in LDREX and STREX ensures atomicity by preventing concurrent writes to
the same address across processors. It doesn't require a global bus lock, allowing other operations
to continue without delay, making it non-blocking and improving performance in multi-core
systems.

The monitor is scalable through two mechanisms: a local monitor and a global monitor. The local
monitor tracks exclusive access for a specific processor, ensuring atomicity within that processor's
context. The global monitor ensures consistency across multiple processors, allowing each core to
access memory without contention. This two-tier approach improves scalability and performance,
especially in multi-core environments.

11
Figure 2: Local Monitor Finite-State-Machine

Figure 3: Semaphore Implementation in Assembly

Bit-Banding: A Microcontroller-Specific Solution

On Cortex-M microcontrollers, ARM also introduced bit-banding that could be used to simplify
synchronization:

• Mechanism: Bit-banding maps each bit of a specific memory region (bit-band region) to an
alias memory address in a separate address space. Writing to the alias address directly and
atomically sets or clears the corresponding bit in the original memory location—eliminating
the need for traditional bitwise operations.

12
• Use Case: In synchronization, tasks or clients can treat individual bits in a shared variable
as semaphores. By writing to their corresponding alias addresses, these bits can be
atomically toggled, ensuring proper synchronization without the complexity of manual
masking or bit manipulation.

Example:

On an ARM Cortex-M3, the bit-band region starts at 0x20000000 (SRAM), while its alias
region begins at 0x22000000. A single bit in the bit-band region is linked to a specific alias
address. Writing 1 or 0 to the alias address directly sets or clears the bit. For instance, a
shared variable can be mapped, and each bit can act as a flag or semaphore for different
tasks.

Figure 4: Bit-band Region & Bit-Band Alias

13
Figure 5: Bit-band mapping

Figure 6: Semaphore Using Bit-Banding

14
Atomicity or exclusivity?

Exclusive instructions (e.g., LDREX/STREX) and atomic operations both ensure safe and reliable
access to shared resources, providing mechanisms to prevent race conditions. While atomicity in
the ARM architecture offers hardware-backed guarantees for indivisible memory operations,
LDREX/STREX uses a reservation-based approach to achieve atomicity in software. In smaller
systems, their behavior may appear similar, but as systems scale, challenges like performance
bottlenecks and contention arise. This raises the question: Are exclusive instructions sufficient for
ensuring atomicity, or do atomic operations offer a more scalable and efficient alternative?

1. Hardware-Guaranteed Atomicity (Single/Multiple Copy)

• Scope: Atomicity guarantees for aligned memory accesses are inherently provided by the
hardware. These operations (e.g., single-copy atomic loads/stores) ensure that reads or
writes happen as indivisible units, requiring no software intervention.
• Consistency Across Systems: In multi-core systems, atomicity depends on the memory
type:
o Strongly-Ordered or Device memory ensures global consistency using multi-copy
atomicity.
o Normal memory does not guarantee multi-copy atomicity, leading to potential
inconsistencies between cores.
• Performance: These operations are fast and efficient, as no retries are required—atomicity
is achieved directly through the hardware memory subsystem.

2. LDREX/STREX: Reservation-Based Atomicity

• Scope: Instead of relying on hardware guarantees for atomicity, LDREX/STREX implement

a read-modify-write (RMW) mechanism. This process involves the LDREX and STREX
instructions explained above.
• Challenges:
o Contention: In multi-core systems with heavy memory sharing, frequent
reservation failures lead to retries, causing increased latency and potential
performance bottlenecks.
o Granularity: The reservation applies to a region of memory, which may lead to
spurious failures if unrelated accesses occur within the reserved block.
• Flexibility: This mechanism allows software to implement atomic RMW sequences even for
memory types that lack hardware-backed atomicity guarantees (e.g., Normal memory).
15
3. Modern Advancements: Eliminating the Need for Reservations

In ARMv8.1-A and beyond, new atomic instructions like LDADD and CAS replace the reservation
model by directly performing RMW operations atomically in hardware. These instructions:

• Avoid the reservation/retry cycle entirely, reducing contention overhead.

• Provide consistent performance across multi-core systems, even under heavy memory
sharing.

Key Comparisons

Figure 7: Comparison between SIngle/Multi Copy & LDREX/STREX Methods

The choice of approach depends on the application and system scale. For simple tasks, disabling
interrupts and re-enabling them after critical operations may be sufficient. For multi-core or highly
concurrent systems, developers and silicon designers must balance complexity, performance, and
reliability to optimize atomicity for their specific needs.

16
Memory Attributes in ARM Cortex Processors

Memory attributes play a vital role in embedded systems, defining how processors access, order,
and synchronize memory regions. By tailoring these attributes, developers can optimize
performance, ensure data consistency, and safeguard against unexpected behavior.

Memory Types and Attributes

ARM architectures classify memory into three primary types:

1. Normal Memory:
o Used for program code and general data storage.
o Examples: Flash, SRAM, ROM, DRAM.
o Accesses can be reordered and buffered for performance optimization.
o Suitable for storage without side effects.

2. Device Memory:
o Designed for peripherals like FIFOs, interrupt controllers, and configuration
registers.
o Accesses can have side effects (e.g., modifying peripheral state).
o Enforces stricter rules to ensure system correctness.

3. Strongly-Ordered Memory:

o Accesses are strictly ordered as per program sequence.

o Typically used for control registers where ordering impacts functionality.

Each memory type has additional attributes influencing access behavior, including:

• Cacheability: Determines if the memory region can be cached. A region can be:
o Write-Through Cacheable: Any write operation is immediately reflected in both the
cache and the main memory.
o Write-Back Cacheable: Writes are initially done to the cache and later written back
to memory, with options for:
▪ Write-Allocate: The cache is loaded with data on a write miss.
▪ No Write-Allocate: Data is not loaded into the cache on a write miss, avoiding
the need to update the cache for every write.
o Non-cacheable: Data is directly accessed from or written to memory, bypassing the
cache, ensuring no cache interference.
• Shareability: Indicates whether the memory is shared across multiple cores or processors.

17
o Shareable: The memory can be accessed by multiple processors, ensuring coherent
data sharing.
o Non-shareable: The memory is exclusive to a single processor, useful for local or
private data, where accessing it from different cores may lead to inconsistencies.
• Bufferability: Allows write operations to be buffered before being written to memory.
o For regions like Device memory, write operations can be buffered to optimize
performance, but the buffering must respect the order, size, and number of accesses
specified by the program.
o In Normal memory, buffering improves throughput, but in certain cases (such as
with Strongly-ordered memory), buffering may be restricted to maintain strict
access order.

In ARM Cortex processors, these attributes are managed using the Memory Protection Unit
(MPU) or the Memory Attribute Indirection Register (MAIR).

The Importance of Memory Attributes

Memory attributes affect performance, determinism, and correctness. For example, Normal
memory allows reordering for higher throughput but isn't suitable for peripherals needing precise
access order. Strongly-ordered memory prioritizes correctness over performance. In embedded
systems, choosing the right memory attribute is crucial—Normal memory works well for program
storage (e.g., Flash or SRAM), while Device memory is used for peripheral registers (e.g., UART,
GPIO). ARM tools and system registers help configure these attributes to meet system
requirements.

Memory Barriers

To address synchronization and memory access reordering problems, ARM processors implement
several types of memory barriers. These barriers offer solutions for ensuring that operations are
completed in the correct order and that side effects from previous instructions are visible to
subsequent ones. Specifically:

• Data Memory Barrier (DMB): Ensures memory accesses (loads and stores) are completed
in the correct order. It guarantees that operations before the barrier finish before any that
follow it.
• Data Synchronization Barrier (DSB): A stronger barrier that makes sure all memory
operations and context changes are fully completed before any subsequent instructions

18
are executed. It’s crucial for synchronizing with peripherals or ensuring that state changes
are visible before proceeding.
• Instruction Synchronization Barrier (ISB): Forces the pipeline to flush, ensuring any
changes to system control settings or context are fully applied before fetching the next
instructions. This ensures consistency after altering processor states or peripherals.
• Speculative Store Bypass Barriers (SSBB, PSSBB): Prevents speculative execution from
bypassing recent store operations, ensuring speculative loads don’t return incorrect or
stale data.
• Consumption of Speculative Data Barrier (CSDB): Ensures that speculative load
instructions don’t affect the results of later operations, particularly when a load has been
speculatively executed but not yet completed.

These barriers solve synchronization issues, particularly in cases of memory reordering or when
working with peripheral devices. They ensure that the software behaves predictably across
different execution stages.

Figure 8: Memory Barrier implementation Example

Additional Notes on Memory Access Restrictions and Privilege Controls

19
• Instruction Fetches: Instruction fetches must only access Normal memory. Accessing
Device or Strongly-ordered memory for instruction fetching is unpredictable and can cause
data inconsistency.
• Access Privileges: Memory regions can be restricted based on the privilege level:
o Privileged Accesses: Allowed during privileged execution (e.g., supervisor mode).
o Unprivileged Accesses: Allowed when running in non-privileged mode.
o A MemManage exception occurs if the processor tries to access a region with
insufficient privileges.

The XN attribute marks memory regions as non-executable, preventing code execution and
triggering a MemManage exception if execution is attempted. This protects against attacks like
code injection, ensuring that code cannot run from non-executable areas such as peripheral
memory or mounted devices (e.g., USB or SD cards).

Key Takeaway

Memory attributes in ARM Cortex processors define how memory regions are accessed, ordered,
and synchronized, affecting performance and correctness. Memory barriers, like DMB and DSB,
ensure operations occur in the correct order and synchronize with peripherals. Access can also be
restricted based on privilege levels, ensuring safe and predictable system behavior. These features
help optimize memory usage in embedded systems.

20
Protected Memory System Architecture in ARM Cortex-M Processors

ARM Cortex-M processors incorporate a sophisticated memory protection system called

Protected Memory System Architecture (PMSAv7), that ensures software executes safely,
preventing unintentional or malicious interference with critical system components. The Memory
Protection Unit (MPU) is a key element in this protection scheme, offering a means to manage
access to memory regions based on software privileges.

Memory Protection Unit Overview

In ARM Cortex-M processors, the MPU serves as an optional but vital component that controls
access rights to various memory regions. Its primary purpose is to protect system memory by
defining access permissions for different regions within the address space. This protection ensures
that memory is accessed only by authorized software, helping to prevent errors, crashes, or
unauthorized access to sensitive system data.

The MPU works by dividing the memory into regions and assigning specific attributes to each
region. These attributes dictate whether a region can be read from, written to, or executed,
depending on the access level (privileged or unprivileged). The number of regions the MPU can
manage varies across ARM architectures, from 16 regions in ARMv7-M to a more flexible
configuration in ARMv8-M.

Memory Protection Unit (MPU) in Action

The MPU in ARMv7-M processors supports up to 16 memory regions, and in architectures like
ARMv8-M, the regions are defined by a base and limit address. This allows developers to easily
configure the system memory map and protect critical sections of memory from unauthorized
access.

In ARMv7-M (Cortex-M0+, M3, M4, and M7), each region can be subdivided into up to eight
subregions, provided the region is large enough (at least 256 bytes). This flexibility ensures that
developers can fine-tune the memory protection based on the needs of their application.
However, in ARMv8-M (Cortex-M33), the regions are more flexible, and subregions are no longer
used, allowing for simpler and more flexible memory configurations.

MPU and System Memory Map

21
When enabled, the MPU plays a central role in defining the system’s memory map. It manages
access rights to physical memory addresses, ensuring that the processor enforces proper access
control based on the configured memory regions. For example, the Private Peripheral Bus (PPB)
and system space always have default memory attributes, and any unauthorized access triggers a
fault.

For the MPU to function, it must first be enabled by setting a global enable bit in the control
register. If the MPU is not enabled, the processor will bypass the MPU configuration and follow
the default memory map. When enabled, the MPU checks memory accesses against the defined
permissions. If an access attempt violates the defined permissions, a fault is raised, ensuring that
only authorized software can interact with critical system resources.

Behavior of the MPU When Disabled

When the MPU is disabled, memory accesses are not subject to the protection rules. This means
that both privileged and unprivileged accesses bypass permission checks and use the default
memory map. In this state, instruction accesses that attempt to execute from regions marked as
"Execute Never" will trigger a MemManage fault. However, data accesses do not undergo any
permission checks, and therefore cannot cause aborts.

The behavior of the system when the MPU is disabled also affects caching and speculative
operations. Cacheability is controlled by specific bits in the Control Register (CCR), and program
flow prediction or speculative fetches continue to operate based on the default memory
configuration. These features ensure that the processor maintains predictable behavior even
when the MPU is not actively enforcing memory protection.

The MPU Programmer’s Model

Configuring the MPU involves interacting with a set of control registers that require privileged
access for reading and writing. If an unprivileged access is attempted on the MPU registers, a
BusFault is triggered. These registers include:

• MPU Type Register: This register provides details about the number of supported regions
and whether the MPU is present in the system.
• MPU Control Register: The MPU_CTRL register includes the global enable bit, which must
be set to activate the MPU.

22
• MPU Region Number Register: This register selects the current region, which links to its
associated base address and attributes.
• MPU Region Base Address Register: Defines the starting address of the region.
• MPU Region Attribute and Size Register: This controls various attributes of the region, such
as its size, access permissions, memory type, and sub-region access.

Each region in the MPU has its own enable bit. When a region is enabled, its associated access
rights and attributes are enforced. In ARMv7-M implementations that do not support PMSAv7,
only the MPU_TYPE register is necessary, and all other registers are reserved.

Key Takeaway

The MPU is crucial for managing memory access rights in ARM Cortex-M processors, dividing
memory into protected regions. By configuring and enabling the MPU, developers can enforce
security, prevent errors, and ensure the integrity of embedded systems. Understanding the MPU’s
configuration and behavior is essential for building secure, efficient systems on ARM Cortex-M
processors.

23
Cache Management: Optimizing Performance with Smart Caching

In modern processors, caches play a vital role in improving memory access speeds by storing
frequently accessed data closer to the CPU. Caches operate on two key principles: spatial locality
(nearby data is often accessed together) and temporal locality (recently accessed data is likely to
be used again soon). However, as workloads grow more dynamic, managing these caches
efficiently requires more than just traditional algorithms like LRU or FIFO.

Cache Hierarchy and Coherency

Caches are often organized in a hierarchical memory system, with multiple levels of cache (L1, L2,
L3) to balance speed and capacity. While the L1 cache is closest to the CPU and the fastest, it is
limited in size, whereas L3 is larger but slower. The challenge arises when multiple agents (like
DMA or external processors) update memory simultaneously, leading to potential cache
coherency issues.

To maintain consistency, software often employs cache maintenance operations, ensuring that
data changes made in one part of the system are visible throughout the memory hierarchy.
Without this, a breakdown in coherency can occur, where outdated data is accessed from the
cache instead of the most recent version in memory.

AI-Powered Cache Management

Traditional caching methods—while effective—can struggle with the complexity of modern

workloads. Enter machine learning, which enables processors to adapt their caching strategies
based on real-time data access patterns. With techniques like Reinforcement Learning (RL), a
processor can predict which data will be used next and keep it in cache, reducing latency and
improving overall performance.

Google’s CacheNet and NVIDIA’s AI-driven GPU caches are real-world examples of this approach,
where deep learning models optimize cache management dynamically. These systems
continuously learn and adapt to evolving access patterns, offering significant improvements over
static algorithms.

Cache Preloading and AI Integration

24
Modern processors, like ARM's Cortex-A series, integrate AI into their cache management
strategies. With Preload Data (PLD) and Preload Instruction (PLI) hints, the system can anticipate
future memory accesses and preload data into faster cache levels. AI further refines this by
continuously adjusting preload strategies based on learned access patterns, further optimizing
performance in real-time.

By combining traditional cache management techniques with AI-driven predictions, modern

processors can achieve unprecedented efficiency, ensuring that data is always ready when
needed, and latency is minimized. As AI continues to evolve, so too will the way we manage and
utilize cache in computing systems.

25
Conclusion

Mastering the ARM Cortex-M memory model is essential for creating reliable and optimized
embedded systems. By grasping concepts like memory space, alignment, and endianness, along
with the role of memory attributes, semaphores, and caches, developers can design systems that
effectively manage resources and maintain data consistency. The MPU adds an additional layer of
protection, safeguarding critical memory regions from unauthorized access. With a solid
understanding of these memory mechanisms, developers can leverage ARM Cortex-M processors
to their fullest potential, ensuring performance, security, and stability across applications.

STM32 PDF
No ratings yet
STM32 PDF
154 pages
m5 Survey Data Technologies Normalised
No ratings yet
m5 Survey Data Technologies Normalised
4,975 pages
ARM Processor
No ratings yet
ARM Processor
24 pages
Watchguard v11.5.2 CLI Reference
0% (1)
Watchguard v11.5.2 CLI Reference
178 pages
MCT-238 (Spring-2024) Lec 8 - Memory Access Instructions
No ratings yet
MCT-238 (Spring-2024) Lec 8 - Memory Access Instructions
54 pages
Module 2 (Part 1)
No ratings yet
Module 2 (Part 1)
76 pages
02 Architecture PDF
No ratings yet
02 Architecture PDF
82 pages
IERG3060 10 Instructions
No ratings yet
IERG3060 10 Instructions
78 pages
Arm 1706965055
No ratings yet
Arm 1706965055
55 pages
Introduction
No ratings yet
Introduction
53 pages
ARM-ISA-and-Cortex-M0
No ratings yet
ARM-ISA-and-Cortex-M0
45 pages
Lec - 3 Cortex-M0+ CPU
No ratings yet
Lec - 3 Cortex-M0+ CPU
65 pages
Microprocessor Unit 2
No ratings yet
Microprocessor Unit 2
76 pages
Cortex-M0+ CPU Core
No ratings yet
Cortex-M0+ CPU Core
45 pages
Cortex M3 PNJ
No ratings yet
Cortex M3 PNJ
45 pages
Arm Cortex
100% (2)
Arm Cortex
131 pages
Cortex-M For Beginners - 2016 (Final v3)
No ratings yet
Cortex-M For Beginners - 2016 (Final v3)
25 pages
Lecture 02 Introduction To ARM7 Processor
No ratings yet
Lecture 02 Introduction To ARM7 Processor
29 pages
2 - ARM Cotex-M3 - Introduction
100% (1)
2 - ARM Cotex-M3 - Introduction
124 pages
Mx25l25635e Macronix
No ratings yet
Mx25l25635e Macronix
69 pages
White Paper - Cortex-M For Beginners - 2016 (Final v3) PDF
No ratings yet
White Paper - Cortex-M For Beginners - 2016 (Final v3) PDF
25 pages
ARM Cortex M Interview Questions 1703996893
100% (2)
ARM Cortex M Interview Questions 1703996893
26 pages
isa
No ratings yet
isa
75 pages
Intro To ARM Cortex-M3 (CM3) and LPC17xx MCU: Outline
No ratings yet
Intro To ARM Cortex-M3 (CM3) and LPC17xx MCU: Outline
79 pages
Emebdded_system_programming_on_CortexM3M4
No ratings yet
Emebdded_system_programming_on_CortexM3M4
375 pages
Session 16 - ARM
No ratings yet
Session 16 - ARM
218 pages
Proc Emb Ch3
No ratings yet
Proc Emb Ch3
55 pages
ARM Theory
No ratings yet
ARM Theory
191 pages
Unit 5 Converted
No ratings yet
Unit 5 Converted
62 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
6 pages
RTOS For Small Embedded
No ratings yet
RTOS For Small Embedded
22 pages
Arm Module 3
No ratings yet
Arm Module 3
48 pages
Arm Lab Manual
No ratings yet
Arm Lab Manual
84 pages
04 Operating Systems Security
No ratings yet
04 Operating Systems Security
33 pages
ARM Arch2
No ratings yet
ARM Arch2
20 pages
Final Exam Review - Weeks 9-13
No ratings yet
Final Exam Review - Weeks 9-13
70 pages
J2EE Design Patterns
No ratings yet
J2EE Design Patterns
100 pages
Lecture2.2 ARM Instruction Set Architecture
No ratings yet
Lecture2.2 ARM Instruction Set Architecture
95 pages
cp4152 Database Practices Lab
No ratings yet
cp4152 Database Practices Lab
73 pages
MF Reference Summary
No ratings yet
MF Reference Summary
78 pages
ELEC 2142 (2016) Week 1
No ratings yet
ELEC 2142 (2016) Week 1
40 pages
20131219072843156
No ratings yet
20131219072843156
44 pages
STM32 Training 12W06 1V0
No ratings yet
STM32 Training 12W06 1V0
73 pages
Module - 1 - 2 - ESD - 2023 PDF
No ratings yet
Module - 1 - 2 - ESD - 2023 PDF
49 pages
3 Instruction Set
No ratings yet
3 Instruction Set
72 pages
ARM Prog Model 1
No ratings yet
ARM Prog Model 1
29 pages
Yug 2 Practical 222
No ratings yet
Yug 2 Practical 222
24 pages
ARM Cortex-M3/M4 Processor Core Features
No ratings yet
ARM Cortex-M3/M4 Processor Core Features
38 pages
Hierarchical Data Structures
No ratings yet
Hierarchical Data Structures
21 pages
mod4
No ratings yet
mod4
72 pages
Unit 3
No ratings yet
Unit 3
44 pages
ARM Architecture Overview
100% (1)
ARM Architecture Overview
19 pages
Arm7 Scrib1
No ratings yet
Arm7 Scrib1
72 pages
RPG IV Legacy Dates Cheat Sheet
No ratings yet
RPG IV Legacy Dates Cheat Sheet
12 pages
04_ARM_Architecture_Overview.ppt
No ratings yet
04_ARM_Architecture_Overview.ppt
19 pages
STM32F411xC/E Errata Sheet
No ratings yet
STM32F411xC/E Errata Sheet
24 pages
Capstone:4001: Erification of The U.S.B
No ratings yet
Capstone:4001: Erification of The U.S.B
14 pages
Module 4 - ECE3014 Introduction To Embedded System and ARM-1
No ratings yet
Module 4 - ECE3014 Introduction To Embedded System and ARM-1
27 pages
01 Ms Ccs 020611
No ratings yet
01 Ms Ccs 020611
57 pages
1 - Foundations - 11 - 26 - 2 - M1
No ratings yet
1 - Foundations - 11 - 26 - 2 - M1
100 pages
ARM MCU Unit1 Part 1
No ratings yet
ARM MCU Unit1 Part 1
44 pages
04 - The ARM Architecture and ISA
No ratings yet
04 - The ARM Architecture and ISA
73 pages
Chapitre 4-Etude Des Microcontroleurs STM32
100% (1)
Chapitre 4-Etude Des Microcontroleurs STM32
64 pages
Log
No ratings yet
Log
12 pages
Arm9 Embedded Book-Guide
100% (2)
Arm9 Embedded Book-Guide
67 pages
Cortex M3
No ratings yet
Cortex M3
34 pages
CPU Instruction Set
No ratings yet
CPU Instruction Set
16 pages
Chapter 1: Introduction To Computers and Programming: Topics
No ratings yet
Chapter 1: Introduction To Computers and Programming: Topics
16 pages
Module 1B - ARM Cortex M0+ Core Architecture
No ratings yet
Module 1B - ARM Cortex M0+ Core Architecture
28 pages
Test 1Z0-1085-23
0% (1)
Test 1Z0-1085-23
12 pages
Introduction To ARM Cortex-M Processor
100% (2)
Introduction To ARM Cortex-M Processor
19 pages
Unit 4 Introduction To ARM CORTEX M4
100% (1)
Unit 4 Introduction To ARM CORTEX M4
84 pages
S 4 Introduction Architecture ARM Processeurs Embarqués PDF
No ratings yet
S 4 Introduction Architecture ARM Processeurs Embarqués PDF
9 pages
UI Testing Checklist
No ratings yet
UI Testing Checklist
4 pages
Computer Communications For Ethernet Global Data
No ratings yet
Computer Communications For Ethernet Global Data
10 pages
Hans Deluxe Log
0% (1)
Hans Deluxe Log
6 pages
Lecture 8 Addressing Modes
No ratings yet
Lecture 8 Addressing Modes
10 pages
LINUX Interview Questions
0% (1)
LINUX Interview Questions
5 pages
Creare Baza de Date Din Cod:: Public Static Void New String New
No ratings yet
Creare Baza de Date Din Cod:: Public Static Void New String New
2 pages
Es U-1 Ch-2 Part2
No ratings yet
Es U-1 Ch-2 Part2
8 pages
Eer and Er Model
No ratings yet
Eer and Er Model
12 pages
Development of The ARM Architecture
No ratings yet
Development of The ARM Architecture
44 pages
Hackathon - Surid
No ratings yet
Hackathon - Surid
6 pages
Btbee Datasheet
No ratings yet
Btbee Datasheet
9 pages
Mastering the Art of ARM Assembly Programming: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering the Art of ARM Assembly Programming: Unlock the Secrets of Expert-Level Skills
Steve Jones
No ratings yet
ARM Architecture and Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
ARM Architecture and Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cortex-A Architecture and System Design: Definitive Reference for Developers and Engineers
From Everand
Cortex-A Architecture and System Design: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cortex-M Architecture and Programming Reference: Definitive Reference for Developers and Engineers
From Everand
Cortex-M Architecture and Programming Reference: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ARM Cortex-M memory model

Uploaded by

ARM Cortex-M memory model

Uploaded by

1

Address Space Overview

Instruction Execution: Aligned and Unaligned Access

Instruction execution involves processing data or branching to specific memory locations. To

Aligned vs. Unaligned Access:

Trapping Unaligned Access:

• The CCR.UNALIGN_TRP configuration bit enables developers to trap unaligned accesses:

Instruction Fetching: Enforcing Half-Word Alignment

• Alignment Requirement: Instruction fetches must occur at addresses divisible by 2 (0b...0

Balancing Execution and Fetching

Endianness in ARM Cortex-M

Byte Ordering and Instruction Encoding

• hw1 is stored at the lower address.

Instruction: hw1 | hw2

Byte at Address A → hw1 (bits 7:0)

Byte at Address A+1 → hw1 (bits 15:8)

Byte at Address A+2 → hw2 (bits 7:0)

Byte at Address A+3 → hw2 (bits 15:8)

Figure 1: Byte Ordering

ARM defines two levels of atomicity:

Building Synchronization in ARM: Semaphores and Atomic Instructions

1. Early Solutions: Interrupt Disabling

2. SWP Instruction: The First Hardware-Based Solution

3. The Evolution: LDREX/STREX and the Role of the Memory Monitor

Role of the Memory Monitor:

Figure 3: Semaphore Implementation in Assembly

Bit-Banding: A Microcontroller-Specific Solution

Figure 4: Bit-band Region & Bit-Band Alias

Figure 6: Semaphore Using Bit-Banding

1. Hardware-Guaranteed Atomicity (Single/Multiple Copy)

2. LDREX/STREX: Reservation-Based Atomicity

• Scope: Instead of relying on hardware guarantees for atomicity, LDREX/STREX implement

• Avoid the reservation/retry cycle entirely, reducing contention overhead.

Figure 7: Comparison between SIngle/Multi Copy & LDREX/STREX Methods

Memory Types and Attributes

ARM architectures classify memory into three primary types:

o Accesses are strictly ordered as per program sequence.

The Importance of Memory Attributes

Figure 8: Memory Barrier implementation Example

Additional Notes on Memory Access Restrictions and Privilege Controls

ARM Cortex-M processors incorporate a sophisticated memory protection system called

Memory Protection Unit Overview

Memory Protection Unit (MPU) in Action

MPU and System Memory Map

Behavior of the MPU When Disabled

The MPU Programmer’s Model

Cache Hierarchy and Coherency

AI-Powered Cache Management

Traditional caching methods—while effective—can struggle with the complexity of modern

Cache Preloading and AI Integration

By combining traditional cache management techniques with AI-driven predictions, modern

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.