Unit 4
Unit 4
Department of E&TC
The cortex family has three main categories which are namely
The Cortex-A series applications start from handsets, smart phones, computers to high-end
broadcasting and networking equipment.
Cortex-A5 is the basic version of all the Cortex-A processors with low power consumption and
desirable performance capabilities.
High-performance Cortex-A cores starting from A15, A17 and other A53, A35 series, these cores
are used in the Luxury handsets with multicore configuration.
A17 can be configured with up to 4 core combination. With all these flexible options Cortex-A
series has taken over the mobile market with an exponential growth both in the production and
performance of the core.
ARM Processor Families: CORTEX-
R
smallest ARM processor
These cortex-R series mostly target on real-time applications such as automotive, medical and
server-side applications in the communication equipment.
The applications of the Cortex-R are completely fail proof. Example: airbags and brake
system in the automotive applications, critical ECG graphs in the medical equipment,
transmission controls in networking equipment, total system failures in the server-side
technologies are the few examples for the cortex-R applications.
•Cortex-R4 : for the automotive applications with a clock frequency up to 600MHz, has an
8stage pipeline with dual-issue and low latency interrupt system that can interrupt multi-cycle
operations to serve the incoming interrupt. Cortex-R4 can also be implemented in the
dual-core configuration with another R4 being in a redundant lock-step configuration which
enables logic for fault detection and ultimately making it as an ideal safety-critical system.
•Cortex-R5 is widely used in the networking and data storage applications with an increase in
the efficiency, reliability and enhanced error management in the real-time systems
•Cortex-R7 comes with an increased performance clock ticking at the frequency of 1GHz.
It also has a fully integrated generic interrupt controller (GIC) supporting complex
priority-based interrupt handling.
This series of processors are not suitable for the rich operating systems such as android and
Linux.
ARM Processor Families: CORTEX- M
The Cortex-M is the famous microcontroller series built on the
ARMv7 architecture and the smallest microcontroller. Cortex-M0+
is built on the ARMv6 architecture.
The first Cortex-M was released in 2004 namely Cortex-M3 and the
latest one in the series is Cortex-M35P (2018) in which P stands for
physically Secure.
The Cortex-M3 and Cortex-M4 are very similar cores with a 3 stage pipeline and multiple
32-bit buses, clock speed up to 200MHz.
The cortex-M4 is also specifically optimized to handle DSP algorithms and also consumes 3
times less power compared to the cortex-M3
Applications include microcontrollers, mixed signal devices, smart sensors, automotive body
electronics and airbags
ARM Processor Families: CORTEX- M
Cortex Microcontroller Software Interface
Standard (CMSIS)
Cortex Microcontroller Software
Interface Standard (CMSIS)
CMSIS-Core
CMSIS-Core (Cortex-M) implements the basic run-time system for a Cortex-M device and
gives the user access to the processor core and the device peripherals.
A native Cortex-M application with CMSIS uses the software component CMSIS-CORE,
which should be used together with the software component Device-Startup
Hardware Abstraction Layer (HAL) for Cortex-M processor registers with standardized
definitions for the SysTick, NVIC, System Control Block registers, MPU registers, FPU registers,
and core access functions.
System exception names to interface to system exceptions without having compatibility issues.
Methods to organize header files that makes it easy to learn new Cortex-M microcontroller
products and improve software portability. This includes naming conventions for device-specific
interrupts.
Methods for system initialization to be used by each MCU vendor. For example, the
standardized SystemInit() function is essential for configuring the clock system of the device.
Intrinsic functions used to generate CPU instructions that are not supported by standard C
functions.
A variable to determine the system clock frequency which simplifies the setup the SysTick
timer.
Cortex Microcontroller Software Interface
Standard (CMSIS)
•CMSIS offers a very simple configuration interface between microcontroller core and
peripheral units.
•This technology allows to speed-up both the peripherals configuration process and the
time-to-market for commercial success.
CMSIS application software components
CMSIS-CORE:
CMSIS-CORE is standardized API for the Cortex-M processor core and peripherals.
CMSIS-SVD:
CMSIS-System View Description(SVD) describes how to display peripherals in IDEs.
Create this XML-based file to let tools display the peripherals in debuggers and to
create header files with peripheral register and interrupt definitions.
CMSIS-Pack:
CMSIS-Pack is a delivery mechanism for device support and software components.
Development tools and web infrastructures use the information included in packs to
extract device parameters, software components, and evaluation board configurations.
CMSIS-DAP:
CMSIS-DAP is standardized firmware for a debug probe unit (usually on a separated
chip) that connects to the Arm CoreSight Debug Access Port of your SoC design. It is
well suited for integration on low-cost evaluation boards.
.
CMSIS application software components
CMSIS-RTOS:
CMSIS-RTOS is a common API for real-time operating systems. It provides a
standardized programming interface that is portable to many RTOS and enables
software components that can work across multiple RTOS systems.
CMSIS-Driver:
CMSIS-Driver is definition of peripheral driver interfaces for common
peripherals such as USB, Ethernet, SPI, and other types. Using this driver
interface, middleware becomes reusable across supported devices.
CMSIS-DSP:
CMSIS-DSP is a library collection with over 60 signal processing functions.
CMSIS-NN:
CMSIS-NN is a collection of efficient neural network kernels developed to
maximize the performance and minimize the memory footprint of neural
networks on the Cortex-M processor cores.
CMSIS-Zone:
CMSIS-Zone is a tool that helps to describe system resources and to partition
these resources into multiple projects and execution areas. This is required for
complex microcontrollers that contain multiple cores, memory protection units,
or TrustZone for Armv8-M
Cortex Microcontroller Software
Interface Standard (CMSIS)
CMSIS-Core
Typical project setup using a CMSIS
device-driver package
Cortex Microcontroller Software
Interface Standard (CMSIS)
Benefits of CMSIS core
• A project for a Cortex-M microcontroller device can be migrated to another
device from the same vendor with a different Cortex-M processor very easily.
Often microcontroller vendors provide devices with Cortex-M0/M0þ/M3/M4
with the same peripheral and same pin out, and the change required is just
replacing a couple of CMSIS files in the project.
• CMSIS-Core made it easier for a Cortex-M microcontroller project to be
migrated to another device from a different vendor. Obviously, peripheral setup
and access code will need to be modified, but processor core access functions are
based on the same CMSIS source code and do not require changes.
• CMSIS allows software to be much more future proof because embedded software
developed today can be reused on other Cortex-M products in the future.
The CMSIS-Core also allows faster time to market because:
• It is easier to reuse software code from previous projects.
• Since all CMSIS-compliant device drivers have a similar structure, learning to
use a new Cortex-M microcontroller is much easier.
• The CMSIS code has been tested by many silicon vendors and software
developers around the world.
Benefits of CMSIS core
• CMSIS is supported by multiple compiler
tool-chain vendors.
• CMSIS has a small memory footprint (less
than 1KB for all core access functions and a
few bytes of RAM for several variables).
Benefits of CMSIS core
For developers of embedded OS and middleware, the advantage of CMSIS is
significant:
• By using processor core access functions from CMSIS, embedded OS, and
middleware can work with device-driver libraries from various
microcontroller vendors, including future products that are yet to be
released.
• Since CMSIS is designed to work with various toolchains, many software
products can be designed to be toolchain independent.
• Without CMSIS, middleware might need to include a small set of driver
functions for accessing processor peripherals such as the interrupt
controller. Such an arrangement increases the program size, and might
cause compatibility issues with other software products
ARM CORTEX M4 microprocessor
• The Cortex-M3 and Cortex-M4 are processors
designed by ARM
• Cortex-M4 processors use a 32-bit architecture.
• Internal registers in the register bank, the data path,
and the bus interfaces are all 32 bits wide.
• The Instruction Set Architecture (ISA) in the
Cortex-M processors is called the Thumb ISA and is
based on Thumb-2 Technology which supports a
mixture of 16-bit and 32-bit instructions.
ARM CORTEX- Instruction set
ARM CORTEX M4 Features
• Three-stage pipeline design
• Harvard bus architecture with unified memory space: instructions and data use the
same address space
• 32-bit addressing, supporting 4GB of memory space
• On-chip bus interfaces based on ARM AMBA (Advanced Microcontroller Bus
Architecture) Technology, which allow pipelined bus operations for higher
throughput
• An interrupt controller called NVIC (Nested Vectored Interrupt Controller)
supporting up to 240 interrupt requests and from 8 to 256 interrupt priority levels
(dependent on the actual device implementation)
• Support for various features for OS (Operating System) implementation such as a
system tick timer, shadowed stack pointer
• Sleep mode support and various low power features
• Support for an optional MPU (Memory Protection Unit) to provide memory
protection features like programmable memory, or access permission control
• Support for bit-data accesses in two specific memory regions using a feature called
Bit Band
• The option of being used in single processor or multi-processor designs
ARM CORTEX M4 Features
ARM CORTEX M4 Block diagram
ARM CORTEX M4 Architecture
ARM CORTEX M compatibility
Programmer’s model : Registers
Programmer’s model : Registers
Programmer’s model : Registers
Allowed Register Names as Assembly
Code
Programmer’s model : Special registers
Programmer’s model : Special registers
Programmer’s model : Special
registers
In ARM assembler, when accessing xPSR , the symbol PSR is used.
The ERSR cannot be accessed by software code directly using MRS (read as
zero) or MSR
• The IPSR is read only and can be read from combined PSR (xPSR).
Programmer’s model : APSR
Programmer’s model : PSR’s of
ARM series
Memory system: Memory system
features
Memory system: Memory system
features
Bit-band memory region
Access to this region is carried out via the system interface bus
Bit Band memory region
Bit Band memory region
Memory Map
Memory Map
Stack Memory
Stack is a kind of memory usage mechanism that allows a portion of memory to be used as
Last-In-First-Out data storage buffer.
ARM processors use the main system memory for stack memory operations,
and have the PUSH instruction to store data in stack and the POP instruction
to retrieve data from stack.
In simple applications without an OS, both Thread mode and Handler mode can
use MSP only.
After an interrupt event is triggered, the processor first pushes a number of registers into
the stack before entering the Interrupt Service Routine (ISR).
This register state saving operation is called “Stacking,” and at the end of the ISR, these
registers are restored to the register bank and this operation is called “Unstacking.”
Stack Pointer
When embedded systems use an embedded OS, they often use separate memory
areas for application stack and the kernel stack. As a result, the PSP is used and
switching of SP selection takes place in exception entry and exception exit. The automatic
“Stacking” and “Unstacking” stages use PSP.
Exceptions and interrupts
Exceptions and interrupts
Exceptions and interrupts
Exceptions and interrupts
Nested vectored interrupt controller
(NVIC)
Nested vectored interrupt controller
(NVIC)
Nested vectored interrupt controller
(NVIC)-Vector Table
Exception Vector Table
Fault Exception
Debug
Debug
Debug
Reset and reset sequence
Reset and reset sequence
Reset and reset sequence
The SysTick timer
•The Cortex-M processors have a small integrated timer called the SysTick (System
Tick) timer.
•It is integrated as a part of the NVIC and can generate the SysTick
•exception (exception type #15).
•The SysTick timer is a simple decrement 24-bit timer, and can run on processor
clock frequency or from a reference clock frequency (normally an on-chip clock
source).
•The reason for having the timer inside the processor is to help software portability.
•Since all the Cortex-M processors have the same SysTick timer, an OS written for
one Cortex-M3/M4 microcontroller can be reused on other Cortex-M3/M4
microcontrollers.
•If you do not need an embedded OS in your application, the SysTick timer can be
used as a simple timer peripheral for periodic interrupt generation, delay
generation, or timing measurement.
The SysTick timer
Introduction to STM32F40XX
Introduction to STM32F40XX
STM32F40XX Applications
STM32F40XX
Block diagram
STM32F40XX
Block diagram
STM32F40XX
Block diagram
Memory and bus architecture
The main system consists of 32-bit multilayer AHB bus matrix that
interconnects
Memory and bus architecture
Memory and bus architecture
Memory and bus architecture
Memory and bus architecture
Memory organization
• Program memory, data memory, registers and
I/O ports are organized within the same linear
4 Gbyte address space.
• The bytes are coded in memory in little endian
format.
• The lowest numbered byte in a word is
considered the word’s least significant byte
and the highest numbered byte, the word’s
most significant.
Embedded SRAM
• The STM32F405xx/07xx feature 4 Kbytes of backup SRAM plus 192 Kbytes of
system SRAM.
• The embedded SRAM can be accessed as bytes, half-words (16 bits) or full words
(32 bits).
• Read and write operations are performed at CPU speed with 0 wait state.
• The embedded SRAM is divided into up to three blocks:
• SRAM1 and SRAM2 mapped at address 0x2000 0000 and accessible by all AHB
• masters.
• SRAM3 (available on STM32F42xxx and STM32F43xxx) mapped at address
• 0x2002 0000 and accessible by all AHB masters.
• CCM (core coupled memory) mapped at address 0x1000 0000 and accessible only
by the CPU through the D-bus.
Embedded SRAM
• The AHB masters support concurrent SRAM accesses (from the Ethernet
or the USB OTG HS): for instance, the Ethernet MAC can read/write
from/to SRAM2 while the CPU is reading/writing from/to SRAM1 or
SRAM3.
• The CPU can access the SRAM1, SRAM2, and SRAM3 through the
System Bus or through the I-Code/D-Code buses when boot from SRAM is
selected or when physical remap is selected (SYSCFG memory remap
register (SYSCFG_MEMRMP) in the SYSCFG controller).
• To get the max performance on SRAM execution, physical remap should
be selected (boot or software selection).
Flash Memory
• The Flash memory interface manages CPU AHB I-Code and D-Code
accesses to the Flash memory.
• It implements the erase and program Flash memory operations and the
read and write protection mechanisms.
• It accelerates code execution with a system of instruction prefetch and
cache lines.
• The Flash memory is organized as follows:
• A main memory block divided into sectors.
• System memory from which the device boots in System memory boot
mode
• 512 OTP (one-time programmable) bytes for user data.
• Option bytes to configure read and write protection, BOR level, watchdog
• software/hardware and reset when the device is in Standby or Stop mode.
Flash Memory
Flash Memory
Bit-Band Region
• The Cortex®-M4 with FPU memory map includes two bit-band regions.
• These regions map each word in an alias region of memory to a bit in a bit-band region of
memory. Writing to a word in the alias region has the same effect as a read-modify-write
operation on the targeted bit in the bit-band region.
• In the STM32F4xx devices both the peripheral registers and the SRAM are mapped to a
bitband region, so that single bit-band write and read operations are allowed.
Memory Protection Unit
SYSCLK of STM32F4XX
• The SYSCLK is the main system clock derived from either the HSI clock, HSE
clock, or from the PLL clock.
• The SYSCLK then branches off to the peripheral clocks, which feed peripheral
devices, such as a GPIO port or a UART pin, or a SPI pin, etc.
• The first main division of the SYSCLK is the AHB bus (either AHB1 or AHB2).
The AHB1 or the AHB2 bus They're both the same clock signal with the same
frequency, thus, both buses are referred to as the HCLK.
• The HCLK is the clock signal for the AHB bus (AHB1 or AHB2).
• This bus then splits into the APB1 bus and the APB2 bus.
• The APB1 and the APB2 bus are distinct buses that have different clock frequency
capabilities.
• The APB2 bus is referred to as the higher-speed bus, which can handle the double
the frequency that the APB1 bus can handle. In an STM32F407G board, the APB2
can handle up to 84MHz of frequency. The APB1 can handle up to 42MHz of
frequency.
• The PCLK1 clock signal is the clock signal that drives the APB1 bus.
• The PCLK2 clock signal is the clock signal that drives the APB2 bus.
Embedded Flash Unit
Nested vectored interrupt controller
(NVIC)
The STM32F407xx embed a nested vectored interrupt controller able
to manage 16 priority levels, and handle up to 82 maskable interrupt
channels plus the 16 interrupt lines of the Cortex®-M4 with FPU core.
This hardware block provides flexible interrupt management features
with minimum interrupt latency.
Closely coupled NVIC gives low-latency interrupt processing
Interrupt entry vector table address passed directly to the core
Allows early processing of interrupts
Processing of late arriving, higher-priority interrupts
Support tail chaining
Processor state automatically saved
Interrupt entry restored on interrupt exit with no instruction overhead
This hardware block provides flexible interrupt management features with
minimum interrupt latency.
External interrupt/event controller
(EXTI)
The external interrupt/event controller consists of 23 edge-
detector lines used to generate interrupt/event requests.
Each line can be independently configured to select the trigger
event (rising edge, falling edge, both) and can be masked
independently.
A pending register maintains the status of the interrupt
requests.
The EXTI can detect an external line with a pulse width
shorter than the Internal APB2 clock period.
Up to 140 GPIOs can be connected to the 16 external interrupt
lines.
Clocks
Three different clock sources can be used to drive the system clock (SYSCLK):
HSI oscillator clock
HSE oscillator clock
Main PLL (PLL) clock
The devices have the two following secondary clock sources:
• 32 kHz low-speed internal RC (LSI RC) which drives the independent watchdog and,
optionally, the RTC used for Auto-wakeup from the Stop/Standby mode.
• 32.768 kHz low-speed external crystal (LSE crystal) which optionally drives the RTC
clock (RTCCLK)
Each clock source can be switched on or off independently when it is not used, to
optimize power consumption.
The clock controller provides a high degree of flexibility to the application in the
choice of the external crystal or the oscillator to run the core and peripherals at the
highest frequency and, guarantee the appropriate frequency for peripherals that need
a specific clock like Ethernet, USB OTG FS and HS, I2S and SDIO
Clocks
• Several prescalers are used to configure the AHB frequency,
the high-speed APB (APB2) and the low-speed APB (APB1)
domains.
• The maximum frequency of the AHB domain is 168 MHz.
• The maximum allowed frequency of the high-speed APB2
domain is 84 MHz.
• The maximum allowed frequency of the low-speed APB1
domain is 42 MHz.
• All peripheral clocks are derived from the system clock
(SYSCLK) except for:
1. The USB OTG FS clock (48 MHz), the random analog
generator (RNG) clock (≤ 48 MHz) and the SDIO clock (≤ 48
MHz) which are coming from a specific output of PLL
(PLL48CLK)
Clocks
2. The I2S clock
To achieve high-quality audio performance, the I2S clock can be derived either from a
specific PLL (PLLI2S) or from an external clock mapped on the I2S_CKIN pin.
3. The USB OTG HS (60 MHz) clock which is provided from the external PHY
4. The Ethernet MAC clocks (TX, RX and RMII) which are provided from the external
PHY. When the Ethernet is used, the AHB clock frequency must be at least 25 MHz.
• The RCC feeds the external clock of the Cortex System Timer (SysTick) with the
AHB clock (HCLK) divided by 8. The SysTick can work either with this clock or
with the Cortex clock (HCLK), configurable in the SysTick control and status
register.
• The timer clock frequencies are automatically set by hardware. There are two cases:
1. If the APB prescaler is 1, the timer clock frequencies are set to the same frequency as
that of the APB domain to which the timers are connected.
2. Otherwise, they are set to twice (×2) the frequency of the APB domain to which the
timers are connected.
HSE clock
• The high speed external clock signal (HSE) can be generated from two
possible clock sources:
1. HSE external crystal/ceramic resonator
2. HSE external user clock
• The resonator and the load capacitors have to be placed as close as possible
to the oscillator pins in order to minimize output distortion and startup
stabilization time.
• The loading capacitance values must be adjusted according to the selected
oscillator.
HSE clock
External source (HSE bypass)
• In this mode, an external clock source must be provided. You select this mode by
setting the HSEBYP and HSEON bits in the RCC clock control register (RCC_CR).
The external clock signal (square, sinus or triangle) with ~50% duty cycle has to
drive the OSC_IN pin while the OSC_OUT pin should be left HI-Z.
External crystal/ceramic resonator (HSE crystal)
• The HSE has the advantage of producing a very accurate rate on the main clock.
• The HSERDY flag in the RCC clock control register (RCC_CR) indicates if the
high-speed external oscillator is stable or not.
• At startup, the clock is not released until this bit is set by hardware.
• An interrupt can be generated if enabled in the RCC clock interrupt register
(RCC_CIR).
• The HSE Crystal can be switched on and off using the HSEON bit in the RCC
clock control register (RCC_CR).
HSI clock and LSE Clock
• HSI clock
• The HSI clock signal is generated from an internal 16 MHz RC oscillator and can
be used directly as a system clock, or used as PLL input.
• The HSI RC oscillator has the advantage of providing a clock source at low cost (no
external components). It also has a faster startup time than the HSE crystal
oscillator however, even with calibration the frequency is less accurate than an
external crystal oscillator or ceramic resonator.
• LSE clock
• The LSE clock is generated from a 32.768 kHz low-speed external crystal or
ceramic resonator.
• It has the advantage providing a low-power but highly accurate clock source to the
real-time clock peripheral (RTC) for clock/calendar or other timing functions.
• The LSE oscillator is switched on and off using the LSEON bit in RCC Backup
domain control register (RCC_BDCR).
• The LSERDY flag in the RCC Backup domain control register (RCC_BDCR)
indicates if the LSE crystal is stable or not. At startup, the LSE crystal output clock
signal is not released until this bit is set by hardware. An interrupt can be generated
if enabled in the RCC clock interrupt register (RCC_CIR).
LSI Clock
• The LSI RC acts as an low-power clock source that can be kept running in
Stop and Standby mode for the independent watchdog (IWDG) and
Auto-wakeup unit (AWU).
• The clock frequency is around 32 kHz.
• The LSI RC can be switched on and off using the LSION bit in the RCC
clock control & status register (RCC_CSR).
• The LSIRDY flag in the RCC clock control & status register (RCC_CSR)
indicates if the low speed internal oscillator is stable or not.
• At startup, the clock is not released until this bit is set by hardware. An
interrupt can be generated if enabled in the RCC clock interrupt register
(RCC_CIR).
System clock (SYSCLK) selection
PLL Configuration
The STM32F4xx devices feature two PLLs:
• A main PLL (PLL) clocked by the HSE or HSI oscillator and featuring two different
output clocks:
– The first output is used to generate the high speed system clock (up to 168 MHz)
– The second output is used to generate the clock for the USB OTG FS (48 MHz), the random
analog generator (≤48 MHz) and the SDIO (≤ 48 MHz).
• A dedicated PLL (PLLI2S) used to generate an accurate clock to achieve high-quality audio
performance on the I2S interface.
• Since the main-PLL configuration parameters cannot be changed once PLL is enabled, it is
recommended to configure PLL before enabling it (selection of the HSI or HSE oscillator as
PLL clock source, and configuration of division factors M, N, P, and Q).
• The PLLI2S uses the same input clock as PLL (PLLM[5:0] and PLLSRC bits are common to
both PLLs). However, the PLLI2S has dedicated enable/disable and division factors (N and R)
configuration bits. Once the PLLI2S is enabled, the configuration parameters cannot be
changed.
• The two PLLs are disabled by hardware when entering Stop and Standby modes, or when an
HSE failure occurs when HSE or PLL (clocked by HSE) are used as system clock. RCC PLL
configuration register (RCC_PLLCFGR) and RCC clock configuration register (RCC_CFGR)
can be used to configure PLL and PLLI2S, respectively.
RCC clock control register (RCC_CR)
RCC clock control register (RCC_CR)
1. ENABLE HSE and wait for the HSE to become Ready
RCC->CR |= 1<<16;
while (!(RCC->CR & (1<<17)));