CH1 - ARM - PPT New New
CH1 - ARM - PPT New New
Syllabus
Overview of C Compilers and
optimization, basic C data types, C
looping structures, register allocation,
function calls, pointer aliasing,
structure arrangement, unaligned
Data and Endianess
1
Overview of C Complier
• Optimizing C programs
- Reducing memory requirement
- Increasing execution speed
• Optimizing code reduces source readability.
• Usually, it’s only worth optimizing functions
that are frequently executed & important for
performance.
• Profiling tool of ARM simulators can be
used to find frequently used instructions &
functions.
2
Overview of C Complier
Example : Case 1
• Compilers have to translate C
functions into assembler so that it
work for all possible inputs.
• In practice, many of input
combinations are not possible or
won’t occur.
• The memclr function clears N bytes of
memory at addressValue
data.
of N can be 0 Or
not. Compiler does not
know this. Compiler
needs to test before the
first iteration
3
Overview of C
Complier
Example : Case 2
4
Overview of C
Complier
Example : Case 3
Compiler unaware of
whether the N is multiple
of four or not.If it is
multiple of four ,compiler
can repeat the loop body
four times or store four
bytes at a time using an
int
5
Overview of C Complier
6
Overview of C
Complier
• To write efficient C code, user must be aware of limits of a
specific C Compiler & limits of the processor architecture
the compiler is mapping to.
7
Basic data types
8
Basic data types
9
Local Variable types
10
Local Variable types(conti)
11
Local Variable types(conti)
12
Local Variable types(conti)
13
Local Variable types(conti)
14
Local Variable types(conti)
15
Function Arguments
16
Function Arguments (Conti)
17
Function Arguments(Conti)
18
Signed versus Unsigned types
19
Signed versus Unsigned types
20
Efficient use of C types
21
22
23
24
25
26
27
28
29
30
31
Register Allocation
• Compiler attempts to allocate
a processor register to
each local variable used in a C
function.
• It will try to use the same register for
different local variables if the use of the
variables do not overlap.
• Extra variables: On processor stack
-These are called Spilled variables
/Swapped out variables.
- Slow to access compared to register 32
Register Allocation
33
Register Allocation
Register keyword
• The register keyword in C hints that a
compiler should allocate the given
variable to a register.
• However, different compilers treat this
keyword in different ways, and
different architectures have a different
number of available registers (for
example, Thumb and ARM).
34
35
Function Calls
• The ARM Procedure Call Standard
(APCS) defines how to pass function
arguments and return values in ARM
registers.
• The first four integer arguments are
passed in the first four ARM registers:
r0, r1, r2, and r3. Subsequent integer
arguments are placed on the stack.
• Function return integer values are
passed in r0.
36
•
37
Function Calls
Four Register Rule
• Functions with four or fewer arguments are far more efficient
to call than functions with five or more arguments. For
functions with four or fewer arguments, the compiler can pass
all the arguments in registers.
• For functions with more arguments, both the caller and callee
must access the stack for some arguments.
• If a function needs more than four arguments, then it is
almost always more efficient to use structures. Group related
arguments into structures and pass a structure pointer
rather than multiple arguments.
38
Function Calls
• The next example illustrates the benefits of
using a structure pointer. First we show a
typical routine to insert N bytes from array
data into a queue. We implement the queue
using a cyclic buffer with start address Q_start
(inclusive) and end address Q_end (exclusive).
39
Circular queue without structure
40
Circular queue with structure to pass
arguments
41
Comparison
• The second version has only three
function arguments rather than five.
• Each call to the function requires only
three register setups.
• There is a net saving of two
instructions in a function call overhead.
42
Calling functions efficiently
• Try to restrict functions to four
arguments. This will make them more
efficient to call. Use structures to
group related arguments and pass
structure pointers instead of multiple
arguments.
• Define small functions in the same
source file and before the functions
that call them. The compiler can then
optimize the function call or inline the
43
•
Pointer Aliasing
Two pointers are said to alias when they point to the same
address.
Step is
loaded
twice.
Compiler
is
in-
efficient
• Efficiently, compiler should load local
variable once.
44
Pointer aliasing: Common sub expression
elimination
• The compiler optimization called sub
expression elimination allows the
subexpression is evaluated only once and
the value reused for the second
occurrence.
• However, the compiler can’t use this
optimization here. The pointers timer1
and step might alias one another.
• In other words, the compiler cannot be
sure that the write to timer1 doesn’t 45
Aliasing in structure
• If you use structure accesses
rather than direct pointer
access:
50
Structure arrangement
• To improve the memory usage, we
should reorder the elements as follows:
51
Structure Arrangement
Packed Structure
• The armcc compiler does include
keyword _packed that removes all
Padding
The armcc will treat Bool as a one-byte type as it only uses the values 0 and
1.Bool will only take up 8 bits of space in a structure.
However, gcc will treat Bool as a word and take up 32 bits of space in a
structure.
To avoid ambiguity it is best to avoid using enum types in structures used in
the API to your code
53
Load and store alignment restrictions for
• . ARMv5TE.
Transfer size Instruction Byte address
54
Structure arrangement
55
Efficient Structure Arrangement
• Lay structures out in order of
increasing element size. Start the
structure with the smallest elements
and finish with the largest.
• Avoid verylarge structures.
Instead use a hierarchy of smaller
structures.
• Manually add padding to
the structures so that the layout
of the structure does not depend on
56
the compiler.
Unaligned data access and Endianess
• Unaligned data and
endianness are two issues that
can complicate memory
accesses and portability.
• The ARM load and store
instructions
address is a assume
multiplethatof the
the
type
storing. we are loading or
• For load or store to an address
that
then is not
the aligned
behaviour to its type,
depends
on the
implementation. particular
The core
may generate a data
load a rotated value. abort or
• For well written, portable code
we should
accesses. avoid unaligned
57
Unaligned data access and Endianess
• C compilers assume that a pointer is
aligned.
• If a pointer isn’t aligned, then
the program may give unexpected
results.
• This is sometimes an issue when
porting code to the ARM from processors
that do allow unaligned accesses.
• Forarmcc, the packed directive
tells the compiler that a data item
can be positioned at any byte alignment.
• This is usefulfor porting code, but 58
Endianess
• The ARM core can be configured
to work in little-endian (least
significant byte at lowest
address) or big-endian (most
significant byte at lowest
address) modes.
• Little-endian mode is usually
the default.
59
60
Avoiding endianness