0% found this document useful (0 votes)
25 views62 pages

CH1 - ARM - PPT New New

Uploaded by

Malhar Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views62 pages

CH1 - ARM - PPT New New

Uploaded by

Malhar Kulkarni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 62

UNIT 1

Programming in C for ARM

Syllabus
Overview of C Compilers and
optimization, basic C data types, C
looping structures, register allocation,
function calls, pointer aliasing,
structure arrangement, unaligned
Data and Endianess

1
Overview of C Complier
• Optimizing C programs
- Reducing memory requirement
- Increasing execution speed
• Optimizing code reduces source readability.
• Usually, it’s only worth optimizing functions
that are frequently executed & important for
performance.
• Profiling tool of ARM simulators can be
used to find frequently used instructions &
functions.
2
Overview of C Complier
Example : Case 1
• Compilers have to translate C
functions into assembler so that it
work for all possible inputs.
• In practice, many of input
combinations are not possible or
won’t occur.
• The memclr function clears N bytes of
memory at addressValue
data.
of N can be 0 Or
not. Compiler does not
know this. Compiler
needs to test before the
first iteration

3
Overview of C
Complier
Example : Case 2

Compiler unaware of whether


the data array pointer is four
byte aligned or not.If it Is four
byte aligned,then the compiler
can clear four bytes at a time
using int rather than char

4
Overview of C
Complier
Example : Case 3

Compiler unaware of
whether the N is multiple
of four or not.If it is
multiple of four ,compiler
can repeat the loop body
four times or store four
bytes at a time using an
int

5
Overview of C Complier

compilers must be conservative and


assume that N can be any value and that
data may have any alignment.
This ensures the generated code works
safely and reliably under all possible
conditions

6
Overview of C
Complier
• To write efficient C code, user must be aware of limits of a
specific C Compiler & limits of the processor architecture
the compiler is mapping to.

• Examples of ARM C Compilers:


• - armcc from ARM developer suit
• -GNU C compiler(LPC expresso Developer’s
suite)

7
Basic data types

8
Basic data types

9
Local Variable types

10
Local Variable types(conti)

11
Local Variable types(conti)

12
Local Variable types(conti)

13
Local Variable types(conti)

14
Local Variable types(conti)

15
Function Arguments

16
Function Arguments (Conti)

17
Function Arguments(Conti)

18
Signed versus Unsigned types

19
Signed versus Unsigned types

20
Efficient use of C types

21
22
23
24
25
26
27
28
29
30
31
Register Allocation
• Compiler attempts to allocate
a processor register to
each local variable used in a C
function.
• It will try to use the same register for
different local variables if the use of the
variables do not overlap.
• Extra variables: On processor stack
-These are called Spilled variables
/Swapped out variables.
- Slow to access compared to register 32
Register Allocation

33
Register Allocation
Register keyword
• The register keyword in C hints that a
compiler should allocate the given
variable to a register.
• However, different compilers treat this
keyword in different ways, and
different architectures have a different
number of available registers (for
example, Thumb and ARM).

34
35
Function Calls
• The ARM Procedure Call Standard
(APCS) defines how to pass function
arguments and return values in ARM
registers.
• The first four integer arguments are
passed in the first four ARM registers:
r0, r1, r2, and r3. Subsequent integer
arguments are placed on the stack.
• Function return integer values are
passed in r0.
36

37
Function Calls
Four Register Rule
• Functions with four or fewer arguments are far more efficient
to call than functions with five or more arguments. For
functions with four or fewer arguments, the compiler can pass
all the arguments in registers.
• For functions with more arguments, both the caller and callee
must access the stack for some arguments.
• If a function needs more than four arguments, then it is
almost always more efficient to use structures. Group related
arguments into structures and pass a structure pointer
rather than multiple arguments.

38
Function Calls
• The next example illustrates the benefits of
using a structure pointer. First we show a
typical routine to insert N bytes from array
data into a queue. We implement the queue
using a cyclic buffer with start address Q_start
(inclusive) and end address Q_end (exclusive).

39
Circular queue without structure

40
Circular queue with structure to pass
arguments

41
Comparison
• The second version has only three
function arguments rather than five.
• Each call to the function requires only
three register setups.
• There is a net saving of two
instructions in a function call overhead.

42
Calling functions efficiently
• Try to restrict functions to four
arguments. This will make them more
efficient to call. Use structures to
group related arguments and pass
structure pointers instead of multiple
arguments.
• Define small functions in the same
source file and before the functions
that call them. The compiler can then
optimize the function call or inline the
43

Pointer Aliasing
Two pointers are said to alias when they point to the same
address.

• In a function, the compiler often doesn’t know which


pointers can alias and which pointers can’t.
• The following function increments two timer values by a step
amount:

Step is

loaded
twice.
Compiler
is
in-
efficient
• Efficiently, compiler should load local
variable once.
44
Pointer aliasing: Common sub expression
elimination
• The compiler optimization called sub
expression elimination allows the
subexpression is evaluated only once and
the value reused for the second
occurrence.
• However, the compiler can’t use this
optimization here. The pointers timer1
and step might alias one another.
• In other words, the compiler cannot be
sure that the write to timer1 doesn’t 45
Aliasing in structure
• If you use structure accesses
rather than direct pointer
access:

• The compiler evaluates state->step


twice
• To rectify: Create a new local variable
to hold the value of state->step so
the compiler only performs a single 46
Aliasing in Structure(conti)
• In the code for timers_v3 we use a local
variable step to hold the value of state->step.
void timers_v3(State *state, Timers *timers)
{
int step = state->step;
timers->timer1 += step;
timers->timer2 += step;
}
47
Avoiding Pointer Aliasing
• Do not rely on the compiler to
eliminate common sub
expressions involving memory
accesses. Instead create new
local variables to hold the
expression.
• This ensures the expression is
evaluated only once.
48
Structure Arrangement
• The way we lay out a frequently used
structure can have a significant
impact on its performance and code density.
• Two issues:
- Alignment of the structure entries.
- The overall size of the structure.
• ARM compiler pre-process structure starting
memory to multiple width of largest
element in structure (Memory alignment
and padding is done here).
49
Structure Arrangement
Example : Little Endian Memory System
Padded to
ensure int
to start at
location 4

50
Structure arrangement
• To improve the memory usage, we
should reorder the elements as follows:

• This reduces the structure size from 12


bytes to 8 bytes, with the following new
layout:

51
Structure Arrangement
Packed Structure
• The armcc compiler does include
keyword _packed that removes all
Padding

• Packed structures are slow and inefficient to


access
• Only use the packed keyword where
space is far more important than speed
and we can’t reduce padding by 52
Structure arrangement

The armcc will treat Bool as a one-byte type as it only uses the values 0 and
1.Bool will only take up 8 bits of space in a structure.
However, gcc will treat Bool as a word and take up 32 bits of space in a
structure.
To avoid ambiguity it is best to avoid using enum types in structures used in
the API to your code

53
Load and store alignment restrictions for
• . ARMv5TE.
Transfer size Instruction Byte address

1 byte LDRB, LDRSB, STRB any byte address alignment

2 bytes LDRH, LDRSH, STRH multiple of 2 bytes

4 bytes LDR, STR multiple of 4 bytes

8 bytes LDRD, STRD multiple of 8 bytes

54
Structure arrangement

55
Efficient Structure Arrangement
• Lay structures out in order of
increasing element size. Start the
structure with the smallest elements
and finish with the largest.
• Avoid verylarge structures.
Instead use a hierarchy of smaller
structures.
• Manually add padding to
the structures so that the layout
of the structure does not depend on
56
the compiler.
Unaligned data access and Endianess
• Unaligned data and
endianness are two issues that
can complicate memory
accesses and portability.
• The ARM load and store
instructions
address is a assume
multiplethatof the
the
type
storing. we are loading or
• For load or store to an address
that
then is not
the aligned
behaviour to its type,
depends
on the
implementation. particular
The core
may generate a data
load a rotated value. abort or
• For well written, portable code
we should
accesses. avoid unaligned
57
Unaligned data access and Endianess
• C compilers assume that a pointer is
aligned.
• If a pointer isn’t aligned, then
the program may give unexpected
results.
• This is sometimes an issue when
porting code to the ARM from processors
that do allow unaligned accesses.
• Forarmcc, the packed directive
tells the compiler that a data item
can be positioned at any byte alignment.
• This is usefulfor porting code, but 58
Endianess
• The ARM core can be configured
to work in little-endian (least
significant byte at lowest
address) or big-endian (most
significant byte at lowest
address) modes.
• Little-endian mode is usually
the default.
59
60
Avoiding endianness

• Speed is not critical, the above


functions can be used
61
Endianness and Alignment
• Avoid using unaligned data if
you can.

Use the type char * for data
that can be at any byte
alignment. Access the data by
reading bytes and combining
with logical operations. Then
the code won’t depend on 62

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy