EEC 214 Lecture 3
EEC 214 Lecture 3
Computer
Microprocessor
Lecture 3
CALL INSTRUCTIONS AND STACK
Another control transfer instruction is the CALL instruction, which is used to call a subroutine.
Subroutines are often used to perform tasks that need to be performed frequently. This makes a
program more structured in addition to saving memory space.
In the AVR there are four instructions for the call subroutine: CALL (long call) RCALL (relative
call), ICALL (indirect call to Z), and EICALL (extended indirect call to Z). The choice of which
one to use depends on the target address. Each instruction is explained next.
In the AVR there are four instructions for the call subroutine: CALL (long call) RCALL (relative
call), ICALL (indirect call to Z), and EICALL (extended indirect call to Z). The choice of which
one to use depends on the target address. Each instruction is explained next.
If the stack is a section of RAM, there must be a register inside the CPU to point to it. The
register used to access the stack is called the SP (stack pointer) register. In I/O memory space,
there are two registers named SPL (the low byte of the SP) and SPH (the high byte of the SP).
The SP is implemented as two registers. The SPH register presents the high byte of the SP while
the SPL register presents the lower byte.
When the AVR is powered up, the SP register contains the value 0, which is the address of R0.
Therefore, we must initialize the SP at the beginning of the program so that it points to
somewhere in the internal SRAM.
In AVR, the stack grows from higher memory location to lower memory location (when we push
onto the stack, the SP decrements). So, it is common to initialize the SP to the uppermost memory
location.
Different AVRs have different amounts of RAM. In the AVR assembler RAMEND represents the
address of the last RAM location. So, if we want to initialize the SP so that it points to the last
memory location, we can simply load RAMEND into the SP. Notice that SP is made of two
registers, SPH and SPL. So, we load the high byte of RAMEND into SPH, and the low byte of
RAMEND into the SPL.
Example 8 shows how to initialize the SP and use the PUSH and POP instructions. In the
example you can see how the stack changes when the PUSH and POP instructions are executed.
When a subroutine is called, the processor first saves the address of the instruction just below the
CALL instruction on the stack, and then transfers control to that subroutine. This is how the CPU
knows where to resume when it returns from the called subroutine.
For the AVRs whose program counter is not longer than 16 bits (e.g., ATmega128, ATmega32),
the value of the program counter is broken into 2 bytes. The higher byte is pushed onto the stack
first, and then the lower byte is pushed.
For the AVRs whose program counters are longer than 16 bits but shorter than 24 bits, the value
of the program counter is broken up into 3 bytes. The highest byte is pushed first, then the middle
byte is pushed, and finally the lowest byte is pushed. So, in both cases, the higher bytes are
pushed first.
Note that we must not define the stack in the register memory, nor in the I/O memory. So, the SP
must be set to point above 0x60.
We must remember that upon calling a subroutine, the stack keeps track of where the CPU should
return after completing the subroutine. For this reason, we must be very careful when
manipulating the stack contents.
So, the Z register should contain the address of a function when the ICALL instruction is
executed. Because the Z register is 16 bits wide, the ICALL instruction can call the subroutines
that are within the lowest 64K words of the program memory. (The target address calculation in
ICALL is the same as for the IJMP instruction.)
The ICALL and EICALL instructions can be used to implement pointer to function.
1. The crystal frequency: The frequency of the crystal oscillator connected to the XTAL1 and
XTAL2 input pins is one factor in the time delay calculation. The duration of the clock period
for the instruction cycle is a function of this crystal frequency.
2. Indeed, one way to increase performance without losing code compatibility with the older
generation of a given family is to reduce the number of instruction cycles it takes to execute
an instruction. One might wonder how microprocessors such as AVR are able to execute an
instruction in one cycle. There are three ways to do that:
(a) Use Harvard architecture to get the maximum amount of code and data into the CPU,
(b) use RISC architecture features such as fixed-size instructions, and finally
(c) use pipelining to overlap fetching and execution of instructions.
The idea of pipelining in its simplest form is to allow the CPU to fetch and execute at the same
time, as shown in Figure 12. (An instruction fetches while the previous instruction executes.)
In this way, the execution of many instructions is overlapped. One limitation of pipelining is that
the speed of execution is limited to the slowest stage of the pipeline. Compare this to making
pizza.
As shown in Figure 13, in the AVR, each instruction is executed in 3 stages: operand fetch, ALU
operation execution, and result write back. In step 1, the operand is fetched. In step 2, the
operation is performed; for
example, the adding of the two numbers is done. In step 3, the result is written into the
destination register. It should be noted that in many computer architecture books, the process
stage is referred to as execute and write back is called write.
This is called a branch penalty. The penalty is an extra instruction cycle to fetch the instruction
from the target location instead of executing the instruction right below the branch. Remember
that the instruction below the branch has already been fetched and is next in line to be executed
when the CPU branches to a different address.
some instructions take two, three, or four machine cycles. These are JMP, CALL, RET, and all
the conditional branch instructions such as BRNE, BRLO, and so on.
The conditional branch instruction can take only one machine cycle if it does not jump. For
example, the BRNE will jump if Z = 0, and that takes two machine cycles. If Z = 1, then it falls
through and it takes only one machine cycle.
Each port has three I/O registers associated with it, as shown in Table 2. They are designated as
PORTx, DDRx, and PINx. For example, for Port B we have PORTB, DDRB, and PINB. Notice
that DDR stands for Data Direction Register, and PIN stands for Port INput pins. Also notice
that each of the I/O registers is 8 bits wide, and each port has a maximum of 8 pins; therefore,
each bit of the I/O registers affects one of the pins (see Figure 2; the content of bit 0 of DDRB
represents the direction of the PB0 pin, and so on).
It must be noted that unless we set the DDRx bits to one, the data will not go from the port
register to the pins of the AVR.
The pins of the AVR microcontrollers can be in four different states according to the values of
PORTx and DDRx, as shown in Figure 5.
.INCLUDE "M32DEF.INC"
.EQU MYTEMP 0x100 ;save it here
LDI R16,0x00 ;R16 = 00000000 (binary)
OUT DDRA,R16 ;make Port A an input port (0 for In)
NOP ;synchronizer delay
IN R16,PINA ;move from pins of Port A to R16
STS MYTEMP,R16 ;save it in MYTEMP
Synchronizer delay
The input circuit of the AVR has a delay of 1 clock cycle. In other words, the PIN register
represents the data that was present at the pins one clock ago. In the above code, when the
instruction “IN R16,PINA” is executed, the PINA register contains the data, which was present
at the pins one clock before. That is why the NOP is put before the “IN R16,PINA” instruction.
(If the NOP is omitted, the read data is the data of the pins when the port was output.)
For example, the following code will continuously send out the alternating values of 0x55 and
0xAA to Port B:
.INCLUDE "M32DEF.INC"
.EQU MYTEMP=0x100 ;save it here
LDI R16,0x00 ;R16 = 00000000 (binary)
OUT DDRB,R16 ;make Port B an input port (0 for In)
NOP
IN R16,PINB ;move from pins of Port B to R16
STS MYTEMP,R16 ;save it in MYTEMP
.INCLUDE "M32DEF.INC"
.EQU MYTEMP 0x100 ;save it here
LDI R16,0x00 ;R16 = 00000000 (binary)
OUT DDRC,R16 ;make Port C an input port (0 for In)
NOP
IN R16,PINC ;move from pins of Port C to R16
STS MYTEMP,R16 ;save it in MYTEMP
.INCLUDE "M32DEF.INC"
.EQU MYTEMP 0x100 ;save it here
LDI R16,0x00 ;R16 = 00000000 (binary)
OUT DDRD,R16 ;make Port D an input port (0 for In)
NOP
IN R16,PIND ;move from pins of Port D to R16
STS MYTEMP,R16 ;save it in MYTEMP