0% found this document useful (0 votes)
5K views

Module 2 Assemblers

Module 2 Assemblers

Uploaded by

srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5K views

Module 2 Assemblers

Module 2 Assemblers

Uploaded by

srinivas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

SYSTEM SOFTWARE

Module 2
ASSEMBLERS-2

1
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

ASSEMBLERS-2

Machine-Independent Assembler Features


Key points of this section:
1) The implementation of literals within an assembler,
2) Two assembler directives (EQU and ORG),
3) The use of expressions in assembler language,
4) Program blocks and control sections.

Literals
It is often convenient for the programmer to be able to write the value of a constant operand as a
part of the instruction that uses it. This avoids the defining the constants elsewhere in the
program and make up a label for it. Such a notation is called as literal.

Consider the following example


.
:
LDA FIVE
:
FIVE WORD 5
:
It is convenient to write the value of a constant operand as a part of instruction.
:
LDA =X’ 05’
:
A literal is identified with the prefix =, followed by a specification of the literal value.

2
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Literals vs. Immediate Operands


It is important to understand the difference between a literal and immediate operand.
1. With immediate addressing, the operand value is assembled as part of the machine instruction.
2. With a literal, the assembler generates the specified value as a constant at some other memory
location. The address of this generated constant is used as target address for the machine
instruction.

All of the literal operands used in a program are gathered together into one or more literal pools.
Normally literals are placed into a pool at the end of the program. In some cases, it is desirable to
place literals into a pool at some other location in the object program.
When the assembler encounters a LTORG statement, it creates a literal pool that contains all of
the literal operands used since the previous LTORG (or the beginning of the program). This
literal pool is placed in the object program at the location where the LTORG directive was
encountered.

Of course, literals placed in a pool by LTORG will not be repeated in the pool at the end of the
program. If we had not used the LTORG statement, the literal =C’EOF’ would be placed in the
pool at the end of the program. Most assemblers recognize duplicate literals – that is, the same
literal used in more than one place in the program – and store only one copy of the specified data
value.

How to find the duplicate literals?


The easiest way to recognize duplicate literals is by comparison of the character strings defining
them (the string =X’05’). The basic data structure that assembler handles literal operands is
literal table LITTAB. For each literal used, this table contains the literal name, the operand value
and length, and the address assigned to the operand when it is placed in a literal pool. LITTAB
is often organized as a hash table, using the literal name or value as the key.

Format of LITTAB

NAME OPERAND VALUE LENGTH ADDRESS


=X ‘05’ 05 1 1056
EOF 454F46 3 002D

3
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

During pass 1, the assembler searches LITTAB for the specified literal name (or value). If the
literal is already present in the table, no action is needed. If it is not present, the literal is added to
LITTAB (leaving the address unassigned).

During pass 2, the operand address is obtained by searching LITTAB for each literal operand
encountered. Generate Modification record for literals that represent an address in the program.

Symbol-Defining Statements

The user-defined symbols in assembler language programs appear as labels on instructions or


data areas. The value of such a label is the address assigned to the statement on which it appears.
Most assemblers provide an assembler directive that allows the programmer to define symbols
and specify their value. The assembler directive generally used is EQU.

The general form: symbol EQU value

This statement define the given symbol (enters it into SYMBOL) and assigns to it the value
specified. The value may be given as a constant or an expression involving constants and
previously defined symbols. One common use of EQU is to establish symbolic names that can be
used for improved readability in place of numeric values.
For example
+LDT +4096 → MAXLEN EQU 4096
+LDT #MAXLEN
When the assembler encounters the EQU statement, it enters MAXLEN into SYMTAB (with
value 4096).
Another common use of EQU is in defining mnemonic names for registers.

For example:
A EQU 0
X EQU 1
L EQU 2

These statements cause the symbols A, X, L to be entered into SYMBOL with their
corresponding values 0, 1, 2.

Another common assembler directive ‘ORG’:

Its form is ORG value

Where value is a constant or an expression involving constants and previously defined symbols.
When this statement is encountered during assembly of a program, the assembler resets its
location counter (LOCCTR) to the specified value. Since the values of symbols are taken from
LOCCTR, the ORG statement will affect the values of all labels defined until the next ORG.

4
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Example:
To define a table STAB, the content of the table is as follows:
SYMBOL field – 6-byte, VALUE field – one-word, FLAGS field – 2-byte.

SYMBOL VALUE FLAGS


STAB
(100 entries)

. . .
. . .
. . .

Using Indexed Addressing: Using ORG:


Reserve space Use LOCCTR to address fields
STAB RESB 1100
Refer to each field
SYMBOL EQU STAB
VALUE EQU STAB+6
FLAGS EQU STAB+9
Ex: To fetch the VALUE field
LDA VALUE, X (*Last ORG sets LOCCTR back)

Notice that two-pass assembler design requires that all symbols be defined during Pass 1.
Example:

ALPHA RESW 1 BETA EQU ALPHA


BETA EQU ALPHA ALPHA RESW 1

(*BETA cannot be assigned a value)


Another example:
The sequence of statements cannot be resolved by an ordinary two-pass assembler regardless of
how the work is divided between passes.

ALPHA EQU BETA


BETA EQU DELTA
DELTA RESW 1

5
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Expressions
Most assemblers allow the use of expressions. Each such expression must be evaluated by the
assembler to produce a single operand address or value.
Expressions are classified as either absolute expressions or relative expressions.

Relative: means relative to the beginning of the program. Labels on instructions and data areas,
and references to the location counter value, are relative terms.
Absolute: means independent of program location. A constant is an absolute term.

Note: A symbol whose value is given by EQU (or some similar assembler directive) may be
either an absolute term or a relative term depending on the expression used to define its value. If
relative terms occur in pairs and the terms in each such pair have opposite signs, then the
resulting expressions are absolute expressions. None of the relative terms may enter into a
multiplication or division operation.
A relative expression is one in which all of the relative terms except one can be paired as
described above; the remaining unpaired relative term must have a positive sign.

Example: 107 MAXLEN EQU BUFEND-BUFFER

Both BUFEND and BUFFER are relative terms, each representing an address within the
program. However, the expression represents an absolute value: the difference between the two
addresses which is the length of the buffer area in bytes.

Example: BUFEND + BUFFER, 100 - BUFFER, or 3×BUFFER represent neither absolute


values nor locations within the program. Because such expressions are very unlikely to be of any
use, they are considered errors.
To determine the type of an expression, we must keep track of the types of all symbols defined
in the program. With this information, the assembler can easily determine the type of each
expression used as an operand and generate Modification records in the object program for
relative values.
SYMTAB
Name Value
Symbol Type Value
COPY 0
RETADR R 30 FIRST 0
BUFFER R 36 CLOOP 6
ENDFIL 1A
BUFEND R 1036 RETADR 30
MAXLEN A 1000 LENGTH 33
BUFFER 36
BUFEND 1036
MAXLEN 1000
LITTAB RDREC 1036
RLOOP 1040
C'EOF' 454F46 3 002D EXIT 1056
X'05' 05 1 1076 INPUT 105C
WREC 105D
WLOOP 1062

6
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Program Blocks

Program blocks are referred to be segments of code that are rearranged within a single object
program unit, and control sections to be segments that are translated into independent object
program units. Program blocks allow the generated machine instructions and data to appear in
the object program in a different order by Separating blocks for storing code, data, stack, and
larger data block.

Fig shows our example program, as it might be written using program blocks.

Block name Block number Address Length


default 0 0000 0066
CDATA 1 0066 000B
CBKLS 2 0071 1000

In this case Three blocks are used:


1. The first (unnamed) program block contains the executable instructions of the program.
2. The second (named CDATA) contains all data areas that are a few words or less in length.
3. The third (named CBLKS) contains all data areas that consist of larger blocks of memory.

The assembler directive USE indicates which portions of the source program belong to the
various blocks.
The beginning of program begins Default block (unnamed)
Line 92 signals the beginning of CDATA
Line 103 begins the CBLK block
Line 123 resumes Default block
Line 183 resumes CDATA
Line 208 resumes Default block
Line 252 resumes CDATA

The assembler accomplishes this logical rearrangement of code by maintaining, during Pass 1, a
separate location counter for each program block. Thus each label in the program is assigned an
address that is relative to the start of the block that contains it. At the end of Pass 1, the latest
value of the location counter for each block indicates the length of that block. The assembler can
then assign to each block a starting address in the object program (beginning with relative
location 0).

For code generation during Pass 2, the assembler needs the address for each symbol relative to
the start of the object program (not the start of an individual program block). This is easily found
from the information in SYMTAB. The assembler simply adds the location of the symbol,
relative to the start of its block, to the assigned block starting address.

7
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Fig 2.12

8
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Fig 2.12 shows this process applied to our sample program. Notice that the symbol MAXLEN
(line 107) is shown without a block number. It is an absolute symbol.

Consider an Example:

0006 0 LDA LENGTH 032060

SYMTAB shows the value of the operand (LENGTH) as relative location 0003 within
program block 1 (CDATA). The starting address for CDATA is 0066. Thus the desired target
address for this instruction is 0003+0066=0069.

Example of Address Calculation

20 0006 0 LDA LENGTH 032060

The value of the operand (LENGTH)


Address 0003 relative to Block 1 (CDATA)
Address 0003+0066=0069 relative to program When this instruction is executed
PC = 0000 (starting addr. Of default block) + 0009
disp = 0069 – 0009 = 0060

opcode n i x b p e disp
000000 1 1 0 0 1 0 060

Label name Block number Address Flag


Length 1 0003

Object Program
It is not necessary to physically rearrange the generated code in the object program. The
assemblers just simply insert the proper load address in each Text record. The loader will load
these codes into correct place.

H^COPY ^000000^001071
T^000000^1E^172063^4B2021^032060^290000^332006^4B203B^3F2FEE^032055^0F2056^01000
3
T^00001E^09^0F2048^4B2029^3E203F
T^000027^1D^B410^B400^B440^75101000^E22038^332FFA^DB2032^A004^3320085^57A02FB
850
T^000044^09^3B2FEA^13201F^4F0000
T^000006^01^F1
T^00004D^19^B410^772017^E32031B^332FFA^53A016^FD2012^B850^3B2FEE^4F0000
T^000006^04^454F46^05
E^000000

9
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Program Blocks Loaded in Memory

Not present
in object program

Control Sections:

A control section is a part of the program that maintains its identity after assembly; each
control section can be loaded and relocated independently of the others. Different control
sections are most often used for subroutines or other logical subdivisions. The programmer can
assemble, load, and manipulate each of these control sections separately.

Because of this, there should be some means for linking control sections together. For
example, instructions in one control section may refer to the data or instructions of other control
sections. Since control sections are independently loaded and relocated, the assembler is unable
to process these references in the usual way. Such references between different control sections
are called external references.

The assembler generates the information about each of the external references that will
allow the loader to perform the required linking. When a program is written using multiple
control sections, the beginning of each of the control section is indicated by an assembler
directive
– assembler directive: CSECT

10
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

The syntax
secname CSECT
– separate location counter for each control section

Control sections differ from program blocks in that they are handled separately by the assembler.
Symbols that are defined in one control section may not be used directly another control section;
they must be identified as external reference for the loader to handle. The external references are
indicated by two assembler directives:

EXTDEF (external Definition):


It is the statement in a control section, names symbols that are defined in this section but
may be used by other control sections. Control section names do not need to be named in the
EXTREF as they are automatically considered as external symbols.

EXTREF (external Reference):


It names symbols that are used in this section but are defined in some other control
section. The order in which these symbols are listed is not significant. The assembler must
include proper information about the external references in the object program that will cause the
loader to insert the proper value where they are required.

11
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

12
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Handling External Reference

Case 1

15 0003 CLOOP +JSUB RDREC 4B100000


 The operand RDREC is an external reference.
o The assembler has no idea where RDREC is
o inserts an address of zero
o can only use extended format to provide enough room (that is, relative addressing
for external reference is invalid)
 The assembler generates information for each external reference that will allow the loader
to perform the required linking.

Case 2

190 0028 MAXLEN WORD BUFEND-BUFFER 000000

 There are two external references in the expression, BUFEND and BUFFER.
 The assembler inserts a value of zero
 passes information to the loader
 Add to this data area the address of BUFEND
 Subtract from this data area the address of BUFFER

Case 3

On line 107, BUFEND and BUFFER are defined in the same control section and the expression
can be calculated immediately.

107 1000 MAXLEN EQU BUFEND-BUFFER

13
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Object Code for the example program:

14
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

The assembler must also include information in the object program that will cause the loader to
insert the proper value where they are required. The assembler maintains two new record in the
object code and a changed version of modification record.

A define record gives information about the external symbols that are defined in this control
section, i.e., symbols named by EXTDEF.

Define record (EXTDEF)


 Col. 1 D
 Col. 2-7 Name of external symbol defined in this control section
 Col. 8-13 Relative address within this control section (hexadecimal)
 Col.14-73 Repeat information in Col. 2-13 for other external symbols

A refer record lists the symbols that are used as external references by the control section, i.e.,
symbols named by EXTREF.

Refer record (EXTREF)


 Col. 1 R
 Col. 2-7 Name of external symbol referred to in this control section
 Col. 8-73 Name of other external reference symbols

The new items in the modification record specify the modification to be performed: adding or
subtracting the value of some external symbol. The symbol used for modification my be defined
either in this control section or in another section.

Modification record
 Col. 1 M
 Col. 2-7 Starting address of the field to be modified (hexadecimal)
 Col. 8-9 Length of the field to be modified, in half-bytes (hexadecimal)
 Col 10 Modification flag (+/-)
 Col.11-16 External symbol whose value is to be added to or subtracted from
the indicated field.

15
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

The object program is shown below. There is a separate object program for each of the
control sections. In the Define Record and refer record the symbols named in EXTDEF and
EXTREF are included.

In the case of Define, the record also indicates the relative address of each external
symbol within the control section.

For EXTREF symbols, no address information is available. These symbols are simply
named in the Refer record.

16
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Handling Expressions in Multiple Control Sections:

The existence of multiple control sections that can be relocated independently of one
another makes the handling of expressions complicated. It is required that in an expression that
all the relative terms be paired (for absolute expression), or that all except one be paired (for
relative expressions).

When it comes in a program having multiple control sections then we have an extended
restriction that:

 Both terms in each pair of an expression must be within the same control section
o If two terms represent relative locations within the same control section, their
difference is an absolute value (regardless of where the control section is located.
 Legal: BUFEND-BUFFER (both are in the same control section)

o If the terms are located in different control sections, their difference has a value
that is unpredictable.
 Illegal: RDREC-COPY (both are of different control section) it is the
difference in the load addresses of the two control sections. This value
depends on the way run-time storage is allocated; it is unlikely to be of
any use.

 How to enforce this restriction


o When an expression involves external references, the assembler cannot determine
whether or not the expression is legal.
o The assembler evaluates all of the terms it can, combines these to form an initial
expression value, and generates Modification records.
o The loader checks the expression for errors and finishes the evaluation.

ASSEMBLER DESIGN
Here we are discussing
o The structure and logic of one-pass assembler. These assemblers are used when it is
necessary or desirable to avoid a second pass over the source program.
o Notion of a multi-pass assembler, an extension of two-pass assembler that allows an
assembler to handle forward references during symbol definition.

One-Pass Assembler

The main problem in designing the assembler using single pass was to resolve forward
references. We can avoid to some extent the forward references by:
 Eliminating forward reference to data items, by defining all the storage reservation
statements at the beginning of the program rather at the end.
 Unfortunately, forward reference to labels on the instructions cannot be avoided.
(forward jumping)
 To provide some provision for handling forward references by prohibiting forward
references to data items.

17
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

There are two types of one-pass assemblers:


1. One that produces object code directly in memory for immediate execution (Load-and-go
assemblers).
2. The other type produces the usual kind of object code for later execution.

Load-and-Go Assembler

 Load-and-go assembler generates their object code in memory for immediate execution.
 No object program is written out, no loader is needed.
 It is useful in a system with frequent program development and testing
o The efficiency of the assembly process is an important consideration.
 Programs are re-assembled nearly every time they are run; efficiency of the assembly
process is an important consideration.

Forward Reference in One-Pass Assemblers: In load-and-Go assemblers when a forward


reference is encountered:

 Omits the operand address if the symbol has not yet been defined
 Enters this undefined symbol into SYMTAB and indicates that it is undefined
 Adds the address of this operand address to a list of forward references associated with
the SYMTAB entry

18
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

 When the definition for the symbol is encountered, scans the reference list and inserts the
address.
 At the end of the program, reports the error if there are still SYMTAB entries indicated
undefined symbols.
 For Load-and-Go assembler
o Search SYMTAB for the symbol named in the END statement and jumps to this
location to begin execution if there is no error

After Scanning line 40 of the program:

40 2021 J` CLOOP 302012

The status is that upto this point the symbol RREC is referred once at location 2013, ENDFIL at
201F and WRREC at location 201C. None of these symbols are defined. The figure shows that
how the pending definitions along with their addresses are included in the symbol table.

Fig : object code in memory and symbol table entries for the program after scanning line 40.

19
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

The status after scanning line 160, which has encountered the definition of RDREC and
ENDFIL, is as given below:

If One-Pass needs to generate object code:

 If the operand contains an undefined symbol, use 0 as the address and write the Text
record to the object program.
 Forward references are entered into lists as in the load-and-go assembler.
 When the definition of a symbol is encountered, the assembler generates another Text
record with the correct operand address of each entry in the reference list.
 When loaded, the incorrect address 0 will be updated by the latter Text record containing
the symbol definition.

20
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Object Code Generated by One-Pass Assembler:

Multi_Pass Assembler:

 For a two pass assembler, forward references in symbol definition are not allowed:
ALPHA EQU BETA
BETA EQU DELTA
DELTA RESW 1
o Symbol definition must be completed in pass 1.
 Prohibiting forward references in symbol definition is not a serious inconvenience.
o Forward references tend to create difficulty for a person reading the program.

Implementation Issues for Modified Two-Pass Assembler:

Implementation Issues when forward referencing is encountered in Symbol Defining statements:


 For a forward reference in symbol definition, we store in the SYMTAB:
o The symbol name
o The defining expression
o The number of undefined symbols in the defining expression
 The undefined symbol (marked with a flag *) associated with a list of symbols depend on this
undefined symbol.
 When a symbol is defined, we can recursively evaluate the symbol expressions depending on
the newly defined symbol.

21
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Multi-Pass Assembler Example Program

Multi-Pass Assembler (Figure 2.21 of Beck): Example for forward reference in Symbol Defining
Statements:

22
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

23
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI
SYSTEM SOFTWARE

Questions

1. Enlist the various assembler features that are m/c dependent and m/c independent. Explain
any one of them each.
2. In a two pass assembler, list the different data bases used in each pass. Explain the contents
and uses of each data base.
3. Compare a two pass assembler with a single pass assembler. How forward references are
handled in one pass assembler?
4. What is LITORG? When it is used? Explain with an example.
5. When is multi-pass assembler required? Show step by step procedure to evaluate the
following statements. Show the symbol table after each scan.
1. HALFSZ EQU MAXLEN/2
2. MAXLEN EQU BUFEND-BUFFER
3. PREVBT EQU BUFFER-1
4. BUFFER RESB 4096
5. BUFEND EQU *
6. Explain the need for BASE and NOBASE directives with examples.
7. Explain program relocation. Also explain how the problems of relocation are solved.
8. What is a program block? How multiple program blocks are handled by assemblers?
9. What are the different ways of specifying an operand value in a source statement? Give their
formats. (12M)
10. Compare a two-pass assembler with a single pass assembler. How forward references are
handled in one-pass assembler? (10M)
11. What is the difference between literal and immediate operand. How does the assembler
handle the literal operands? (4M)
12. Explain the following assembler directives with example each:
(i) EQU (ii) BASE (iii) ORG (iv) USE (v) NOBASE (5M)
13. Write short notes on multi pass assemblers. (5M)
14. Give the difference between program blocks and control sections and explain in detail
processing of control sections. (10M)
15. With required data structures & processing logic, explain the implementation of literals
within an assembler. (10M)
16. Give the format for the following record necessary to obtain object code:
i. Header record ii. Text record iii. Refer record
iv. Define record v. Modification record (revised )
v. End record. (12M)

24
Dr. C.K. SRINIVAS Professor DEPT OF CSE BITM, BELLARI

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy