0% found this document useful (0 votes)
7 views54 pages

System Software Ch3

Uploaded by

psychoboy6232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views54 pages

System Software Ch3

Uploaded by

psychoboy6232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

3 Assemblers

Syllabus
Elements of assembly language programming, Design afthe assembler, Assembler design criteria,
Types of assemblers, Two-pass assemblers, One-pass assemblers, Single pass assembler for intel
r86, Algorithm of single pass assembler, Multi-pass assemblers, Advanced assembly process,
Variants of assemblers design of two pass assembler.

Contents
3.1 Basic Assembler Functions Summer-19, Marks 3
3.2 Assembly Language Programming
3.3 Simply Assembly Scheme Winter-17, Marks 3
3.4 Pass Structure of Assemblers Summer-12, Winter-15,17,
Summer-18, Marks 7

3.5 Design of a Two Pass Assembler. Summer-17, 18, 19,


.
Winter-17, 18, 19 Marks 7
3.6 ETor Reporting
3.7 Database Summer-19, Marks 4
3.8 Forward Reference Summer-18,19, Marks 3
3.9 Literals Winter-17, Marks 3
3.10 Expressions
3.11 MASM Assembler
3.12 Multiple Choice Questions
3.13 Short Questions and Answers

(3 - 1)
System Sofware 3-2 Assemblers

3.1 Basic Assembler Functions GTU Summer-19|

The assembler is
responsible for Assembler Object Linker
Source Code
translating the assembly program
language program into
machine code. When Executable
code
the source language is
essentially a symbolic
representation for a Loader
numerical machine
language, the translator Fig. 3.1.1 Role of assembler
is called an assembler
and the source language is called an assembly language.
A pre assembly language is a language in which each statement produces exactly
one machine instruction. Fig. 3.1.1 shows the role of assembler.

Assembler functions
1. Translate mnemonic opcodes to machine language.
2. Convert symbolic operands to their machine addresses.
3. Build machine instructions in the proper format.
4. Convert data constants intò machine represenation.
5. Error checking is provided.

6. Changes can be quickly and easily incorporated with a reassembly.

7. Variables are represented by symbolic names, not as memory locations.


Assembly language statements are written one per line. A machine code program
thus consists of a sequence of assembly language statements, where each statement
contains a mnemonic. Each line of an assembly language program is split into four
fields, as shown below.
LABEL OPCODE OPERAND COMMENTS

Label : It is an identifier and optional field. Labels are used extensively in programs
to reduce reliance upon programmers remembering where data or code is
located. The maximum length of label differs between assemblers. Some
accept upto 32 characters long, other only four characters. A label when
. Z)
declared is suffixed by a colon and begins with a valid character (A
For example
START LDAA # 24 H

TECHNICAL PUBLICATIONs - An up thrust for knowledge


System Software 3-3 Assemblers

Here, the label START is equal to


the address of the instruction
LDAA # 24 H
An advantage of using lables is
that inserting or code
statenents do not necessitate reworking actual machine rearranging
instructions.
Opcode
:
This field contains a mnemonic. Opcode stands for operation
code i.e. a
machine code instruction. The opcode may also require
additional
information i.e. operands.
:
Operand This field consists of additional infornmation or data that the opcode requires.
In certain types of addressing modes, the operand is used to
specity
-Constants or labels
-
Immediate data
-
Data contained in another accumulator or register.
- An
address
Fig. 3.1.2 shows the machine instruction format.

Sign Opcode Register Main memory operand


operand
Fig. 3.1.2 Machine instruction format

Review Question

1. Define assembler. List out tasks perforned during different phase of assembler.
GTUESummer-19, Marks 3

3.2 Assembly Language Programming


A feature of assembly language is its machine dependency. It is low level
programming language. Features are as follows
:

1.
Mnemonic operation codes
2. Symbolic operands

3. Data declaration
Assembler function is translating mnemonic operation codes to their machine
language equivalents. Assigning machine addresses to symbolic labels.
Mnemonics are predefined assembly-language names for machine instruction,
pseudo-ops, directives and data allocation statements. Mnemonics are not
case-sensitive. It also enables the assembler to provide helpful diagnostics.

-
TECHNICAL PUBLICATIONS An up thrust for knowledge
System Sofware 3-4 Assemblers

Mnemonic are a set of readily memorized programming instruction that are later
translated into pure machine code by assembler.
An opcode is short for Operation Code'. An opcode is a single instruction that can
be executed by the CPU. In machine language it is a binary or hexadecimal
value
Such as 'B6' loaded into the instruction register,

In assembly language mnemonic form an opcode is a command such as MOV or


ADD or JMP. For example -
MOV, AL, 34 h
The opcode is the MOV instruction. The other
parts`are called the 'operands.
Operands are manipulated by the opcode. In this example, the
operands are the
register named AL and the.value 34 hex. The MOV
instruction moves a value
between a memory word and a register.
• Each line of a progranm is one of the
following :
1. an instruction 2. an assembler directive
3. comment
White space (between symbols) and case are ignored. Comments (beginning
with "") are also ignored.
An instruction has the following format :
Label Öpcode Operands Comments

Optional Mandatory
• Following is the simple assembly language :

Program to multiply a number by theconstant 4

ORIG x2000
LD R1. Four
ILD R2. NUM
AND R3, R3, #0 Clear R3,
The inner loop
AGAIN ADD R3. R3. R2
ADD R1, R1.
#-1 R1 keeps trackS of
BRp AGAIN the íteration
HALT
NUM BLKW1
Eout FILL x0006

END

An up thrust for knowledge


TECHNICAL PUBLICATIONS
Sofiwere 3
-5 Assemblers
System

Anything after a semicolon is a commernt in the assembly program. It is ignored


by assembler. It is used by programmer to document or understand programs.

3.2.1 Assembly Language Statements


t consists of three types of statemernts :

1. Imperative 2. Declaration 3. Assembler directives

Imperative statement: It specify an action to be performed during the execution of the


assembled program. Imperative programs focus on how to solve a problem popular
imperative languages such as C and Pascal are based on side effects on memory. Each
imperative statement translates into one machine instruction.

Declaration statement: It focus on what the problem is and leave the solution
mechanism up to the language implementation. Within the declaratíve paradigm, most
notable are the functional languages uch as Lisp. Declarative languages are typically
quite abstract and hence can be harder to implemernt efficierntly.

Assembler Directive
Assembler directives instruçt the assembler to perform certain actions during the
assembly of a program. They can be used to declare variables, create storage space for
results, to declare constants., The following assembler directives are used in the
program :
1. START - Specify name and starting address for the program.
START< constant >
-
2. END Indicate the end of the source program and specify the first executable
instruction in the program.
END [< operand spec >]
3. BYTE - Generate character or hexadecimal constant, occupying as many bytes as
needed to represent the constant.
4. WORD - Generate one word integer constant.

3.2,2 Advantages of Assembly Language


1. Reduced errors.
2. Faster translation times.
3. Changes could be made easier and faster.

-
TECHNICAL PUBLICATIONS An up thrust for knowledge
System Sofware 3-6 Assemblers

B.23 Disadvantages
1. Many instructions are required to achieve small tasks.
2. Source programs tend to be large and difficult to follow.
3. Programmer requires
knowledge of the processor architecture and instruction set.
4. Programs are machine
dependent, requires complete rewrites if the hardware is
changed.

33 Simply Assembly Scheme


GTUWinter-17
Following steps are used for the design
specification of an assembler
1. Find out the information necessary
to perform a task.
2. Specify data structure
and define format of data structure.
3. To obtain and maintain
the information, determine required processing
4. To perform the
task, determine required processing.

3.31 Synthesis Phase


Let us consider the following assembly statement
MOVER CREG, TWO
Corresponding to the above statement following information,
synthesis the machine
instruction.
-1)
The "TWO" is name of the memory word address.
2) MOVER mnemonic is machine operation code.
Address of the memory word information is depends on the source program.
So it
is available by the analysis phase. The second information i.e.
machine operation
code does not depends on the source program. It only depends on the
assembly
language. So the synthesis phase can determine this information for its own use.
Data structure used in synthesis phase
1. Symbol table
2. Mnemonics table
Symbol table contains name and address field. Symbol table is built by the
analysis phase. It also contains flags to indicate errors.
Mnemonic table contains mnemonic and opcode.
The synthesis phase uses these tables to obtain the machine address with which a
name is associated and the machine opcode corresponding to a
mnemonic.

TECHNICAL PUBLICATIONS - An up thrust for knowledge


System Sofware 3-7 Assemblors

B3.2 Analysis Phase


Primary fnction is lo build a ymbol table. For symbol table it must determine
tlhe addresses witlh which lhe symbolic names used in a program are associated.
Sonne addresses are directly delcrmincd.
Considcr the following program.

Assembly language Machine language


START 200
READ M 200) + 09 0 212
MOVER BREG, TWO 201) + 04 2
214
202)+ 05 2 215
MOVEM BREG, DATA 203) + 03 2 215
Here MULT BREG, DATA 204) + 04 3 215
MOVER CREG, DATA 205) + 013 214
SUB CREG, TWO 206) +
05 3 215
MOVEM CREG, DATA 207) + 06 3
212
COMP CREG, M 208) + 07 2
203
BC LE, Here 209) +
05 2 203
MOVEM BREG, OUTPUT 210) + 10 0
213
PRINT OUTPUT 211)+ 00 0 000
STOP 212)
M DS 213)

OUTPUT DS 214)+ 00 0 001

TWO DC 215)
DATA DS 1

END

Memory allocation means the fixing the address of the assembly language
statement. Suppose we want to fix the memory address of M, then also fix the
address of remaining instructions. Location counter is a data structure used to
implement the memory allocation. Location Counter (LC) is always made to
contain the address of the 'next memory word in the target program. LC is
initialised to the constant specified in the START statement.
The detail procedure is given below.
1. In the analysis phase, it reads the label of the assembly statement. The label

and the contents of LC is enters as the new entry in the symbol table.

TECHNICAL PUBLICATIONS - An up thrust tor knowledge


Systom Softwaro 3-8 Assemblers

2. It finds the number of memory words required


by the assembly staternent
and updates the LC contents.
3. Location counter points to the next memory
word in the target program.
4. Analysis phase needs to know length of different
instruction for updating the
LC.
5. These information are provided
and depends upon the assembly language.
So the mnemonics table can
be extended to include this information a new
field called length. in

3.3.3 Data Structure of the ASsembler


Fig. 3.3.1 shows the data structure used by
the analysis and synthesis phases.

Source
program

Analysis
phase
Mnemonics Table
Symbol table
MnemonicsOpcode|Length Gontrol Transfer
Symbol Address

ataacces access
Synthesis
phase
ata

Target
program

Fig. 3.3.1 Data structure with analysis and synthesis phase


• Mnemonic table is a fixed table
and only used in the analysis and synthesis
phases.
Symbol table is constructed during analysis phase and used during synthesis.

TECHNICAL PUBLICATIONS -An up thrust for knowledge


System
Software
3-9 Assemblers

Analysis phase task


Separate the label, mnemonic opcode and operarnd fields
of a statement.
2. If statement contains label then the pair
of (symbol, < LC conternt >) is entered in
the symbol table.
a
Validity of the mnemonic opcode is checked through a look-up
table by using
mnemonics table.
4. Location counter processing is done.

Synthesis phase task


1, Read the machine opcode corresponding to the mnemonic from the mnemonics
table.
2. Get an address of a memory operand from the symbol table.
3. Synthesize a machine instruction.

Review Question

1. Explain the data structure of single pas assembler. GTU: Winter-17, Marks 3

3.4 Pass Structure of Assemblers GTU : Summer-12,18, Winter-15,17

Most of the assembler makes two passes over the source program. Pass one does
little more than scan the source program for label definitions and assign addresses.
The second pass performs most of the actual translation.
Assembler must process the assembler directives statements. These statements are
not translated into machine instructions. They provide instructions to the
assembler itself.
The assembler directives START specifies the. starting memory address for the
object program and END marks the end of the program.
At last, the assembler nmust write the generated object code onto some output
device. This object program will later be loaded into memory for executation.
Object program contains three types of records.
i) Header ii) Text ii) End
Header contains the program name, starting address and length.
Text record contains the translated instructions and data of the program, together
with an indication of the addresses where these are to be loaded.
End record marks the end of the object program and specifies the address in the
program where execution is to begin.

TECHNICAL PUBLICATIONs -An up thrust for knowledge


System Software 3- 10 Assemblers

Forward references in programs cannot be resolved in a


single pass. So we
more than one pass through the source code. need

3.4.1 Two Pass Translation

Forward reference is handle easily in


two pass translation of the assembly
Source program
language program. Location counter
processing is performed in the first
pass and symbols defined in the
program are entered into the symbol Pass 1
table.
Fig. 3.4.1 shows two pass structure. ransfer

The second pass synthesizes the target


Data" Intermediate
from using the address information Control

structure. COde
found in the symbol table.
Functions of the two passes of. a a
Data
access
simple assembler is as follows.
1. Assign addresses to all statements
Pass II
in the program.
2. Save the values assigned to all
labels for use in Pass 2.
Target program
3. Perform SOme processing of
assembler directives. Fig. 3.4.1 Two pass assembly

Pass 2 (Assemble instructions and generate object program)


1. Assembler instructions (i.e. translating operation codes and looking up address)
2. Generates data values defined by BYTE, WORD etc.
3. Perform processing of assembler directives not done during Pass I.

4. Write the object program and the assembly listing.

3.4:2 Single Pass Translation


In two pass translation, LC processing and construction of the symbol table is
done. Back patching is used to solve the problem of forward reference. The
operand field of an instruction containing a forward reference is left blank initially.
The address of the forward reference symbol is put into this field when its
definition is encountered. For example,

TECHNICAL PUBLICATIONS - An up thrust for knowledge


System Sofiware 3- 11 Assemblers

MOVER BREG, ONE


In the above statement, ONE is a forward reference. The memory
location 101
contains the instruction opcode and address of BREG. Table of
Incomplete
Instructions (TIl) is used for inserting the second operand address. TII contains the
(< instructionaddress >, < symbol > )
e Whern assembler reads END statement, the symbol table
would contain the
addresses of all symbols defined in the sorce program and table of 'incomplete
instruction would contain information describing all forward references. The
assembler can now process each entry in TI to complete the concerned instruction.
Advantages
1. One pass assembler
loads the machine instructions directly in memory.
2. It is desirable to avoid a
second pass over the source program.
Disadvantages
1. It carnnot resolve forward reference of data symbols.
2. Forward jump to instruction items can not be easily eliminated.

3.4.3 Assembler Algorithm and Data Structure

Simple assembler uses two major internal data structures.


a) The COperation Code Table (OPTAB)
b) The Symbol Table (SYMTAB)
OPTAB contains the mnemonic operation and its machine language equivalents. It
also contains instruction format and length.
OPTAB is used to look up mnemonic operation codes and translate them to their
machine language equivalents. SYMTAB is used to store values assigned to lables.
Location counter (LOCCTR) is also required. LOCCTR is a variable used to help in
the assignment of addresses. It is initialized to the beginning address specified in
the START statement.
After processing every source statement, the length of the assembled instruction or
data to be generated is added to LOCCTR
OPTAB must contain the mnemonic operation code and its machine language
equivalent. This table also contains information about instruction format and
length for complex assembler.
• In Pass 1, OPTAB is used to look up and validate operation codes in the source
programn. In Pass 2 it is used to translate the operation codes to machine language.

-
TECHNICAL PUBLICATIONS An up thrust for knowledge
System Software 3- 12 Assemblers

organized as a hash table, with mnemonic operation code as


OPTAB is usually
the
key. OPTABis a static table, i.e. entries are not normally added to or deleted fro
it.
SVMTAB incudes the name and value for each label in the source program
together with flags to indicate eror conditions. It also contains information abe
the data area or instruction labeled.
During Pass 1, labels are entered into SYMTAB as they are encountered in the
source program along with their assigned address.
During Pass 2, symbols used as operands are looked up in SYMTAB to obtain the
addresses to be inserted in the assembled instructions.
SYMTAB is also organized as a hash table for efficiency
of insertion and retrieval.
Entries are rarely deleted from this table.
• It is possible for both passes
of the assembler to read the original source program
as input. Information such as
location counter values and error flag statement can
be communicated between the two passes. For this reason,
Pass 1 usually writes an
intermediate file that contains each source statement
together with its assigned
address, error indications etc. This file is used as the input to
Pass 2.
Algorithm for Pass 1 of Assomblor

Pass 1:
start
read first input line
f OPCODE START then
begin
save nunber of OPERAND starting addresa
initialize location counter to starting address
write line to intermediate file
read next line
end (if START)
else
initialize location counter to zero
Whlle OPCODE END' do
begin
if line is comment line then
begin
if symbol in the LABEL field then
begin
search symbol table for LABEL
if found then
set emOI lag
else

TECHNICAL PUBLUCATIONS - An up
thrust for knowtedge
Soflwaro
Syslon 3- 13 Assemblers

insert (LABEL, 1ocation counter) into symbol table


ond {symbol}
search operation code table for OPCODE
1f found then
add 3 instruction length to location counter
elso if OPCODE = WORD' then
add 3 and number of OPERAND to location counter
olse if OPCODE RESW then
add and number of OPERAND to locatíon cOunter
3
else if OPCODE = BYTE then
begin
find length of Constant
add length to location counter
end
else
set error flag for invalid operation code
end (comment}
write line to intermediate file
read next input line
end
wite last line to intermediate file
save (l0cation conter starting address) as program length
end {Pass 1}
Fig. 3.4.2 (a) Algorlthm for pass 1
of assembler
Pass 2 :
start
read first input line (from intermediate file}
=
iI OPCODE 'START then
begin
write listing line
read next input line
end (if START}
write Header record ta object program
irst Text record is initialize
while OPCODEEND' do
begin
if this is not a comment line then
begin
search operation code table for OPCODE
if found then
begin
if symbol in OPERAND field then
begin
search symbol table for OPERAND
if found then
assign symbol value as operand address and store
else
TECHNICAL PUBLICATIONS- An up thrust for knowledge
3-14 Assemblers
System Sofvare
begin
store 0 as operand address
set error flag for undefined symbol
end
end{if symbol}
else
store 0 as operand address
assemble the object code instruction
end {found}
else if OPCODE BYTE' or WORD' then
=

convert constant to object code


if object code will not fit into the current Text record then
begin
WIite Text record to object program
new Text record is initialize
end
add object code to Text record
end {if not comment}
write listing line
read next input line
end {while not END}
WIite last Text record to object program
WIite End record to object program
Write last listing line
end {Pass 2}
Fig. 3.4.2 (b) Algorithm for Pass 2 of assembler
a
Example 3.41 Consider following assembly language program : Show (i
() Contents of
Symbol
Table (ii) Intermediate codes using Variant representation.
I
FGTU: Summer-12, Winter-15, Marks 7

START 101
READ N
MOVER BREG, ONE
MOVEM BREG, TERM
AGAIN
MULT BREG, TERM
MOVER: CREG, TERM
ADD CREG, ONE
MOVEM CREG, TERM
COMP CREG, N
BC LE, AGAIN
MOVEM BREG, AGAIN
PRINT RESULT

TECHNICAL PUBLICATIONS- An up thrust for knowledge


Sohwere 3- 15
9stem Assemblers

N STOP
RESULT DS
ONE DS 1

TERM DC 1

DS 1

END

Instuction opcode : STOP - 00, ADD - 01, MULT - 03, MOVER 04, MOVEM - 05,
-

cOMP - 06, BC - 07, READ - 09, PRINT - 10, LE - 02


Assembler irectines : START - 01, END - 02 Declarationstatements : DC - 01, DS - 02
- -
Rerister code : BREG 02, CREG 03
Solution:
D Content of Symbol table

Symbol Address
1
ONE 102
2 TERM 103
3 N 108
4 RESULT 110

:
i)
Intermediate code using Variant I representation

(AD, 01) (C, 101)

(1S,09) (S, 01)

(IS,04) (01) (S, 01)

(IS,05) (01) (S, 02)

(IS,03) (01) (S, 02)

(1S,06) (01) (S, 03)

(IS,01) (01)(S, 02)

(IS,07) (01) (S, 04)

(IS,08) (01) (S, 01)

(1S,08) (01) (S, 05)

(IS, 09) (01), (S, 02)

TECHNICAL PUBLICATIONS- An up thrust for knowedge


System Sofware Assemblery
3-16

(1S, 10) (S, 05)

(IS, 00)

(DL, 02) (C, 01)

(DL, 02) (C, 01)


(DL, 02) (C, 01)

(DL, 02) (C, 01)

(AD, 02)

3.4.4 Table of Incomplete Instruction


A table of instruction containing forward references is maintained separately called
Table of Incomplete Instruction (TI). This table can be used to ill-up the
addresses in incomplete instruction.
The address of the forward reference symbol is put into this field when it
definition is encountered. For example : MOVER BREG, ONE
• In the above statement, ONE is a forward reference. The memory location 101
contains the instruction opcode and address of BREG.
• Table of Incomplete Instructions (TI) is used for inserting the second operand
address.
TII contains the (< instructionaddress >, < symbol >)
When assembler reads END statement, the symbol table would contain the
addresses of all symbols defined in the source program and table of incomplete
instruction would contain information describing all forward references.
The assembler can now procesS each entry in TII to complete the concerned
instruction.
Forward reference is handle easily in two pass translation of the assembly
language program. Location counter processing is performed in the first pass and
symbols defined in the program are entered into the symbol table.

Review Questions

1. Draw a flowchart of maintaining Table of Incomplete Instruction


(TII) in assembler.
GTU : Winter-17, Marks 3
2 Describe following data structures : OPTAB, SYMTAB,
LITTAB and POOLTAB.
4
GTU: Summer-18. Marks
TECHNICAL PUBLICATIONS - An up
thrust for knowledge
System Sofware 3- 17 Assernblers

R5 Design of a Two Pass Assembler


GTU: Summer-17, 18, 19, Winter-17, 18, 19
Two pass assembler perform the following tasks.

Pass 1

1. Separate the symbol, mnemonic opcode and operand fields


used in the program.
2. Construct the symbol table
3. Perform LC processing
4. Construct intermediate representation.
Pass 2
1. Synthesize the target program.
Pass 1 performs analysis of the source program and synthesis of the intermediate
representation.
Pass 2 processes the intermediate representation to synthesize the target program.

3.5.1 Advanced Assembler Directive


1. ORIGIN (ORG)
• This can be used to indirectly assign values to symbols.
When this statement is encountered during assembly of a program, the assembler
resets its location counter to the specified value.
The ORG statement will thus affect the values of all labels defined until the next
ORG.
Normally when an ORG without specified value is encounter, the previously
saved location counter value is restored.
Syntax of the ORIGIN directive is:
ORIGIN <address spec>
where <address spec> is an <operand spec> or <constant>.
This type of statement is useful when the target program does not consist of
consecutive memory words.
No Forward Reference is allowed.
For example
a. SYMBOL :
6 bytes
b. VALUE: 1
word
c. FLAGS: 2
bytes
d. LDA VALUE, X
TECHNICAL PUBLICATIONS- An up thrust for knowledge
System Software 3- 18 Assemblers

2. EQU
Most assemblers provide an assembler directive that allows the programmer to
define symbols and specify their values, for example. The general form:
symbol EQU value
One common use of EQU is to establish symbolic names that can be used for
improved readability in place of numeric values.
Anothe common use of EQU is in defining mnemonic nanes for registers. For
example:
A EQU 0
X EQU 1

L EQU 2
These statements cause the symbols A, X, L, m to be entered into SYMBOL with
...
their corresponding values 0, 1, 2,.
3. LTORG /

LTORG allows placing literals into a pool at some other location in the object
program.
Directive LTORG creates a literal pool that contains all of the literal operands used
since the previous LTORG or the beginning of the program.
Literals placed in a pool by LTORG will not be repeated in the pool at the end of
the program.
The LTORG statement permits a programmer to specify where literals should be
placed. By default, assembler places the literals after the ENDstatement.
Assembler allocates memory to the literals of a literal pool. The pool contains all
literals used in the program since the start of the program or since the last LTORG
statement.

3.5.2 Pass 1
Assembler
Following data structure is used by Pass 1
1. OPTAB - A table of mnemonic.opcodes and related information.
2. SYMTAB - Symbol table
3. LITTAB - A table of literals used in the program.
Let us consider the following assembly program
START 300
2 MOVER AREG.5 300)04.1 311
+
MOVEM AREGR1 301) 05 1317
AGAIN MOVER AREG, R1 302) +
04 1317

TECHNICAL PUBLICATIONS - An up thrust for knowledge


System Software 3- 19 Assemblers

5 MOVER
CREG,R2 303) + 05 3 318
6 ADD
CREG, 1
304? + 01 3 312
*********.

7
12 BC
ANY,TERM 310) + 07 6 314
13 LTORG
= '5 311) + 00 0 005
=1' 312) + 00 0 001
14
15 TERM SUB AREG, = 1' 314) + 02 1 319
16 BC LT, DOWN 315)+ 07 1 302
17 LAST STOP 316) 00 0
000
18 ORIGIN LOOP +2
19 MULT CREG, R2 304) + 03 3 318
20 ORIGIN LAST 1
21 R1 DS 1
e***wa*n* *****Ww. 317)
22 DOWN EQU
wn.sw**w.*.**** ***.******s******w******wv LOOP
23 R2 DS 1
318 ?
24 END
25 = '1 319) + 00 0
001
OPTAB, SYMTAB, LITTAB and POOLTAB contain the following data of the above
given program.
1. OPTAB

Mnemonic Class Mnemonic info


MOVER IS (04, 1)
DS DL R#7
START AD R# 11

2. Symbol table
Symbol Address Length
AGAIN 302 1

TERM 314 1

LAST 316
R1 317
302 1
DOWN
1
R2 318

TECHNICAL PUBLICATIONS- An up thrust for knowledge


3-20 Assemblers
System Sofware

3. LIteral Table (LITTAB)

Literal Addres8
1 =5
2 = 1

3 1
4. POOLTAB

Literal
number
#1
#3

A) OPTAB: It consists of, mnemonic opcode, class and mnemonic. info field. The class
field indicates the statement types, i.e. imperative, declaration and assembler.
B) SYMTAB : It consists of address and length.
:
C) LITTAB It consists of literal and address fields.
Processing of an assembly statement begins with the processing of its label field. If
the label contains a symbol, the symbol and the value in location counter is copied
into a new entry of symbol table.
For imperative statement: The length of the machine instruction is simply added to
the LC. Length is also entered into the SYMTAB entry of the symbol.
Declaration or assembler directive statement: The routine mentioned in the
mnemonic info field is called to perform appropriate processing of the statement. For
example, in the case of a DS statement, routine #7 would be called.
R

First pass uses LITTAB to collect all literals used in a program. Awarness of
different literal pools is maintained using the auxiliary table POOLTAB. This table
contains the literal number of the starting literal of each literal pool. At any stage,
the current literal pool is the last pool in LITTAB.

3.5.3 Intermediate Code Forms


It consists of a set of IC units. Each IC unit consisting of the following fields.
1. Address
2. Mnemonic opcode representation
3. Operand representation
TECHNICAL PUBLICATIONs - An up
thrust for knowledge
System Sofiware 3-21 Assemblers

Addres9 Opcode Operands


Fig. 3.5.1 Intermedlate code
Mnemonic field
Format of the mnemonic field is
(statement class, code)
where
Statement class = Any one of this IS, DL or AD
Code = Instruction code in machine language (for an imperative)
Code = Ordinal number within the class (for declaration)
Relation between declaration statement and assembler directives are as follows

Declaration statements Assembler directives

DC 01 START 01

DS 02 END 02

ORIGIN 03

EQU 04

LTORG 05

For imperative statement


Variant I
• Single digit number is the code used for a register. This is the first operand.

Memory operand is the second operand and represented in the following format.
(operand class, code)
where
operand class = C, S or L (any one)
(C= Constant, S = Symbol, L = Literal)

-
TECHNICAL PUBLICATIONs An up thrust for knowledge
System Sofware 3-22 Assemblers

Following program shows intermediate code for variant I

START 300 (AD 01) (C, 300)

READ R1 (IS, 09) (S, 01)


AGAIN MOVER AREG, R1 (1S, 04)(1) (S, 01)

SUB .AREG= 1 (IS, 02) (1) (L, 01)

BC GT, AGAIN (IS, 07) (4) (S, 02)


STOP (IS, 00)
R1 DS 1 (DL, 02) (C, 1)

LTORG (DL, 05)

Code field
1) For constant
a) Internal representation of the constant itself
b) e.g. START 300 is (C, 300)

2) For symbol or literal :

Contains the original number of the operand's entry in SYMTAB or LITTAB. For
example
.
Symbol xyz Literal
=
'25'
(S, 17) (L, 35)

• If the assembly statement contains symbol in the label field, then entry is made in
SYMTAB.
For example, (A, 349, 1) is the entry for symbol A. It allocate one word for A at
memory address of 349.
• For forward reference
MOVER AREG, A
A
is entered in a symbol table at number n. It is represented as (s, n) in IC.
A
this point, the address and length fields of A'sentry cannot be filled in.
Symbol table contains two types, of entry at any time, ie. defined symbols and
forward references.

TECHNICAL PUBLICATIONS-An up thrust for knowledge


System Sofwara 3-23 Assemblers

Varlant lI
.
Following is the program for intermediate code
variant J.
.It differ from variant I in the operand field of the source statements.
Processing of he operand fields is essential to support LC
processnB n
declarative and assembler.
Operand field is processed only to identify literal reference for imperative
statements.
In literal table, literals are entered and represented as (L,
m) in C.
START 300 (AD 01)(C, 300)

READ R1 (IS, 09) R1

AGAIN MOVER AREG, R1 (IS, 04) AREG, R1

SUB AREG, ='1 (IS, 02) AREG, (L, 01)

BC GT, AGAIN (IS,07) GT, AGAIN

STOP (IS, 00)


R1 DS (DL, 02) (C, I)

LIORG (DL, DS)

Fig. 3.5.2 shows the memory requirement by variant I and variant II

Pass 2 Pass 1
Pass 2
Pass 1

Wastage of
memory Datastructures Data structures

Data structures Data structureS Work area Work area

Work area Work area

(b) Variant II
(a) Variant I
Fig. 3.5.2 Variant I and II

TECHNICAL PUBLICATIONS-An up thust for knowledge


System Softwaro 3-24 Assemblers

Program Intermediate code


START 100 AD#5, 100
MOVER AREG, '5 04 L # 1
MOVEM AREG, A 06 S # 1
LOOP MOVER AREG, A 04 S #1
SUM AREG, = 1
A DS 1 DL#1
BC NEXT 06
S #2
Criteria for writing the intermediate code
1. It should be easier to construct.
2. It should be compact.
3. It should minimize the further analysis.

Comparison between variant I and II


In variant I, address of symbols from table is kept in address field
and then
intermediate code is created. Code created by this way is compact.
because of
operand field in which the address is only kept in the symbol table. Pass 1
requires more memory. Pass 2 uses the memory after released by Pass 1. More
memory is wastage in variant I.
In variant II, intermediate code is not compact because Pass 1 not process the
symbols and literals. In Pass 2, symbols and literals are search in SYMTAB and
LITTAB respectively. Memory requirement by Pass 1 and Pass 2 is same. No
memory is wasted.

3.5.4 Processing of Declarations and Assembler Directives


In intermediate code, is it compulsory to represent the address of each source
statement for declarations ?
Answer of this question is "Yes".
An explicit represerntation of DS statements and assembler directives in IC is also
required ?
• Let us consider following code
START 300
ONE DS 30
TWO DC 7

TECHNICAL PUBLICATIONS -An up thrust for knowledge


3- 20 Asseiblers

t isnot rqulred to represent lhe START and D slaloment in IC. If the IC


ntains an address field, ropresentation for D atatenent and assermbler directive is
not required.
.pS statement and assembler direcllves xepresentation is required you can omit
address field.
Pass Il can deternine tlhe address for TWO only after analyzing the intermediate
eode units for the START and DS statements.

R55 Comparison between 2-Pass and Single-Pass Assembler


2-Pass assembler Single-pass assembler
A one pass assembler passes over the
source
Tyo pass assembler does two passes over the
source file. In the second pass, after the symbol file exactly once, in the same pass collecting
table is complete, it does the actual assembly the labels, resolving future references and
by translating the operations into machine doing the actual assembly.
codes and so on.
It can resolve forward references of data It cannot resolves forward references of data
symbols. symbols.
Loader is equired as object code is generated. No object program is written, so loader is not
required.
The second pass synthesizes the target form A table of instruction containing forward
using the address information found in the references is maintained separately called Table
symbol table. of Incomplete Instruction (TII).

First pass constructs an intermediate This table can be used to fill-up the addresses
representation of the source program and that in incomplete instruction.
will be used by second pass.
Address of symbol can be calculated. Only creates tables with all symbols, no
address of symbol is calculated.

Example 3.5.1 Given the source program


Start 100
3
A DS
L1 MOVER AREG, B
ADD AREG, C
MOVEM AREG, D
D EQU A+1
L2 PRINT D
ORIGIN A-1
DC '5
ORIGIN L2+1
STOP
B DC 19
END I1

-
TECHNICAL PUBLICATIONS An up thrust for knowledge
3- 26 Assemblers
System Software
pass 1:
a) Siow the contents of the symbol table at the end of
b) Explain the significance of EQU and ORIGIN statement in the
program and explain how
they are processed by the assembler,
c) Show the internediate code generated for the program. GTU Summer-18, Marks 7

:
Solution : Symbol Table

Symbol Address
100
L1 101

D 104
L2 105

C 107

110

Intermediate Code
(AD, 01) (C, 100)
(DL, 02) (C, 3)
(IS, 04) (1) (L, 01)
(IS, 01) (1) (S, 01)
(IS, 05) (1) (L, 02)
(IS, 06) (1) (L, 03)

(AD, 02)

Review. Questions

1. Explain in brief desigu of a Two Assembler. GTUSummer-17, Marks 7


2. Explain in detail any two advanced assembler directives. GTUSummer-17, Marks 7

3. Explain and compare two variants of intermediate code. GTUISummer-17,. Marks 4


4. Explain the use of internediate code with example in assembler and also
menion field of it.
4
:GTUWinter-17,,Marks
Marks 4
5. Explain in detail any twO advanced assembler directives. GTU:Summer-18;

TECHNICAL PUBLICATIONS - An up thrust for knowledge


System Sofware 3-27 Assemblers
4 Compare various internmediate code forms
for an assembler. GTUYSummer.18, Marks 3
7.
Consider the assembly program fragment
START 200
READ A
LOOP MOVER AREG, A
SUB AREG, =1
BC GT, LOOP
STOP
A DS 1
What will be the intermediate code for the above program fragment ?
What' does START directive
do ? What will be the difference if ORIGIN directive
is 1used in place of START ?
GTU:Winter-18, Marks 7
S. Consider the assembly program fragment
MOVER CREG, B
ADD CREG, ="1"
BC ANY, NEXT
LTORG
= '5'
= "1
SUB AREG = '1'
END
= '1"

) Explain LTORG directive.


i) Explain the entries in mnemonic opcodes table as per above code fragment.
ii) How table of literals will be manipulated ? GTUEWinter-18, Marks 7
9. Compare Variant and Variant II of intermediate code generation for assembler. write
1

intermediate code for Variant I and Variant II of below program fragnent.


START 200
READ A
LOOP. MOVER AREG, A

SUB AREG, ='1"


BC GT,LOOP
STOP
A DS 1
LTORG
GTUSummer-19, Marks

-
TECHNICAL PUBLICATIONS An up thrust for knowledge
System Software 3-28 Assemblers

10. List out assembler directives and explain any two advance assembler directives.
GTU: Summer-19, Marks 7
tvo pass assembler.
11. Differentiate between one pass and GTU:Winter-19, Marks 4
12. Consider the following asembly language progam and explain role of OPTAB, SYMTAB,
LITTAB, POOLTAB vitlh its contents. GTU : Winter-19, Marks 7

START 300
MOVER AREG, "=5
MOVEM AREG,R1
AGAIN MOVER AREG,R1
MOVER CREG,R2
ADD CREG,='1'
BC ANY,TERM

LTORG
='5'

TERM SUB AREG,='1'


BC LT,DOWN
LAST STOP
ORIGIN LOOP+2
MULT CREG,R2
ORIGIN LAST+1
RI DS
DOWN EQU LOOP
R2 DS 1

END
="1'
13. List out various assembler directives. Explain any three in detail. GTU Winter-19, Marks 7

3.6 Error Reporting


Error reporting is important parameter in assembler. The basic decision is to report
the error in Pass 1 or delay until Pass 2. Error reporting is related to the speed
and memory requirement of the assembler.
The advantage of producing listing in Pass 1 is that, the source program need not
be preserved till Pass 2. This conserves menmory and avoids some amount ot
duplicate processing. A listing produced in Pass 1 can report only certain errors n
the most relevant place i.e. against the source statement itself.

TECHNICAL PUBLICATIONs -
An up thrust for knowledge
Sowere
Sst 3-29 Assemblera

Following Fig. 36.1 shows error reporting in Pass 1.

Sr. No. Statement Address


START 300

MOVER AREG,R1 300

10 MVER BREG. R1 307

*#eror Invalid
*
opcode

I5 SUB BREG, R2 308


19 RI DS 309

24 R1 DC10 327
error K*
Duplicate dennition of symbol R1.

40 END

error undeffned symbol R2 In gtatement 15


Fig. 3.6.1 Pass 1 error reporting
Examples of error
1) Syntax errors like missing commas or parenthesis.
2) Semantic errors like duplicate definitions of symbols.
3) Other error like references to undefined variables can only be reported at the end
of the source program.
In Pass 2, target code is printed. Target code to source statement is dificult to
locate and reverse also. All these problem make the debugging task difficult.

TECHNICAL PUBLICATIONS- An up thrust for knowledge


System Software 3-30 Assemblers

Delaying the program listing and error reporting till Pass 2 is possible. It
gives
effective error reporting facility. It is als0 necessary to report all errors against
erroneous statement itself.
Error reporting in Pass 1

From the Fig. 3.6.1, error is detected at statement 10 and 24. Statement 10 gives
invalid opcode error and statement 24 gives duplicate definition error. Staternent
10 does not match with any mnemonic in OPTAB. Entry of R1 already
exists in
symbol table so statement 24 gives error.
Symbol R2 is undefined so it is harder to detect. Because at the end of
Pass 1,
there is no record that a forward reference to R2 exist in statement 15. At
of Pass 1, all the entries would be processed for checking the definition
the end
of the
symbol has been encountered.
Error reporting in Pass 2
Following Fig. 3.6.2 shows Pass 2 error reporting.
Sr. No. Statement Address Instruction
01 START 300
02 MOVER AREG, R1 300 1 309

10 MVER BREG, R1 307 -2309


e eTor ** Invalid opcode
15 SUB BREG, R2 308 01 2
error s* Undefined symbol R2 in
19 R1 DS 5 309

24 R1 DC '10 327 00 0
010
** error ** Duplication definition of symbol R1

40 END
Fig. 3.6.2 Pass 2 errorreporting
In Pass 1, error reporting at statement 10 and 24 is easy, Error indication at
statement 14 is also easy because the symbol table is an searched for entry K

match is not found, error is reported.


TECHNICAL PUBLICATIONS -
An up thrust for knowledge
Sofware 3-31 Assemblers
System

3.7 Database GTU ummer-19


shows the data structure and files in
Fig. 3.7.1 a
two pass assembler. Database is
required for designing an assembler.

Source
program
(user)

Pass I

|Opcode
table
Source
program
Symbol|i
table

Intermediate
Literal Çode
table

Pass II

Program Required
listing target
program

Fig. 3.7.1 Two pass assembler data structure

Consider the following program.


Relative address

SIMPLE START

BALR 15, 0

USING *, 15
R1, TWO
LOOP

TECHNICAL PUBLICATIONs
-
An up thrust for knowledge
System Software 3-32 Assembler

R1, FOUR

ST R1, FOUR 8
CLI FOUR +3, 4 12
BNE LOOP 16
BR 14 20

RI EQU 24
TWO DC F2 28
FOUR DS 32
END

3.7.1 Pass 1 Databases


1. Source program is used as input.
2. For keeping track of each instruction location,
Location Counter (LC) is used.
3. Machine Operation Table (MOT)
is used to indicate the symbolic mnemonic for
each instruction with its length. An instruction length 2, 4 or 6 bytes.
is
4. Pseudo Operation Table (POT) is
used to represent the symbolic mnemonic and
also action to be taken for each pseudo opcode used in Pass 1.
5. Literals of the program is stored
in the. Literal Table (LT) With their assigned
location.
6. Labels and its corresponding value is stored·in the Symbol
Table (ST).
7. A copy of the input to be
used later by Pass 2 for processing. Normally it is stored
in the secondary storage device.

Pass 2 database,
1. Copy of source program input to Pass 1.
2. Location counter.
3. MOT represents the following for each
instructions:
a) Symbolic mnemonic
b) Length
c) Binary machine opcode
d) Format
4. POT is used for each pseudo-op the symbolic mnemonic and the action to be taken
in Pass 2.

TECHNICAL PUBLICATIONS® - An up
thrust for knowledge
System Sofiware 3- 33 Assomblors
5, Pass 1
prepared the symbol table which contains label and its corresponding value.
6. Base Table indicates which registers are
currently specified as base registers by
using pseudo-ops.
7. A work space is used to hold each instruction as its various parts are
being
assembled together.
8. A work space (PRINT LINE)
used to produce'a printing listing.
9. A work space (PUNCH
CARD) also used for outputting.
10. An output desk of assembled
instructions in the format needed by the loader.

3.7.2 Format of Database


Pass 2 requires a Machine Operation Table (MOT) containing name,
length, binary
code, and format; Pass 1 requires only name
and length.
• It is possible to use two separate tables with different formnats and contents or, use
the samne table for both passes.
The Machine-Op Table (MOT) and Pseudo-Op Tables (POT) are
examples of fixed
tables. The contents of these tables are not filled in or altered during
the assembly
process.
shows a possible content and format of the MOT. The op-code is the key
Fig. 3.7.2
and its value is the binary op-code equivalent, which is stored for use
in
generating machine code. The instruction length is stored for use in updating the
location counter; the instruction format for use in forming the machine language
equivalent.

48 bits per entry

Mnemonic Instru1ction Instruction Not used 3-bit


op-code length format
(4-bytes) (1-byte) (2-bits) (3-bits)

5A 10 001

"AH- 4A 10 001

"AL 5E 10 001

"ALR " 1E 01 C00

"ARbH" 1A 01 000

Fig. 3.7.2 MOT for Pass 1


and Pass 2
-
TECHNICAL PUBLICATIONs An up thrust for knowtedgs
System Software 3-34 Assemblers

Fig. shows a possible pseudo-op table. Each pseudo-op is listed


3.7.3 with an
associated pointer to the assembler routine for processing the pseudo-op.

Pseudo-op Address of routine to process pseudo-op


(5-bytes) (3-bytes)

"DROP." PIDROP

"END-" P1END

"EQU-" PIEQU

"START-" P1START

"USING-" P1USING

Pass 1
Fig. 3.7.3 POT for
The Symbol Table and Literal Table includes for each entry. It also contains a
length field and a relative-location indicator. The length field indicates the lenoth
of the instruction or data to which the symbol is attached."
The relative location specify the value of the symbol to the assembler. Value of the
symbol is absolute or relative to thebase of the program.
Symbol table for Pass 1 and Pass 2

Symbol 8-bytes Value 4 bytes Length 1 byte Relocation 1 byte

"SIMPLE - " 0000H One byte Relative

"LOOP 0000H Four bytes Relative

"R1 0018H Four bytes Relative

"IWO 001CH Four bytes Relative

"FOUR ---" Four bytes Relative

Assembler uses relative location to identify the symbol value. Symbol value may
be relative or absolute to the base of the program. If constant value is used for
defining the symbol, then the symbol is bsolute. Otherwise- it is consider as a
relative.
Symbol table uses "R" for specifying relative and "A" for specifying the absolute
location.

TECHNICAL PUBLICATIONS -An up thrust for knowledge


Sofiware
System 3-35 Assemblers

Pass 1 flowchart
Start

Pass 1

Initialize LC

Read card DS END


EQU USING
DC DROP
Adjust LC Calculate operand
field
Search JYes
Pseudo-op table
Assign storage
L=Length of Assign vatue to locations to
No symbol in label literals
data field field
Search
Machine-op table Reset
copy file

L=Length
Go to
Pass 2

Process literals

Symbol Yes
in label Assign current value
field
of LC to symbol

LC = LC+L

Write copy of card


for Pass 2

Fig. 3.7.4 Pass 1 flowchart

TECHNICAL PUBLICATIONs -An up thrust tor knowedge


System Software 3-36
Assembler

Flowchart for Pass 2


|START

Pass 2

Initialize LC

Readcard
form file copy

Search Yes
pseudo-op table DS EQU DROP
No
tDC START USING END
Adjust LC Evaluate
Search to proper operand
machine-op tabe alignment

Shows base
Get op-code byte reg. number
and format code, Fom constant Enter base unavallabe
DC and insert in reg. no. and
L= Length or
DS assembled value into
program
base table
Type of DS
instruction
L= length
RR of data field
Evaluate both Printing listing
register expressions
and insert into RX
2nd byte
Evaluate register and
index expressions
"Punch" and insert into
assembled 2nd byte
instruction
Generates iterals
Calculate effective for entries in
Display address of operand Literal Table
assembly
listing line
D+C(B)=EA STOP

| LC= LC+L
Put B&D into
bytes 3 and 4

Fig. 3.7.5 Pass 2 flowchart

TECHNICAL PUBLICATIONS -
An up thrust for knowledge
Software
System 3-37 Assemblers
o Assembler uses a
base table for generating
machine instruction and calculates the proper base register reference in
the correct offsets. Base register helps to an
assembler for generating an address. An
address contains offset, a base register
number and an index register number.
Ofset = (Value of symbol in symbol table) -
(ontent of base register)
Review Question

1. Describe following data structures : OPTAB,


SYMTAB, LITTAB and POOLTAB.
GTU : Summer-19, Marks 4

3.8 Forward Reference GTU : Summer-18,19


forward reference of a program entity is a reference to the entity
A
which
precedes its definition in the program.
6 For any symbol that has not yet been defined.
1. Omit the address translation

2. Insert the symbol into SYMTAB and mark this symbol undefined
3. The address that refers to the undefined symbol is
added to a list of forward
references associated with the symbol table entry
4 When the definition for a symbol is encountered, the proper address for the
symbol is then inserted into any instructions previous generated according to
the forward reference list
A one-pass assembler scans the program just once. The main problem in trying to
assemble a program in one pass involves forward references.
Can we write a program without forward references ?
All storage reservation statements can be defined before they are referenced. But,
forward references to labels on instructions cannot be eliminated as easily. The
logic of the program often needs a forward jump.
The one-pass assembler must make some special provision for handling forward
references. One-pass assembler are of two types
:

1. One type of one-pass assemblers produces object code directly in memory for
immediate execution. No object program is written out and no loader is
needed.
2. The other type of one-pass assemblers produces the usual kind of object
program for later execution.

TECHNICAL PUBLICATIONS- An up thrust for knowledge


System Sofiware 3-38 Assemblers

The assembler that does not write object program out and does not need a loader
is called a load-and-go assembler.
It avoids the overhead of
writing the object
program out and reading
Source Load-and-go Program
it back in. It is useful in
program assembler loaded
a system that is oriented in memory
toward program
development and testing.
A load-and-go àssembler Assembler
can be one-pass assembler
or a two-pass assembler. Fig. 3.8.1 Load and go assembler
Fig. 3.8.1 shows the
concept of load-and-go
assembler.
Handling of forward references in one-pass load-and-go assembler
1. The assembler generates object code instructions as it scans the source program. If
an instruction operand is a symbol that has not yet been defined. The symbol is
entered into the symbol table with a flag indicating that the symbol is undefined.
2. The operand address is omitted when the instruction is assembled; the operand
address is added to a list of forward references associated with the symbol table
entry.
3. When the definition for a symbol is encountered, the forward reference list for that
symbol is scanned, and the proper address is inserted into any instructions
previously generated.
One-pass assemblers that product object programs as output are often used on
systems where external working-storage device for the intermediate file between
the two passes are not available and systems with slow external storages.
Forward references are entered into lists as before. Object code without addresses
of undefined operands can be written out as part of a Text record in the object
program.
When the definition of a forward reference is encountered, the assembler generates
Text records with the correct operand address. In effect, the services of the loader
are being used to complete forward references that could not be handled by the
assembler.
When the program is loaded, this address will be inserted into the instruction by
the action of the loader.

TECHNICAL PUBLICATIONS- An up thrust for knowledge


Sstem Sofivare 3- 39 Assemblers

One-Pass Assemblers
One-pass assembler generates their object code in memory for immediate
execution. No object program is written out and no loader is needed. This kind of
load-and-go assembler is useful in the system that is oriented
towards program
development and testing.
.A load-and-go assembler avoids the overhead of writing the object program out
and reading it back in. This can be accomplished with either a one or two pasS
assembler. HOwever, a one pass assembler also avoids the overhead of an
additional pass over the source program. Because the object program is
produced
in memory rather than being written out on secondary storage, handling of
forward references becomes less difficult.
The assembler simply generates object code instructions as it scans the source
progranm.
Fig. 3.8.2 shows the sample program for a one-pass assembler.

Line Loc Source Statement Object Code


1000 COPY START 1000
1000 EOF BYTE C'EOF 454f46
1003 THREE WORD 3 000003
1006 ZERO WORD 000000
1009 RETADR RESW 1

100G LENGTH RESW 1


6 100F BUFFER RESB 4096

10 200F FIRST STL RETDAR 141009


2012 CLOOP JSUB RDREC 48203D
20 2015 LDA LENGTH 00100C
25 2018 COMP ZERO 281006
30 201B JEQ ENDFIL 302024
201E JSUB WRREC 482062
40 2021 CLOOP 302012
45 2024 ENDFIL LDA EOF 001000
50 2027 STA BUFFER 0C100F
202A LDA THREE 001003
60 202D STA LENGTH 0C1000C
6E 2030 JSUB WRREC 482062
2033 LDL RETADR 081009
75 2036 RSUB 4C0000
110
115 SUBROUTINE TO READ RECORD INTO BUFFER

TECHNICAL PUBLICATIONS-An up thrust for knowledge


3-40
System Software Assemblers

120
INPUT BYTE XF1 F1
121 2039
MAXLEN WORD 4096 001000
122 203A
24

203D RDREG LDX ZERO 041006


125
2040 LDA ZERO 001006
130
2043 RLOOP TD INPUT E02039
135
140 2046 JEO RLOOP. 302043
145 2049 RD INPUT D82039
2040 COMP ZERO 281006
150
JEQ EXT 30205B
155 204F
2052 STCH BUFFERX 54900F
160
165 2055 MAXLEN 2C203A
170 2058 RLOOP 382043
175 205B EXIT STX LENGTH 101000
180 205E RSUB 4C0000
195
200 SUBROUTINE TO WRITE RECORD FROM BUFFER
205
206 2061 OUTPUT BYIE X05' 05
207
210 2062 WRREC LDX ZERO 041006
215 2062 WLOOP TD QUTPUT E02061
220 2068 JEO WLOOP 302065
225 2068 LDCH BUFFER,X 50900F
230 206E WD OUTPUT DC2061
235 2071 TIX LENGTH 2C100C
240 2074 JLT WLOOP 382065
245 2077 RSUB 4C0000
255 END FIRST

Fig. 3.8.2 Sample program for a one-pass assembler


Fig. 3.8.3 shows object code and symbol table entries for (program) Fig. 3.8.3 after
scanning line 40
The first forward reference occurred on line 15. Since the operand (RDREC) Was
not yet defined, the instruction was assembled with no value assigned as the
operand address (denoted by ...)
RDREC was then entered into SYMTAB as an undefined symbol (indicated by );
the address of the operand field (2013) of the instruction was inserted in a list
associated with RDREC.
A similar process was followed with the instructions on lines 30 and 35.

TECHNICAL PUBLICATIONS - An up thrust for knowledge


System Software 3-41 Assemtlers

Symbol Value
LENGTH
1000
RDREC
2013
THREE 1003

ZERO 1006
WRREC
201F
EOF 100D
ENDFIL
201C
RETADR 1009
BUFFER 100F
CLOOP 2012

FIRST 200F

Fig. 3.8.3

•Now consider Fig. 3.8.4, which corresponds to the situation after scanning
lne 160.
By this time, some of the forward references (ENDFIL, Line 45 and RDREC line
125) have been resolved, while others (EXIT, line 175 and WRREC,
line 210) have
been added.
When the symbol ENDFIL was defined, the assembler placed its value in the
SYMTAB entry; it then inserted this value into the instruction operand
feld (at
address 201C) as directed by the forward reference list. From this point on, any
references to ENDEIL would not be forward references and would not be entered
into a list.
At the end of the program, any SYMTAB entries that are still marked with s
indicate undefined symbols. These should be flagged by the assembler as errors.
One-pass assemblers that produce object programs follow a slightly different
procedure from that previously described.
1) Forward references are entered into lists as before.
2) When the definition of a symbol is encountered, instructions that made
forward references to that symbol may no longer available in mnemory for
modification. In general, they will already have been written out as part of a
Text record in the object programn. In this case, the assembler must generate
another Text record with the correct operand address.
TECHNICAL PUBLICATIONS - An up thrust for knowledge
System Softwaro Assemblors
3- 42

Symbol Value

LENGTH 100C

RDREC 203D

THREE 1003

ZERO 1006

WRREC |*
201F - 2031

EOF 1000

ENDFIL 2024

RETADIR 1009

BUFFER 100F

CLOOP 2012

FIRST 200F

MAXLEN 203A

INPUT 2039

EXIT 2050

RLOOP 2043

Fig. 3.8.4
3) When the program is loaded, this address will be inserted into the instruction
by the action of the loader.
The 2nd Text record contains that object code generated from lines 10 through 40
in Fig. 3.8.2. The operand addresses for the instructions on lines 15, 30 and 35
have been generated as 0000.
When ENDFIL on line 45 is encountered, the assembler generates the 3" Text
record. This record specifies that the value 2024 (the address of ENDEIL) to be s
loaded at location 201C (the operand address field of JEQ on line 30).
When the program is loaded, the value 2024 will replace the 0000 previously
loaded.
Forward Reference in One-Pass Assembler
1. Omits the operand address if
the symbol has not yet been defined.
2. Enters this undefined symbol into SYMTAB
and indicates that it is undefined.
TECHNICAL PUBLICATIONS- An up thrust for knowledge
Sovam 3- 43 Angomblora
Syston
a
Adls the addregs ol lhis operand addlress to list of forward references associated
with the SYMTAB entry.
A
When the definilion for the symbol is encountercd, scans the reference list and
inserts the address.
E Át the end of he program, reports tlhe error if there are still SYMTAB entries
indicated undefincd symbols.
6 For Load-and-Go assenmbler : Scarch SYMTAB for the syrnbol narmed in the END
statement and jumps to this location to begin cxecution if there is no erro.

IF
One-Pass Assemblers Need to Produce Object Codes
When external working storage devices are not available or too slow then the
:
solution is as follows
1. If the operand contains an undefined symbol, use 0 as the address and write
the Text record to the object program.
2. Forward references are entered into lists as in the load-and-go assembler.
3. Whe the definition of.a symbol is encountered, the assernbler generator
another text record with the correct operand address of each entry in the
reference list.
4. When loaded, the incorrect address 0 will be updated by the latter text record
containing the symbol definition.

Review Question

1. Define forvard references. Hotv it can be solved using back-patching.


GTUSummer-18. 19, Marks 3

3.9 Literals GTU Winter-17


. Let
the programmers to be able to write the value of a constant operand as a part
of the instruction that uses it. This avoids having to define the constant elsewhere
a literal
in the program and make up a label for it. Such an operand is called
because the value is stated literally in the instruction.
•A literal is identified with, the prefix =, which followed by a specification of the
literal value. For example
45 001A ENDFIL LDA = C' EOF 032010
Specifies a 3-byte operand with value EOF.

TECHNICAL PUBLICATIONS An up thrust for knowledge


System Sofware 3- 44 Assemblers

Difference between a lIteral and Immedlate operand


1. With immediate addressing, the operand value is assermbled as part of the machin
instruction.
e.g. 55 0020 LDA #3 010003
a constant
2. With a
literal, the assembler generates the specified value as at some
other memory location.
e.g. 45 001A ENDFILLDA = CEOF' 032010

Literal Pools
All of the literal operands used in a program are gathered together into one or
more literal pools.
Normally literals are placed into a pool at the end of the program.
In some cases, it is desirable to place literals into a pool at some other location in
the object program. For this purpose, the assembler directive LTORG is used.
1. When the assembler encounters a LTORG statement, it creates a literal pool
that contains all of the literal operands used since the previous LTORG.
2. This literal pool is placed in the object program at the location where the
LTORGdirective was encountered.
3. Literal placed in a pool by LTORG will not be repeated in the pool at the end
of the program.

Duplicate Literals
The assemblers should recognize duplicate literals and store only one copy of the
specified data value. For example
215 1062 WLOOP TD = X'05'
230 106 B WD= X'05'
Only one data area with this value is generated. Both instruction refer to the same
address in the literal pool for their operand.
How to find the duplicate literals ?
1) The easiest method is to recognize duplicate literals is
by comparison of the
character strings defining them. Same literal name with different value
e.g. LOCCR = *

2) Comparison of the generated data value. The benefits of using generate data
value are usually not reat enough to justify the additional complexity in the
assembler.
The basic data structure that assembler handles literal operands is literal table
LITTAB. For each literal used, this table contains the literal name, the operand

TECHNICAL PUBLICATIONS -
An up thrust for knowledge
System Software 3- 45 Assemblers

value and length, and the address assigned to the operand


when it is placed in a
literal pool.

Pass
) Build LITTAB with literal name, operand value and lerngth, leaving the address
unassigned.
2 When LTORG statement is encountered, assign an address to each literal not yet
assigned an address.

Pass 2
1) Search LITABfor each literal operand encountered.
2) Generate data values using BYTE or WORD statement.

3)Generate modification record for literals that represent an address in the program.
o
Following are the parameter which affects the pass structure of an assembler.
1. Translationtime
2. Storage area
3. Efficiency of target code
4. Speed of translation
5. Error listing
6. Overheads
O
Single pass assembler translation time is more than the multipass assembler
because there are more than one pass.
o Multipass assembler requires less memory for loading because memory allocated
to Pass 1 is reused by Pass 2.
o
In multipass assembler the code generated by the first pass is called intermediate
code. This is stored on secondary storage device, which involves I/0 operations,
due to which speed of multipass assembler slows down. So single pass assembler
is better.
o
The code generated by assembler should be efficient in terms of executation speed.
It is not possible to declare error immediately in one pass assembler but errors can
be displayed more precisely if we use more than one passes.

Review Question

L. Explain the difference between literal and constant in assembler with its syntax. Why PO0LTAB
:
is requiring ? GTU Winter-17, Marks 3

TECHNICAL PUBLICATIONs- An up thrust for knowledge


System Software 3-46 Asser

3.10 Expressions
a
Most assembler allow the use of expressions whenever such single operand is
permitted. Each such expression must be evaluated by the assembler to produce
single operand address or value.
Expressions can be classified as absolute expressions or relative expressions
depending upon the type of value they produce.
Relative expressions : Relative means relativ to the beginning of the program
Labels on instuctions and data areas, and references to the location counter valhue
are relative terms. No relative term nay enter into a multiplication or division
operation.
Absolute : Absolute means independent of program location. A constant is an
absolute term. Absolute exprssions may also contains relative termns provided the
relative terms occur in pair and the terms in each such pair have opposite sign.
A relative term or expression represents some value that may be written as
(S + r) where
S = Starting address of the program

I = Value of the term or expression relative to the starting address


Example
107 MAXLEN EQU BUFEND-BUFFER
Both BUFEND and BUFFER are relative terms, each representing an address within
the program. However the expression BUFEND-BUFFER represents an absolute value.
To determine the type of an expression, we must keep track of the types of all
symbols defined in the program. Following table shows the symbol table entries.

Symbol Type Value


RETADR R 0030

BUFFER R 0036
BUFEND R 1036
MAXLEN 1000

With this information, the assemnbler can easily determine the type of each express1On
used as an operand and generate modification records in the object program for relative
values.

TÉCHNICAL PUBLICATIONS® - An up thrust for knowledge


System Sofiware 3- 47 Assemblers
SYMTAB

Name Value
COPy

FIRST 0

CLOOP 6

ENDFIL 1À

RETADR 30

LENGTH 33

BUFFER 36

BUFEND 1036

MAXLEN 1000

RDREC 1036

RLOOP 1040

EXIT 1056

IINPUT 105C

WREC 105D

WLOOP 1062

LITTAB

C'EOF 454F46 3 002D

X'05 05 1076

3.11 MASM Assembler


An MASM assembler language program is written as a collection of segments. In
is defined as
X86 system, memory is a collection of segments. Each segment
belonging to a particular class, corresponding to its contents. Commonly used
classes are CODE, DATA, CONST and STACK.

TECHNICAL PUBLICATIONS- An up thrust for knowledge


System Sofware 3-48
Assembler

During program Execution, segments are addressed via the X86


Register CS is used to address code segment segment
and register SS addressed register
segments. using
stack
These segment registers are automatically set by the
system loader
program is loaded for execution. when
Data segments are normally
a
addressed using DS, ES, FS Or GS.
be used can be specified by the programmer. Segment register
The assembler assumes
that all references to data segments
use
default. The assembler directive can register DS
change this assumption
ASSUME ES: DATASEG2 by using directive,
,ie.
This tells the assembler to assume
DATASEG2.
that register ES indicates the
segment
Registers DS,
ES, FS and GS must be loaded
used to address data segments. by the programn before
For example, they can be
MOVAX, DATASEG 2
MOVES, AX
would set ES to indicate the
data segment DATASEG2
Jump instructions are assembled
1. Near jump
in two different ways.
2. Far jump
• A Near jump is a jump to a
target in. the same code segment.
jump to a target in a different A far jump is a
code segment.
• Near jump is assembled
using the current code segment
must be assembled using a register CS. A far jump
different segment register, which
instruction prefix. is specified in an
The assembled machine instruction
for a near jump occupies 2 or 3
assembled instruction for a bytes. The
far jump requires 5 bytes.
Segments in an MASM source program can
References between segments be written in more than one part.
that are assembled together are automaticaly
handled by the assembler. External references
between separately assembled
modules must be handled by the
linker.

TECHNICAL PUBLICATIONS- An up thrust for knowledge


System Sofiware 3- 49 Assemblers

Solved Examples

Cvample 3.11.1 Explain various str,


data
structure required for 2 pass assembler. For the
following assembly language code show the contents
of sunbol table, literal table and also
generate intermediate and target code. Assume your
own opcodes and instruction length.
START 1000
READ N
MOVER B,
MOVEM B, TERM
AGAIN MULT B, TERM
MOVER TERM
ADD
MOVEM
C TERM
COMP
C
BC LE, AGAIN
MOVEM B RESULT
LTORG
PRINT RESULT
STOP
N DS
RESULT DS 20
TERM DS 1
END
Solution : Symbol table

Syntax Addr
Again 1003
N 1013
Result 1014
Term 1034
Literal table
=
/' 1011

TECHNICAL PUBLICATIONS-An up thrust for knowledge


3- 50
Assemblers
System Software

Intermediate code
Read 3A

MOVER 3B

MOVEM 3C
MULT 3D
ADD 3E

COMP 3F
BC 3G

Print AA
Stop AB

Target code
B - Register #2
-
C
Register #3

Address Instruction OP1 OP2


1000 3A
1001 3B #2
1002 3C #
2
1003 3D #2
1004 3B #3
1005 3E
1006 3C #3
1007 3F #3
1008 3G
1009 3C #2
1011 AA
1012 AB
1U13

1014

1034

1035 end

TECHNICAL PUBLICATIONS - An up thrust for knowledge


System
Softwaro
3-51 Assomblers

Assuming length of instruction is 1


byte and DS and DC also taken 1
byte literal take
one byte of memory.

Example 3.11.2
START 100
MOVER AREG,= 5
ADD CREG,= 1
A DS 3
L1 MOVER AREG,b
Add AREG,C
MOVEM AREG,d
L TOR G
D EQU A+1
L2 Print D
ORIGIN A-1
SUB AREG,1
MULT CREG,b
DS
ORIGIN L2+1
STOP
B DC 19
END
i) Shou the contents of symbol table, literal. table and poo! table at the end of
pass I.
i) Show the intemediate code generated for the program.
Solution : Symbol table :

Symbol Value Length


A 1

L1 4 1

4 4

L2 4 4

B 19 1

Literal table :

Symbol Value Length


areg = 5 48 4
Creg = 1 52 4

areg = 1 56 4

TECHNICAL PUBLICATIONS- An up thrust for knowledge


System Software 3-52 Assemblers

Intermediate code
MOVER AREG 5
ADD CREG 1
MOVER AREG b
ADD AREG C
MOVEM AREG d
SUB AREG 1
MUL CREG b
Print D
Stop

3.12 Multiple Choice Questions


?
Q.1 In two pass assembler, the object code generation is done during
a First pass
b
Second pass
Third pass di Not done by assembler
Q.2 Mnemonic represent
a Operation codes b Strings
Address d None of these
Q.3 Assembler works in phases.
a b 2
3
d
4

Q.4 The assembler·in first pass reads the program to collect symbols defined with offsets
in a table
a Literal table b Hash table
Symbol table POT table
Q.5 The must contain at least the mnemonic operation code and its machine
language equivalent.
a Symbol table Literal table
C Operation code table d Program counter
Q.6 An assembler is
a programming language dependent. b syntax dependant.
machine dependant. d data dependarnt.
Q.7 In a two-pass assembler, the task of the Pass II is to
a separate the symbol, mnemonic opcode and operand fields.
build the symbol table.
construct intermediate code.
d synthesize the target program.

TECHNICAL PUBLICATIONS - An up
thrust for knowiedge
System Sofware 3-54 Assemblers

Q.17 A literal is identified by prefix


b #

c d

Q.18 phase performs type checking task.


Lexical analysis b Syntax analysis
Semantic analysis

Answer Keys for Multiple Choice Questions


Q.No. Answer Q.No, Answer a.No, Answer 0.No, Answer.Q.No. Answer Q.No. Anew.
2 5
1, a 3. b 4. C 6,
7. 8. 9. d
10. 11. b 12. b
13. 14. 15. 16. 17. 18.

3.13 Short Questions and Answers


Q.1 Define an assembler.
Ans. An assembler is a translator that translates source instructions (in symbolie
:

language) into target instructions (in machine language), on a one to one basis.
Q.2 Define assembler directives.
Ans. Assembler directive instruct the assembler to perform certain action during the
:

assembly program.
Q.3 What is a program block ?
Ans. Program block refer to segments of code that are rearranged within a single
:

object program unit and control sections to refer to segments that are translated into
independent object program unit.
Q.4 What is a common use of ORG ?
Ans. : The most common use for ORG is to specify a start address for the program in
a computer without an operating system.
Q.5 List three types of assembly language statements ?
Ans. : Three types of assembly language statements are :
Imperative statements,
Declaration statements and Assembler directives.
Q.6 What is symbol table ? GTUSummer-17 Mark I
Ans. : Each entry in the table contains the definition of a symbol and has fields for the
name, value, and type of the symbol.

TECHNICAL PUBLICATIONS- An up thrust for knowledge


Sofwaro 3- 55 Assemblers
System

What is the program relocatlon ?


Q.7
.Program relocation is the process of modifying the addresses used in the address
sensitive instructions of the program such that the programn can execute correctly fromn a
designated arca of
the memory.

What is meant by forward reference ?


0.8
A forward reference of a program entity is a reference
Ans. : to the entity which
precedes its definition in the program.

Q.9 Define literal.


Ans. :
Programmer is able to write the value of a constant operand as a part of the
instruction that uses it. This avoids having to define constant elsewhere in the program
and make up label for it. Such open is called literal.
a a

Q.10 What is control section ?


Ans. :Control sections : Segment of code that are translated into independent object
program unit.

0.11 What are the contents of the literal table ?


Ans. : Literal table contains the literal name, the operand value and length, and the
address.
:
Q.12 Backpatching.
Define
a technique in which the operand field of an instruction
Ans. : Backpatching is
containing a forward reference is left blank initially. The address of the forward
reference symbol is put into this field when definition is encountered.

TECHNICAL PUBLICATIONS An up thrust for knowledge


-

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy