Implementation of TWO Pass assemble

Aim: Implementation of TWO Pass assembler with hypothetical Instruction set

Instruction set should include all types of assembly language statements such

as Imperative, Declarative and Assembler Directive. While designing stress

should be given on

a) How efficiently Mnemonic opcode table could be implemented so as

to enable faster retrieval on op-code.

b) Implementation of symbol table for faster retrieval.

( Concepts in DSF should be applied while design)

Objective:

· To learn the basic translation process of assembly language to machine

Language.

Theory: –

A language translator bridges an execution gap to machine language of computer system. An assembler is a language translator whose source language is assembly language.

Language processing activity consists of two phases, Analysis phase and synthesis phase. Analysis of source program consists of three components, Lexical rules, syntax rules and semantic rules. Lexical rules govern the formation of valid statements in source language. Semantic rules associate the formation meaning with valid statements of language. Synthesis phase is concerned with construction of target language statements, which have the same meaning as source language statements. This consists of memory allocation and code generation.

Analysis of source program statements may not be immediately followed by synthesis of equivalent target statements. This is due to forward references issue concerning memory requirements and organization of Language Processor (LP).

Forward reference of a program entity is a reference to the entity, which precedes its definition in the program. While processing a statement containing a forward reference, language processor does not posses all relevant information concerning referenced entity. This creates difficulties in synthesizing the equivalent target statements. This problem can be solved by postponing the generation of target code until more information concerning the entity is available. This also reduces memory requirements of LP and simplifies its organization. This leads to multi-pass model of language processing.

Language Processor Pass: –

It is the processing of every statement in a source program or its equivalent representation to perform language-processing function.

Assembly Language statements: –

There are three types of statements Imperative, Declarative, Assembly directives. An imperative statement indicates an action to be performed during the execution of assembled program. Each imperative statement usually translates into one machine instruction. Declarative statement e.g. DS reserves areas of memory and associates names with them. DC constructs memory word containing constants. Assembler directives instruct the assembler to perform certain actions during assembly of a program,

e.g. START directive indicates that the first word of the target program generated by assembler should be placed at memory word with address

Function Of Analysis And Synthesis Phase:

Analysis Phase: –

Isolate the label operation code and operand fields of a statement.

Enter the symbol found in label field (if any) and address of next available machine word into symbol table.

Validate the mnemonic operation code by looking it up in the mnemonics table.

Determine the machine storage requirements of the statement by considering the mnemonic operation code and operand fields of the statement.

Calculate the address of the address of the first machine word following the target code generated for this statement (Location Counter Processing)

Synthesis Phase:

Obtain the machine operation code corresponding to the mnemonic operation code by searching the mnemonic table.

Obtain the address of the operand from the symbol table.

Synthesize the machine instruction or the machine form of the constant as the case may be.

Design of a Two Pass Assembler: –

Tasks performed by the passes of two-pass assembler are as follows:

Pass I: –

Separate the symbol, mnemonic opcode and operand fields.

Determine the storage-required foe every assembly language statement and update the location counter.

Build the symbol table and the literal table.

Construct the intermediate code for every assembly language statement.

Pass II: –

Synthesize the target code by processing the intermediate code generated during

Data structures required for pass I:

1. Source file containing assembly program.

2. MOT: A table of mnemonic op-codes and related information.

It has the following fields

Mnemonic : Such as ADD, END, DC

TYPE : IS for imperative, DL for declarative and AD for Assembler directive

OP- code : Operation code indicating the operation to be performed.

Length : Length of instruction required for Location Counter Processing

Hash table Implementation of MOT to minimize the search time required for searching the instruction.

Index

Mnemonic

TYPE

OP-Code

Length

Link

0

ADD

IS

01

01

-1

1

BC

IS

07

01

-1

2

COMP

IS

06

01

-1

3

DIV

IS

08

01

5

4

EQU

AD

03

7

5

DC

DL

01

6

6

DS

DL

02

-1

7

END

AD

05

-1

Hash Function used is ASCII Value of the First letter of Mnemonic – 65. This helps in retrieving the op- code and other related information in minimum time. For Example the instruction starting with alphabet ‘A’ will be found at index location 0, ‘B’ at index 1, so on and so forth. If more instructions exist with same alphabet then the instruction is stored at empty location and the index of that instruction is stored in the link field. Thus instructions starting with alphabet ‘D’ will be stored at index locations 3,5,and 6. Those starting with E will be stored at 4 and 7 and the process continues.

  1. SYMTB: The symbol table.

Fields are Symbol name, Address (LC Value). Initialize all values in the address fields to -1 and when symbol gets added when it appears in label field replace address value with current LC. The symbol if it used but not defined will have address value -1 which will be used for error detection.

Symbol

Address

Loop

204

Next

214

4. LITTAB: and POOLTAB : Literal table stores the literals used in the program and

POOLTAB stores the pointers to the literals in the current literal pool.

Literal

Address

= ‘5’

= ‘1’

= ‘1’

5. Intermediate form used Variant 1 / Variant 2

Students are supposed to write the variant used by them.

Data Structure used by Pass II:

1. OPTAB: A table of mnemonic opcodes and related information.

2. SYMTAB: The symbol table

3. LITTAB: A table of literals used in the program

4. Intermediate code generated by Pass I

5. Output file containing Target code / error listing.

Algorithm

1 Open the source file in input mode.

2. if end of file of source file go to step 8.

3. Read the next line of the source program

4. Separate the line into words. These words could be stored in array of strings.

5. Search for first word is mnemonic opcode table, if not present it is a label , add this as a symbol in symbol table with current LC. And then search for second word in mnemonic opcode table.

6. If instruction is found

case 1 : imperative statement

case 2: Declarative statement

case 3: Assembler Directive

Generate Intermediate code and write to Intermediate code file.

7. go to step 2.

8. Close source file and open intermediate code file

9. If end of file ( Intermediate code), go to step 13

10. Read next line from intermediate code file.

11. Write opcode, register code, and address of memory( to be fetched from literal or symbol table depending on the case) onto target file. This is to be done only for Imperative statement.

12 go to step 9.

13. Close all files.

14. Display symbol table, literal table and target file.

Imperative statement case :

  1. If opcode >= 1 && opcode <=8 ( Instruction requires register operand)
a. Set type as IS, get opcode, get register code, and make entry into symbol or literal table as the case may be. In case of symbol, used as operand, LC field is not known so LC could be -1. Perform LC processing LC++. Updating of symbol table should consider error handling.

  1. if opcode is 00 ( stop) :

Set all fields of Intermediate call as 00. LC++

  1. else register operand not required ( Read and Print)

Same as case 1, only register code is not required, so set it to zero. Here again update the symbol table. LC++

On similar lines we can identify the cases for declarative and assembler directive statements based on opcode.
List of hypothetical instructions:

Instruction Assembly Remarks

Opcode mnemonic

00 STOP stop execution

01 ADD first operand modified condition code set

02 SUB first operand modified condition code set

03 MULT first operand modified condition code set

04 MOVER register memory

05 MOVEM memory register

06 COMP sets condition code

07 BC branch on condition code

08 DIV analogous to SUB

09 READ first operand is not used.

10 PRINT first operand is not used.

Errors: –

Forward reference(Symbol used but not defined): –

This error occurs when some symbol is used but it is not defined into the program.

Duplication of Symbol: –

This error occurs when some symbol is declared more than once in the program.

Mnemonic error:


If there is invalid instruction then this error will occur.


Register error:


If there is invalid register then this error will occur.


Operand error: –

This error will occur when there is an error in the operand field,