Implementation of TWO Pass assembler
Aim
Implementation of TWO Pass assembler with hypothetical Instruction set Instruction set should include all types of assembly language statements such as Imperative, Declarative and Assembler Directive. While designing stress should be given on
- How efficiently Mnemonic opcode table could be implemented so as to enable faster retrieval on op-code.
- Implementation of symbol table for faster retrieval. ( Concepts in DSF should be applied while design)
Objective
To learn the basic translation process of assembly language to machine Language.
Theory
A language translator bridges an execution gap to machine language of computer system. An assembler is a language translator whose source language is assembly language.
Language processing activity consists of two phases, Analysis phase and synthesis phase. Analysis of source program consists of three components, Lexical rules, syntax rules and semantic rules. Lexical rules govern the formation of valid statements in source language. Semantic rules associate the formation meaning with valid statements of language. Synthesis phase is concerned with construction of target language statements, which have the same meaning as source language statements. This consists of memory allocation and code generation.
Analysis of source program statements may not be immediately followed by synthesis of equivalent target statements. This is due to forward references issue concerning memory requirements and organization of Language Processor (LP).
Forward reference of a program entity is a reference to the entity, which precedes its definition in the program. While processing a statement containing a forward reference, language processor does not posses all relevant information concerning referenced entity. This creates difficulties in synthesizing the equivalent target statements. This problem can be solved by postponing the generation of target code until more information concerning the entity is available. This also reduces memory requirements of LP and simplifies its organization. This leads to multi-pass model of language processing.
Language Processor Pass
It is the processing of every statement in a source program or its equivalent representation to perform language-processing function.
Assembly Language statements
There are three types of statements Imperative, Declarative, Assembly directives. An imperative statement indicates an action to be performed during the execution of assembled program. Each imperative statement usually translates into one machine instruction. Declarative statement e.g. DS reserves areas of memory and associates names with them. DC constructs memory word containing constants. Assembler directives instruct the assembler to perform certain actions during assembly of a program, e.g. START directive indicates that the first word of the target program generated by assembler should be placed at memory word with address
Function Of Analysis And Synthesis Phase
Analysis Phase
- Isolate the label operation code and operand fields of a statement.
- Enter the symbol found in label field (if any) and address of next available machine word into symbol table.
- Validate the mnemonic operation code by looking it up in the mnemonics table.
- Determine the machine storage requirements of the statement by considering the mnemonic operation code and operand fields of the statement.
- Calculate the address of the address of the first machine word following the target code generated for this statement (Location Counter Processing)
Synthesis Phase
- Obtain the machine operation code corresponding to the mnemonic operation code by searching the mnemonic table.
- Obtain the address of the operand from the symbol table.
- Synthesize the machine instruction or the machine form of the constant as the case may be.
Design of a Two Pass Assembler
Tasks performed by the passes of two-pass assembler are as follows:
Pass I
- Separate the symbol, mnemonic opcode and operand fields.
- Determine the storage-required foe every assembly language statement and update the location counter.
- Build the symbol table and the literal table.
- Construct the intermediate code for every assembly language statement.
Pass II
Synthesize the target code by processing the intermediate code generated during
Data structures required for pass I
- Source file containing assembly program.
- MOT: A table of mnemonic op-codes and related information.
It has the following fields
Index | Mnemonic | TYPE | OP-Code | Length | Link |
0 | ADD | IS | 01 | 01 | -1 |
1 | BC | IS | 07 | 01 | -1 |
2 | COMP | IS | 06 | 01 | -1 |
3 | DIV | IS | 08 | 01 | 5 |
4 | EQU | AD | 03 | – | 7 |
5 | DC | DL | 01 | – | 6 |
6 | DS | DL | 02 | – | -1 |
7 | END | AD | 05 | – | -1 |
Mnemonic : Such as ADD, END, DC
TYPE : IS for imperative, DL for declarative and AD for Assembler directive
OP- code : Operation code indicating the operation to be performed.
Length : Length of instruction required for Location Counter Processing
Hash table Implementation of MOT to minimize the search time required for searching the instruction.
Hash Function used is ASCII Value of the First letter of Mnemonic – 65. This helps in retrieving the op- code and other related information in minimum time. For Example the instruction starting with alphabet ‘A’ will be found at index location 0, ‘B’ at index 1, so on and so forth. If more instructions exist with same alphabet then the instruction is stored at empty location and the index of that instruction is stored in the link field. Thus instructions starting with alphabet ‘D’ will be stored at index locations 3,5,and 6. Those starting with E will be stored at 4 and 7 and the process continues.
SYMTB: The symbol table.
Fields are Symbol name, Address (LC Value). Initialize all values in the address fields to -1 and when symbol gets added when it appears in label field replace address value with current LC. The symbol if it used but not defined will have address value -1 which will be used for error detection.
Symbol | Address |
Loop | 204 |
Next | 214 |
Literal | Address |
= ‘5’ | |
= ‘1’ | |
=‘1’ |
Intermediate form used Variant 1 / Variant 2
Students are supposed to write the variant used by them.
Data Structure used by Pass II
- OPTAB: A table of mnemonic opcodes and related information.
- SYMTAB: The symbol table
- LITTAB: A table of literals used in the program
- Intermediate code generated by Pass I
- Output file containing Target code / error listing.
Algorithm
- Open the source file in input mode.
- if end of file of source file go to step 8.
- Read the next line of the source program
- Separate the line into words. These words could be stored in array of strings.
- Search for first word is mnemonic opcode table, if not present it is a label , add this as a symbol in symbol table with current LC. And then search for second word in mnemonic opcode table.
- If instruction is found
case 1 : imperative statement
case 2: Declarative statement
case 3: Assembler Directive
- Generate Intermediate code and write to Intermediate code file.
- go to step 2.
- Close source file and open intermediate code file
- If end of file ( Intermediate code), go to step 13
- Read next line from intermediate code file.
- Write opcode, register code, and address of memory( to be fetched from literal or symbol table depending on the case) onto target file. This is to be done only for Imperative statement.
- go to step 9.
- Close all files.
- Display symbol table, literal table and target file.
Imperative statement case
If opcode >= 1 && opcode <=8 ( Instruction requires register operand)
a. Set type as IS, get opcode, get register code, and make entry into symbol or literal table as the case may be. In case of symbol, used as operand, LC field is not known so LC could be -1. Perform LC processing LC++. Updating of symbol table should consider error handling.
if opcode is 00 ( stop) :
Set all fields of Intermediate call as 00. LC++
else register operand not required ( Read and Print)
Same as case 1, only register code is not required, so set it to zero. Here again update the symbol table. LC++
On similar lines we can identify the cases for declarative and assembler directive statements based on opcode.
List of hypothetical instructions:
Instruction Assembly Remarks
Opcode mnemonic
00 STOP stop execution
01 ADD first operand modified condition code set
02 SUB first operand modified condition code set
03 MULT first operand modified condition code set
04 MOVER register memory
05 MOVEM memory register
06 COMP sets condition code
07 BC branch on condition code
08 DIV analogous to SUB
09 READ first operand is not used.
10 PRINT first operand is not used.
Errors
Forward reference(Symbol used but not defined): –
This error occurs when some symbol is used but it is not defined into the program.
Duplication of Symbol
This error occurs when some symbol is declared more than once in the program.
Mnemonic error
If there is invalid instruction then this error will occur.
Register error
If there is invalid register then this error will occur.
Operand error
This error will occur when there is an error in the operand field,
sanket kurude says
very nice work!!keep doing it guys!!
ProjectsGeek says
Thanks ..!!
masood says
i need this project can you pls mail it to me
ProjectsGeek says
You can read this assembler concept on website there is no attachment which we can send you.