Aim: Implementation of TWO Pass assembler with hypothetical Instruction set
Instruction set should include all types of assembly language statements such
as Imperative, Declarative and Assembler Directive. While designing stress
should be given on
a) How efficiently Mnemonic opcode table could be implemented so as
to enable faster retrieval on op-code.
b) Implementation of symbol table for faster retrieval.
( Concepts in DSF should be applied while design)
·To learn the basic translation process of assembly language to machine
A language translator bridges an execution gap to machine language of computer system. An assembler is a language translator whose source language is assembly language.
Language processing activity consists of two phases, Analysis phase and synthesis phase. Analysis of source program consists of three components, Lexical rules, syntax rules and semantic rules. Lexical rules govern the formation of valid statements in source language. Semantic rules associate the formation meaning with valid statements of language. Synthesis phase is concerned with construction of target language statements, which have the same meaning as source language statements. This consists of memory allocation and code generation.
Analysis of source program statements may not be immediately followed by synthesis of equivalent target statements. This is due to forward references issue concerning memory requirements and organization of Language Processor (LP).
Forward reference of a program entity is a reference to the entity, which precedes its definition in the program. While processing a statement containing a forward reference, language processor does not posses all relevant information concerning referenced entity. This creates difficulties in synthesizing the equivalent target statements. This problem can be solved by postponing the generation of target code until more information concerning the entity is available. This also reduces memory requirements of LP and simplifies its organization. This leads to multi-pass model of language processing.
Language Processor Pass: –
It is the processing of every statement in a source program or its equivalent representation to perform language-processing function.
Assembly Language statements: –
There are three types of statements Imperative, Declarative, Assembly directives. An imperative statement indicates an action to be performed during the execution of assembled program. Each imperative statement usually translates into one machine instruction. Declarative statement e.g. DS reserves areas of memory and associates names with them. DC constructs memory word containing constants. Assembler directives instruct the assembler to perform certain actions during assembly of a program,
e.g. START directive indicates that the first word of the target program generated by assembler should be placed at memory word with address
Function Of Analysis And Synthesis Phase:
Analysis Phase: –
Isolate the label operation code and operand fields of a statement.
Enter the symbol found in label field (if any) and address of next available machine word into symbol table.
Validate the mnemonic operation code by looking it up in the mnemonics table.
Determine the machine storage requirements of the statement by considering the mnemonic operation code and operand fields of the statement.
Calculate the address of the address of the first machine word following the target code generated for this statement (Location Counter Processing)
Obtain the machine operation code corresponding to the mnemonic operation code by searching the mnemonic table.
Obtain the address of the operand from the symbol table.
Synthesize the machine instruction or the machine form of the constant as the case may be.
Design of a Two Pass Assembler: –
Tasks performed by the passes of two-pass assembler are as follows:
Pass I: –
Separate the symbol, mnemonic opcode and operand fields.
Determine the storage-required foe every assembly language statement and update the location counter.
Build the symbol table and the literal table.
Construct the intermediate code for every assembly language statement.
Pass II: –
Synthesize the target code by processing the intermediate code generated during
Data structures required for pass I:
1. Source file containing assembly program.
2. MOT: A table of mnemonic op-codes and related information.
It has the following fields
Mnemonic : Such as ADD, END, DC
TYPE : IS for imperative, DL for declarative and AD for Assembler directive
OP- code : Operation code indicating the operation to be performed.
Length : Length of instruction required for Location Counter Processing
Hash table Implementation of MOT to minimize the search time required for searching the instruction.
Hash Function used is ASCII Value of the First letter of Mnemonic – 65. This helps in retrieving the op- code and other related information in minimum time. For Example the instruction starting with alphabet ‘A’ will be found at index location 0, ‘B’ at index 1, so on and so forth. If more instructions exist with samealphabet then the instruction isstored at empty location and the index of that instruction is stored in the link field. Thus instructions starting with alphabet ‘D’ will be stored at index locations 3,5,and 6. Those starting with E will be stored at 4 and 7 and the process continues.
SYMTB: The symbol table.
Fields are Symbol name, Address (LC Value). Initialize all values in the address fields to -1 and when symbol gets added when it appears in label field replace address value with current LC. The symbol if it used but not defined will have address value -1 which will be used for error detection.
4. LITTAB: and POOLTAB : Literal table stores the literals used in the program and
POOLTAB stores the pointers to the literals in the current literal pool.
5. Intermediate form used Variant 1 / Variant 2
Students are supposed to write the variant used by them.
Data Structure used by Pass II:
1. OPTAB: A table of mnemonic opcodes and related information.
2. SYMTAB: The symbol table
3. LITTAB: A table of literals used in the program
a. Set type as IS, get opcode, get register code, and make entry into symbol or literal table as the case may be. In case of symbol, used as operand, LC field is not known so LC could be -1. Perform LC processing LC++. Updating of symbol table should consider error handling.
if opcode is 00 ( stop) :
Set all fields of Intermediate call as 00. LC++
else register operand not required ( Read and Print)
Same as case 1, only register code is not required, so set it to zero. Here again update the symbol table. LC++
On similar lines we can identify the cases for declarative and assembler directive statements based on opcode.
List of hypothetical instructions:
Instruction Assembly Remarks
00 STOP stop execution
01 ADD first operand modified condition code set
02 SUB first operand modified condition code set
03 MULT first operand modified condition code set
04 MOVER register memory
05 MOVEM memory register
06 COMP sets condition code
07 BC branch on condition code
08 DIV analogous to SUB
09 READ first operand is not used.
10 PRINT first operand is not used.
Forward reference(Symbol used but not defined): –
This error occurs when some symbol is used but it is not defined into the program.
Duplication of Symbol: –
This error occurs when some symbol is declared more than once in the program.
If there is invalid instruction then this error will occur.
Register error: –
If there is invalid register then this error will occur.
Operand error: –
This error will occur when there is an error in the operand field,