Share with others

Implementation of Calculator using LEX and YACC

Aim: Implementation of Calculator using LEX and YACC.

Objective: To study the process of lexical analysis and parsing.

Theory:

During the first phase the compiler reads the input and converts strings in the source to tokens. With regular expressions we can specify patterns to lex so it can generate code that will allow it to scan and match strings in the input. Each pattern specified in the input to lex has an associated action. Typically an action returns a token that represents the matched string for subsequent use by the parser. Initially we will simply print the matched string rather than return a token value.

The following represents a simple pattern, composed of a regular expression, that scans for identifiers. Lex will read this pattern and produce C code for a lexical analyzer that scans for identifiers.

letter(letter|digit)*

This pattern matches a string of characters that begins with a single letter followed by zero or more letters or digits. This example nicely illustrates operations allowed in regular expressions:

repetition, expressed by the “*” operator

alternation, expressed by the “|” operator

History of Lex & Yacc

�� Lex & Yacc were developed at Bell Laboratories in the 70’s

��Yacc was developed as the first of the two by Stephen C. Johnson

�� Lex was designed by Mike E. Lesk and Eric Schmidt to work with Yacc

�� Standard UNIX utilities

Lex & Yacc

��Programming Tools for writers of compilers and interpreters

��Also interesting for non-compilerwriters

��Any application looking for patterns in its input or having an

input/command language is acandiate for Lex/Yacc

  • lex and yacc help you write programs that transform structured input

– lex — generates a lexical analyzer

• divides a stream of input characters into meaningful units (lexemes), identifies them (token) and may pass the token to a parser generator, yacc

• lex specifications are regular expressions – yacc — generates a parser

• may do syntax checking only or create an interpreter

• yacc specifications are grammar components

Lex:

�� The Unix program “lex” is a “Lexical Analyzer Generator”

– Takes a high-level description of lexical tokens and actions

– Generates C subroutines that implement the lexical analysis

• The name of the resulting subroutine is “yylex”

�� Generally, yylex is linked to other routines, such as the parsing procedures

generated by YACC

  • Organization of a Lex program

%%

%%

�� Translation rules consist of a sequence of patterns associated with actions

�� Lex reads the file and generates a scanner

– Repeatedly locates the “longest prefix of the input that is matched by one or more of

the patterns”

– When the action is found, lex executes the associated action

– In the case of a tie:

• Use whichever regexp uses the most characters

• If same number of characters, the first rule wins

Regular Expressions in Lex

�� References to a single character

– x the character “x”

– “x” an “x”, even if x is an operator

– \x an “x”, even if x is an operator

– (x) an x

– [xy] the character x or y

– [x-z] the character x, y or z

– [ˆx] any character except x

– . any character except newline

�� Repetitions and options

– x? an optional x

– x* 0,1,2, … instances of x

– x+ 1,2,3, … instances of x

Yacc Introduction

�� Yacc is a theoretically complicated, but “easy” to use program that parses input files to

verify that they correspond to a certain language

�� Your main program calls yyparse() to parse the input file

�� The compiled YACC program automatically calls yylex(), which is in lex.yy.c

�� You really need a Makefile to keep it all straight

��Yacc takes a grammar that you specify (in BNF form) and produces a parser that

recognizes valid sentences in your language

��Can generate interpreters, also, if you include an action for each statement that is

executed when the statement is recognized (completed)

The Yacc Parser

�� Parser reads tokens; if token does not complete a rule it is pushed on a stack and the

parser switches to a new state reflecting the token it just read

�� When it finds all tokens that constitute the right hand side of a rule, it pops of the right

hand symbols from the stack and pushes the left hand symbol on the stack (called a

reduction)

�� Whenever yacc reduces a rule, it executes the user code associated with the rule

�� Parser is referred to as a shift/reduce parser

�� yacc cannot run alone — it needs lex

Organization of a Yacc file

��Definition section c

– Declarations of tokens used in grammar, the types of values used on the parser stack

and other odds and ends

– For example, %token PLUS, MINUS,

TIMES, DIVIDE

– Declaration of non-terminals, %union,etc

�� Rules section

– A list of grammar rules in BNF form

– Example:

– Each rule may or may not have an associated action (actions Communication between

Lex and Yacc

�� Whenever Lex returns a token to the parser, that has an associated value, the lexer

must store the value in the global variable yylval before it returns.

�� The variable yylval is of the type YYSTYPE; this type is defined in the file yy.tab.h

(created by yacc using the option ‘–d’).

�� By default it is integer.

�� If you want to have tokens of multiple valued types, you have to list all the values

using the %union declaration





Add Image


Add ImageAdd Image


Share with others