JAWAHARLAL NEHRU  TECHNOLOGICAL  UNIVERSITY  HYDERABAD
III Year B.Tech. CSE - I Sem      L   T/P/D   C
-   -/3/-   2
(A50587) COMPILER DESIGN LAB

Objectives: 

  • To provide an understanding of the language translation peculiarities by designing a complete translator for a mini language.

Recomended System / Software Requirements:

  • Intel based desktop PC with minimum of 166 MHZ or faster processor with atleast 64 MB RAM and 100 MB free disk space
  • C++ comiler and JDK kit

Consider the following mini Language, a simple procedural high-level language, only operating on integer data, with a syntax looking vaguely like a simple C crossed with Pascal. The syntax of the language is defined by the following BNF grammar:

<program> ::= <block>

<block> ::= { <variabledefinition> <slist> } | { <slist> }

<variabledefinition> ::= int<vardeflist>;

<vardeflist> ::= <vardec> | <vardec>, <vardeflist>

<vardec> ::= <identifier> | <identifier> [ <constant> ]

<slist> ::= <statement> | <statement>; <slist>

<statement> ::= <assignment> | <ifstatement> | <whilestatement> | <block> | <printstatement> | <empty>

<assignment> ::= <identifier> = <expression> | <identifier> [ <expression> ] = <expression>

<ifstatement> ::= <bexpression> then <slist> else <slist> endif | if <bexpression> then <slist> endif

<whilestatement> ::= while <bexpression> do <slist> enddo

<printstatement> ::= print ( <expression> )

<expression> ::= <expression> <additionop> <term> | <term> | addingop> <term>

<bexpression> ::= <expression> <relop> <expression>

<relop> ::= < | <= | == | >= | > | !=

<addingop> ::= + | -

<term> ::= <term> <mulitop> <factor> | <factor>

<multop> ::= * | / 

<factor> ::= <constant> | <identifier> | <identifier> [ <expression> ] | ( <expression> )

<constant> ::= <digit> | <digit> <constant>

<identifier> ::= <identifier> <letterordigit> | <letter>

<letterordigit> ::= <letter> | <digit>

<letter> ::= a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z

<digit> ::= 0|1|2|3|4|5|6|7|8|9

<empty> has the obvious meaning

Comments (zero or more characters enclosed between the standard C / Java style comment brackets /*...*/) can be inserted. The language has rudimentary support for 1-dimensional arrays. The declaration

int a[3] declares an array of three elements, referenced as a[0], a[1] and a[2] Note also that you should worry about the scoping of names.

A simple program written in this language is:

{
  int a[3], t1, t2;
  t1 = 2;
  a[0] = 1; a[1] = 2; a[t1] = 3;
  t2 = -(a[2] + t1 * 6)/ a[2] - t1);
  if t2 > 5 then
  print(t2);
  else
  {
    int t3;
    t3 = 99;
    t2 = -25;
    print(-t1 + t2 * t3); /* this is a comment on 2 lines */
  }
  endif
}

  1. Design a Lexical analyzer for the above language. The lexical analyzer should ignore redundant spaces, tabs and newlines. It sholud also ignore comments. Although the syntax specification states that identifiers can be arbitrarily long, you may restrict the length to some reasonable value.
  2. Implement the lexical analyzer using JLex, flex or lex or other lexical analyzer generarting tools.
  3. Design Predictive parser for the given language.
  4. Design LALR bottom up parser for the above language.
  5. Convert the BNF rules into Yacc from and write code to generate abstract syntax tree.
  6. Write program to generate machine code from the abstract syntax tree generated by the parser. The following instruction set may be considered as target code.

The following is a simple register-based machine, supporting a total of 17 instructions. It has three distinct internal storage areas. The first is the set of 8 registers, used by the individual instructions as detailed below, the second is an area used for the storage of variables and the third is an area used for the storage of program. The instructions can be precede by a label. This consists of an integer in the range 1 to 9999 and the label is followed by a colon to seperate it from the rest of the instruction. The numerical label can be used as the argument to a jump instruction, as detailed below.

In the description of the individual instructions below, instruction argument types are specified as follows:

R specifies a register in the form R0, R1, R2, R3, R4, R5, R6 or R7 (or r0, r1, etc).

L specifies a numerical label (in the rabge 1 tp 9999).

V specifies a "variable location" ( a variable number, or a variable location pointed to by a register - see below).

A specifies a constant value, a variable location, a register or a variable location pointed to by a register (an indirect address). Constant values are specified as an integer value, optionally preceded by a minus sign, preceded by a # symbol. An indirect address is specified by an @ followed by a register.

So, for example an A-type argument could have the form 4 (variable number 4), #4 (the constant value 4), r4 (register 4) or @r4 (the contents of register 4 identifies the variable location to be accessed).

The instruction set is defined as follows:
LOAD A, R
loads the integer value secified by A into register R.
STORE R, V
stores the value in register R to variable V.
OUT R
outputs the value in register R.
NEG R
negates the value in register R.
ADD A, R
adds the value specified by A to register R, leaving the result in register R.
SUB A, R
subtracts the value specified by A from register R, leaving the result in register R.
MUL A, R
multiplies the value specified by A by register R, leaving the result in register R.
DIV A, R
divides register R by the value specified by A, leaving the result in register R.
JMP L
causes an unconditional jump to the instruction with the label L.
JEQ R, L
jumps to the instruction with the label L if the value in register R is zero.
JNE R, L
jumps to the instruction with the label L if the value in register R is not zero.
JGE R, L
jumps to the instruction with the label L if the value in register R is greater than or equal to zero.
JGT R, L
jumps to the instruction with the label L if the value in register R is greater than zero.
JLE R, L
jumps to the instruction with the label L if the value in register R is less than or equal to zero.
JLT R, L
jumos to the instruction with the label L if the value in register R is less than zero.
NOP
is an instruction with no effect. It can be tagged by a label.
STOP
stops execution of the machine. All programs should terminate by executing a STOP instruction.

Outcomes:

  • By this laboratory, students will understand the practical approach of now a compiler works.
  • This will enable him to work in the development phasse of new computer languages in industry.
  • Created
    Jun 07, 2015
  • Updated
    Jun 07, 2015
  • Views
    5,814