Context-Free Languages and Parsing

To date, the different stages of compilation:

  • Scanner weeds out programs that are lexically invalid. That cannot be split into tokens.
  • Syntactically valid programs move on to Context-Sensitive Analysis stage.
  • Prgorams that are lexically, syntactically and semantically valid move to the Code Generation stage.

We don’t use Regular Languages because it has limitations on solving the Balanced Parentheses Problem.

Context-Free Grammars

In this module, we focus on the expression of Context-Free Languages, and so need a context-free equivalent to the regular expression.

A grammar is a language of languages.

L = (L) | A rewriting process (expand), with the symbol , and repeatedly rewrite the part into one of its expanded forms to reach a string in the language.

idea is to rewrite (or expand) a non-terminal by replacing the left-hand side (LHS) of the production rule with the right-hand side (RHS).

Every regular language is context free!

We can visually represent the structure of an input word with a parse tree.

  • The root of the tree is the start symbol.
  • Each non-leaf node is a rule that was used for the non-terminal.
  • Leaf nodes represent terminals or . Properties of parse trees:
  • Property 1: A derivation uniquely defines a parse tree.
  • Property 2: An input string can have more than one parse tree.

Problem: According to property 2, we can derive different parse trees. This is not what we want to represent the structure of a program within a compiler, it is important that the compiler produces unique parse trees. Solution: Stick with Leftmost Derivation. Problems: Even if we select a particular derivation style, it is still possible to obtain two different dervations two different parse trees. Solution: To avoid ambiguity, we introduce parenthesis. Force the language syntax to require parentheses.

Using this technique to change the associativity of a grammar, we can create a grammar that follows BEDMAS rules more closely by making ∗ and / appear further down the tree. Recall, with our depth-first traversal, the deeper parts of the tree will be evaluated first therefore get a higher precedence.

The next two modules LL(1) Top-Down Parsig and Bottom-Up Parsing will describe two different approaches to writing parsing algorithms.