Context-Free Languages and Parsing
To date, the different stages of compilation:
- Scanner weeds out programs that are lexically invalid. That cannot be split into tokens.
- Syntactically valid programs move on to Context-Sensitive Analysis stage.
- Prgorams that are lexically, syntactically and semantically valid move to the Code Generation stage.
We don’t use Regular Languages because it has limitations on solving the Balanced Parentheses Problem.
Context-Free Grammars
In this module, we focus on the expression of Context-Free Languages, and so need a context-free equivalent to the regular expression.
A grammar is a language of languages.
L = (L) | A rewriting process (expand), with the symbol , and repeatedly rewrite the part into one of its expanded forms to reach a string in the language.
idea is to rewrite (or expand) a non-terminal by replacing the left-hand side (LHS) of the production rule with the right-hand side (RHS).
Every regular language is context free!
We can visually represent the structure of an input word with a parse tree.
- The root of the tree is the start symbol.
- Each non-leaf node is a rule that was used for the non-terminal.
- Leaf nodes represent terminals or . Properties of parse trees:
- Property 1: A derivation uniquely defines a parse tree.
- Property 2: An input string can have more than one parse tree.
Problem: According to property 2, we can derive different parse trees. This is not what we want to represent the structure of a program within a compiler, it is important that the compiler produces unique parse trees. Solution: Stick with Leftmost Derivation. Problems: Even if we select a particular derivation style, it is still possible to obtain two different dervations two different parse trees. Solution: To avoid ambiguity, we introduce parenthesis. Force the language syntax to require parentheses.
Using this technique to change the associativity of a grammar, we can create a grammar that follows BEDMAS rules more closely by making ∗ and / appear further down the tree. Recall, with our depth-first traversal, the deeper parts of the tree will be evaluated first therefore get a higher precedence.
The next two modules LL(1) Top-Down Parsig and Bottom-Up Parsing will describe two different approaches to writing parsing algorithms.