Context-Free Languages and Parsing

To date, the different stages of compilation:

Scanner weeds out programs that are lexically invalid. That cannot be split into tokens.
Syntactically valid programs move on to Context-Sensitive Analysis stage.
Prgorams that are lexically, syntactically and semantically valid move to the Code Generation stage.

We don’t use Regular Languages because it has limitations on solving the Balanced Parentheses Problem.

Context-Free Grammars

In this module, we focus on the expression of Context-Free Languages, and so need a context-free equivalent to the regular expression.

A grammar is a language of languages.

L = (L) | $ϵ$ A rewriting process (expand), with the symbol $\to$ , and repeatedly rewrite the $L$ part into one of its expanded forms to reach a string in the language.

$\to$ idea is to rewrite (or expand) a non-terminal by replacing the left-hand side (LHS) of the production rule with the right-hand side (RHS).

Every regular language is context free!

We can visually represent the structure of an input word with a parse tree.

The root of the tree is the start symbol.
Each non-leaf node is a rule that was used for the non-terminal.
Leaf nodes represent terminals or $ϵ$ . Properties of parse trees:
Property 1: A derivation uniquely defines a parse tree.
Property 2: An input string can have more than one parse tree.

Problem: According to property 2, we can derive different parse trees. This is not what we want to represent the structure of a program within a compiler, it is important that the compiler produces unique parse trees. Solution: Stick with Leftmost Derivation. Problems: Even if we select a particular derivation style, it is still possible to obtain two different dervations $\Rightarrow$ two different parse trees. Solution: To avoid ambiguity, we introduce parenthesis. Force the language syntax to require parentheses.

Using this technique to change the associativity of a grammar, we can create a grammar that follows BEDMAS rules more closely by making ∗ and / appear further down the tree. Recall, with our depth-first traversal, the deeper parts of the tree will be evaluated first therefore get a higher precedence.

The next two modules LL(1) Top-Down Parsig and Bottom-Up Parsing will describe two different approaches to writing parsing algorithms.

🪴 Avril Chen

Explorer

Context-Free Languages and Parsing

Context-Free Grammars

Graph View

Table of Contents

Backlinks