CS241 - Foundations of Sequential Programs DFA

e-Non Deterministic Finite Automata

Take the union of two NFA’s. Concatenation of two NFAs and the Kleene star of an NFA. Allows us to glew language together easier.

Step of a single symbol as well as how many characters until the next epsilon closure. Interchange with epsilon and delta, end with epsilon closure. Extra step in between.

Exercise: What regular language does this machine represent?!

Screen Shot 2022-10-04 at 11.58.50 AM.png a(a|b)* (ca(a|b)) Always do epsilon closure after reading a character, or whatever you are reading in. The empty set is our error state.

Equivalences

Every e-NFA has a corresponding DFA. More expressive way to represent a language. Can automate algorithm to convert to a DFA. (can check if it’s linear time when running it, recognizes regular language.) Use egrep.

Only with eNFAs

Put int a black box? Concatenation: Do whatever in L1, then start L2. Put epsilon closure to the accept state at the end of L1.

Summary

  • For each regular language, we can construct an e-NFA that recognizes the language.
  • Convert e-NFA into DFA and vice versa.

Scanning

😴

  • Is C a regular language? Following are regular:

    • C keywords
    • C identifiers
    • C literals
    • C operators
    • C comments Finite automata can do our tokenization = Scanning. Tokens are regular language.
  • Our goal, given some text and break the text up into tokens.

    • Categorized the tokens somehow. Later we dont need to be concerned to find the structure?
  • Problem: Some tokens can be recognized in multiple different ways.

  • Choose an algorithm that will make the distinction and decision. Be clear what way of breaking down the token is right. Algorithm becomes part of the language.

Two algorithms

Maximal Munch Scanning Algorithm

Consume characters until you non longer hae a valid transition. If none left, backtrack to the last valid accepting state and resume. Always remember the last accept state, and what character I’ve read to make a different token when I backtrack. For example: 0x, backtrack to 0, start again from whatever I backtracked from.

Simplified Maximal Munch

If none, produce the token and proceed. Otherwise go to an error state.

Assignments on compiler

  1. Identify tokens (Scanning) [Completed]
  2. Check order of tokens (Syntactic Analysis) [Now]
  3. Type Checking (Semantic Analysis) [Later]
  4. Code Generation [Also later]

Context-Free Languages

Regular languages + recursion In terms of expression, we hav e a different form, called grammars

Definition: Grammar is the language of languages. Grammars help us to describe what we are allowed and not allowed to say. Context free language is a set of rewrite rules.