##### Differences

This shows you the differences between two versions of the page.

 cs-236:project-1-worksheet [2015/01/05 14:02]egm created cs-236:project-1-worksheet [2015/01/05 14:06] (current)egm 2015/01/05 14:06 egm 2015/01/05 14:02 egm created 2015/01/05 14:06 egm 2015/01/05 14:02 egm created Line 1: Line 1: - * [[Project 1 Worksheet | Project 1 Worksheet: Lexical Analysis ​for Datalog]] + The objectives ​for this project are to - * [[Project 2 Worksheet | Project 2 Worksheet: LL(1) Parsing ​of Datalog]] + * Create finite state machines to recognize tokens from regular expressions - * [[Project 3 Worksheet | Project 3 Worksheet: Answering Datalog Queries]] + * Implement the state machines with a C++ program - * [[Project 4 Worksheet | Project 4 Worksheet: Generating Facts from Datalog Rules]] + * Produce a correct stream of tokens given arbitrary input - * [[Project 5 Worksheet | Project 5 Worksheet: Optimizing Rule Evaluation ​in Datalog ​Queries]] + + ==Notation and Definitions== + A deterministic finite state machine is a quintuple $(S,\ I,\ f,\ s_0,\ F)$, where: + * $S$ is a finite, non-empty set of states + * $I$ is the input alphabet ​(a finite, non-empty set of symbols) + * $f$ is the state-transition function: $f\ :\ S \times I \rightarrow S$ + * $s_0$ is an initial state, an element ​of $S$ + *  $F$ is the set of final states, a (possibly empty) subset of $S$. + + A finite state transducer is a 6-tuple $(S,\ I,\ O,\ f,\ g,\ s_0)$ as above except: + * $O$ is the output alphabet (a finite, non-empty set of symbols) + * $f$ is the state-transition function: $f\ :\ S \times ​ I \rightarrow S$ + * $g$ is the output function: $g\ :\ S \times I \rightarrow O$ + + ==Questions== + # What is the line of code in '''​Lex.cpp'''​ that establishes <​math>​s_0?​ + # Where are the elements of <​math>​S​ enumerated? ​ Give the first three as they appear in the code. + # What are the elements in the set <​math>​I?​ + # Describe a way to refactor the code so that it would work for any FSM, not just the one that has the structure required for the Datalog ​lexer. + # The code implements the state-transition function <​math>​f\ :\ S \times I \rightarrow S​ in two different ways depending on whether a non-empty token in <​math>​O​ is to be emitted according to <​math>​g\ :\ S \times I \rightarrow O​. ​ Match the state-transition function to the code below by circling, labeling, and explaining how the code components represent the state, S, obtain the  input I, and represents the output O for the '''​Comma'''​ state and the input I and output O for the '''​SawColon'''​ state. ​ + # Why is there no token emitted for the '''​SawColon'''​ case in the switch statement?​ + + + State Lex::​nextState() { + State result; + char character;​ + switch(state) { + case Start: ​              ​result = getNextState();​ break; + case Comma: ​              ​emit(COMMA);​ result = getNextState();​ break; + case Period: ​             emit(PERIOD);​ result = getNextState();​ break; + case SawColon: + character = input->​getCurrentCharacter();​ + if(character == '​-'​) { + result = Colon_Dash;​ + input->​advance();​ + } else { //Every other character + throw "​ERROR::​ in case SawColon:, Expecting ​ '​-'​ but found " + character + '​.';​ + } + break; + case Colon_Dash: ​         emit(COLON_DASH);​ result = getNextState();​ break; + … + 