This page defines a general rubric for grading projects in CS 236. The rubric provides only a framework and individual sections may deviate from the rubric. Refer to the appropriate syllabus in each section for the final rubric. This rubric indicates at least an 80% level for grading. Accomplishing the 80% level serves as an indicator that a student's solution has enough functionality to accomplish at least the 80% level on the following lab without needing to revisit prior labs to add more functionality. These levels are only indicators, no test suite is perfect, and there is always the chance that a student will need to correct or extend previously implemented labs due to coding errors etc.

The rubric indicates the required functionality tested with minimum standards on the implementations. A visual inspection by the grader is required to judge if a solution adheres to the minimum implementation standards. The grader also reserves the right to adjust scores down if the code does not adhere to the project standards or style guide, is not readable, or is poorly designed in general.

## Lexical Analyzer

To accomplish 80% of the points allocated to the Lexical Analyzer, a solution must recognize the following tokens:

• COMMA, PERIOD, Q_MARK, LEFT_PAREN, RIGHT_PAREN, COLON, COLON_DASH, MULTIPLY, ADD, SCHEMES, FACTS, RULES, QUERIES
• ID
• STRING without single quote escape sequences
• WHITESPACE
• COMMENT for single line comments (e.g., #)
• EOF

Test input only contains ASCII characters that create valid tokens. Output must adhere to the output format.

To accomplish 90% of the points allocated to the Lexical Analyzer, a solution must meet the requirements of the 80% level and additionally implement

• UNDEFINED for single characters
• STRING with single quote escape sequences
• UNDEFINED for non-terminating strings

A valid solution uses automata to recognize STRING and its UNDEFINED counterpart. An automaton should be implemented in a way that makes the state of the automaton clear with simple if-statements governing transitions between states. Test input contains ASCII characters covering all tokens up to this level with additional sequences for both types of UNDEFINED tokens. Output must adhere to the output format.

To accomplish 100% of the points allocated to the Lexical Analyzer, a solution must meet the requirements of the 90% level and additionally implement

• COMMENT for block comments (e.g., #|…|#)
• UNDEFINED for non-terminating block comments

A valid solution uses automata to recognize COMMENT and its UNDEFINED counterpart. Test input contains any sequence of ASCII characters. Output must adhere to the output format.

## Syntax Checker

To accomplish 80% of the points allocated the to the Syntax Checker, a solution must implement a recursive LL(1) parser to recognize membership in the language with the following restriction to the grammar: parameters cannot be expressions.

parameter	->	STRING | ID

The Syntax Checker must correctly handle valid and invalid Datalog programs. Test input only generates tokens from the 80% level of the Lexical Analyzer. Output must adhere to the output format.

To accomplish 100% of the points allocated to the Syntax Checker, a solution must first meet the requirements of the 80% level and additionally recognize expressions as parameters.

parameter	->	STRING | ID | expression

Test input only uses tokens from the 80% level of the Lexical Analyzer. Output must adhere to the output format.

## Datalog Parser

To accomplish 80% of the points allocated to the Datalog Parser, a solution must modify the Syntax Checker to store data in classes with the following restriction on the grammar: parameters cannot be expressions.

parameter	->	STRING | ID

A valid solution minimally implements and uses a Predicate and Rule class to store data. Elements in the classes must be stored separately (i.e., the parameter list for X(W,Z) should not be the string “(W,Z)”). Test input is only valid Datalog program files generating tokens from the 80% level of the Lexical Analyzer. Output must adhere to the output format. The output must be generated by traversing the stored data (i.e., “X(W,Z)” is created by piecing together the individual elements stored in the Predicate class).

To accomplish 100% of the points allocated to the Datalog Parser, a solution must first meet the requirements of the 90% level and additionally recognize and store expressions as parameters.

parameter	->	STRING | ID | expression

Expressions must be stored in a recursive data structure. The data structure must be traversed to generate the output. Test input is only valid Datalog program files generating tokens from the 80% level of the Lexical Analyzer. Output must adhere to the output format.

Further exploration: use polymorphism in the recursive data structure to separate a STRING parameter from an ID parameter from an expression parameter.

## Relational Database

To accomplish 80% of the points allocated to the Relational Database, a solution must answer n-parameter queries with any combination of IDs and STRINGs that never include duplicate IDs. A valid solution implements and uses a Relation class to store facts from the input file. That class provides select, project, and rename operations for the queries. The solution additionally implements a method that takes as input a query and returns the relation that is the result of the query. Test input is only valid Datalog program files generating tokens from the 80% level of the Lexical Analyzer. Output must adhere to the output format.

To accomplish 100% of the points allocated to the Relational Database, a solution must meet the requirements of the 80% level and additionally handle queries that include duplicate IDs.

Further Exploration: evaluate expressions in n-parameter queries with any combination of IDs, STRINGs, and expressions. Expressions are evaluated with the following rules as the query is processed:

• An ID refers to the value in the associated column in the current row of the relation
• A '+' or a '*' with STRING operands that are numeric (i.e., '6', '1.5', etc.) have their usual meaning.
• A '+' with any STRING operand that is not numeric is concatenation.
• Expressions are type-safe (i.e., IDs are always in the scheme and expressions use only numeric STRING tokens or expressions use only '+' with non-numeric STRING tokens).

To accomplish 80% of the points allocated to the Datalog Interpreter, a solution must interpret rules with the following restrictions

• Predicates adhere to the 80% level of the Relational Database having no duplicate IDs
• The head predicate does not require reordering in the final project step
• Parameter lists in the predicates are either disjoint or share a single common attribute

A valid solution uses a join operation for the Relation class to evaluate rules and reuses the relational database query code to evaluate predicates in rules. Output must adhere to the output format.

To accomplish 90% of the points allocated to the Datalog Interpreter, a solution must first meet the requirements of the 80% level and additionally support head-predicates that require re-ordering of attributes on the final project and parameter lists in the predicates that may have any number of common attributes. All predicates adhere to the 80% level of the Relational Database having no duplicate IDs.

To accomplish 100% of the points allocated to the Datalog Interpreter, a solution must first meet the requirements of the 90% level and additionally interpret rules that adhere to the 100% level of the Relational Database having duplicate IDs, and the solution must meet the time bound in the project standards.

Further exploration: support expressions in the predicates on rules.

## Optimized Rule Evaluation

The rubric matches the Datalog Interpreter Rubric with the change that the rules be evaluated in groups with each group representing a strongly connected component from the rule dependency graph. Test output must meet the output format. A valid solution create a Graph class that computes the strongly connected components in the rule dependency graph.

Further exploration: complete all tests in under 10 seconds without benchmark specialization (i.e., must be a general solution that works for any valid input).