Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: zz-x2580
Computer Science Department
CSC 413
Assignment 2 - Modify the Lexer
Due Date
March 2, before midnight
March 4, before midnight, is the late submission deadline (for 75% credit)
Note that the due date applies to the last commit timestamp into the main branch of your repository.
Overview
The purpose of this assignment is to extend the Lexer component of our x language compiler to be able to handle
additional tokens, to supplement our understanding of Compilers and Lexical Analysis.
You are provided with the Lexer code, which will be automatically cloned into your github repository when you begin
the assignment via this github assignment link.
Submission
Your assignment will be submitted using github. Only the “main” branch of your repository will be graded.
Late submission is determined by the last commit time on the “main” branch. You are required to submit a
documentation PDF named “documentation.pdf” in a “documentation” folder at the root of your project.
Please refer to the documentation requirements posted on iLearn. Organization and appearance of this
document is critical. Please use spelling and grammar checkers - your ability to communicate about
software and technology is almost as important as your ability to write software!
We will test your program using the following commands:
1. git clone your-repository-name
2. cd your-repository-name
3. find . -name "*.class" -type f -delete
4. find . -name "*.jar" -type f -delete
5. javac lexer/setup/TokenSetup.java
6. java lexer.setup.TokenSetup
7. javac lexer/Lexer.java
8. java lever.Lexer filename.x
Requirements
You will be extending the Lexer in order to be able to process additional tokens, as well as to improve the output of
the Lexer.
1. The current implementation of Lexer reads a hardcoded file. Lexer must be updated to allow input via a filename
provided as a command line argument:
java lexer.Lexer sample_files/simple.x
Note that the main method is currently commented out - you should uncomment and update this method. In the
event that no filename is supplied, a usage instruction should be displayed:
> java lexer.Lexer
usage: java lexer.Lexer filename.x
2. Our compiler must be updated to accommodate additional tokens. The tokens file must be updated, and
TokenSetup run in order to re-generate the Tokens and TokenTypes classes.
1. Greater: >
2. GreaterEqual: >=
3. HashDelimeter: #
4. LeftBracket: [
5. RightBracket: ]
6. Utf16String: utf16string (this is the type)
7. Utf16StringLit: any utf16string literal, which is a backslash, followed by a lower case u, followed
by 4 hexadecimal digits (0-9, a-f, A-F), repeated twice.
Valid examples: \uD83D\uDC7D \ud83D\uDc7D
Invalid examples: \uDf12\dF \uR123\uZ123
8. TimestampType: timestamp (this is the type)
9. TimestampLit: any timestamp expressed as yyyy~MM~dd~hh:mm:ss, where y, M, d, h, m, and s
are integers in the range 0-9, and 01 <= MM <= 12, 01 <= dd <= 31, 00 <= hh <= 23, 01 <= mm <= 59, and 00
<= ss <= 59.
Valid example: 2022~02~15~12:15:22
Invalid example: 2022~14~15~39:15:22 123~14~15~39:15:22
10. Reserved words
1. Begin: begin
2. End: end
3. In: in
3. The Token class must be updated to include the line number that a token was found (for subsequent error
reporting, etc.).
4. Lexer output must be updated for readability, and to include the line number from the Token, as well as the type
of the token created. (Note that the initial debug text that shows the file information has been removed!). The
format for each of the token lines is:
1. 11 columns, left aligned, for the token description, then a space
2. left:, then a space
3. 8 columns, left aligned, for the left position, then a space
4. right:, then a space
5. 8 columns, left aligned, for the right position, then a space
6. line:, then a space
7. 8 columns, left aligned, for the line number, then a space
8. The symbol
> java lexer.Lexer sample_files/simple.x
READLINE: program { int i int j
program left: 0 right: 6 line: 1 Program
{ left: 8 right: 8 line: 1 LeftBrace
int left: 10 right: 12 line: 1 Int
i left: 14 right: 14 line: 1 Identifier
int left: 16 right: 18 line: 1 Int
j left: 20 right: 20 line: 1 Identifier
READLINE: i = i + j + 7
/* Remainder of output omitted for brevity, see Appendix A */
5. Lexer output must be updated to include a printout, with line number, of each of the lines read in from the source
file. Line numbers for here should be printed in 3 columns, right aligned. Note that when an error is encountered,
the error should be reported as usual, and the lines of the source file should be output, with line numbers, up to
and including the error line.
1: program { int i int j
2: i = i + j + 7
3: j = write(i)
4: }
Appendix A
The complete output for simple.x (the indentation you see for the file output is to allow for three digit line numbers):
java lexer.Lexer sample_files/simple.x
READLINE: program { int i int j
program left: 0 right: 6 line: 1 Program
{ left: 8 right: 8 line: 1 LeftBrace
int left: 10 right: 12 line: 1 Int
i left: 14 right: 14 line: 1 Identifier
int left: 16 right: 18 line: 1 Int
j left: 20 right: 20 line: 1 Identifier
READLINE: i = i + j + 7
i left: 3 right: 3 line: 2 Identifier
= left: 5 right: 5 line: 2 Assign
i left: 7 right: 7 line: 2 Identifier
+ left: 9 right: 9 line: 2 Plus
j left: 11 right: 11 line: 2 Identifier
+ left: 13 right: 13 line: 2 Plus
7 left: 15 right: 15 line: 2 INTeger
READLINE: j = write(i)
j left: 3 right: 3 line: 3 Identifier
= left: 5 right: 5 line: 3 Assign
write left: 7 right: 11 line: 3 Identifier
( left: 12 right: 12 line: 3 LeftParen
i left: 13 right: 13 line: 3 Identifier
) left: 14 right: 14 line: 3 RightParen
READLINE: }
} left: 0 right: 0 line: 4 RightBrace
1: program { int i int j
2: i = i + j + 7
3: j = write(i)
4: }