The lexer recognizes the following lexemes:
Block quotes can also be parsed with this code. The lexer normally doesn't look for block quotes by itself, you need to parse them explicitly as explained in a later section.
Only ASCII characters (bit 7 cleared) can be used for lexemes, except within quoted strings.
0xffff_ff_ff 12_543 0b1010_0011_1100_0_0_0_0Here are the prefixes to be used for different bases:
int LEisIDENT(const char *v);Return value is nonzero iff the current token matches the given identifier.
Inside the quoted string, you may use any non-ASCII byte in addition to the normal ASCII set. A quoted string may also span multiple lines. In order to escape encode some special characters, the backslash character is used. Here is a list of recognized escape sequences:
\d => double quote \q => single quote \s => backslash \n => newline (ASCII 10) \t => tab \e => escape (ASCII 27) \xCC => any byte, CC are hexadecimal digitsThe value (LEval) contains the decoded form of the quoted string, without the surrounding quotes.
int LEpair(int lp,int rp);in order to parse a block quote. The argument lp is the character code for the "left parenthesis" and rp is the right one. When LEpair is called, the current token has to be equal to lp. The function then scans the input for a matching right parenthesis. While doing this, it skips over quoted strings, but doesn't ignore comments. It also handles nested parenthesis.
The return value is 0 if there was a matching right parenthesis, otherwise it's nonzero. In case of success, the variable LEval holds the contents of the blockquote, without the quoting parenthesis. This content also includes any comments.
You should be careful to not have any dangling quote character within a block quoted text. For example, the following won't work:
test_block_quote { the following parenthesis is ignored because it is inside a quoted string "{". However, if you have some dangling quote characters, those will actually start a quoted string which won't end within the block quoted text and cause the closing parenthesis to be skipped. }This also applies to comments within the block quoted string because comments aren't recognized there.
void LEinit();in order to initialize the lexer. At compile time, you may modify buffer sizes for I/O using the macros
#define LEbufSIZE 256 #define LEinpSIZE 256The former is the initial size of the token buffer. The latter is the size of the input buffer.
After you've initialized the lexer, you can give it some input using:
int LEfile(char *fn); void LEmemory(unsigned char *buf, int len,char *name);The first one uses the given file and returns nonzero if there was a problem opening it. The second one uses a memory block and stores the given name so that you can refer to it later when printing error messages.
After setting the input, you need to call LEnext in order to get the first token. When you're done with the lexer, you should call
void LEclose();in order to close the associated file and reset the lexer.
LEtIDENT= 256, LEtNUMBER, LEtQUOTED, LEtEOF, LEtBADNUM, LEtBADCHAR, LEtUNTERMQ, LEtBADQLEsub will contain the subtype for the token as described in sections regarding numbers and quoted strings.
LEval will contain the string associated with the token.
LEfil and LElin can be used to determine the file name and line number for token position. Note that this isn't accurate for multi-line tokens such as quoted strings.
For tokens signalling end of input or errors, only LEtok is valid, the rest shouldn't be used. In case of an error, the lexer can not recover. The only thing to do is to close the lexer and abandon processing.
When you're done with the current token, you should call LEnext() to get the next token. Calling LEnext() after end of input continues to generate LEtEOF tokens.