From: dtrg <none@none>
Date: Sat, 4 Feb 2006 00:43:32 +0000 (+0000)
Subject: Moved LLgen documentation from the main ACK documentation directory into LLgen's... 
X-Git-Tag: release-6-0-pre-1~147
X-Git-Url: https://git.ndcode.org/public/gitweb.cgi?a=commitdiff_plain;h=953a565a10592f5b063f37f45322f50059e5cf3d;p=ack.git

Moved LLgen documentation from the main ACK documentation directory into LLgen's own documentation directory.
---

diff --git a/util/LLgen/doc/LLgen.1 b/util/LLgen/doc/LLgen.1
new file mode 100644
index 000000000..0436ffa30
--- /dev/null
+++ b/util/LLgen/doc/LLgen.1
@@ -0,0 +1,139 @@
+.\" $Id$
+.TH LLGEN 1 "$Revision$"
+.ad
+.SH NAME
+LLgen, an extended LL(1) parser generator
+.SH SYNOPSIS
+LLgen [ \-vxwans ] [ \-j[\fInum\fP] ] [ \-l\fInum\fP ] [ \-h\fInum\fP ] file ...
+.SH DESCRIPTION
+\fILLgen\fP
+converts a context-free grammar into a set of
+functions which form a recursive descent parser with no backtrack.
+The grammar may be ambiguous;
+ambiguities can be broken by user specifications.
+.PP
+\fILLgen\fP
+reads each
+\fIfile\fP
+in sequence.
+Together, these files must constitute a context-free grammar.
+For each file,
+\fILLgen\fP
+generates an output file, which must be compiled by the
+C-compiler.
+In addition, it generates the files
+\fILpars.c\fP
+and
+\fILpars.h.\fP
+\fILpars.h\fP
+contains the
+\fIdefine\fP
+statements that associate the
+\fILLgen\fP-assigned `token-codes' with user declared `token-names'.
+This allows other source files, for instance the source file
+containing the lexical analyzer,
+to access the token-codes by
+using the token-names.
+\fILpars.c\fP
+contains the error recovery routines and tables. It must also
+be compiled by the C-compiler. When the generated parser uses non-correcting
+error recovery (
+\fB\-n\fP 
+option) 
+\fILLgen\fP
+also generates a file 
+\fILncor.c\fP
+that contains the non-correcting recovery mechanism.
+.PP
+\fILLgen\fP
+will only update those output files that differ from their previous
+version.
+This allows
+\fILLgen\fP
+to be used with
+\fImake\fP
+(1) convieniently.
+.PP
+To obtain a working program, the user must also supply a
+lexical analyzer, as well as
+\fImain\fP
+and
+\fILLmessage\fP,
+an error reporting routine;
+\fILex\fP
+(1) is a useful program for creating lexical analysers usable
+by
+\fILLgen\fP.
+.PP
+\fILLgen\fP accepts the following flags:
+.IP \fB\-v\fP
+create a file called
+\fILL.output\fP,
+which contains a description of the conflicts that
+were not resolved.
+If the flag is given more than once,
+\fILLgen\fP
+will be more "verbose".
+If it is given three times, a complete description of the
+grammar will be supplied.
+.IP \fB\-x\fP
+the sets that are computed are extended with the nonterminal
+symbols and these extended sets are also included in the
+\fILL.output\fP
+file.
+.IP \fB\-w\fP
+no warnings are given.
+.IP \fB\-a\fP
+Produce ANSI C function headers and prototypes.
+.IP \fB\-n\fP
+Produce a parser with non-correcting error recovery.
+.IP \fB\-s\fP
+Simulate the calling of all defined subparsers in all semantic actions. When 
+using non-correcting error recovery, subparsers that are called in semantic
+actions may cause problems; this flag provides a `brute-force' solution. 
+.IP \fB\-j\fP[\fInum\fP]
+when this flag is given, \fILLgen\fP will generate dense switches,
+so that the compiler can generate a jump table for it. This will only be
+done for switches that have density between
+\fIlow_percentage\fP and \fIhigh_percentage\fP, as explained below.
+Usually, compilers generate a jumptable when the density of the switch
+is above a certain threshold. When jump tables are to be used more often,
+\fIhigh_percentage\fP must be set to this threshold, and \fIlow_percentage\fP
+must be set to a minimum threshold. There is a time-space trade-off here.
+.I num
+is the minimum number of cases in a switch for the \fB\-j\fP option to be
+effective. The default value (if
+.I num
+is not given) is 8.
+.IP \fB\-l\fP\fInum\fP
+The \fIlow_percentage\fP, as described above. Default value is 10.
+.IP \fB\-h\fP\fInum\fP
+The \fIhigh_percentage\fP, as described above. Default value is 30.
+.SH FILES
+LL.output                 verbose output file
+.br
+Lpars.c                   the error recovery routines
+.br
+Lncor.c                   non-correcting error recovery mechanism
+.br
+Lpars.h                   defines for token names
+.SH "SEE ALSO"
+\fIlex\fP(1)
+.br
+\fImake\fP(1)
+.br
+\fILLgen, an Extended LL(1) Parser Generator\fP
+by C.J.H. Jacobs.
+.br
+\fITop-down Non-Correcting Error Recovery in LLgen\fP
+by A.W van Deudekom and P.J. Kooiman
+.SH DIAGNOSTICS
+Are intended to be self-explanatory. They are reported
+on standard error. A more detailed report is found in the
+\fILL.output\fP
+file.
+.SH AUTHOR
+Ceriel J. H. Jacobs
+.br
+The non-correcting error recovery mechanism is written by
+A.W van Deudekom and P.J. Kooiman.
diff --git a/util/LLgen/doc/LLgen.n b/util/LLgen/doc/LLgen.n
new file mode 100644
index 000000000..3d9786a5b
--- /dev/null
+++ b/util/LLgen/doc/LLgen.n
@@ -0,0 +1,1077 @@
+.\"	$Id$
+.\"	Run this paper off with
+.\"	refer [options] -p LLgen.refs LLgen.doc | [n]eqn | tbl | (nt)roff -ms
+.if '\*(>.'' \{\
+.	if '\*(<.'' \{\
+.		if n .ds >. .
+.		if n .ds >, ,
+.		if t .ds <. .
+.		if t .ds <, ,\
+\}\
+\}
+.cs 5 22u
+.ND
+.EQ
+delim @@
+.EN
+.TL
+LLgen, an extended LL(1) parser generator
+.AU
+Ceriel J. H. Jacobs
+.AI
+Dept. of Mathematics and Computer Science
+Vrije Universiteit
+Amsterdam, The Netherlands
+.AB
+\fILLgen\fR provides a
+tool for generating an efficient recursive descent parser
+with no backtrack from
+an Extended Context Free syntax.
+The \fILLgen\fR
+user specifies the syntax, together with code
+describing actions associated with the parsing process.
+\fILLgen\fR
+turns this specification into a number of subroutines that handle the
+parsing process.
+.PP
+The grammar may be ambiguous.
+\fILLgen\fR contains both static and dynamic facilities
+to resolve these ambiguities.
+.PP
+The specification can be split into several files, for each of
+which \fILLgen\fR generates an output file containing the
+corresponding part of the parser.
+Furthermore, only output files that differ from their previous
+version are updated.
+Other output files are not affected in any
+way.
+This allows the user to recompile only those output files that have
+changed.
+.PP
+The subroutine produced by \fILLgen\fR calls a user supplied routine
+that must return the next token. This way, the input to the
+parser can be split into single characters or higher level
+tokens.
+.PP
+An error recovery mechanism is generated almost completely
+automatically.
+It is based on so called \fBdefault choices\fR, which are
+implicitly or explicitly specified by the user.
+.PP
+\fILLgen\fR has succesfully been used to create recognizers for
+Pascal, C, and Modula-2.
+.AE
+.NH
+Introduction
+.PP
+\fILLgen\fR
+provides a tool for generating an efficient recursive
+descent parser with no backtrack from an Extended Context Free
+syntax.
+A parser generated by
+\fILLgen\fR
+will be called
+\fILLparse\fR
+for the rest of this document.
+It is assumed that the reader has some knowledge of LL(1) grammars and
+recursive descent parsers.
+For a survey on the subject, see reference
+.[ (
+griffiths
+.]).
+.PP
+Extended LL(1) parsers are an extension of LL(1) parsers. They are
+derived from an Extended Context-Free (ECF) syntax instead of a Context-Free
+(CF) syntax.
+ECF syntax is described in section 2.
+Section 3 provides an outline of a
+specification as accepted by
+\fILLgen\fR and also discusses the lexical conventions of
+grammar specification files.
+Section 4 provides a description of the way the
+\fILLgen\fR
+user can associate
+actions with the syntax. These actions must be written in the programming
+language C,
+.[
+kernighan ritchie
+.]
+which also is the target language of \fILLgen\fR.
+The error recovery technique is discussed in section 5.
+This section also discusses what the user can do about it.
+Section 6 discusses
+the facilities \fILLgen\fR offers
+to resolve ambiguities and conflicts.
+\fILLgen\fR offers facilities to resolve them both at parser
+generation time and during the execution of \fILLparse\fR.
+Section 7 discusses the
+\fILLgen\fR
+working environment.
+It also discusses the lexical analyzer that must be supplied by the
+user.
+This lexical analyzer must read the input stream and break it
+up into basic input items, called \fBtokens\fR for the rest of
+this document.
+Appendix A gives a summary of the
+\fILLgen\fR
+input syntax.
+Appendix B gives an example.
+It is very instructive to compare this example with the one
+given in reference
+.[ (
+yacc
+.]).
+It demonstrates the struggle \fILLparse\fR and other LL(1)
+parsers have with expressions.
+Appendix C gives an example of the \fILLgen\fR features
+allowing the user to recompile only those output files that
+have changed, using the \fImake\fR program.
+.[
+make
+.]
+.NH
+The Extended Context-Free Syntax
+.PP
+The extensions of an ECF syntax with respect to an ordinary CF syntax are:
+.IP 1. 10
+An ECF syntax contains the repetition operator: "N" (N represents a positive
+integer).
+.IP 2. 10
+An ECF syntax contains the closure set operator without and with
+upperbound: "*" and "*N".
+.IP 3. 10
+An ECF syntax contains the positive closure set operator without and with
+upperbound: "+" and "+N".
+.IP 4. 10
+An ECF syntax contains the optional operator: "?", which is a
+shorthand for "*1".
+.IP 5. 10
+An ECF syntax contains parentheses "[" and "]" which can be
+used for grouping.
+.PP
+We can describe the syntax of an ECF syntax with an ECF syntax :
+.DS
+.ft CW
+grammar         : rule +
+                ;
+.ft R
+.DE
+This grammar rule states that a grammar consists of one or more
+rules.
+.DS
+.ft CW
+rule            : nonterminal ':' productionrule ';'
+                ;
+.ft R
+.DE
+A rule consists of a left hand side, the nonterminal,
+followed by ":",
+the \fBproduce symbol\fR, followed by a production rule, followed by a
+";", in\%di\%ca\%ting the end of the rule.
+.DS
+.ft CW
+productionrule  : production [ '|' production ]*
+                ;
+.ft R
+.DE
+A production rule consists of one or
+more alternative productions separated by "|". This symbol is called the
+\fBalternation symbol\fR.
+.DS
+.ft CW
+production      : term *
+                ;
+.ft R
+.DE
+A production consists of a possibly empty list of terms.
+So, empty productions are allowed.
+.DS
+.ft CW
+term            : element repeats
+                ;
+.ft R
+.DE
+A term is an element, possibly with a repeat specification.
+.DS
+.ft CW
+element         : LITERAL
+                | IDENTIFIER
+                | '[' productionrule ']'
+                ;
+.ft R
+.DE
+An element can be a LITERAL, which basically is a single character
+between apostrophes, it can be an IDENTIFIER, which is either a
+nonterminal or a token, and it can be a production rule
+between square parentheses.
+.DS
+.ft CW
+repeats         : '?'
+                | [ '*' | '+' ] NUMBER ?
+                | NUMBER ?
+                ;
+.ft R
+.DE
+These are the repeat specifications discussed above. Notice that
+this specification may be empty.
+.PP
+The class of ECF languages
+is identical with the class of CF languages. However, in many
+cases recursive definitions of language features can now be
+replaced by iterative ones. This tends to reduce the number of
+nonterminals and gives rise to very efficient recursive descent
+parsers.
+.NH
+Grammar Specifications
+.PP
+The major part of a
+\fILLgen\fR
+grammar specification consists of an
+ECF syntax specification.
+Names in this syntax specification refer to either tokens or nonterminal
+symbols.
+\fILLgen\fR
+requires token names to be declared as such. This way it
+can be avoided that a typing error in a nonterminal name causes it to
+be accepted as a token name. The token declarations will be
+discussed later.
+A name will be regarded as a nonterminal symbol, unless it is declared
+as a token name.
+If there is no production rule for a nonterminal symbol, \fILLgen\fR
+will complain.
+.PP
+A grammar specification may also include some C routines,
+for instance the lexical analyzer and an error reporting
+routine.
+Thus, a grammar specification file can contain declarations,
+grammar rules and C-code.
+.PP
+Blanks, tabs and newlines are ignored, but may not appear in names or
+keywords.
+Comments may appear wherever a name is legal (which is almost
+everywhere).
+They are enclosed in
+/* ... */, as in C. Comments do not nest.
+.PP
+Names may be of arbitrary length, and can be made up of letters, underscore
+"\_" and non-initial digits. Upper and lower case letters are distinct.
+Only the first 50 characters are significant.
+Notice however, that the names for the tokens will be used by the
+C-preprocessor.
+The number of significant characters therefore depends on the
+underlying C-implementation.
+A safe rule is to make the identifiers distinct in the first six
+characters, case ignored.
+.PP
+There are two kinds of tokens:
+those that are declared and are denoted by a name,
+and literals.
+.PP
+A literal consists of a character enclosed in apostrophes "'".
+The "\e" is an escape character within literals. The following escapes
+are recognized :
+.TS
+center;
+l l.
+\&'\en'	newline
+\&'\er'	return
+\&'\e''	apostrophe "'"
+\&'\e\e'	backslash "\e"
+\&'\et'	tab
+\&'\eb'	backspace
+\&'\ef'	form feed
+\&'\exxx'	"xxx" in octal
+.TE
+.PP
+Names representing tokens must be declared before they are used.
+This can be done using the "\fB%token\fR" keyword,
+by writing
+.nf
+.ft CW
+.sp 1
+%token  name1, name2, . . . ;
+.ft R
+.fi
+.PP
+\fILLparse\fR is designed to recognize special nonterminal
+symbols called \fBstart symbols\fR.
+\fILLgen\fR allows for more than one start symbol.
+Thus, grammars with more than one entry point are accepted.
+The start symbols must be declared explicitly using the
+"\fB%start\fR" keyword. It can be used whenever a declaration is
+legal, f.i.:
+.nf
+.ft CW
+.sp 1
+%start LLparse, specification ;
+.ft R
+.fi
+.sp 1
+declares "specification" as a start symbol and associates the
+identifier "LLparse" with it.
+"LLparse" will now be the name of the C-function that must be
+called to recognize "specification".
+.NH
+Actions
+.PP
+\fILLgen\fR
+allows arbitrary insertions of actions within the right hand side
+of a production rule in the ECF syntax. An action consists of a number of C
+statements, enclosed in the brackets "{" and "}".
+.PP
+\fILLgen\fR
+generates a parsing routine for each rule in the grammar. The actions
+supplied by the user are just inserted in the proper place.
+There may also be declarations before the statements in the
+action, as
+the "{" and "}" are copied into the target code along with the
+action. The scope of these declarations terminates with the
+closing bracket "}" of the action.
+.PP
+In addition to actions, it is also possible to declare local variables
+in the parsing routine, which can then be used in the actions.
+Such a declaration consists of a number of C variable declarations,
+enclosed in the brackets "{" and "}". It must be placed
+right in front of the ":" in the grammar rule.
+The scope of these local variables consists of the complete
+grammar rule.
+.PP
+In order to facilitate communication between the actions and
+\fILLparse\fR,
+the parsing routines can be given C-like parameters.
+Each parameter must be declared separately, and each of these declarations must
+end with a semicolon.
+For the last parameter, the semicolon is optional.
+.PP
+So, for example
+.nf
+.ft CW
+.sp 1
+expr(int *pval;) { int fact; } :
+                /*
+                 * Rule with one parameter, a pointer to an int.
+                 * Parameter specifications are ordinary C declarations.
+                 * One local variable, of type int.
+                 */
+        factor (&fact)          { *pval = fact; }
+                /*
+                 * factor is another nonterminal symbol.
+                 * One actual parameter is supplied.
+                 * Notice that the parameter passing mechanism is that
+                 * of C.
+                 */
+        [ '+' factor (&fact)    { *pval += fact; } ]*
+                /*
+                 * remember the '*' means zero or more times
+                 */
+        ;
+.sp 1
+.ft R
+.fi
+is a rule to recognize a number of factors, separated by "+", and
+to compute their sum.
+.PP
+\fILLgen\fR
+generates C code, so the parameter passing mechanism is that of
+C, as is shown in the example above.
+.PP
+Actions often manipulate attributes of the token just read.
+For instance, when an identifier is read, its name must be
+looked up in a symbol table.
+Therefore, \fILLgen\fR generates code
+such that at a number of places in the grammar rule
+it is defined which token has last been read.
+After a token, the last token read is this token.
+After a "[" or a "|", the last token read is the next token to
+be accepted by \fILLparse\fR.
+At all other places, it is undefined which token has last been
+read.
+The last token read is available in the global integer variable
+\fILLsymb\fR.
+.PP
+The user may also specify C-code wherever a \fILLgen\fR-declaration is
+legal.
+Again, this code must be enclosed in the brackets "{" and "}".
+This way, the user can define global declarations and
+C-functions.
+To avoid name-conflicts with identifiers generated by
+\fILLgen\fR, \fILLparse\fR only uses names beginning with
+"LL"; the user should avoid such names.
+.NH
+Error Recovery
+.PP
+The error recovery technique used by \fILLgen\fR is a
+modification of the one presented in reference
+.[ (
+automatic construction error correcting
+.]).
+It is based on \fBdefault choices\fR, which just are
+what the word says, default choices at
+every point in the grammar where there is a
+choice.
+Thus, in an alternation, one of the productions is marked as a
+default choice, and in a term with a non-fixed repetition
+specification there will also be a default choice (between
+doing the term (once more) and continuing with the rest of the
+production in which the term appears).
+.PP
+When \fILLparse\fR detects an error after having parsed the
+string @s@, the default choices enable it to compute one
+syntactically correct continuation,
+consisting of the tokens @t sub 1~...~t sub n@,
+such that @s~t sub 1~...~t sub n@ is a string of tokens that
+is a member of the language defined by the grammar.
+Notice, that the computation of this continuation must
+terminate, which implies that the default choices may not
+invoke recursive rules.
+.PP
+At each point in this continuation, a certain number of other
+tokens could also be syntactically correct, f.i. the token
+@t@ is syntactically correct at point @t sub i@ in this
+continuation, if the string @s~t sub 1~...~t sub i~t~s sub 1@
+is a string of the language defined by the grammar for some
+string @s sub 1@ and i >= 0.
+.PP
+The set @T@
+containing all these tokens (including @t sub 1 ,~...,~t sub n@) is computed.
+Next, \fILLparse\fR discards zero
+or more tokens from its input, until a token
+@t@ \(mo @T@ is found.
+The error is then corrected by inserting i (i >= 0) tokens
+@t sub 1~...~t sub i@, such that the string
+@s~t sub 1~...~t sub i~t~s sub 1@ is a string of the language
+defined by the grammar, for some @s sub 1@.
+Then, normal parsing is resumed.
+.PP
+The above is difficult to implement in a recursive decent
+parser, and is not the way \fILLparse\fR does it, but the
+effect is the same. In fact, \fILLparse\fR maintains a list
+of tokens that may not be discarded, which is adjusted as
+\fILLparse\fR proceeds. This list is just a representation
+of the set @T@ mentioned
+above. When an error occurs, \fILLparse\fR discards tokens until
+a token @t@ that is a member of this list is found.
+Then, it continues parsing, following the default choices,
+inserting tokens along the way, until this token @t@ is legal.
+The selection of
+the default choices must guarantee that this will always
+happen.
+.PP
+The default choices are explicitly or implicitly
+specified by the user.
+By default, the default choice in an alternation is the
+alternative with the shortest possible terminal production.
+The user can select one of the other productions in the
+alternation as the default choice by putting the keyword
+"\fB%default\fR" in front of it.
+.PP
+By default, for terms with a repetition count containing "*" or
+"?" the default choice is to continue with the rest of the rule
+in which the term appears, and
+.sp 1
+.ft CW
+.nf
+                term+
+.fi
+.ft R
+.sp 1
+is treated as
+.sp 1
+.nf
+.ft CW
+                term term* .
+.ft R
+.fi
+.PP
+It is also clear, that it can never be the default choice to do
+the term (once more), because this could cause the parser to
+loop, inserting tokens forever.
+However, when the user does not want the parser to skip
+tokens that would not have been skipped if the term
+would have been the default choice,
+the skipping of such a term can be prevented by
+using the keyword "\fB%persistent\fR".
+For instance, the rule
+.sp 1
+.ft CW
+.nf
+commandlist : command* ;
+.fi
+.ft R
+.sp 1
+could be changed to
+.sp 1
+.ft CW
+.nf
+commandlist : [ %persistent command ]* ;
+.fi
+.ft R
+.sp 1
+The effects of this in case of a syntax error are twofold:
+The set @T@ mentioned above will be extended as if "command" were
+in the default production, so that fewer tokens will be
+skipped.
+Also, if the first token that is not skipped is a member of the
+subset of @T@ arising from the grammar rule for "command",
+\fILLparse\fR will enter that rule.
+So, in fact the default choice
+is determined dynamically (by \fILLparse\fR).
+Again, \fILLgen\fR checks (statically)
+that \fILLparse\fR will always terminate, and if not,
+\fILLgen\fR will complain.
+.PP
+An important property of this error recovery method is that,
+once a rule is started, it will be finished.
+This means that all actions in the rule will be executed
+normally, so that the user can be sure that there will be no
+inconsistencies in his data structures because of syntax
+errors.
+Also, as the method is in fact error correcting, the
+actions in a rule only have to deal with syntactically correct
+input.
+.NH
+Ambiguities and conflicts
+.PP
+As \fILLgen\fR generates a recursive descent parser with no backtrack,
+it must at all times be able to determine what to do,
+based on the current input symbol.
+Unfortunately, this cannot be done for all grammars.
+Two kinds of conflicts can arise :
+.IP 1) 10
+the grammar rule is of the form "production1 | production2",
+and \fILLparse\fR cannot decide which production to chose.
+This we call an \fBalternation conflict\fR.
+.IP 2) 10
+the grammar rule is of the form "[ productionrule ]...",
+where ... specifies a non-fixed repetition count,
+and \fILLparse\fR cannot decide whether to
+choose "productionrule" once more, or to continue.
+This we call a \fBrepetition conflict\fR.
+.PP
+There can be several causes for conflicts: the grammar may be
+ambiguous, or the grammar may require a more complex parser
+than \fILLgen\fR can construct.
+The conflicts can be examined by inspecting the verbose
+(-\fBv\fR) option output file.
+The conflicts can be resolved by rewriting the grammar
+or by using \fBconflict resolvers\fR.
+The mechanism described here is based on the attributed parsing
+of reference
+.[ (
+milton
+.]).
+.PP
+An alternation conflict can be resolved by putting an \fBif condition\fR
+in front of the first conflicting production.
+It consists of a "\fB%if\fR" followed by a
+C-expression between parentheses.
+\fILLparse\fR will then evaluate this expression whenever a
+token is met at this point on which there is a conflict, so
+the conflict will be resolved dynamically.
+If the expression evaluates to
+non-zero, the first conflicting production is chosen,
+otherwise one of the remaining ones is chosen.
+.PP
+An alternation conflict can also be resolved using the keywords
+"\fB%prefer\fR" or "\fB%avoid\fR". "\fB%prefer\fR"
+is equivalent in behaviour to
+"\fB%if\fR (1)". "\fB%avoid\fR" is equivalent to "\fB%if\fR (0)".
+In these cases however, "\fB%prefer\fR" and "\fB%avoid\fR" should be used,
+as they resolve the conflict statically and thus
+give rise to better C-code.
+.PP
+A repetition conflict can be resolved by putting a \fBwhile condition\fR
+right after the opening parentheses. This while condition
+consists of a "\fB%while\fR" followed by a C-expression between
+parentheses. Again, \fILLparse\fR will then
+evaluate this expression whenever a token is met
+at this point on which there is a conflict.
+If the expression evaluates to non-zero, the
+repeating part is chosen, otherwise the parser continues with
+the rest of the rule.
+Appendix B will give an example of these features.
+.PP
+A useful aid in writing conflict resolvers is the "\fB%first\fR" keyword.
+It is used to declare a C-macro that forms an expression
+returning 1 if the parameter supplied can start a specified
+nonterminal, f.i.:
+.sp 1
+.nf
+.ft CW
+%first fmac, nonterm ;
+.ft R
+.sp 1
+.fi
+declares "fmac" as a macro with one parameter, whose value
+is a token number. If the parameter
+X can start the nonterminal "nonterm", "fmac(X)" is true,
+otherwise it is false.
+.NH
+The LLgen working environment
+.PP
+\fILLgen\fR generates a number of files: one for each input
+file, and two other files: \fILpars.c\fR and \fILpars.h\fR.
+\fILpars.h\fR contains "#-define"s for the tokennames.
+\fILpars.c\fR contains the error recovery routines and tables.
+Only those output files that differ from their previous version
+are updated. See appendix C for a possible application of this
+feature.
+.PP
+The names of the output files are constructed as
+follows:
+in the input file name, the suffix after the last point is
+replaced by a "c". If no point is present in the input file
+name, ".c" is appended to it. \fILLgen\fR checks that the
+filename constructed this way in fact represents a previous
+version, or does not exist already.
+.PP
+The user must provide some environment to obtain a complete
+program.
+Routines called \fImain\fR and \fILLmessage\fR must be defined.
+Also, a lexical analyzer must be provided.
+.PP
+The routine \fImain\fR must be defined, as it must be in every
+C-program. It should eventually call one of the startsymbol
+routines.
+.PP
+The routine \fILLmessage\fR must accept one
+parameter, whose value is a token number, zero or -1.
+.br
+A zero parameter indicates that the current token (the one in
+the external variable \fILLsymb\fR) is deleted.
+.br
+A -1 parameter indicates that the parser expected end of file, but didn't get
+it.
+The parser will then skip tokens until end of file is detected.
+.br
+A parameter that is a token number (a positive parameter)
+indicates that this
+token is to be inserted in front of the token currently in
+\fILLsymb\fR.
+The user can give the token the proper attributes.
+Also, the user must take care, that the token currently in
+\fILLsymb\fR is again returned by the \fBnext\fR call to the
+lexical analyzer, with the proper attributes.
+So, the lexical analyzer must have a facility to push back one
+token.
+.PP
+The user may also supply his own error recovery routines, or handle
+errors differently. For this purpose, the name of a routine to be called
+when an error occurs may be declared using the keyword \fB%onerror\fR.
+This routine takes two parameters.
+The first one is either the token number of the
+token expected, or 0. In the last case, the error occurred at a choice.
+In both cases, the routine must ensure that the next call to the lexical
+analyser returns the token that replaces the current one. Of course,
+that could well be the current one, in which case
+.I LLparse
+recovers from the error.
+The second parameter contains a list of tokens that are not skipped at the
+error point. The list is in the form of a null-terminated array of integers,
+whose address is passed.
+.PP
+The user must supply a lexical analyzer to read the input stream and
+break it up into tokens, which are passed to
+.I LLparse.
+It should be an integer valued function, returning the token number.
+The name of this function can be declared using the
+"\fB%lexical\fR" keyword.
+This keyword can be used wherever a declaration is legal and may appear
+only once in the grammar specification, f.i.:
+.sp 1
+.nf
+.ft CW
+%lexical scanner ;
+.ft R
+.fi
+.sp 1
+declares "scanner" as the name of the lexical analyzer.
+The default name for the lexical analyzer is "yylex".
+The reason for this funny name is that a useful tool for constructing
+lexical analyzers is the
+.I Lex
+program,
+.[
+lex
+.]
+which generates a routine of that name.
+.PP
+The token numbers are chosen by \fILLgen\fR.
+The token number for a literal
+is the numerical value of the character in the local character set.
+If the tokens have a name,
+the "#\ define" mechanism of C is used to give them a value and
+to allow the lexical analyzer to return their token numbers symbolically.
+These "#\ define"s are collected in the file \fILpars.h\fR which
+can be "#\ include"d in any file that needs the token-names.
+The maximum token number chosen is defined in the macro \fILL_MAXTOKNO\fP.
+.PP
+The lexical analyzer must signal the end
+of input to \fILLparse\fR
+by returning a number less than or equal to zero.
+.NH
+Programs with more than one parser
+.PP
+\fILLgen\fR offers a simple facility for having more than one parser in
+a program: in this case, the user can change the names of global procedures,
+variables, etc, by giving a different prefix, like this:
+.sp 1
+.nf
+.ft CW
+%prefix XX ;
+.ft R
+.fi
+.sp 1
+The effect of this is that all global names start with XX instead of LL, for
+the parser that has this prefix. This holds for the variables \fILLsymb\fP,
+which now is called \fIXXsymb\fP, for the routine \fILLmessage\fP,
+which must now be called \fIXXmessage\fP, and for the macro \fILL_MAXTOKNO\fP,
+which is now called \fIXX_MAXTOKNO\fP.
+\fILL.output\fP is now \fIXX.output\fP, and \fILpars.c\fP and \fILpars.h\fP
+are now called \fIXXpars.c\fP and \fIXXpars.h\fP.
+.bp
+.SH
+References
+.[
+$LIST$
+.]
+.bp
+.SH
+Appendix A : LLgen Input Syntax
+.PP
+This appendix has a description of the \fILLgen\fR input syntax,
+as a \fILLgen\fR specification. As a matter of fact, the current
+version of \fILLgen\fR is written with \fILLgen\fR.
+.nf
+.ft CW
+.sp 2
+/*
+ * First the declarations of the terminals
+ * The order is not important
+ */
+
+%token  IDENTIFIER;            /* terminal or nonterminal name */
+%token  NUMBER;
+%token  LITERAL;
+
+/*
+ * Reserved words
+ */
+
+%token  TOKEN;         /* %token */
+%token  START;         /* %start */
+%token  PERSISTENT;    /* %persistent */
+%token  IF;            /* %if */
+%token  WHILE;         /* %while */
+%token  AVOID;         /* %avoid */
+%token  PREFER;        /* %prefer */
+%token  DEFAULT;       /* %default */
+%token  LEXICAL;       /* %lexical */
+%token  PREFIX;        /* %prefix */
+%token  ONERROR;       /* %onerror */
+%token  FIRST;         /* %first */
+
+/*
+ * Declare LLparse to be a C-routine that recognizes "specification"
+ */
+
+%start  LLparse, specification;
+
+specification
+        : declaration*
+        ;
+
+declaration
+        : START
+                IDENTIFIER ',' IDENTIFIER
+          ';'
+        | '{'
+                /* Read C-declaration here */
+          '}'
+        | TOKEN
+                IDENTIFIER
+                [ ',' IDENTIFIER ]*
+          ';'
+        | FIRST
+                IDENTIFIER ',' IDENTIFIER
+          ';'
+        | LEXICAL
+                IDENTIFIER
+          ';'
+        | PREFIX
+                IDENTIFIER
+          ';'
+        | ONERROR
+                IDENTIFIER
+	  ';'
+        | rule
+        ;
+
+rule    : IDENTIFIER parameters? ldecl?
+                ':' productions
+          ';'
+        ;
+
+ldecl   : '{'
+                /* Read C-declaration here */
+          '}'
+        ;
+
+productions
+        : simpleproduction
+          [ '|' simpleproduction ]*
+        ;
+
+simpleproduction
+        : DEFAULT?
+	  [ IF '(' /* Read C-expression here */ ')'
+          | PREFER
+          | AVOID
+          ]?
+          [ element repeats ]*
+        ;
+
+element : '{'
+                /* Read action here */
+          '}'
+        | '[' [ WHILE '(' /* Read C-expression here */ ')' ]?
+                PERSISTENT?
+                productions
+          ']'
+        | LITERAL
+        | IDENTIFIER parameters?
+        ;
+
+parameters
+        : '(' /* Read C-parameters here */ ')'
+        ;
+
+repeats : /* empty */
+        | [ '*' | '+' ] NUMBER?
+        | NUMBER
+        | '?'
+        ;
+
+.fi
+.ft R
+.bp
+.SH
+Appendix B : An example
+.PP
+This example gives the complete \fILLgen\fR specification of a simple
+desk calculator. It has 26 registers, labeled "a" through "z",
+and accepts arithmetic expressions made up of the C operators
++, -, *, /, %, &, and |, with their usual priorities.
+The value of the expression is
+printed. As in C, an integer that begins with 0 is assumed to
+be octal; otherwise it is assumed to be decimal.
+.PP
+Although the example is short and not very complicated, it
+demonstrates the use of if and while conditions. In
+the example they are in fact used to reduce the number of
+nonterminals, and to reduce the overhead due to the recursion
+that would be involved in parsing an expression with an
+ordinary recursive descent parser. In an ordinary LL(1)
+grammar there would be one nonterminal for each operator
+priority. The example shows how we can do it all with one
+nonterminal, no matter how many priority levels there are.
+.sp 1
+.nf
+.ft CW
+{
+#include <stdio.h>
+#include <ctype.h>
+#define MAXPRIO      5
+#define prio(op)     (ptab[op])
+
+struct token {
+        int     t_tokno;        /* token number */
+        int     t_tval;         /* Its attribute */
+} stok = { 0,0 }, tok;
+
+int     nerrors = 0;
+int     regs[26];               /* Space for the registers */
+int     ptab[128];              /* Attribute table */
+
+struct token
+nexttok() {  /* Read next token and return it */
+        register        c;
+        struct token    new;
+
+        while ((c = getchar()) == ' ' || c == '\et') { /* nothing */ }
+        if (isdigit(c)) new.t_tokno = DIGIT;
+        else if (islower(c)) new.t_tokno = IDENT;
+        else new.t_tokno = c;
+        if (c >= 0) new.t_tval = ptab[c];
+        return new;
+}   }
+
+%token  DIGIT, IDENT;
+%start  parse, list;
+
+list    : stat* ;
+
+stat    {       int     ident, val; } :
+        %if (stok = nexttok(),
+             stok.t_tokno == '=')
+                    /* The conflict is resolved by looking one further
+                     * token ahead. The grammar is LL(2)
+                     */
+          IDENT
+                                {       ident = tok.t_tval; }
+          '=' expr(1,&val) '\en'
+                                {       if (!nerrors) regs[ident] = val; }
+        | expr(1,&val) '\en'
+                                {       if (!nerrors) printf("%d\en",val); }
+        | '\en'
+        ;
+
+expr(int level; int *val;) {       int     expr; } :
+          factor(val)
+          [ %while (prio(tok.t_tokno) >= level)
+                    /* Swallow operators as long as their priority is
+                     * larger than or equal to the level of this invocation
+                     */
+              '+' expr(prio('+')+1,&expr)
+                                {       *val += expr; }
+                    /* This states that '+' groups left to right. If it
+                     * should group right to left, the rule should read:
+                     * '+' expr(prio('+'),&expr)
+                     */
+            | '-' expr(prio('-')+1,&expr)
+                                {       *val -= expr; }
+            | '*' expr(prio('*')+1,&expr)
+                                {       *val *= expr; }
+            | '/' expr(prio('/')+1,&expr)
+                                {       *val /= expr; }
+            | '%' expr(prio('%')+1,&expr)
+                                {       *val %= expr; }
+            | '&' expr(prio('&')+1,&expr)
+                                {       *val &= expr; }
+            | '|' expr(prio('|')+1,&expr)
+                                {       *val |= expr; }
+          ]*
+                    /* Notice the "*" here. It is important.
+                     */
+	;
+
+factor(int *val;):
+            '(' expr(1,val) ')'
+          | '-' expr(MAXPRIO+1,val)
+                                {       *val = -*val; }
+          | number(val)
+          | IDENT
+                                {       *val = regs[tok.t_tval]; }
+        ;
+
+number(int *val;) {       int base; }
+        : DIGIT
+                                {       base = (*val=tok.t_tval)==0?8:10; }
+          [ DIGIT
+                                {       *val = base * *val + tok.t_tval; }
+          ]*        ;
+
+%lexical scanner ;
+{
+scanner() {
+        if (stok.t_tokno) { /* a token has been inserted or read ahead */
+                tok = stok;
+                stok.t_tokno = 0;
+                return tok.t_tokno;
+        }
+        if (nerrors && tok.t_tokno == '\en') {
+                printf("ERROR\en");
+                nerrors = 0;
+        }
+        tok = nexttok();
+        return tok.t_tokno;
+}
+
+LLmessage(insertedtok) {
+        nerrors++;
+        if (insertedtok) { /* token inserted, save old token */
+                stok = tok;
+                tok.t_tval = 0;
+                if (insertedtok < 128) tok.t_tval = ptab[insertedtok];
+        }
+}
+
+main() {
+        register *p;
+
+        for (p = ptab; p < &ptab[128]; p++) *p = 0;
+        /* for letters, their attribute is their index in the regs array */
+        for (p = &ptab['a']; p <= &ptab['z']; p++) *p = p - &ptab['a'];
+        /* for digits, their attribute is their value */
+        for (p = &ptab['0']; p <= &ptab['9']; p++) *p = p - &ptab['0'];
+        /* for operators, their attribute is their priority */
+        ptab['*'] = 4;
+        ptab['/'] = 4;
+        ptab['%'] = 4;
+        ptab['+'] = 3;
+        ptab['-'] = 3;
+        ptab['&'] = 2;
+        ptab['|'] = 1;
+        parse();
+	exit(nerrors);
+}   }
+.fi
+.ft R
+.bp
+.SH
+Appendix C. How to use \fILLgen\fR.
+.PP
+This appendix demonstrates how \fILLgen\fR can be used in
+combination with the \fImake\fR program, to make effective use
+of the \fILLgen\fR-feature that it only changes output files
+when neccessary. \fIMake\fR uses a "makefile", which
+is a file containing dependencies and associated commands.
+A dependency usually indicates that some files depend on other
+files. When a file depends on another file and is older than
+that other file, the commands associated with the dependency
+are executed.
+.PP
+So, \fImake\fR seems just the program that we always wanted.
+However, it
+is not very good in handling programs that generate more than
+one file.
+As usual, there is a way around this problem.
+A sample makefile follows:
+.sp 1
+.ft CW
+.nf
+# The grammar exists of the files decl.g, stat.g and expr.g.
+# The ".o"-files are the result of a C-compilation.
+
+GFILES = decl.g stat.g expr.g
+OFILES = decl.o stat.o expr.o Lpars.o
+LLOPT =
+
+# As make does'nt handle programs that generate more than one
+# file well, we just don't tell make about it.
+# We just create a dummy file, and touch it whenever LLgen is
+# executed. This way, the dummy in fact depends on the grammar
+# files.
+# Then, we execute make again, to do the C-compilations and
+# such.
+
+all:	dummy
+        make parser
+
+dummy:  $(GFILES)
+        LLgen $(LLOPT) $(GFILES)
+        touch dummy
+
+parser: $(OFILES)
+        $(CC) -o parser $(LDFLAGS) $(OFILES)
+
+# Some dependencies without actions :
+# make already knows what to do about them
+
+Lpars.o:        Lpars.h
+stat.o:         Lpars.h
+decl.o:         Lpars.h
+expr.o:         Lpars.h
+
+.fi
+.ft R
diff --git a/util/LLgen/doc/LLgen.refs b/util/LLgen/doc/LLgen.refs
new file mode 100644
index 000000000..df73595b8
--- /dev/null
+++ b/util/LLgen/doc/LLgen.refs
@@ -0,0 +1,54 @@
+%T An ALL(1) Compiler Generator
+%A D. R. Milton
+%A L. W. Kirchhoff
+%A B. R. Rowland
+%B Proc. of the SIGPLAN '79 Symposium on Compiler Construction
+%D August 1979 
+%J SIGPLAN Notices
+%N 8
+%P 152-157
+%V 14
+
+%T Lex - A Lexical Analyser Generator
+%A M. E. Lesk
+%I Bell Laboratories
+%D October 1975
+%C Murray Hill, New Jersey
+%R Comp. Sci. Tech. Rep. No. 39
+
+%T Yacc: Yet Another Compiler Compiler
+%A S. C. Johnson
+%I Bell Laboratories
+%D 1975
+%C Murray Hill, New Jersey
+%R Comp. Sci. Tech. Rep. No. 32
+
+%T The C Programming Language
+%A B. W. Kernighan
+%A D. M. Ritchie
+%I Prentice-Hall, Inc.
+%C Englewood Cliffs, New Jersey
+%D 1978
+
+%A M. Griffiths
+%T LL(1) Grammars and Analysers
+%E F. L. Bauer and J. Eickel
+%B Compiler Construction, An Advanced Course
+%I Springer-Verlag
+%C New York, N.Y.
+%D 1974
+
+%T Make - A Program for Maintaining Computer Programs
+%A S. I. Feldman
+%J Software - Practice and Experience
+%V 10
+%N 8
+%P 255-265
+%D August 1979
+
+%T Methods for the Automatic Construction of Error Correcting Parsers
+%A J. R\*:ohrich
+%J Acta Informatica
+%V 13
+%P 115-139
+%D 1980
diff --git a/util/LLgen/doc/LLgen_NCER.n b/util/LLgen/doc/LLgen_NCER.n
new file mode 100644
index 000000000..3693a1525
--- /dev/null
+++ b/util/LLgen/doc/LLgen_NCER.n
@@ -0,0 +1,2712 @@
+.RP
+.TL
+
+
+
+Top-down Non-Correcting Error Recovery 
+ in LLgen
+.AU
+Arthur van Deudekom
+Peter Kooiman
+.AI
+Department of Mathematics and Computer Science
+Vrije Universiteit 
+Amsterdam
+
+
+
+
+
+Supervised by
+.AU
+dr. D. Grune
+.AI
+Department of Mathematics and Computer Science
+Vrije Universiteit
+Amsterdam
+
+.AB
+This paper describes the design and implementation of a parser
+generator with non-correcting error recovery based on the extended LL(1)
+parser generator LLgen. It describes a top-down algorithm for implementing 
+this error recovery technique that can handle left-recursive grammars. 
+The parser generator has been tested with several existing ACK-compilers, 
+among which C and Modula-2. Various optimizations have been tried and are
+discussed in this paper. 
+.AE
+.LP
+.nr PS 12
+.nr VS 14
+
+.NH
+Introduction
+.EQ
+delim $$
+.EN
+
+.nr PS 10
+.nr VS 12
+.RS
+.LP
+One of the trickier problems in constructing parser-generators is what
+to do when the input to the generated parser is not well formed. Several
+approaches are known, most of which are `correcting', meaning that they
+modify the input to make it correct. However, in most cases there are
+several possible corrections, and often the one chosen will turn out
+to be the wrong one. As a result of such an incorrect choice, spurious error 
+messages can occur. Every programmer knows from experience how the omission 
+of a single `)' can on occasion lead to pages of error messages. 
+
+.LP
+A radically different approach is to just discard all the input up to
+and including the offending token, and start with a clean slate at the
+token following the offending one. [RICHTER] describes how
+this idea can be used to construct a non-correcting error recovery system
+that will never introduce spurious error messages. It is, however,
+possible that errors are overlooked.
+
+.LP
+In this paper we describe the incorporation of this non-correcting error
+recovery into LLgen, an existing LL(1) parser generator.
+In this introduction, we will describe in detail this non-correcting error
+recovery technique, give an overview of LLgen and how it handles
+errors, and finally describe how we have incorporated noncorrecting
+error recovery in LLgen.
+.RE
+
+.NH 2
+Non-correcting syntax error recovery
+
+.LP
+Richter describes how syntax error recovery can be done
+without making any corrections to the input text. Richter gives three
+reasons why recovery without correction is desirable:
+
+.IP 1
+In most cases there are many possible corrections, the choice among which
+will severely influence the further processing of the input. Thus, the
+probability of selecting the right correction is not high.
+
+.IP 2
+The harm done by selecting the wrong correction is often unlimited.
+
+.IP 3
+The loss of information to the user of a non-correcting recovery technique
+need not be grave.
+
+.LP
+The non-correcting technique described by Richter can be summarized as
+follows: When a syntax-error has occurred, the input up to and including the
+erroneous symbol is discarded; the remainder of the 
+input is processed by a substring parser of the input
+language, that is a parser that recognizes any substring of a string in the input 
+language. When the substring parser detects a syntax error, the offending 
+symbol is reported as another error, and the input up to and including the 
+erroneous symbol is discarded. The process is then repeated with the remaining input, possibly
+finding other syntax errors, until all the input is scanned.
+This process yields what Richter calls a 
+.I 
+suffix analysis 
+.R
+of an input string. Formally, given an input string 
+.I x
+, suffix analysis produces a set of strings $w sub k$ and a set of symbols
+$ a sub k$ such that
+.br
+
+.IP
+$x~ =~ w sub 0 a sub 0 w sub 1 a sub 1~...w sub n-1 a sub n-1 w sub n$
+.LP
+and such that:
+.br
+.IP
+ $w sub 0$ is the longest prefix of $x$ that is  a prefix of
+a string in the input language L, formally: there is a string $y$ such that 
+$w sub 0 y$ is in  L, but there is no string $z$ such that $w sub 0 a sub 0 z$
+is in L;
+.IP
+For $0 < k < n$, $w sub k$ is a longest substring of $x$ that is also a
+substring of a string in L, formally there are strings $u$ and $v$ such that 
+$u w sub k v$ is in L, but there are no strings $y$ en $z$ such that 
+$y w sub k a sub k z$ is in L; 
+.IP
+$w sub n$ is a substring of $x$ 
+that is a substring of a string in L, formally:
+there exist $u$ and $v$, such that $u w sub n v$ is in L. Note that
+$w sub n$ need not be a suffix of a string in L, if $x$ represents incomplete
+input $w sub n$ is not a suffix of a string in L.
+
+.LP
+Now, the $a sub k$ indicate points at which an error is detected. The
+"real" error need not be at $a sub k$, it can have occurred anywhere
+within $w sub k a sub k$.
+In his paper, Richter shows that, although this method may miss errors, it 
+will never introduce spurious errors.
+
+.LP
+For implementing the technique, a parser that recognizes any
+substring of the input language is needed. If we confine ourselves to
+syntactical analysis, it is sufficient to construct a substring
+recognizer. Richter himself does not give a practical construction, but
+[CORMACK] describes how a LR substring parser can be constructed
+that handles BC-LR(1,1) grammars. In this paper, we describe the construction
+of a LL substring recognizer that can handle any grammar. Furthermore,
+our recognizer is actually a suffix-recognizer, that is, a recognizer that 
+recognizes any suffix of a string in the input language. Our suffix recognizer has the
+correct-prefix property, 
+meaning that it detects the first syntax error as early as possible
+in a left-to-right scan of the input. Specifically, if the input language
+is L and the invalid input is $x$ , it finds a string $w$ and an input symbol 
+$a$ such that $x = way$  , there is a string $z$ such that $wz$
+is in L, and there is no string $z$ such that $waz$ is in L.
+Because the suffix parser has this correct-prefix property, it can be
+used as a substring parser, because it will detect the first input symbol that
+is not part of a substring of the language. Because it is a suffix-recognizer,
+it additionally will detect incomplete input, because in that case 
+at the end of the input the parser will not be in an accepting state.
+
+.NH 2
+Overview of LLgen
+
+.LP
+LLgen is an extended LL(1) parser generator. For a complete description, 
+see [GRUNE].
+LLgen can actually handle grammars that are not LL(1), because it allows
+the use of conflict-resolvers. In case of an LL(1) conflict, these resolvers
+are used to statically or dynamically decide which rule to use. As we will see
+later, this feature makes it necessary for the suffix-recognizer to 
+handle grammars that are not LL(1). Semantic actions can occur anywhere
+in the grammar rules, and they are executed when their position is 
+reached during parsing. A typical LLgen rule looks like
+.br
+.IP
+S:	A {
+.I action
+} B
+.LP
+where the action is a piece of C-code, that will be executed
+when the parser is using the rule for S and has recognized A.
+
+.LP
+LLgen-generated parsers use correcting syntax error recovery, based on a
+scheme designed by R\*:ohrich [ROEHRICH], inserting or deleting symbols at the point of error detection
+until correct input results. This means that actions in the parser will
+always be executed in an order that could also have resulted from
+syntactically correct input, and most importantly, once a grammar-rule
+is started it is guaranteed to be completed. This means that syntactic
+errors can never result in inconsistencies for the actions. Actions
+only have to deal with syntactically correct input. In a nutshell, the
+error recovery in LLgen-parsers works as follows: Suppose the parser is
+presented with correct input that breaks off before the end. The error
+recovery mechanism now provides a continuation path, chosen in such a
+way that all active rules are left as soon as possible. Effectively, the
+continuation path is the `shortest way out'. The symbols on this path are
+called `acceptable', and end-of-file is also `acceptable'. Furthermore, at
+each point along this `shortest path' there can be other terminals that
+would be correct; these are `acceptable' as well. Now, when an
+error occurs, all symbols that are not acceptable are discarded, until 
+an acceptable symbol appears in the input. The tokens on the path up to 
+but not including the acceptable input symbol are inserted. 
+From then on, normal parsing resumes.
+
+.NH 2
+Incorporation of non-correcting error recovery in LLgen
+
+.LP
+An important consideration in incorporating the non-correcting recovery
+in LLgen was that correct programs should suffer as little as possible
+in what regards compilation speed. Furthermore, the existing error
+recovery method has the highly desirable property that rules that are
+started will be finished too, thus ensuring that errors in the
+input text will not cause inconsistencies in the semantic actions. We have
+implemented the non-correcting error recovery in such a way that this 
+property is preserved.
+
+.LP
+The way we have achieved these goals is by actually including
+the suffix recognizer as a `second recognizer' in the generated parser. 
+Correct programs are handled in the usual way by the parser, but if an error
+occurs the following happens: instead of going to the standard error
+recovery routine, the parser starts executing the non-correcting error
+handler. This process continues, reporting all errors, until the
+end of the input text is reached. Then, control is handed back to
+the standard error recovery routine. This routine will now think
+there is no more input, and thus start inserting tokens so as to construct
+a `shortest way out'. This ensures that all rules that were started are
+also finished, and no inconsistencies can occur in the semantic actions. 
+However, this method does require some modifications to the error reporting 
+routine. Normally, if the generated parser inserts a token, it reports 
+this to the user, but in this case this is undesirable. The insertions only 
+serve to maintain consistency in the semantic actions 
+and do not signify errors, so reporting of insertions should be suppressed. 
+.bp
+.nr PS 12
+.nr VS 14
+.PS
+boxwid = boxwid / 1.5
+boxht = boxht / 1.5
+arcrad = arcrad / 1.5
+movewid = movewid / 1.5
+moveht = moveht / 1.5
+arrowwid = arrowwid / 1.5
+arrowht = arrowht / 1.5
+arrowhead = arrowhead / 1.5
+linewid = linewid / 1.5
+lineht = lineht / 1.5
+.PE
+.NH
+The LL suffix parser
+
+.nr PS 10
+.nr VS 12
+.RS
+.LP
+In this chapter, we describe the construction of the LL suffix parser.
+The described parser is not restricted to LL(1) grammars, because the
+presence of conflict resolvers in LLgen allows for more general grammars,
+that may even be left-recursive. We start this chapter with a discussion
+of the implications of conflict resolvers, and continue with descriptions
+of the parser algorithm, the used data-structures,
+the handling of left- and right recursion, and some possible optimizations.
+.RE
+
+.NH 2
+LLgen conflict resolvers and their implications
+
+.LP
+In grammars that are nearly but not completely LL(1), conflicts
+will arise in the two places where parsing decisions are made: the choice
+of which alternative to start (`alternation conflicts') and the decision
+to stop or continue a repeated item (`repetition conflicts'). In order to
+allow LLgen to handle this type of grammar, the user can 
+specify conflict resolvers in those places where conflicts arise.
+These resolvers are Boolean expressions labeling an alternative,
+and are evaluated when a conflict arises during parsing. If the 
+expression evaluates to `true' the labeled alternative will be taken.
+The Boolean expressions are expressions in C, and can consult
+any information available at the point they occur.
+However, if a syntactic error has occurred in the input, and the non-correcting
+error recovery starts, we can no longer rely on the conflict resolvers to 
+guide parsing decisions. The suffix recognizer is only concerned with
+syntax, and will not execute any semantic actions. It recognizes suffices
+of correct input, but does not know or care what prefix would make
+the suffix a correct program; as a result, the information that conflict
+resolvers could use is not available, because the semantic actions
+that would build this information have not been executed.
+Therefore, the information used by the conflict resolvers is no longer 
+reliable, and the suffix parser needs to be able to handle the underlying
+grammar without their help. In particular, it has to be able to handle
+left-recursive grammars.  
+
+.NH 2
+The suffix parser algorithm
+
+.LP
+Our algorithm needs easy access to the grammar rules; in the description
+we assume there is an efficient way to access the grammar rules. In 
+the next chapter we will describe the details of the actual implementation.
+For the moment, we will only consider grammars that are not left- or 
+right-recursive. In the next section, we will discuss how the algorithm has to be adapted
+to handle left- and right recursion. 
+
+.LP
+Suppose the grammar is G, and the input to the suffix recognizer is 
+$a sub 0 a sub 1 ... a sub n-1 a sub n$. Remember that parsing is
+always started by the `normal' LLgen generated parser. It's only after
+a syntactic error has occurred that the suffix recognizer will be started.
+The input to the suffix recognizer thus is the `tail' of the input, starting
+at the first symbol after the position where the first syntax error was
+found.
+
+.LP
+Now, in order to get parsing going again, the parser scans the grammar
+for rules which contain symbol $a sub 0$ in the right hand side:
+.br
+
+	A:	$alpha ~ a sub 0 ~ beta$
+.br
+
+.LP
+where $alpha$ and $beta$ represent a string of terminals and non-terminals,
+possible empty. Now, for each of these rules found, and for any string  
+$b sub 0 b sub 1$...$ b sub m$ that can be generated by $beta$ it holds that
+$a sub 0 b sub o b sub 1$...$b sub m$ is a substring of some string in L.
+This can be shown as follows, supposing that the start symbol is S and
+S $-> sup * gamma$  A $delta$:
+.br
+
+S $-> sup * gamma$ A $delta$ $-> sup * gamma ~ alpha ~ a sub 0 beta ~ delta
+-> sup * gamma ~ alpha ~ a sub 0 b sub 0 b sub 1$...$b sub m delta$
+
+.br
+Of course, there may very well be more than one such string
+$b sub 1 b sub 2$..$b sub m$, and one of these strings can be empty as well, if
+$beta$ can produce empty. Now, in what we will call the 
+.I 
+predicting phase
+.R
+the algorithm will
+produce all possible symbols $b sub 0$. Then, in what we will call the
+.I 
+accepting phase
+.R
+these symbols  are matched against
+the input, and those not matching are discarded. Then, entering the next
+predicting phase, the algorithm will produce
+all symbols $b sub 1$, and match them against the next input symbol in
+the subsequent accepting phase,
+etc. In case one of the strings $b sub 0$...$b sub m$ is empty, or
+the end of one of the strings is reached, some way to continue is
+needed; we will discuss this later. First let's see how the
+algorithm produces the strings $b sub 0$...$b sub m$ .
+
+.LP
+For each rule in the grammar of the form
+.br
+
+	A:	$alpha a sub 0 W sub 1 W sub 2$...$W sub p$
+.br
+
+with each $W sub k$ a terminal or nonterminal, a 
+.I
+prediction graph 
+.R
+is created that looks like this:
+
+.PS
+down; box "$a sub 0$"; arrow; box "$W sub 1$"; arrow
+box "$W sub 2$"; arrow dashed; box "$W sub p$"
+arrow; box "END" "$[A]$"
+.PE
+
+.LP
+The bottom element of these prediction graphs is an end-marker containing the
+left-hand side of the rule used. All these graphs have $a sub 0$ on top, and
+this $a sub 0$ is matched against the $a sub 0$ in the input in the
+accepting phase that follows, removing the
+$a sub 0$ from the graph. If the prediction graph is now empty, we have to find a way 
+to continue;  this case is treated later. First we will consider what to do if
+the prediction graph is not empty. There are two possibilities: either $W sub 1$ is a 
+terminal, or it is a nonterminal. If it is a terminal, we are finished for 
+the moment; if not, the algorithm scans for rules of the form
+.br
+
+	$W sub 1$:	$U sub 1 U sub 2$...$U sub i$
+.br
+
+.LP
+with each $U sub k$ a terminal or nonterminal. Now, the algorithm substitutes 
+the top of the prediction graph with the right-hand sides 
+of all the rules found. Because there can be more than one rule, the
+prediction graph can now become a DAG (Directed Acyclic Graph).
+Supposing there are two rules with $W sub 1$ in the LHS:
+
+.br
+
+	$W sub 1$:	$U sub 1 U sub 2$...$U sub i$
+.br
+	$W sub 1$:	$V sub 1 V sub 2$...$V sub j$
+
+.LP
+the prediction graph will now look like this:
+
+.PS
+B1: box "$U sub 1$"
+move 
+B2: box "$V sub 1$"
+arrow dashed down from bottom of B1
+B3: box "$U sub i$"
+arrow dashed down from bottom of B2
+B4:box "$V sub j$"
+move to 0.5 <B3.se, B4.sw>
+down;move
+B5:box "$[W sub 1 ]$"
+arrow dashed;
+box "$W sub p$"
+arrow;
+box "END" "$[A]$"
+arrow from B3.bottom to B5.top
+arrow from B4.bottom to B5.top
+.PE
+
+.LP
+The graph element representing $W sub 1$ is left in the stack, the
+notation $[W sub 1 ]$ indicates it has been substituted. These substituted
+element will from now on be ignored by the algorithm. The elements 
+$U sub 1$ and $V sub 1$ are now `on top' of the prediction graph.
+
+.LP
+If $W sub 1$ can also produce empty, its successor in the prediction graph 
+has to be processed
+as well; the algorithm walks down the graph to this successor, and
+there the process is repeated; if it is a terminal we are finished, else we 
+substitute it with the right hand sides of its grammar rule. 
+However, the element that we want to substitute now, say $W sub k$, cannot
+be marked `substituted' just like that, because it can be on another
+path, on which it cannot be substituted yet. Therefore, a copy of element
+$W sub k$ is made, it is marked $[W sub k ]$, and an edge is created
+from $[W sub k ]$ to the successor of $W sub k$. This produces graphs like
+this:
+.br
+.PS
+B1: box "$U sub 1$"
+move
+B2: box "$V sub 1$"
+move
+X1:box "$X sub 1$"
+arrow dashed down from bottom of B1
+B3: box "$U sub m$"
+arrow dashed down from bottom of B2
+B4:box "$V sub j$"
+arrow dashed down from bottom of X1
+Xj: box "$X sub j$"
+move to 0.5 <B3.se, B4.sw>
+down;move
+B5:box "$[W sub 1 ]$"
+arrow dashed;
+B6: box "$W sub k$"
+arrow
+Wk1:box "$W sub k+1$"
+arrow dashed
+box "$W sub n$"
+arrow;
+box "END" "$[A]$"
+arrow from B3.bottom to B5.top
+arrow from B4.bottom to B5.top
+move down from Xj.top;move;move;move
+Wk: box "$[W sub k ]$"
+arrow from Xj.bottom to Wk.top
+arrow from Wk.bottom to Wk1.top
+.PE
+
+.LP
+This process of substituting is repeated with all nonterminals that are
+now on top of the prediction graph, until there are only terminals on top of 
+the graph.
+This completes the prediction phase of the algorithm, not taking into account
+what to do if an END marker appears on top of the graph.
+Now, the algorithm enters its accepting phase, in which
+the terminals on top are compared with the next symbol in the input.
+If a terminal in the graph matches the input, its element is deleted
+from the graph, and the substitution process will continue with its
+successors, in the next prediction phase.
+If a terminal on top of the graph does not
+match the input, the path it is on represents a `dead-end', which
+does not need to be processed any further. The terminal is no longer
+a `top', and the algorithm will not visit it again.
+
+.LP
+There is one tricky situation: consider again this graph:
+
+.PS
+B1: box "$U$"
+move
+B2: box "$a$"
+move to 0.5 <B1.se, B2.sw>
+down;move
+B5:box "$W sub 1 $"
+arrow dashed;
+box "$W sub n$"
+arrow;
+box "END" "$[A]$"
+arrow from B1.bottom to B5.top
+arrow from B2.bottom to B5.top
+.PE
+
+.LP
+Here, the algorithm is processing $W sub 1$ in the predicting phase, and
+using some rule it has produced $a$ on top; there is another rule with
+$W sub 1$ in its LHS which has produced nonterminal $U$ on top.
+Now, suppose $U$ is a  nonterminal that can 
+produce empty. Now, the algorithm starts substituting $U$, and walks
+down $W sub 1$. What we definitely do not want 
+is the algorithm to start substituting $W sub 1$ again, because then we
+would loop forever. Therefore, if the algorithm starts processing 
+element $W sub 1$ it should make it $[W sub 1 ]$ before it does
+anything else. On entering the element 
+for the second time in the prediction phase , it sees that it is already substituted, 
+so there is nothing to do.
+It then just walks to the successor of $W sub 1$ and
+starts substituting it. This is correct, since the fact that the algorithm
+enters an element for the second time in a prediction phase  means that the element
+indirectly can produce the empty string, and thus its successor must
+be substituted as well in the prediction phase.
+
+.LP
+It is easy to see that the substitution process will stop: the algorithm can 
+only loop if  it starts processing an element for the second time in a
+prediction phase, 
+or if the  processing of an element eventually yields a graph with that 
+same element on top. 
+The first case cannot occur because the algorithm marks elements it is 
+processing as `substituted' before it does anything else, meaning that those elements will not
+be processed again; the second case can only occur if the grammar is 
+left-recursive, which we assumed it was not. 
+
+.LP
+The algorithm simulates
+left-most derivations of strings $a sub 0 b sub 0 b sub 1$..$b sub n$
+starting from $a sub 0 W sub 1$..$W sub p$; as we showed before, if
+the algorithm recognizes a string $a sub 0 b sub 0$..$b sub n$ that
+string is a substring of some string in L. Conversely, because the
+algorithm start out by using all rules of the form 
+A:	$alpha a sub 0 beta$, and then proceeds to simulate all
+possible left-most derivations, it will recognize all input
+$a sub 0 b sub 0$... $b sub n$ that can be produced starting from
+$a sub 0 beta$.
+
+.LP
+Now we will discuss what has to be done if an END marker appears as
+top of the prediction graph. 
+When this happens, it means that starting from some rule 
+.br
+
+	A:	$alpha a sub 0 beta$
+
+.br
+the algorithm has produced a leftmost-derivation of a string 
+$a sub 0 b sub 1 .. b sub n$ starting from $a sub 0 beta$, or that $beta$ can produce
+empty and the string so far is just $a sub 0$. The next step is to assume
+that the have recognized A and that that some string produced by $alpha$
+is part of the prefix that makes the suffix we are recognizing a 
+correct string in L. Remember that in the END marker we kept record of
+the LHS of the rule that has started the graph, and we will now use this
+LHS to continue recognizing. What the algorithm does is scan for all
+rules of the form:
+.br
+
+	B:	$gamma$ A $delta$
+.br
+
+with $gamma$ and $delta$ possibly empty strings of terminals and nonterminals.
+The algorithm now starts a new component in the prediction graph, and if $delta$ is
+$W sub 1 W sub 2$...$W sub n$ it looks like this:
+
+.PS
+down;box "$W sub 1$"; arrow
+box "$W sub 2$"; line dashed; box "$W sub n$"
+arrow; box "END" "$[B]$"
+.PE
+
+.LP
+Note that the END marker now contains B, because we have started to match
+a rule for B. If the $delta$ in the rule for B was empty, this just produces
+and END marker with B in it; in this case, the process is just repeated
+with all rules of the form:
+.br
+
+	C:	$zeta$ B $eta$
+.br
+
+.LP
+etc, until we have a prediction graph with a nonterminal or terminal on top.
+Now, the substitution algorithm is again applied over all nonterminals on
+top, until every top contains a terminal. It is possible that during
+substitution again an END marker will turn up; if this happens
+we again scan for rules to continue with etc. 
+This `continuation algorithm' can only loop if, when
+trying to build a new prediction graph for matched symbol A, it produces an empty
+graph with again matched symbol A. If this happens, the grammar was
+(directly or indirectly) right-recursive, and we assumed that it was not.
+Therefore, the algorithm will terminate. The terminals on top of the
+new graph after applying this `continuation' algorithm are exactly those
+that could follow the string $A sub 0 b sub 0$..$b sub n$ in a substring
+of a string in L.
+To see this, suppose we have `recognized' the rule
+.br
+
+	A:	$alpha a sub 0 beta$
+
+.br
+and $a sub 0 b sub 0 b sub 1$...$b sub n$ is the string produced from 
+$a sub 0 beta$ by the algorithm. Now, using rule:
+.br
+
+	B:	$gamma$ A $delta$
+
+.br
+and supposing that S $->$ $zeta$ B $eta$ we get
+.br
+
+	S $->$ $zeta$ B $eta$ $->$ $zeta gamma$ A $delta$ $eta$ $->$ $zeta gamma a sub 0 b sub 0 b sub 1$ ... $b sub n$ $delta$ $eta$
+
+.br
+.LP
+and thus any string produced by a derivation starting from
+$delta$ can come right after $a sub 0 b sub 0$...$b sub n$ in a substring 
+of some string in L. The algorithm will proceed to generate all these
+strings starting from $delta$. If $delta$ produces empty, the above
+is just repeated. Because in the `continuation' part
+all possible rules are considered, the whole algorithm will recognize
+all substrings of any string in L. In order to determine if we 
+have actually recognized a suffix of some string in L, we need to
+remember if within a predicting phase the `continuation' part of the algorithm has been run
+on an END marker containing the start-symbol S;
+if this is the case, then the input seen until now is a suffix of some string in L.
+Formally, it means that there is a derivation starting from start symbol
+$S$ such that if the
+input seen until now is $a sub 0 a sub 1$..$a sub n$, then:
+.br
+
+	S $-> sup * alpha beta$ $-> sup * alpha a sub 0 a sub 1$..$a sub n$
+.br
+
+.LP
+where $alpha$ can be empty, $beta$ is not empty.
+
+.NH 2
+The prediction graph data structure
+
+.LP
+The graphs that are produced by the suffix recognizer may grow extremely
+large; to facilitate an efficient
+implementation we have devised a way of keeping the size of the
+data structure under control, in a way that is very similar to
+the way described in [TOMITA].
+
+.LP 
+The basic idea is, that in a prediction phase of the algorithm, it is not
+necessary to explicitly substitute each nonterminal every time it
+turns up as a `top'; it is sufficient to do it once, because the
+second substitution will produce exactly the same subgraph starting at
+the substituted nonterminal. Here is an example:
+
+.PS
+down;box "$a$";arrow;box "A";arrow dashed;box "[B]";arrow
+box "C";arrow dashed;box "END" "[X]"
+move right from last box.e;
+box "END" "[Y]";
+arrow <- dashed up from last box.top;
+box "D";arrow <- up from last box.top
+box "B"
+.PE
+
+.LP
+Here, in the left component of the graph, nonterminal B has been
+substituted. Now, in the same prediction phase, the algorithm again runs into
+B, now in the right component. There is no need to compute again
+what the substitution will produce, it is exactly the part on top
+of B in the left component. Therefore, all that is needed is:
+
+.PS
+down;box "$a$";arrow;box "A";arrow dashed;
+B1: box "[B]";arrow 
+box "C";arrow dashed;box "END" "[X]"
+move right from last box.e;
+box "END" "[Y]";
+arrow <- dashed up from last box.top;
+box "D"
+arrow from B1.bottom to last box.top
+.PE
+
+So, when, in a prediction phase of the algorithm, a nonterminal is substituted,
+the nonterminal is placed on a list, together with a pointer to
+the substituted nonterminal. If in the same prediction phase a nonterminal that
+is on the list becomes a top, all we need to do is place an edge
+between the already substituted one and the successor of the top we are currently
+processing. When a prediction phase is finished, the list is cleared.
+There is one catch: if we consider again the last picture,
+note that if nonterminal B can (directly or indirectly) produce empty,
+it is also necessary to substitute D. However, it is not difficult to
+determine if a nonterminal can produce empty. LLgen already computes
+this information for each nonterminal.
+
+.LP
+Without this `joining together' of graph components, each
+element in the graph has exactly one successor, except the END marker,
+which has none.
+Now that components get joined as described, an element can have any
+number of successors. The recognizer algorithm now has to consider all
+successors of a graph element instead of one.
+
+.NH 2
+Handling right recursion
+
+.LP
+The only problem right-recursive grammars cause in the algorithm is in the
+`continuation' part; they can cause this part of the algorithm to loop
+forever. As an example, consider:
+.br
+
+	A:	$alpha$ B
+.br
+	B:      $beta$ C
+.br
+	C:	$gamma$ A
+
+.LP
+Now suppose the `substitution' part of the algorithm has turned up
+an END marker with nonterminal A in it. The continuation algorithm will
+now produce:
+
+.PS
+box "END" "[A]";move;box "END" "[C]";move;box "END" "[B]";move
+box "END" "[A]";move;box "END" "[C]"
+.PE
+
+.LP
+etc. etc. However, a slight modification to the algorithm suffices
+to eliminate this problem; within each prediction phase of the algorithm, we
+simply maintain a list of nonterminals that have turned up in an
+END marker. As soon as an END marker turns up whose nonterminal is
+already in the list, we stop the `continuation' algorithm; the part
+of the graph that would be produced by it already has been generated
+by an earlier invocation of the algorithm in the same prediction phase.
+At the end
+of a prediction phase, when all heads are terminals, we clear the list.
+This way, no looping can occur; even if the right recursion is
+indirect, for instance if in the above example the rule for A had been
+.br
+
+	A:	$alpha$ B $delta$
+.br
+.LP
+where $delta$ can produce empty, the algorithm still works; the substitution
+of $delta$ will yield an END marker on top, and when trying to find
+a continuation for LHS A the algorithm notices A is already on the list.
+
+
+.NH 2
+Handling left recursion
+
+.LP
+Left-recursion is, unfortunately, a much tougher problem than
+right-recursion. The result of left-recursive grammar rules is that
+the substitution algorithm never stops, because it can keep on building
+the graph with the same set of rules without ever turning up a terminal.
+One course of action would be to pre-process the grammar rules to
+eliminate left-recursion; there are algorithms that eliminate direct
+and indirect left-recursion. However, we have taken another course; by
+allowing the produced graphs to contain loops, we can handle left
+recursion without any modifications to the grammar. As soon as 
+we come to the point that we want to substitute a nonterminal
+which was already substituted earlier on the same path and in
+the same prediction phase, we can 
+make a link from the `older' nonterminal to the successor of
+the `new' nonterminal. In this way we have constructed a loop
+in the graph. As an example, suppose we have the following rules:
+.br
+
+D: A
+
+A: B a
+
+B: A | x
+
+.br
+Suppose also that we have nonterminal `D' on top of a stack. We 
+now start substituting `D':
+
+.PS
+A: box "A"
+move
+X: box "x"
+move to 0.5 <A.se, X.sw>
+down
+move
+B: box "[B]"
+arrow
+box "a"
+arrow
+box "[A]"
+arrow
+box "[D]"
+arrow dashed
+box "END" "[S]"
+
+arrow from A.s to B.n
+arrow from X.s to B.n
+ 
+.PE 
+
+.LP
+We now have an `A' on top of of the stack which was already 
+substituted on the same path and also in the same prediction phase. To avoid
+never ending substitution we make a loop as follows:
+
+.PS
+A: box "A" dashed
+move
+X: box "x"
+move to 0.5 <A.se, X.sw>
+down
+move
+B: box "[B]"
+arrow
+box "a"
+arrow
+A2: box "[A]"
+arrow
+box "[D]"
+arrow dashed
+box "END" "[S]"
+
+arrow dashed from A.s to B.n
+arrow from X.s to B.n
+arc <- from B.w to A2.w
+.PE
+
+.LP
+The dashed box with `A' in it means that it can be deleted, because
+there is already an occurrence of it in the loop.
+
+.LP
+The most beautiful result of loops in graphs is 
+that the original parsing algorithm needs only one minor change.
+When the algorithm visits an element which has more than one 
+outgoing edge the algorithm starts tracking down both paths,
+just like before, only now there may be one or more backedges among
+these edges, but the algorithm needs not to be aware of this fact.
+The only difficulty with loops is that the algorithm might go into
+a loop; it continues searching for terminals but it might happen
+that there are no valid terminals in the loop. The solution to this
+problem is not very difficult; just set a flag at all elements we
+visit. When we reach an element which has this flag turned on, we
+don't have to search any further.  At the end of the prediction phase, when we
+have found all possible new heads, all flags are cleared.
+Even if there are no loops in the
+prediction graph, setting flags may be used as an optimization: 
+it is possible that two paths come together at one point. In that situation
+it is useless to scan for the second time the part of the graph which 
+both paths have in common.
+
+.NH 2
+Some optimizations using reference counts
+
+.LP
+As explained in section 2.2, it is sometimes necessary to copy a
+prediction graph element before substituting it. In order to determine
+if a certain element has to be copied, it is convenient to maintain
+a reference count in each graph element. This reference count keeps
+track of the number of edges that enter an element. Now, when we want
+to substitute an element with reference count not 0, we need to
+copy it, because there is another path in the prediction graph that
+contains the element we want to substitute, and on this other path
+the element cannot be substituted yet.
+
+.LP
+Maintaining reference counts also enables us to perform another
+optimization: remember that if, in a prediction phase, a terminal
+is predicted that does not match the current inputsymbol, we from
+then on just ignore the path in the graph starting at the terminal.
+However, we can safely delete the terminal from the graph; furthermore,
+all its successors in the prediction graph that have reference count
+0 can be deleted as well, as can their successors with reference
+count 0, etc. This way, we delete from the prediction graph
+most elements that are no longer accessible, but not all of them; as will 
+be explained in the next section, loops in the prediction graph
+can cause problems. 
+
+.NH 2
+The algorithm to delete inaccessible loops
+
+.LP
+Deleting graph elements which are no longer reachable is not as easy
+as it looks when there are loops in the graph, introduced by
+the extension to the algorithm that handles left recursive grammars.
+Suppose for example that we have a very simple loop as in the left 
+picture below:
+
+.PS
+down
+X: box "x" "(0)"
+arrow 
+box "[B]" "(2)"
+arrow
+box "a" "(1)"
+arrow
+box "[A]" "(1)"
+arrow 
+box "[D]" "(1)"
+arc <- from 2nd box.w to 2nd last box.w
+
+move right from X.ne
+move 
+move
+move
+move
+move
+move
+down 
+box "x" "(0)" dashed
+arrow dashed
+B: box "[B]" "(1)"
+arrow
+box "a" "(1)"
+arrow
+box "[A]" "(1)"
+arrow
+box "[D]" "(1)"
+arc <- from B.w to 2nd last box.w
+.PE
+
+.LP
+The number below each symbol indicates the reference count of that element.
+Suppose now that we delete `x', then we have the situation depicted in the 
+picture on the right. The loop consisting of `[B]', `a' and `[A]' is now
+unreachable, so all these elements can be deallocated.
+The reference count of `[B]' is 1, so it will not be deleted. To be precise
+all elements in the loop have their reference counts on 1, and
+consequently none of these will be deleted. But we stated earlier
+that all elements of the loop cannot be reached anymore and that the
+loop had to be deleted! In this example the reference counts of the
+loop elements are all 1, but in more complex situations it is also 
+possible that some of the elements have a reference count of more
+than 1.
+
+.LP
+To solve this problem we present an algorithm, devised by E. Wattel, that
+determines whether a loop can be deleted or not.
+The algorithm consists of two parts. The first part of the algorithm goes as
+follows: it presumes that all elements of the loop will indeed be
+deleted. Every time it deletes an element it decreases the reference
+count of all the successors of the element that are also member of the same 
+loop.  How the algorithm knows which elements belong to the loop and which
+do not will be explained later. The situation of the example above will now 
+look like this:
+ 
+.PS
+down
+box "[B]" "(0)" 
+arrow 
+box "a" "(0)"
+arrow 
+box "[A]" "(0)"
+arrow
+box "[D]" "(1)"
+arc <- from 1st box.w to 2nd last box.w
+.PE
+
+.LP
+The number below each symbol indicates again the reference count 
+after we have applied the first part of the algorithm.
+
+.LP
+The second part of the algorithm checks and restores the 
+reference counts of all members of the loop . When it finds 
+out that one or more reference counts are not 0, it concludes 
+that it is still possible to enter the loop in some way, and 
+that it cannot be 
+deleted yet. In the other case it reports that the loop can be 
+deleted, which is also true in our example.
+
+.LP
+We will now formally describe the first part of the algorithm 
+that finds all directed circuits from a given vertex, and determines if 
+the vertices on those circuits can be deleted.
+The algorithm works on prediction-graphs in which every edge that
+is in a circuit is marked. Note that a marked edge may be in more than one circuit.
+We will call this mark `C'.
+The input to the algorithm is such a prediction graph, and a start vertex,
+say A. The first part of the algorithm is:
+
+.IP 1
+Put the start vertex A on a list L; mark all edges `unused'
+.IP 2
+If L is empty, stop
+.IP 3
+For each vertex in list L, check if there are edges marked both C' and
+`unused'. For each edge found, mark it `used', and traverse it to its
+other endpoint; put this endpoint on a new list M, initially empty
+.IP 4
+Decrease the reference count of all vertices on M by 1
+.IP 5
+L := M; go to 2
+
+.LP
+It is clear that the algorithm will terminate: each edge is only traversed once,
+and the number of edges is finite. We will now prove some properties of this
+part of the algorithm.
+
+.LP
+.I
+An edge is traversed by the algorithm if and only if it is on some
+directed circuit $A ->$...$->A$.
+.R
+.br
+
+The if-part is easy; if an edge $e$ connecting vertices $W$ and $V$ is on some directed circuit starting in
+$A$, then there is a path $A ->$...$-> W -> V$; let $A ->$...$-> W -> V$ be a path
+of minimum length from $A$ to $V$. If the length of the path from $A$ to
+$W$ is $k$, then after turn $k$ of the algorithm $W$ will be on list L. To see
+that this is the case, suppose that $W$ is not on list L after turn $k$;
+this means that the edge entering $W$ was already marked used in a
+previous turn, but then there would be a shorter path from $A$
+to $W$, contradicting the assumption that the path is of
+minimum length. The edge
+$e$ is marked `C', because it is in a circuit; it is marked `unused', for if
+it were marked used, there would be a shorter path from $A$ to $V$. So,
+in turn $k + 1$, the edge $e$ will be traversed. 
+
+.LP
+On the other hand, suppose that an edge $e$ is traversed by the algorithm;
+we will show by induction on the number of turns the algorithm has made
+that $e$ is on a directed circuit $A->$..$->A$. In the first turn, all
+edges from $A$ that are marked `C' are traversed, and clearly, if an edge
+from $A$ is part of a circuit then that edge is part of a circuit from $A$ to $A$.
+Now suppose that in turn $n+1$ an edge $e$ connecting vertices $W$ and
+$V$ is traversed. This means the edge is 
+marked `C', so it is part of some circuit. If there is a path from $V$ to $A$,
+we can simply trace a circuit
+$A->$...$-> W -> V -> $...$-> A$, and clearly $e$ is on a circuit from
+$A$ to $A$. Now, suppose there is no path from $V$ to
+$A$. We can always trace a circuit $W -> V ->$...$-> W$ because the
+edge from $W$ to $V$ is part of a circuit; and by the
+induction hypothesis there is a circuit $A ->$...$-> W ->$...$-> A$. We can
+now make a `detour' at  $W$, yielding a circuit $A->$...$-> W -> V$...
+$-> W ->$...$-> A$. This case is shown in the picture below.
+So in either case $e$ is on a circuit from $A$ to $A$.
+
+.PS
+down; 
+B1: box "A"; 
+arrow dashed; 
+B3: box dashed; 
+arrow dashed;
+B2: box "W";
+arrow dashed; box dashed;
+arc <- from B1.w to last box.w
+arrow right "$e$" "C" from B2.e
+box "V"; arrow dashed; box dashed;
+arrow dashed  -> from last box.n to B3.e
+.PE
+  
+.LP
+.I
+A vertex appears on list L if and only if it is on some directed
+circuit from $A$ to $A$.
+.R
+.br
+
+.LP
+If a vertex is in such a circuit, there is an edge that enters it, which
+is part of a circuit form $A$ to $A$; we already showed that this edge
+is traversed by the algorithm, and thus the vertex will appear on list
+L. Conversely, if a vertex appears on list L, then an edge entering
+that vertex has been traversed by the algorithm; we showed that this
+edge is part of a circuit from $A$ to $A$, and thus the vertex is
+part of a circuit from $A$ to $A$.
+
+.LP
+.I
+When the algorithm is finished, each vertex that is part of some
+directed circuit from $A$ to $A$ has its reference count decreased by exactly
+the number of edges entering it that are part of a directed circuit from $A$ to $A$.
+.R
+.br
+
+.LP
+Each edge that is part of some circuit from $A$ to $A$ is traversed
+exactly once; the reference count of the endpoint is decreased
+by one after an edge has been traversed. Thus, if a vertex is endpoint
+of $k$ such vertices, its reference count is decreased by $k$.
+
+.LP
+.I
+If the reference count of each of the vertices visited by the algorithm
+is 0 after the algorithm has finised, all these vertices can be deleted; 
+if the reference count is not zero for one or more of the visited
+vertices, then none of them can be deleted.
+.R
+.br
+
+.LP
+Suppose all visited vertices have reference count 0; this means that
+each of the vertices is only entered by edges that are on a circuit
+from $A$ to $A$. Therefore, it holds that any path leading to any
+of the visited vertices has to start in one of the visited vertices; there
+is no path starting in an unvisited vertex to a visited one. Thus,
+all the visited vertices are unreachable.
+Conversely, if one of the visited vertices has reference count not zero,
+then there is a path from an unvisited vertex to this vertex. Because from
+the vertex with reference count non zero, we can get to $A$, and from $A$
+we can get to any of the other vertices, all visited vertices are 
+reachable.
+
+.LP
+The second part of the algorithm now checks if all reference counts are
+zero, and if they are, it deletes all visited vertices. 
+
+
+.NH 2
+Marking loop elements
+
+.LP
+One point we have omitted so far is how the edges in the prediction
+graph that are part of a loop get marked.
+Basically, a loop can be detected:
+
+	a. when it is made;
+.br
+	b. when we want to know about it.
+
+.LP
+The first approach checks if a loop is constructed
+as soon as we join two paths in the graph, and if so, marks all
+edges of the loop. The other approach does not do any checking when two
+paths are joined together; it starts looking for loops when we want
+to delete an element with reference count not 0, marking all edges
+belonging to the loops it discovers. In practice it turns out that
+we very often encounter elements that we would like to delete, but that have
+reference count not 0, whereas the joining of paths occurs relatively 
+infrequently. We therefore have chosen to check if a loop is created
+when two paths in a prediction graph are joined. 
+
+.LP
+Now the question arises how to find and mark all edges of
+the loop. For this problem we devised also an algorithm. 
+Because we already know that there is an edge from the element on which 
+the new path is connected to the successor of the joined element, the
+algorithm only has to find a path from this last element back to the first one.
+This can be done by a backtracking depth first search; to find a path from
+one element to another we have to find a possible empty path 
+from one of the successors of the first element to the last element. As
+soon as we have found a path, we can mark all the edges on the path and also
+the backedge as loop edges. In case that there is more than one path
+back to the first element it is necessary that the algorithm continues
+searching after it has found one path.
+
+.LP
+To avoid looping of this algorithm we have to set a flag at the elements
+which are on the path already. When the algorithm is backtracking it can 
+clear the flags at the elements it is leaving.
+
+.LP
+To speed up the searching process we can set flags at the edges we have already 
+visited but did not lead back to the first element. When the algorithm
+encounters such an edge it already knows that this edge is not worth
+searching again and can be skipped. At the end of the algorithm these
+flags have to be cleared again.
+
+.LP
+One might propose another optimization: as soon as
+we reach an edge that is already marked as a loop edge, we
+can stop searching for other loop edges. There is, however, 
+a case in which this can go wrong. Imagine the following situation:
+
+.PS
+down
+E: box "[E]"
+arrow " C" ljust
+D: box "[D]"
+arrow " C" ljust
+C: box "c"
+arrow " C" ljust
+box "b"
+arrow " C" ljust
+A: box "[A]"
+arrow 
+box "a"
+
+move right from D
+move right
+J: box "[J]"
+down
+arrow from J.s " C" ljust
+I: box "i"
+arrow " C" ljust
+H: box "[H]"
+arrow from H.s to A.e 
+
+arc <- from E.w to A.w 
+move left from C
+move left
+"C"
+arc -> from H.e to J.e
+move right from I
+move right
+"C"
+
+arrow dashed from E.s to J.n
+
+
+.PE
+
+What we have here is a prediction graph with two loops; all edges that belong 
+to a loop are again marked with an `C'. Note that the edge between `[H]'
+and `[A]' is not a loop edge. Suppose that `[J]' is not yet
+completely substituted, i.e. there is another production rule for
+J:
+.br
+
+J:	E
+
+.br
+The `E' on top of the right path is now joined with the `[E]' 
+on the left path, which is depicted by the dashed arrow
+between `[E]' and `[J]'. When we take a good look at the graph
+we see that the two loops are merged into one. But that is not
+the most important observation we have to make: not only the 
+edge between `[E]' and `[J]' must be marked as a loop edge, but
+also the edge between `[H]' and `[A]'! So it is not possible
+to stop searching for loop edges as soon as we have found an
+edge which was already marked as a loop edge. We have to continue
+until we reach the element at which we started: `[E]'. So the 
+optimization proposed above is incorrect.
+
+
+.NH 2
+Optimizations using FIRST and FOLLOW sets
+
+.LP
+In the algorithm as we have described it, every nonterminal on top of the graph
+is substituted until only terminals remain on top; these terminals are
+then matched against the current input symbol. However, by using
+FIRST sets, we can save considerably on the number of computations 
+necessary. Suppose one of the top elements of the graph is nonterminal A,
+and the current inputsymbol is $a$. Then, it is of no use to substitute
+A if terminal $a$ is not in FIRST(A), because then substituting A will
+never produce $a$ on top of the graph. So, before substituting a
+nonterminal we check if the current inputsymbol is in its FIRST set; if
+it is not, we can declare the path the nonterminal is on a dead end, and
+delete it, without having to perform the actual substitution. Of course, if
+A can produce empty, we still have to consider its successor in the graph. 
+
+.LP
+Similarly, when we have an END marker on top, with nonterminal B in
+it, and we consider using rule 
+.br
+
+	D:	$alpha$ B C $gamma$
+
+.br
+We first check if the current inputsymbol is in FIRST(C); if this is
+not the case, there is no need to start a graph component with this
+rule, because it will never produce the next inputsymbol on top.
+Again, if C produces empty, we still have to evaluate the part of the
+rule following C.
+
+.LP
+To circumvent the problems caused in the FIRST set optimization by
+nonterminal that produce empty, we can also make use of FOLLOW-sets.
+When substituting, if we encounter a nonterminal whose FIRST set does
+not contain the current inputsymbol but which can produce empty,
+we check if the current inputsymbol is in its FOLLOW set. If it is not,
+there is no need to process its successor. Similarly, in case we
+are processing an END marker as explained above, there is no need
+to process the part of the rule following C if FIRST(C) does not
+contain the input symbol, or C produces empty but the inputsymbol
+is not in FOLLOW(C).
+.bp
+.nr PS 12
+.nr VS 14
+
+.NH
+Test results
+
+.nr PS 10
+.nr VS 12
+.RS
+
+.LP
+In this chapter, we discuss some test results that were obtained
+by recompiling existing ACK compilers with the modified LLgen.
+We tried several combinations of possible optimizations, including
+`dumb' ones, like no optimization at all, not even deleting unreachable
+prediction graph elements.
+The incorporation of LLgen with non-correcting error recovery went
+smoothly; only minor modifications to the Make-files were necessary.
+Specifically, these modifications consisted of passing an extra
+flag to LLgen, and including the new generated C-file Lncor.c in
+the list of generated C-files. Also, the LLmessage error reporting
+routine had to be adapted. We successfully recompiled the C, Modula-2
+and Occam compilers; in the next sections, we discuss some test results
+that were obtained with the Modula-2 and C compilers.
+
+.RE
+.LP
+.NH 2
+Performance
+
+.LP
+We will now present and discuss, with the aid of some 
+diagrams, time and space measurements on the non-correcting error
+recovery. We have measured the effect of various optimizations. 
+These optimizations include the first-set optimization and the follow-set 
+optimization. We also measured the effect of leaving out the loop-deletion
+algorithm, regarding both time and space. We performed out measurements using 
+C- and Modula-2-programs of three different sizes; one of approximately 
+750 tokens, one of appr. 5000 tokens and one of appr. 15000 tokens. We have
+chosen to represent the sizes of programs in the number of tokens instead of 
+number of lines, because the number of tokens more realistically 
+reflects the load the programs put on the error recovery mechanism. Also we give 
+our time measurements in usertime instead of realtime, because realtime 
+depends heavily on the load of the system, which usertime does not.
+Our space measurements are based on the size of the prediction graphs.
+Note that all files are entirely recognized by the non-correcting error 
+recovery technique. We achieved this by putting a `1' at the beginning
+of each file; because then each file starts with a syntax error LLgen 
+is forced to continue with the non-correcting error recovery. 
+
+.NH 3
+Time and space measurements on the effect of the first-set optimization 
+
+.LP
+In the diagram below we show our time measurements we got from recognizing 
+the C-programs both with and without first-set optimization. 
+
+.G1
+coord x 0, 17000 y 0, 65
+ticks bot out at 750, 5000, 15000
+label bot "Number of tokens"
+label left "User Time" "(sec)" left .3
+draw no_opt dashed 
+draw first_opt dashed 
+
+copy thru X
+	times size +2 at $1, $2 
+	times size +2 at $1, $3 
+	next no_opt at $1, $2
+	next first_opt at $1, $3
+X until "XXX"
+
+742	2.5	.9	
+5010	16.3	5.8
+14308	54.2	16.8
+XXX
+
+copy thru X "$1 $2" size -2 at 11000, $3 X until "XXX"
+No optimization 55 
+First-set optimization 20
+XXX
+.G2
+
+.I
+.ce
+Time measurements of three C-programs with and without first-set optimization
+.R
+
+.LP
+Notice the considerable time savings we 
+get when the first-set optimization is turned on; a factor of slightly more than
+3. Obviously this is an extremely useful optimization. On the other hand
+we found there were no measurable time savings when using the follow-set
+optimization; for that reason we did not chart the result of this optimization.
+It seems that the time savings gained by the optimization are 
+waisted again by the extra processing time needed. We conclude that 
+this optimization is of little or no use when we want to save on time.
+
+.LP
+In the following picture the time measurements of three Modula-2 programs 
+are given, again with and without first-set optimization. 
+
+.G1
+coord x 0, 17000 y 0, 65
+ticks bot out at 750, 5000, 15000
+label bot "Number of tokens"
+label left "User Time" "(sec)" left .3
+draw no_opt dashed 
+draw first_opt dashed 
+copy thru X
+	times size +2 at $1, $2 
+	times size +2 at $1, $3 
+	next no_opt at $1, $2
+	next first_opt at $1, $3
+X until "XXX"
+
+823	1.3	.6	
+4290	7.6	3.5	
+16530	30.5	14.3
+XXX
+
+copy thru X "$1 $2" size -2 at 13000, $3 X until "XXX"
+No optimization 30 
+First-set optimization 15
+XXX
+.G2
+
+.I
+.ce
+Time measurements of three Modula-2-programs with and without first-set optimization
+.R
+
+.LP
+From this picture we can conclude mainly the same as above; considerable
+time savings when we use the first-set optimization; 
+the factor is somewhat less, but still more than 2. Again we have omitted
+the results of the follow-set optimization, for the same reason as before.
+
+.LP
+There is however one remarkable difference between the two languages: parsing
+C-programs needs almost twice the time as parsing programs of comparable
+sizes written in Modula-2. This can be explained by the fact that the 
+C-grammar is far more complicated than that of Modula-2, and also the 
+production rules are longer in C, so building, deleting and definitely
+traversing the graph will consume more time.
+
+.LP
+Now we come to the space measurements of both C- and Modula-2 programs.
+In the picture below we present the maximum sizes of the prediction graphs,
+during the recognition of the three C-programs.
+ 
+.G1
+coord x 0, 17000 y 0, 18000 
+ticks bot out at 750, 5000, 15000
+label bot "Number of tokens"
+label left "Maximum size of" "the prediction graph" "(bytes)"left .3
+draw no_opt dashed 
+draw first_opt dashed 
+copy thru X
+	times size +2 at $1, $2 
+	times size +2 at $1, $3 
+	next no_opt at $1, $2
+	next first_opt at $1, $3
+X until "XXX"
+
+742	5568	10444
+5010	7668	12664
+14308	13636	17308
+XXX
+
+copy thru X "$1 $2" size -2 at 8000, $3 X until "XXX"
+No optimization 16000 
+First-set optimization 7000 
+XXX
+.G2
+
+.I
+.ce
+Maximum sizes of the prediction graphs when recognizing three C-programs
+.R
+
+.LP
+From this diagram we see that, although the prediction graphs
+are smaller when the first-set optimization is used, the space savings are
+not as spectacular as the time savings achieved by this optimization.
+
+.LP
+In Modula-2 the first-set optimization also causes a decrease in memory
+usage. The savings are less than in C, but still about 1.5 Kb. Again 
+this can be explained by the fact that the rules of the Modula-2 grammar 
+are shorter than that of C.
+
+.G1
+coord x 0, 17000 y 0, 12000 
+ticks bot out at 750, 5000, 15000
+label bot "Number of tokens"
+label left "Maximum size of" "the prediction graph" "(bytes)" left .3
+draw no_opt dashed 
+draw first_opt dashed 
+copy thru X
+	times size +2 at $1, $2 
+	times size +2 at $1, $3 
+	next no_opt at $1, $2
+	next first_opt at $1, $3
+X until "XXX"
+
+823	5056	3292
+4290	6420	4664
+16530	11388	9632
+XXX
+
+copy thru X "$1 $2" size -2 at 8000, $3 X until "XXX"
+No optimization 10000 
+First-set optimization 4000 
+XXX
+.G2
+
+.I
+.ce
+Maximum sizes of the prediction graphs when recognizing three Modula-2-programs
+.R
+
+.NH 3
+Input that is recognized in quadratic time
+
+.LP
+The measurements presented may suggest that the time required to
+recognize input depends linearly on the length of the input; however,
+this is not always the case. When there are recursive rules in the
+grammar, the time needed to recognize input that is produced by this
+rules can become proportional to the square of the input length.
+Consider this set of grammar rules:
+.br
+.nf
+
+	S:	'{' A '}'
+	A:	'a' A | $epsilon$
+
+.fi
+.LP
+When the input is `{aaa....', the algorithm will produce the following 
+prediction graphs: 
+
+.PS
+up; B1: box "END" "S"; arrow <- ;box "}";arrow <- ;box "A";arrow <- ;box "{";
+move right from B1.se; move
+up; B2: box "END" "S"; arrow <-; box "}"; arrow <-; box "[A]"; 
+arrow <-; box "A"; arrow <-; box "a";
+move right from B2.se; move
+up; B3: box "END" "S"; arrow <-; box "}"; arrow <-; box "[A]";
+arrow <-; box "[A]"; arrow <-; box "A"; arrow <-; box "a";
+move right from B3.se;move
+up; B4: box "END" "S"; arrow <-; box "}"; arrow <-; box "[A]";
+arrow <-; box "[A]"; arrow <-; box "[A]"; arrow <- ; box "A"; arrow <-;box "a";
+.PE
+
+.LP
+In each prediction phase, a new [A] appears on the prediction graph. However,
+since A also produces empty, the prediction algorithm has to traverse all the
+elements [A] until it finds the element `}'. In the first prediction phase,
+there is one element [A], in the second there are two, etc, so in all
+1 + 2 + 3 + ... + k = $k(k+1) over 2$ elements have to be traversed if 
+there are k prediction phases, making this proportional to the square
+of the input length. We constructed a parser with this simple input grammar
+and measured the processing time the error recovery mechanism used.
+In the following diagram the dashed line shows the processing time needed;
+the dotted line is the curve $t = 13 n sup 2$. Clearly the processing time
+is proportional to the square of the number of tokens.
+
+.G1
+coord x 0, 2100 y 0, 60 
+ticks bot out at 500, 1000, 1500, 2000
+label bot "Number of tokens"
+label left "User Time" "(sec)" left .3
+draw quad dashed 
+
+copy thru X
+        times size +2 at $1, $2 
+        next quad at $1, $2
+X until "XXX"
+
+500  3.0     
+1000 12.4
+1500 28.6
+2000 51.4 
+XXX
+
+draw dotted
+for i from 0 to 2100 by 25 do { next at i, 0.000013 * i * i }
+.G2
+
+.LP
+In the grammar used for the C compiler, array initializations are handled by a recursive
+rule, so we would expect that the error recovery mechanism needs quadratic
+processing time to recognize such an initialization; we made measurements on 
+the processing time and indeed, the
+processing time needed grows proportionally to the square of the size of the input, as the
+next figure shows. Here, the processing times are about half of those in
+the previous example; this is so because the recursion appears after two
+tokens are recognized. Note that the algorithm only takes quadratic time
+when it is recognizing input that is generated by a recursive grammar rule.
+Other input is still recognized in linear time, regardless of the fact that
+there are recursive grammar rules.
+
+.G1
+coord x 0, 5000 y 0, 85 
+ticks bot out at 1150, 2400, 3600, 4800
+label bot "Number of tokens"
+label left "User Time" "(sec)" left .3
+draw quad dashed 
+
+copy thru X
+        times size +2 at $1, $2 
+        next quad at $1, $2
+X until "XXX"
+
+1150 5.1      
+2400 20.3
+3600 43.7 
+4800 78.6
+XXX
+.G2
+
+.LP
+Unfortunately, there is no easy way to speed up the recognition of these
+recursively defined language elements; they are caused by the substituted
+tokens that are left in the prediction graph, and we cannot just delete those
+`dummies' from the graph during a prediction phase because the `join' part of the
+prediction algorithm depends on them. One could traverse the graph after
+a prediction phase to delete the dummies, but then the processing
+time needed to recognize non-recursively defined language elements would 
+increase dramatically. However, we feel that in practice things
+like large array initializations will not occur in hand-made programs; when
+they occur, it is probably in computer-generated programs, which normally
+will be correct anyway, meaning that the error recovery never sees them.
+When testing such generated programs, one is likely
+to use small test-cases, which are handled well by the error recovery.
+
+.NH 3
+Time measurements on the effect of leaving out the loop-deletion algorithm
+
+.LP
+We now show what effect the loop-deletion algorithm has on processing time. 
+To put it another way: how much time can be saved when we turn off the 
+loop-deletion algorithm. In the diagram below we give the measurements of 
+the three C-programs; note that we do use the first-set optimization.
+
+.G1
+coord x 0, 17000 y 0, 22
+ticks bot out at 750, 5000, 15000
+label bot "Number of tokens"
+label left "User Time" "(sec)" left .3
+draw no_loop dashed 
+draw loop dashed 
+copy thru X
+	times size +2 at $1, $2 
+	times size +2 at $1, $3 
+	next no_loop at $1, $2
+	next loop at $1, $3
+X until "XXX"
+
+742	.9	.4	
+5010	5.8	6.8
+14308	16.8	20.5
+XXX
+
+copy thru X "$1 $2" size -2 at 11300, $3 X until "XXX"
+With loop-deletion 20 
+Without loop-deletion 9 
+XXX
+.G2
+
+.I
+.ce
+Time measurements on processing three C-programs with and without the loop-deletion algorithm
+.R
+
+The diagram shows that the loop-deletion algorithm 
+does not dramatically slow down the recognizing process. There is, however, 
+a measurable time loss of \(+-25%. As we will see later, the loop-deletion
+algorithm will turn out to be extremely useful in efficient use of memory 
+when there are many loops in the graph.
+
+The effect of the loop-detecion algorithm on parsing Modula-2 programs
+is even less than with C-programs; in fact there is no measurable
+time loss:
+
+.G1
+coord x 0, 17000 y 0, 15
+ticks bot out at 750, 5000, 15000
+label bot "Number of tokens"
+label left "User Time" "(sec)" left .3
+draw no_loop dashed 
+draw loop dashed 
+copy thru X
+	times size +2 at $1, $2 
+	times size +2 at $1, $3 
+	next no_loop at $1, $2
+	next loop at $1, $3
+X until "XXX"
+
+823	.6	.6
+4290	3.5	3.8
+16530	14.3	14.3
+XXX
+
+copy thru X "$1 $2" size -2 at 11800, $3 X until "XXX"
+With loop-deletion 13 
+Without loop-deletion 7
+XXX
+.G2
+
+.I
+.ce
+Time measurements on processing three Modula-2-programs with and without a loop-deletion algorithm
+.R 
+
+There are at least two reasons for this; both result from the relative
+simplicity of the Modula-2 grammar. The distance from a head to an 
+end of stack marker is shorter than in C, and secondly Modula-2
+causes fewer joins to occur than C, meaning that the loop marking algorithm
+is run less often and when it is run it has fewer paths to search.
+
+
+.NH 3
+Space measurements on the effect of leaving out the loop-deletion algorithm
+
+.LP
+Clearly, to make any measurements on the space-usage effects of leaving out
+the loop-deletion algorithm we need a program that causes the prediction
+graph to contain loops; however, we have not been able to devise a C
+or Modula-2 program that does this. In order to be able to make measurements,
+we added an extra alternative to a rule of the C compiler grammar, making
+it directly left-recursive. To make LLgen accept this new grammar, we
+put a `%if' directive in the rule.
+
+.LP
+We have input our standard C test program consisting of 800 tokens to
+the error recovery routine for this `doctored' C compiler, 
+and compared the storage needed for the prediction graphs with the
+loop deletion algorithm enabled with the storage needed when the
+algorithm is disabled. With the loop-deletion algorithm enabled, the
+maximum size of the prediction graph was 5576 bytes. When the loop
+algorithm was disabled, the maximum size of the prediction graph
+grew to 12676 bytes; furthermore, 12676 bytes of heap were allocated
+for the prediction graph, but not deallocated again, because they were
+in use by graph elements that were in inaccessible loops. The user-time
+the program needed decreased only slightly, from 0.9 to 1.0 seconds. Given the
+relatively small input program, this data suggests that when loops
+are actually being made, the loop deletion algorithm is definitely 
+worth the extra overhead it costs, considering the space
+that would otherwise be occupied by inaccessible loops. To verify this,
+we input the C program consisting of 15000 tokens to the compiler;
+execution time increased from 17.3 to 21.1 seconds after enabling
+the loop deletion algorithm, while the maximum size of the prediction graph
+shrunk from 328664 to 13664 bytes. With the loop-deletion algorithm
+disabled, 326720 bytes allocated for the graph were not deallocated again.
+Again, given the relatively small increase in execution time and the
+large reduction of memory usage, we feel that the loop-deletion
+algorithm is useful enough to justify the overhead it creates.
+
+.NH 2
+Problems encountered
+
+.LP
+In this section we describe some of the problems we encountered
+while testing the non-correcting error recovery.
+
+.NH 3
+The LLgen error reporting mechanism.
+
+.LP
+The parsers generated by LLgen call a user-supplied error reporting
+routine, usually called LLmessage. This routine is called with an
+integer parameter that is positive, zero or negative. When the parameter
+is positive the parser has just inserted a token, whose
+number is equal to the parameter; if it is zero, the parser 
+has deleted a token whose number is in a global variable called LLsymb; if
+it is negative, it means that LLgen expected end-of-file, but did not
+find it. The routine LLmessage is supposed to print an error message,
+and when a token is inserted, it should set all necessary attributes.
+
+.LP
+However, when non-correcting error recovery is used, the situation becomes slightly
+different; when the parser inserts a token, it is only to keep the
+semantic actions consistent, and does no longer signify an error.
+However, the LLmessage routine still has to be called because the
+attributes of the inserted token need to be set. Therefore, when
+non-correcting error recovery is used, the LLmessage routine should not
+print an error message when the parameter is positive, or else it will
+print highly confusing error messages indeed. Furthermore, the
+LLmessage routine will usually print a message like `token ... deleted' when
+it is called with parameter equal to zero; however, when the non-correcting
+error recovery is used, it is more appropriate to report something
+like `token ... illegal', as the non-correcting error recovery does
+not delete tokens. Finally, when an unexpected end-of-file is encountered,
+LLgen normally just inserts the missing tokens and calls
+LLmessage with the parameter equal to the token number;
+when non-correcting error recovery is used we need a way to
+actually report we have encountered an unexpected end-of-file. The
+way we achieved this is by calling LLgen with parameter 0 and the
+global variable LLsymb set to EOFILE when this situation occurs; the
+routine LLmessage should print something like `unexpected end of file'
+when it is called with parameter 0 and LLsymb is EOFILE. To facilitate
+switching between correcting and non-correcting error recovery, the
+file Lpars.h contains a statement `#define LLNONCORR' if non-correcting
+error recovery is used.
+
+
+.NH 3
+Parsers being started in semantic actions
+
+.LP
+LLgen allows the programmer to define more than one nonterminal as the
+start symbol of the input grammar; it will generate a parsing routine
+for each of the start symbols. However, the error recovery code
+is generated only once; it is shared by all parsers.
+The programmer is free to call any
+of the generated parsers whenever he wants; for instance, in the C-compiler
+a separate parser for expressions in #if and #elsif statements is used. Whenever
+the lexical analyzer encounters such a statement, it calls the expression
+parser. It is also possible to call a parser in a semantic action of
+another parser; in the MODULA-2 compiler a separate parser for 
+definition modules is used. When the main parser encounters a 
+FROM defmod IMPORT statement a semantic
+actions opens the definition module defmod and starts the parser for
+definition modules. 
+
+.LP
+The fact that subparsers can be started just about anywhere causes
+problems when non-correcting error recovery is used. 
+Suppose a parser calls another parser in a semantic action
+to parse a separate input file. In the Modula-2 compiler, after 
+seeing the FROM defmod IMPORT statement a semantic action opens
+defmod and parses it; now, if a syntax error occurred before the
+FROM IMPORT statement, the non-correcting error recovery will not
+execute the action that opens and parses the definition module, but
+it will not report an error either, because the statement 
+FROM defmod IMPORT is part of the input language of the main parser.
+However, suppose that during the parsing of a definition module
+an error occurs; then, some semantic actions that would normally
+be executed during parsing of the definition module will not have
+taken place. When normal parsing is now resumed by the main parser,
+after the non-correcting error recovery has finished with the
+definition module, a lot of spurious semantic errors are likely to be
+reported, because the semantic actions that would normally have been
+executed during the definition module parsing have not been executed
+by the error recovery. Therefore, it is desirable that the main parser
+does not resume normal parsing, but instead continues with the non-correcting
+error recovery as well. Any syntactic errors in the main program will
+still be reported, but no spurious semantic errors will be reported
+that way.
+
+.LP
+When the lexical analyzer calls other parsers, as is the case in
+the ACK C compiler, recursive invocations of the non-correcting error
+recovery routine can occur. This will happen if a parser starts the
+error recovery, the error recovery calls the lexical analyzer, which
+starts another parser that finds a syntax error and calls the
+error recovery again. This is not really a problem, but is has
+consequences for the implementation of the error recovery routine.
+
+.LP
+The worst case
+occurs when two parsers are involved in parsing one input file, and
+the secondary parser (e.g. an inline assembly parser) is called in a semantic
+action of the main parser. Suppose now that the input text contains
+a syntax error; after detecting this error, the parser starts the
+non-correcting error recovery. This recovery does not execute any
+semantic actions; therefore it will not start the subparser at those points
+where the original LLgen generated parser would. As a result, parts
+of the program that would be accepted by the subparser will now probably
+be rejected as illegal, because the error recovery does not know it
+should use another grammar to check these parts. This is a serious
+problem, and we have devised and implemented two ways to solve it.
+
+.LP
+The first solution is based on the assumption that whenever a semantic
+action occurs in the grammar, another parser can be started at that
+point. Obviously, we have no way of knowing which semantic actions start
+a parser and which don't, so we assume the worst.
+Now, assume that in the grammar there are k symbols defined as
+start symbols, say $W sub 1 , W sub 2 , ..., W sub k$. Each of these symbols
+will cause LLgen to generate a parser that can be called in any
+of the semantic actions of the grammar. We now introduce a new
+symbol $X$, and a new grammar rule $X -> W sub 1 X | W sub 2 X | ... | 
+W sub k X |
+epsilon$.
+In the grammar the error recovery algorithm uses, we insert this symbol
+X at all positions where there are semantic actions in the original grammar,
+so a rule $A -> alpha$ { action } $beta$ becomes $A -> alpha X beta$. As a
+result, at each position in a grammar rule where a semantic action 
+occurs, we now accept any input that would be accepted by any of the
+parsers. Clearly, this solution is somewhat of a kludge, as it will
+accept a lot of input that is not accepted by the original parser.
+However, it is guaranteed to never give spurious error messages, because
+whenever a parser would be started by the original parser, there now
+is an $X$ in the grammar that produces all the strings that would be
+accepted by that parser. We have implemented this solution, and found
+it to be extremely slow, which of course was to be expected given the
+number of semantic actions in the average grammar. Furthermore,
+because each time a semantic action occurs in the grammar
+a string accepted by any of the generated parsers is accepted, including
+strings recognized by the currently running parser, error messages
+become hard to interpret. As an example, consider the following
+C program:
+.br
+.nf
+
+
+	main()
+	{
+		int i, j;
+
+		while (i < j 
+           		j++;
+
+		i = 1;
+		j = 2;
+
+	}
+
+
+.fi
+.LP
+Clearly, there is a `)' missing in the while-statement;
+however, if this program is input to the error recovery it will complain 
+"} illegal", since after recognizing the
+expression controlling the while the original parser starts  a 
+semantic action, so the non-correcting recovery will accept a valid
+C program at that point; after recognizing the three statements
+following the while-statement as a separate program the
+recognizer expects the missing `)', but gets `}' instead.
+
+.LP
+Our second solution is based on the observation that if we knew
+which semantic actions can start other parsers, we would only
+have to introduce the new symbol $X$ at those places where parsers
+can get started. We have therefore extended LLgen with a new directive
+%substart, which is used to indicate to the parser generator that
+another parser may be started. The %substart is followed by the
+startsymbols that will produce the parsers that can be called,
+so %substart A, B, C; indicates that in the semantic action
+following the directive the parsers produced by startsymbols
+A, B, en C can be started. In the grammar used by the error
+recovery, a new symbol $X$ will be introduced at this point,
+along with a new rule $X -> AX | BX | CX | epsilon$. Of course, this
+solution can still accept input that would not have been accepted
+by original parser, for instance if a parser is started
+conditionally, based on other semantic information. However, it
+is a big improvement over the first solution, both in performance
+and the input it accepts.
+
+.NH 3
+Syntactic errors being handled in semantic actions
+
+.LP
+A programmer may decide to handle certain syntactic  errors
+in semantic actions, for instance because he is not satisfied with 
+the standard error recovery. However, since the non-correcting error
+recovery does not execute semantic actions, this may cause errors
+to remain undetected. We encountered the following example in the ACK
+Modula-2 compiler, in the grammar rule for assignment statement:
+.br
+.nf
+
+
+	Assignment_statement:	lvalue
+				[
+				 	'='
+					{
+					 error(":= expected");
+					}
+
+					|
+
+					':='
+				]
+				expression
+				;
+
+.fi
+.LP
+This works well in the original LLgen; however, statements like
+`j=9' are not treated as syntactic, but as semantic errors.
+The original LLgen generated parser
+will print the (semantic) error message, but the non-correcting recovery
+will not execute the semantic action and therefore the erroneous
+input will be accepted.
+
+.LP
+To facilitate the incorporation of non-correcting error recovery in parsers 
+that use this kind of `trick', we extended LLgen with the %erroneous
+directive. The directive indicates to the non-correcting recovery 
+mechanism that the token following it is not really part of the grammar.
+When recognizing input, the error recovery will ignore tokens in the
+grammar that have %erroneous in front of them. If in the example above,
+the '=' is replaced with %erroneous '=', the non-correcting mechanism will
+report an error when it sees a statement like 'j = 9'. See appendix B
+for details about the implementation of the %erroneous directive. 
+
+.LP
+Another example is in the ACK C compiler. For some reason, the
+grammar accepts function definitions without `()', so according
+to the syntax a function definition can look like:
+.br
+.nf
+
+	int func
+	{
+	  ....
+	}
+.fi
+
+.LP
+The absence of the `()', however, causes `func' to be entered in the
+symbol table as non-function, and when the parser encounters the body
+a semantic action will complain with the error message "Making function body
+for non-function". This again will cause the non-correcting error
+recovery to miss errors. Consider this piece of code:
+.br
+.nf
+
+int i int j = 1;
+{}
+
+.fi
+
+.LP
+where apparently there's a `;' missing between the declarations
+of i and j. The original LLgen-generated parser only gives semantic errors:
+.br
+.nf
+"Making function body for non-function"
+"j is not in parameter list"
+"Illegal initialization of formal parameter, ignored"
+.fi
+.LP
+As a result, the non-correcting error recovery will not report
+any errors in this piece of code, because it does not execute the
+semantic actions that recognize and report the error. Unfortunately,
+due to the way the C-grammar is written, it is not possible to solve
+this problem using a %erroneous directive; the part of the grammar 
+that deals with declaratons would have to be rewritten so as to
+syntactically reject functions without `()'. 
+
+.NH 3
+Semantic actions that read input
+
+.LP
+There are no restrictions on what a semantic action can do;
+there is nothing to stop the programmer from writing a parser in such
+a way that some of the input to the parser is processed by semantic
+actions. Obviously, because the non-correcting error recovery does not
+execute semantic actions, this kind of parser will not work at all
+with the new error recovery. Ironically, LLgen itself is written in
+such a fashion; {}-enclosed C-code in its input is processed by
+a semantic action in the LLgen grammar. We feel that it is bad
+practice to write parsers this way; the `eating' of parts of
+the input should be done in the lexical analyzer, not in the parser.
+After all, in the case of LLgen, one can regard a semantic action
+in the input as one token, and thus it should be handled by
+the lexical analyzer as such.
+
+.NH 2
+Examples of error recovery
+
+.LP
+We will now give some examples that compare non-correcting error
+recovery with the correcting error recovery used by parsers generated
+by `standard' LLgen.
+
+Consider the next C program, where there is a `)' missing in the
+header of function `test'.
+.br
+.nf
+
+	1 	int test(a,b
+	2
+	3	int a,b;
+	4
+	5	{
+	6		if (a < b)
+	7			return(1);
+	8		else
+	9			return(0);
+	10	}
+.fi
+
+.LP
+This small error derails the `standard' parser; it produces the
+following error messages, where we have left out 7 messages reporting
+semantic errors:
+.br
+.nf
+
+	line 3: , missing before type_identifier
+	line 3: , missing before identifier
+	line 3: ) missing before ;
+	line 5: { deleted
+	line 6: if deleted
+	line 6: < deleted
+	line 6: ) missing before identifier
+	line 6: ) deleted
+	line 7: identifier missing before return
+	line 7: ; missing before return
+	line 7: { missing before return
+	line 8: else deleted
+
+.fi
+.LP
+In contrast, the parser using non-correcting error recovery produces
+only one error message:
+.br
+
+	line 3: type_identifier illegal
+
+This error message correctly pin-points the error: there should
+have been a `)' at the position where type-identifier `int' is.
+
+.LP
+Now, an example with Modula-2; consider this program:
+.br
+.nf
+
+	1	MODULE test;
+	2
+	3	TYPES
+	4		ElementRecordType = RECORD
+	5		Element: ElementType;
+	6		Next,
+	7		Prior: ElementPointerType;
+	8	END;
+	9
+	10	VARS a,b,c: ElementRecordType;
+	11
+	12
+	13	BEGIN
+	14
+	15		a := b;
+	16
+	17	END test.
+
+.fi
+.LP
+There are two syntactic errors in this program; on line 3, TYPES should be TYPE, and
+on line 10, VARS should be VAR. We have left out the type declarations of
+ElementType and ElementPointerType; clearly this will generate semantic
+errors, but we are only interested in syntactic errors anyway.
+The correcting error recovery parser
+again derails on this program; it produces the following syntactic error messages:
+.br
+.nf
+
+	line 3: CONST missing before identifier
+	line 4: '=' missing before identifier
+	line 4: RECORD deleted
+	line 5: ':' deleted
+	line 5: ';' missing before identifier
+	line 5: '=' missing before ';'
+	line 5: number missing before ';'
+	line 6: ',' deleted
+	line 7: '=' missing before identifier
+	line 7: ':' deleted
+	line 7: ';' missing before identifier
+	line 7: '=' missing before ';'
+	line 7: number missing before ';'
+	line 8: ';' deleted
+	line 10: identifier deleted
+	line 10: ',' deleted
+	line 10: identifier deleted
+	line 10: ',' deleted
+	line 10: identifier deleted
+	line 10: ':' deleted
+	line 10: identifier deleted
+	line 10: ';' deleted
+	line 13: BEGIN deleted
+	line 15: identifier deleted
+	line 15: := deleted
+	line 15: identifier deleted
+	line 15: ';' deleted
+	line 17: END deleted
+	line 17: identifier deleted
+
+.fi
+.LP
+The error correction mechanism clearly makes the wrong guess by inserting
+CONST on line 3; as a result, all that follows is rejected as incorrect.
+In contrast, the non-correcting error recovery mechanism only produces
+two error messages:
+.br
+.nf
+
+	line 3: identifier illegal
+	line 10: identifier illegal
+
+.fi
+.LP
+This again exactly pin-points the errors: the identifiers TYPES and
+VARS constitute the only errors in the program. Note that the
+presence of more than one error does not cause any problems to the
+non-correcting recovery mechanism. 
+
+.bp
+.nr PS 12
+.nr VS 14
+
+.NH
+Conclusion
+
+.nr PS 10
+.nr VS 12
+
+.LP
+After implementing and testing a non-correcting error recovery mechanism 
+we have come to the conclusion that it indeed is superior to correcting
+mechanisms in what regards the error messages it produces; 
+the examples we have given clearly show this. However, there is a
+clear loss of performance when errors are present in a program, 
+although we have found this performance
+degradation to be acceptable. We feel that the benefits of
+better error messages outweigh the loss of performance. In any case,
+correct programs do not suffer at all from the incorporation 
+of a non-correcting recovery mechanism.
+The error recovery mechanism we implemented does not make
+unreasonable demands on resources; the size of the prediction
+graphs stays within reasonable limits. 
+
+.LP
+The main problems we encountered had to do with recognizing
+`languages within languages', and semantic actions that did
+unreasonable things like eating input. The more `well-behaved' a
+parser is, the better the results the non-correcting error recovery
+mechanism gives. This is also true for the input grammars: with a
+language like Modula-2, whose syntax has been designed with parser
+generators in mind, the performance of the non-correcting mechanism
+is better than with C, whose syntax is extremely hard, if not
+impossible to describe with a LL(1) grammar.
+
+.bp
+.nr PS 12
+.nr VS 14
+
+.NH
+Bibliography
+
+.nr PS 10
+.nr VS 12
+
+.IP [CORMACK] 12
+Gordon V. Cormack, `An LR substring parser for noncorrecting syntax error
+recovery', ACM SIGPLAN Notices, vol. 24, no. 7, p. 161-169, July 1989
+
+.IP [GRUNE] 12
+Dick Grune, Ceriel J.H. Jacobs, `A programmer friendly LL(1) parser
+generator', Softw. Pract. Exper., vol. 18, no. 1, p. 29-38, Jan 1988
+
+.IP [RICHTER] 12
+Helmut Richter, `Noncorrecting syntax error recovery', ACM Trans. Prog. Lang.
+Sys., vol.7, no.3, p. 478-489, July 1985
+
+.IP [ROEHRICH] 12
+Johannes R\*:ohrich, `Methods for the automatic construction of error
+correcting parsers', Acta Inform., vol. 13, no. 2, p. 115-139, Feb 1980
+
+.IP [TOMITA] 12
+Masaru Tomita, Efficient parsing for natural language, Kluwer Academic
+Publishers, Boston, p.210, 1986
+.bp
+.SH
+Appendix A: Implementation Issues
+
+.nr PS 10
+.nr VS 12
+.RS
+.LP
+In this appendix we will describe some implementation issues;
+the data structure used to store the grammar during non-correcting
+error recovery, postponing deletions of graph elements until after 
+the prediction phase, and the implementation of the %substart directive .
+.RE
+
+.SH
+A.1 The grammar data structure
+
+.LP
+The grammar data structure used by the non-correcting error recovery technique has
+to meet two conditions: easy access to a rule as a whole to make
+substituting nonterminals efficient and easy access to each symbol in the RHS 
+of a rule to make starting error recovery and finding continuations
+efficient. To fulfill these conditions we decided to construct the 
+storage of the grammar as follows.
+
+.LP
+A rule in the grammar is divided in two
+parts: a LHS and a RHS. The LHS is represented by a struct `lhs' and
+for each symbol in the RHS a struct 'symbol' is constructed. 
+A struct `lhs' contains the number of the 
+nonterminal forming the LHS of the rule, a pointer to the RHS, the 
+first- and follow-sets of the nonterminal and a flag 'empty' which
+indicates whether the nonterminal produces empty or not. A struct 
+`symbol' contains a field indicating the type of the symbol, i.e.
+a terminal or a nonterminal, the number of the symbol, a `link' pointer
+to a struct `symbol' that represents the same symbol, a `next' pointer
+to the rest of the RHS and a pointer back to the LHS.
+
+.LP
+A special struct `symbol' is added to the end of the RHS to indicate 
+the end of a rule. The type of this struct is LLEORULE, the number
+is set to -1 and the pointers 'link' and `next' are nil.
+
+.LP
+In case that there is more than one RHS for a LHS, all the RHS's 
+are put after each other and separated by another special struct 
+`symbol'. The type of this struct is LLALT, the number is set to 
+-1 and the 'link' pointer is nil. After the last RHS a `LLEORULE'-struct 
+marker is added. 
+
+.LP
+Finally, to make searching efficient there are two arrays: `terminals'
+and `nonterminals'. `terminals' is indexed by the number of a terminal
+and contains for each terminal a struct containing a 'link' pointer 
+to a symbol, representing this terminal, in the RHS of a rule. Because
+this symbol has again a 'link' pointer to another symbol representing
+the terminal, it is possible by following this chain of pointers 
+to find all rules containing such a terminal. In a similar way `nonterminals'
+is indexed by the number of a nonterminal and contains for each 
+nonterminal a struct. This struct not only contains a 'link' pointer
+linking all rules with this nonterminal, but also contains a 'rule' 
+pointer. This pointer points to the RHS or RHS's of the rules of which
+the nonterminal forms the LHS.
+ 
+.LP
+As an example, consider the following grammar:
+
+.br
+A:	a B
+.br
+B:	a | $epsilon$
+.br
+
+This will result in the picture below. Note that `pointer' fields 
+without an arrow indicate nil pointers.
+
+.PS
+dx = 0.05
+
+down 
+A_a: box ht boxht/2 "link"
+box invis "a" ljust with .e at A_a.w
+
+move to A_a.s
+move
+move
+
+A: box "link" "rule"
+B: box "link" "rule"
+line dashed from A.w to A.e 
+line dashed from B.w to B.e
+box invis "A" ljust with .e at A.w
+box invis "B" ljust with .e at B.w
+
+move to A.ne
+right
+move
+move
+down
+
+LHS_A: box wid 1.2 * boxwid ht 2.5 * boxht "`A'" "rhs" "first" "follow" "empty 0"
+line dashed from 0.2 <LHS_A.nw, LHS_A.sw> to 0.2 <LHS_A.ne, LHS_A.se>
+line dashed from 0.4 <LHS_A.nw, LHS_A.sw> to 0.4 <LHS_A.ne, LHS_A.se>
+line dashed from 0.6 <LHS_A.nw, LHS_A.sw> to 0.6 <LHS_A.ne, LHS_A.se>
+line dashed from 0.8 <LHS_A.nw, LHS_A.sw> to 0.8 <LHS_A.ne, LHS_A.se>
+
+move to LHS_A.ne + (1,0)
+
+RHS_a1: box wid 2.0 * boxwid ht 2.5 * boxht "LLTERM" "`a'" "link" "next" "lhs"
+line dashed from 0.2 <RHS_a1.nw, RHS_a1.sw> to 0.2 <RHS_a1.ne, RHS_a1.se>
+line dashed from 0.4 <RHS_a1.nw, RHS_a1.sw> to 0.4 <RHS_a1.ne, RHS_a1.se>
+line dashed from 0.6 <RHS_a1.nw, RHS_a1.sw> to 0.6 <RHS_a1.ne, RHS_a1.se>
+line dashed from 0.8 <RHS_a1.nw, RHS_a1.sw> to 0.8 <RHS_a1.ne, RHS_a1.se>
+
+move to RHS_a1.ne + (1,0)
+
+RHS_B: box wid 2.0 * boxwid ht 2.5 * boxht "LLNONTERM" "`B'" "link" "next" "lhs"
+line dashed from 0.2 <RHS_B.nw, RHS_B.sw> to 0.2 <RHS_B.ne, RHS_B.se>
+line dashed from 0.4 <RHS_B.nw, RHS_B.sw> to 0.4 <RHS_B.ne, RHS_B.se>
+line dashed from 0.6 <RHS_B.nw, RHS_B.sw> to 0.6 <RHS_B.ne, RHS_B.se>
+line dashed from 0.8 <RHS_B.nw, RHS_B.sw> to 0.8 <RHS_B.ne, RHS_B.se>
+
+move to RHS_B.ne + (1,0)
+
+RHS_END1: box wid 2.0 * boxwid ht 2.5 *boxht "LLEORULE" "-1" "link" "next" "lhs"
+line dashed from 0.2 <RHS_END1.nw, RHS_END1.sw> to 0.2 <RHS_END1.ne,RHS_END1.se>
+line dashed from 0.4 <RHS_END1.nw, RHS_END1.sw> to 0.4 <RHS_END1.ne,RHS_END1.se>
+line dashed from 0.6 <RHS_END1.nw, RHS_END1.sw> to 0.6 <RHS_END1.ne,RHS_END1.se>
+line dashed from 0.8 <RHS_END1.nw, RHS_END1.sw> to 0.8 <RHS_END1.ne,RHS_END1.se>
+
+
+move to LHS_A.s - (0,1)
+
+LHS_B: box wid 1.2 * boxwid ht 2.5 * boxht "`B'" "rhs" "first" "follow" "empty 1"
+line dashed from 0.2 <LHS_B.nw, LHS_B.sw> to 0.2 <LHS_B.ne, LHS_B.se>
+line dashed from 0.4 <LHS_B.nw, LHS_B.sw> to 0.4 <LHS_B.ne, LHS_B.se>
+line dashed from 0.6 <LHS_B.nw, LHS_B.sw> to 0.6 <LHS_B.ne, LHS_B.se>
+line dashed from 0.8 <LHS_B.nw, LHS_B.sw> to 0.8 <LHS_B.ne, LHS_B.se>
+
+move to LHS_B.ne + (1,0)
+
+RHS_a2: box wid 2.0 * boxwid ht 2.5 * boxht "LLTERM" "`a'" "link" "next" "lhs"
+line dashed from 0.2 <RHS_a2.nw, RHS_a2.sw> to 0.2 <RHS_a2.ne, RHS_a2.se>
+line dashed from 0.4 <RHS_a2.nw, RHS_a2.sw> to 0.4 <RHS_a2.ne, RHS_a2.se>
+line dashed from 0.6 <RHS_a2.nw, RHS_a2.sw> to 0.6 <RHS_a2.ne, RHS_a2.se>
+line dashed from 0.8 <RHS_a2.nw, RHS_a2.sw> to 0.8 <RHS_a2.ne, RHS_a2.se>
+
+move to RHS_a2.ne + (1,0)
+
+RHS_ALT: box wid 2.0 * boxwid ht 2.5 * boxht "LLALT" "-1" "link" "next" "lhs"
+line dashed from 0.2 <RHS_ALT.nw, RHS_ALT.sw> to 0.2 <RHS_ALT.ne, RHS_ALT.se>
+line dashed from 0.4 <RHS_ALT.nw, RHS_ALT.sw> to 0.4 <RHS_ALT.ne, RHS_ALT.se>
+line dashed from 0.6 <RHS_ALT.nw, RHS_ALT.sw> to 0.6 <RHS_ALT.ne, RHS_ALT.se>
+line dashed from 0.8 <RHS_ALT.nw, RHS_ALT.sw> to 0.8 <RHS_ALT.ne, RHS_ALT.se>
+
+move to RHS_ALT.ne + (1,0)
+
+RHS_END2: box wid 2.0 * boxwid ht 2.5 *boxht "LLEORULE" "-1" "link" "next" "lhs"
+line dashed from 0.2 <RHS_END2.nw, RHS_END2.sw> to 0.2 <RHS_END2.ne,RHS_END2.se>
+line dashed from 0.4 <RHS_END2.nw, RHS_END2.sw> to 0.4 <RHS_END2.ne,RHS_END2.se>
+line dashed from 0.6 <RHS_END2.nw, RHS_END2.sw> to 0.6 <RHS_END2.ne,RHS_END2.se>
+line dashed from 0.8 <RHS_END2.nw, RHS_END2.sw> to 0.8 <RHS_END2.ne,RHS_END2.se>
+
+# Next pointers upper row
+.ps 30
+circle radius .01 at 0.75 <A.ne, A.se> - (dx, 0) 
+circle radius .01 at 0.3 <LHS_A.ne, LHS_A.se> - (dx, 0) 
+circle radius .01 at 0.7 <RHS_a1.ne, RHS_a1.se> - (dx, 0) 
+circle radius .01 at 0.7 <RHS_B.ne, RHS_B.se> - (dx, 0) 
+.ps 10 
+
+arrow from 0.75 <A.ne, A.se> - (dx, 0) to 0.3 <LHS_A.nw, LHS_A.sw>
+arrow from 0.3 <LHS_A.ne, LHS_A.se> - (dx, 0) to 0.3 <RHS_a1.nw,RHS_a1.sw>
+arrow from 0.7 <RHS_a1.ne, RHS_a1.se> - (dx, 0) to 0.7 <RHS_B.nw,RHS_B.sw>
+arrow from 0.7 <RHS_B.ne, RHS_B.se> - (dx, 0) to 0.7 <RHS_END1.nw, RHS_END1.sw>
+
+
+# Next pointers lower row
+.ps 30
+circle radius .01 at 0.75 <B.ne, B.se> - (dx, 0) 
+circle radius .01 at 0.3 <LHS_B.ne, LHS_B.se> - (dx, 0) 
+circle radius .01 at 0.7 <RHS_a2.ne, RHS_a2.se> - (dx, 0) 
+circle radius .01 at 0.7 <RHS_ALT.ne, RHS_ALT.se> - (dx, 0) 
+.ps 10
+
+arrow from 0.75 <B.ne, B.se> - (dx, 0) to 0.3 <LHS_B.nw, LHS_B.sw>
+arrow from 0.3 <LHS_B.ne, LHS_B.se> - (dx, 0) to 0.3 <RHS_a2.nw,RHS_a2.sw>
+arrow from 0.7 <RHS_a2.ne, RHS_a2.se> - (dx, 0) to 0.7 <RHS_ALT.nw,RHS_ALT.sw>
+arrow from 0.7 <RHS_ALT.ne, RHS_ALT.se> - (dx, 0) to 0.7 <RHS_END2.nw, RHS_END2.sw>
+
+
+# Link pointers
+.ps 30
+circle radius .01 at 0.5 <RHS_a1.ne, RHS_a1.se> - (2*dx, 0) 
+circle radius .01 at 0.5 <A_a.ne, A_a.se> - (dx, 0) 
+circle radius .01 at 0.25 <B.ne, B.se> - (dx, 0) 
+.ps 10
+
+arrow dashed from 0.5 <RHS_a1.ne, RHS_a1.se> - (2*dx, 0) to RHS_a2.ne - (2*dx,0)
+line dashed from 0.5 <A_a.ne, A_a.se> - (dx, 0) right 4.0 * boxwid then to RHS_a1.ne - (2*dx, 0) ->
+line dashed from 0.25 <B.ne, B.se> - (dx, 0) right then up .75 then right 7.0 * boxwid then to RHS_B.ne - (2*dx, 0) ->
+
+
+# LHS pointers upper row
+.ps 30
+circle radius .01 at 0.9 <RHS_a1.ne, RHS_a1.se> - (3*dx, 0) 
+circle radius .01 at 0.9 <RHS_B.ne, RHS_B.se> - (3*dx, 0) 
+circle radius .01 at 0.9 <RHS_END1.ne, RHS_END1.se> - (3*dx, 0) 
+.ps 10
+
+line from 0.9 <RHS_a1.ne, RHS_a1.se> - (3*dx, 0) down ->
+line from 0.9 <RHS_B.ne, RHS_B.se> - (3*dx, 0) down ->
+line from 0.9 <RHS_END1.ne, RHS_END1.se> - (3*dx, 0) down then left 8.0 * boxwid then to LHS_A.se -> 
+
+
+# LHS pointers lower row
+.ps 30
+circle radius .01 at 0.9 <RHS_a2.ne, RHS_a2.se> - (3*dx, 0)
+circle radius .01 at 0.9 <RHS_ALT.ne, RHS_ALT.se> - (3*dx, 0)
+circle radius .01 at 0.9 <RHS_END2.ne, RHS_END2.se> - (3*dx, 0)
+.ps 10
+
+line from 0.9 <RHS_a2.ne, RHS_a2.se> - (3*dx, 0) down ->
+line from 0.9 <RHS_ALT.ne, RHS_ALT.se> - (3*dx, 0) down ->
+line from 0.9 <RHS_END2.ne, RHS_END2.se> - (3*dx, 0) down then left 8.0 * boxwid then to LHS_B.se ->
+
+
+# Text above structs
+box invis ht boxht/2 "terminals" with .s at A_a.n
+box invis ht boxht/2 "nonterminals" with .s at A.n
+box invis ht boxht/2 "lhs" with .s at LHS_A.n
+box invis ht boxht/2 "lhs" with .s at LHS_B.n
+box invis ht boxht/2 "symbol" with .s at RHS_a1.n
+box invis ht boxht/2 "symbol" with .s at RHS_B.n
+box invis ht boxht/2 "symbol" with .s at RHS_END1.n
+box invis ht boxht/2 "symbol" with .s at RHS_a2.n
+box invis ht boxht/2 "symbol" with .s at RHS_ALT.n
+box invis ht boxht/2 "symbol" with .s at RHS_END2.n
+.PE
+
+.LP
+Note that the empty alternative for `B' is represented in the 
+data structure by the `LLEORULE-struct' immediately following
+the `LLALT'-struct. When there are still other alternatives 
+the `LLEORULE'-struct is replaced by a `LLALT'-struct followed
+by the other alternatives and a `LLEORULE'-struct. 
+Finally, when the empty rule is the only rule for a 
+nonterminal the RHS will consist only of a `LLEORULE'-struct.
+
+.SH
+A.2 Delayed deletes
+
+.LP
+We encountered a problem with deleting elements during the 
+prediction phase. Imagine that we have a nonterminal `B' on top of 
+the graph, and `B' has two alternatives. Now suppose that we  
+apply the first alternative and we find out that this alternative leads 
+to a `dead end', i.e. a head that does not match the input symbol, so we want
+to get rid of it. When we delete it immediately the deletion algorithm
+will also deallocate `[B]' and possibly some elements below `[B]'.
+However, there was another alternative for `[B]' which was not yet 
+developed and maybe this alternative leads to a head which is legal.
+But `[B]' has already been deleted and thus cannot be used anymore. A similar 
+situation can occur when we want to delete a joined element; 
+the substitution of a nonterminal
+that only produces empty and thus has no element above it in the graph 
+can also lead to such a situation. We therefore decided to put `dead ends' 
+on a list, `cleanup_arr[]', and after the prediction phase has 
+finished we delete all elements on this list, and all their descendants
+that become unreachable of course.
+
+.SH
+A.3 Clearing flags
+
+.LP
+We implemented two different ways to clear the flags set by the prediction 
+phase of the algorithm; the first recursively tracks down the whole graph 
+following the flags, the second puts all elements visited by 
+the prediction phase
+on a list; after the prediction phase has finished the algorithm walks 
+through this list clearing the flags of all elements on it. We took measurements
+on both algorithms and found out that with small programs the times
+did not differ much but large programs were processed faster by the
+second algorithm. Therefore we decided to use the second algorithm. 
+
+.LP
+To speed up the algorithm even more, we do not deallocate the list
+after a prediction phase has finished. We just set the number of 
+elements on the list to 0. This saves considerably on the number
+of `Malloc'-calls.
+
+.SH
+A.4 Implementation of %erroneous directive
+
+.LP
+As explained in chapter 3, the user can put a %erroneous directive
+in front of a terminal, making the non-correcting error recovery
+mechanism ignore that terminal. However, implementing this directive
+was not entirely straightforward; consider, for example, the rule
+.br
+.nf
+
+	A:	'a' | %erroneous 'b' | 'c';
+
+.fi
+.LP
+Just leaving out terminal 'b' will not do, because then nonterminal
+A produces empty all of a sudden, which it did not before. 
+The rule should become
+.br
+.nf
+
+	A:	'a' | 'c';
+
+.fi
+but this is hard to implement in LLgen. We took a different approach:
+we introduce a new terminal 'ERRONEOUS', and substitute it for all 
+terminals with an %erroneous directive in front of them. Thus, the
+example rule becomes
+.br
+.nf
+
+	A:	'a' | ERRONEOUS | 'c';
+
+.fi
+.LP
+Since the terminal ERRONEOUS will never be in the input to the parser,
+this has exactly the desired effect; when a predicting phase produces
+ERRONEOUS as head of a prediction graph this head will never match the
+input. In particular, it will not match the terminal that was
+originally there (in this case 'b') so that terminal is no longer
+regarded as part of the input language at that point.
+.bp
+.SH
+Appendix B: Using the non-correcting error recovery
+
+.LP
+To use the new non-correcting error recovery mechanism, LLgen has to
+be called with the new flag -n. LLgen will then create an extra file
+called `Lncor.c' which contains the code for the non-correcting recovery 
+mechanism. This file has to be compiled and linked with the rest
+of the program, just like the file `Lpars.c'. 
+
+.LP
+The user-supplied error reporting routine `LLmessage' will have to be
+modified slightly; when it is called with a positive parameter, it
+should only set the attributes of the inserted token, but not report an 
+error. Note that the lexical analyzer still must return the same token
+as it did the last time it was called. When LLmessage is called with 
+parameter 0, it should report that the token in global variable LLsymb 
+is illegal; if the value of LLsymb is `EOFILE', the routine should
+report an unexpected End-of-file. When LLmessage is called with parameter
+-1, it should report that end-of-file was expected. To facilitate
+switching between correcting and non-correcting error recovery,
+the file Lpars.h contains a statement `#define LLNONCORR' 
+which indicates that the non-correcting
+mechanism is enabled.
+Here is a
+skeleton for the modified LLmessage routine:
+.nr PS 8
+.nr VS 10
+.LP
+.br
+.nf
+
+	#include "Lpars.h"
+	extern int LLsymb;
+
+	LLmessage(flag) 
+	int flag;
+	{
+		if (flag < 0)     
+		{
+			/* Error message "end-of-file expected" */;
+		}
+		else if (flag)    
+		{
+			/* flag equals the number of the inserted token */
+#ifndef LLNONCORR 
+
+			/* Error message "token inserted" */;	
+#endif
+
+			/* Code to set attributes for inserted token */
+			/* Code to make lexical analyzer return same token as before */
+
+		else    
+		{
+			/* The number of the illegal or deleted token is in LLsymb */
+#ifndef LLNONCORR
+
+			/* Error message "token deleted" */;
+#else
+
+			if (LLsymb == EOFILE)
+			{
+				/* Error message "unexpected end of file" */
+			}
+			else
+			{
+				/* Error message "token illegal" */;
+			}
+#endif
+			
+		}
+
+	}
+
+.fi
+.nr PS 10
+.nr VS 12
+
+.LP
+For best results, one should check if the parser calls other parsers
+in semantic actions; if this is the case, and the called parser
+processes the same input file as the calling parser, then a %substart
+should be put in front of the semantic action that starts a parser.
+If a semantic action calls parsers defined by startsymbols say
+A and B, then `%substart A, B;' should be put in front of the action.
+As an alternative, one can use the -s flag of LLgen; this has the
+same effect as putting `%substart X, Y, ....;' in front of all
+semantic actions, where X, Y, .... are the startsymbols of the grammar.
+Clearly, it is preferable to analyze the grammar and put %substart
+directives only where appropriate.
+
+Finally, beware of syntactic errors being handled in semantic
+actions; eg, one could have a rule like
+.nr PS 8
+.nr VS 10
+.LP
+.br
+.nf
+
+        Assignment_statement:   lvalue
+                                [
+                                        '='
+                                        {
+                                         error(":= expected");
+                                        }
+
+                                        |
+
+                                        ':='
+                                ]
+                                expression
+                                ;
+.fi
+
+.nr PS 10
+.nr VS 12
+.LP
+To ensure that the non-correcting mechanism will recognize the
+`=' as a syntactic error, a `%erroneous' directive should be
+put in front of it.