From: sater <none@none>
Date: Fri, 29 Jun 1984 11:21:50 +0000 (+0000)
Subject: Initial revision
X-Git-Tag: release-5-5~6238
X-Git-Url: https://git.ndcode.org/public/gitweb.cgi?a=commitdiff_plain;h=59c2380f8502ef88271d857d8921b5e00908f1c7;p=ack.git

Initial revision
---

diff --git a/doc/Makefile b/doc/Makefile
new file mode 100644
index 000000000..b8b84410f
--- /dev/null
+++ b/doc/Makefile
@@ -0,0 +1,39 @@
+SUF=pr
+PRINT=cat
+RESFILES=cref.$(SUF) pcref.$(SUF) val.$(SUF) v7bugs.$(SUF) install.$(SUF)\
+ack.$(SUF) cg.$(SUF) regadd.$(SUF) peep.$(SUF) toolkit.$(SUF)
+NROFF=nroff
+
+cref.$(SUF):        cref.doc
+		tbl $? | $(NROFF) >$@
+v7bugs.$(SUF):      v7bugs.doc
+		$(NROFF) -ms $? >$@
+ack.$(SUF):         ack.doc
+		$(NROFF) -ms $? >$@
+cg.$(SUF):		cg.doc
+		$(NROFF) -ms $? >$@
+regadd.$(SUF):		regadd.doc
+		$(NROFF) -ms $? >$@
+install.$(SUF):     install.doc
+		$(NROFF) -ms $? >$@
+pcref.$(SUF):       pcref.doc
+		$(NROFF) $? >$@
+peep.$(SUF):	peep.doc
+		$(NROFF) -ms $? >$@
+val.$(SUF):         val.doc
+		$(NROFF) $? >$@
+toolkit.$(SUF):	toolkit.doc
+		$(NROFF) -ms $? >$@
+
+install cmp:
+
+pr:
+		@make "SUF="$SUF "NROFF="$NROFF "PRINT="$PRINT $(RESFILES) \
+			>make.pr.out 2>&1
+		@$(PRINT) $(RESFILES)
+
+opr:
+		make pr | opr
+
+clean:
+		-rm -f *.old $(RESFILES) *.t
diff --git a/doc/ack.doc b/doc/ack.doc
new file mode 100644
index 000000000..258ed8bb0
--- /dev/null
+++ b/doc/ack.doc
@@ -0,0 +1,419 @@
+.nr LL 7.5i
+.tr ~
+.nr PD 1v
+.TL
+Ack Description File
+.br
+Reference Manual
+.AU
+Ed Keizer
+.AI
+Wiskundig Seminarium
+Vrije Universiteit
+Amsterdam
+.NH
+Introduction
+.PP
+The program \fIack\fP(I) internally maintains a table of
+possible transformations and a table of string variables.
+The transformation table contains one entry for each possible
+transformation of a file.
+Which transformations are used depends on the suffix of the
+source file.
+Each transformation table entry tells which input suffixes are
+allowed and what suffix/name the output file has.
+When the output file does not already satisfy the request of the
+user, with the flag \fB-c.suffix\fP, the table is scanned
+starting with the next transformation in the table for another
+transformation that has as input suffix the output suffix of
+the previous transformation.
+A few special transformations are recognized, among them is the
+combiner.
+A program combining several files into one.
+When no stop suffix was specified (flag \fB-c.suffix\fP) \fIack\fP
+stops after executing the combiner with as arguments the -
+possibly transformed - input files and libraries.
+\fIAck\fP will only perform the transformations in the order in
+which they are presented in the table.
+.LP
+The string variables are used while creating the argument list
+and program call name for
+a particular transformation.
+.NH
+Which descriptions are used
+.PP
+\fIAck\fP always uses two description files: one to define the
+front-end transformations and one for the machine dependent
+back-end transformations.
+Each description has a name.
+First the way of determining
+the name of the descriptions needed is described.
+.PP
+When the shell environment variable ACKFE is set \fIack\fP uses
+that to determine the front-end table name, otherwise it uses
+\fBfe\fP.
+.PP
+The way the backend table name is determined is more
+convoluted.
+.br
+First, when the last filename in the program call name is not
+one of \fIack\fP, \fIcc\fP, \fIacc\fP, \fIpc\fP or \fIapc\fP,
+this filename is used as the backend description name.
+Second, when the \fB-m\fP is present the \fB-m\fP is chopped of this
+flag and the rest is used as the backend description name.
+Third, when both failed the shell environment variable ACKM is
+used.
+Last, when also ACKM was not present the default backend is
+used, determined by the definition of ACKM in h/local.h.
+The presence and value of the definition of ACKM is
+determined at compile time of \fIack\fP.
+.PP
+Now, we have the names, but that is only the first step.
+\fIAck\fP stores a few descriptions at compile time.
+This descriptions are simply files read in at compile time.
+At the moment of writing this document, the descriptions
+included are: pdp, fe, i86, m68k2, vax2 and int.
+The name of a description is first searched for internally,
+then in the directory lib/ack and finally in the current
+directory of the user.
+.NH
+Using the description file
+.PP
+Before starting on a narrative of the description file,
+the introduction of a few terms is necessary.
+All these terms are used to describe the scanning of zero
+terminated strings, thereby producing another string or
+sequence of strings.
+.IP Backslashing 5
+.br
+All characters preceded by \e are modified to prevent
+recognition at further scanning.
+This modification is undone before a string is passed to the
+outside world as argument or message.
+When reading the description files the
+sequences \e\e, \e# and \e<newline> have a special meaning.
+\e\e translates to a single \e, \e# translates to a single #
+that is not
+recognized as the start of comment, but can be used in
+recognition and finally, \e<newline> translates to nothing at
+all, thereby allowing continuation lines.
+.nr PD 0
+.IP "Variable replacement"
+.br
+The scan recognizes the sequences {{, {NAME} and {NAME?text}
+Where NAME can be any combination if characters excluding ? and
+} and text may be anything excluding }.
+(~\e} is allowed of course~)
+The first sequence produces an unescaped single {.
+The second produces the contents of the NAME, definitions are
+done by \fIack\fP and in description files.
+When the NAME is not defined an error message is produced on
+the diagnostic output.
+The last sequence produces the contents of NAME if it is
+defined and text otherwise.
+.PP
+.IP "Expression replacement"
+.br
+Syntax:  (\fIsuffix sequence\fP:\fIsuffix sequence\fP=\fItext\fP)
+.br
+Example: (.c.p.e:.e=tail_em)
+.br
+If the two suffix sequences have a common member -~\&.e in this
+case~- the text is produced.
+When no common member is present the empty string is produced.
+Thus the example given is a constant expression.
+Normally, one of the suffix sequences is produced by variable
+replacement.
+\fIAck\fP sets three variables while performing the diverse
+transformations: HEAD, TAIL and RTS.
+All three variables depend on the properties \fIrts\fP and
+\fIneed\fP from the transformations used.
+Whenever a transformation is used for the first time,
+the text following the \fIneed\fP is appended to both the HEAD and
+TAIL variable.
+The value of the variable RTS is determined by the first
+transformation used with a \fIrts\fP property.
+.LP
+Two runtime flags have effect on the value of one or more of
+these variables.
+The flag \fB-.suffix\fP has the same effect on these three variables
+as if a file with that \fBsuffix\fP was included in the argument list
+and had to be translated.
+The flag \fB-r.suffix\fP only has that effect on the TAIL
+variable.
+The program call names \fIacc\fP and \fIcc\fP have the effect
+of an automatic \fB-.c\fB flag.
+\fIApc\fP and \fIpc\fP have the effect of an automatic \fB-.p\fP flag.
+.IP "Line splitting"
+.br
+The string is transformed into a sequence of strings by replacing
+the blank space by string separators (nulls).
+.IP "IO replacement"
+.br
+The > in the string is replaced by the output file name.
+The < in the string is replaced by the input file name.
+When multiple input files are present the string is duplicated
+for each input file name.
+.nr PD 1v
+.LP
+Each description is a sequence of variable definitions followed
+by a sequence of transformation definitions.
+Variable definitions use a line each, transformations
+definitions consist of a sequence of lines.
+Empty lines are discarded, as are lines with nothing but
+comment.
+Comment is started by a # character, and continues to the end
+of the line.
+Three special two-characters sequences exist: \e#, \e\e and
+\e<newline>.
+Their effect is described under 'backslashing' above.
+Each - nonempty - line starts with a keyword, possibly
+preceded by blank space.
+The keyword can be followed by a further specification.
+The two are separated by blank space.
+.PP
+Variable definitions use the keyword \fIvar\fP and look like this:
+.DS X
+   var NAME=text
+.DE
+The name can be any identifier, the text may contain any
+character.
+Blank space before the equal sign is not part of the NAME.
+Blank space after the equal is considered as part of the text.
+The text is scanned for variable replacement before it is
+associated with the variable name.
+.br
+.sp 2
+The start of a transformation definition is indicated by the
+keyword \fIname\fP.
+The last line of such a definition contains the keyword
+\fIend\fP.
+The lines in between associate properties to a transformation
+and may be presented in any order.
+The identifier after the \fIname\fP keyword determines the name
+of the transformation.
+This name is used for debugging and by the \fB-R\fP flag.
+The keywords are used to specify which input suffices are
+recognized by that transformation,
+the program to run, the arguments to be handed to that program
+and the name or suffix of the resulting output file.
+Two keywords are used to indicate which run-time startoffs and
+libraries are needed.
+The possible keywords are:
+.IP \fIfrom\fP
+.br
+followed by a sequence of suffices.
+Each file with one of these suffices is allowed as input file.
+Preprocessor transformations, those with the \fBP\fP property
+after the \fIprop\fP keyword, do not need the \fIfrom\fP
+keyword. All other transformations do.
+.nr PD 0
+.IP \fIto\fP
+.br
+followed by the suffix of the output file name or in the case of a
+linker -~indicated by C option after the \fIprop\fP keyword~-
+the output file name.
+.IP \fIprogram\fP
+.br
+followed by name of the load file of the program, a pathname most likely
+starts with either a / or {EM}.
+This keyword must be
+present, the remainder of the line
+is subject to backslashing and variable replacement.
+.IP \fImapflag\fP
+.br
+The mapflags are used to grab flags given to \fIack\fP and
+pass them on to a specific transformation.
+This feature uses a few simple pattern matching and replacement
+facilities.
+Multiple occurences of this keyword are allowed.
+This text following the keyword is
+subjected to backslashing.
+The keyword is followed by a match expression and a variable
+assignment separated by blank space.
+As soon as both description files are read, \fIack\fP looks
+at all transformations in these files to find a match for the
+flags given to \fIack\fP.
+The flags \fB-m\fP, \fB-o\fP,
+\fI-O\fP, \fB-r\fP, \fB-v\fP, \fB-g\fP, -\fB-c\fP, \fB-t\fP,
+\fB-k\fP, \fB-R\fP and -\f-.\fP are specific to \fIack\fP and
+not handed down to any transformation.
+The matching is performed in the order in which the entries
+appear in the definition.
+The scanning stops after first match is found.
+When a match is found, the variable assignment is executed.
+A * in the match expression matches any sequence of characters,
+a * in the right hand part of the assignment is
+replaced by the characters matched by
+the * in the expression.
+The right hand part is also subject to variable replacement.
+The variable will probably be used in the program arguments.
+The \fB-l\fP flags are special,
+the order in which they are presented to \fIack\fP must be
+preserved.
+The identifier LNAME is used in conjunction with the scanning of
+\fB-l\fP flags.
+The value assigned to LNAME is used to replace the flag.
+The example further on shows the use all this.
+.IP \fIargs\fP
+.br
+The keyword is followed by the program call arguments.
+It is subject to backslashing, variable replacement, expression
+replacement, line splitting and IO replacement.
+The variables assigned to by \fImapflags\P will probably be
+used here.
+The flags not recognized by \fIack\fP or any of the transformations
+are passed to the linker and inserted before all other arguments.
+.IP \fIprop\fB
+.br
+This -~optional~- keyword is followed by a sequence of options,
+each option is indicated by one character
+signifying a special property of the transformation.
+The possible options are:
+.DS X
+   <            the input file will be read from standard input
+   >            the output file will be written on standard output
+   p            the input files must be preprocessed
+   m            the input files must be preprocessed when starting with #
+   O            this transformation is an optimizer and may be skipped
+   P            this transformation is the preprocessor
+   C            this transformation is the linker
+.DE
+.IP \fIrts\fP
+.br
+This -~optional~- keyword indicates that the rest of the line must be
+used to set the variable RTS, if it was not already set.
+Thus the variable RTS is set by the first transformation
+executed which such a property or as a result from \fIack\fP's program
+call name (acc, cc, apc or pc) or by the \fB-.suffix\fP flag.
+.IP \fIneed\fP
+.br
+This -~optional~- keyword indicates that the rest of the line must be
+concatenated to the NEEDS variable.
+This is done once for every transformation used or indicated
+by one of the program call names mentioned above or indicated
+by the \fB-.suffix\fP flag.
+.br
+.nr PD 1v
+.NH
+Conventions used in description files
+.PP
+\fIAck\fP reads two description files.
+A few of the variables defined in the machine specific file
+are used by the descriptions of the front-ends.
+Other variables, set by \fack\fB, are of use to all
+transformations.
+.PP
+\fIAck\fP sets the variable EM to the home directory of the
+Amsterdam Compiler Kit.
+The variable SOURCE is set to the name of the argument that is currently
+being massaged, this is usefull for debugging.
+.br
+The variable M indicates the
+directory in mach/{M}/lib/tail_..... and NAME is the string to
+be defined by the preprocessor with -D{NAME}.
+The definitions of {w}, {s}, {l}, {d}, {f} and {p} indicate
+EM_WSIZE, EM_SSIZE, EM_LSIZE, EM_DSIZE, EM_FSIZE and EM_PSIZE
+respectively.
+.br
+The variable INCLUDES is used as the last argument to \fIcpp\fP,
+it is currently used to add the directory {EM}/include to
+the list of directories containing #include files.
+{EM}/include contains a few files used by the library routines
+for part III from the
+.UX
+manual.
+These routines are included in the kit.
+.PP
+The variables HEAD, TAIL and RTS are set by \fIack\fP and used
+to compose the arguments for the linker.
+.NH
+Example
+.sp 1
+description for front-end
+.DS X
+name cpp                        # the C-preprocessor
+        # no from, it's governed by the P property
+        to .i                   # result files have suffix i
+        program {EM}/lib/cpp    # pathname of loadfile
+        mapflag -I* CPP_F={CPP_F?} -I*          # grab -I.. -U.. and
+        mapflag -U* CPP_F={CPP_F?} -U*          # -D.. to use as arguments
+        mapflag -D* CPP_F={CPP_F?} -D*          # in the variable CPP_F
+        args {CPP_F?} {INCLUDES?} -D{NAME} -DEM_WSIZE={w} -DEM_PSIZE={p} \
+-DEM_SSIZE={s} -DEM_LSIZE={l} -DEM_FSIZE={f} -DEM_DSIZE={d} <
+                                # The arguments are: first the -[IUD]...
+                                #  then the include dir's for this machine
+                                #  then the NAME and size valeus finally
+                                #  followed by the input file name
+        prop >P                 # Output on stdout, is preprocessor
+end
+name cem                        # the C-compiler proper
+        from .c                 # used for files with suffix .c
+        to .k                   # produces compact code files
+        program {EM}/lib/em_cem # pathname of loadfile
+        mapflag -p CEM_F={CEM_F?} -Xp   # pass -p as -Xp to cem
+        mapflag -L CEM_F={CEM_F?} -l    # pass -L as -l to cem
+        args -Vw{w}i{w}p{p}f{f}s{s}l{l}d{d} {CEM_F?}
+                                # the arguments are the object sizes in
+                                # the -V... flag and possibly -l and -Xp
+        prop <>p                # input on stdin, output on stdout, use cpp
+        rts .c                  # use the C run-time system
+        need .c                 # use the C libraries
+end
+name decode                     # make human readable files from compact code
+        from .k.m               # accept files with suffix .k or .m
+        to .e                   # produce .e files
+        program {EM}/lib/em_decode      # pathname of loadfile
+        args <                  # the input file name is the only argument
+        prop >                  # the output comes on stdout
+end
+.DE
+
+.DS X
+Example of a backend, in this case the EM assembler/loader.
+
+var w=2                         # wordsize 2
+var p=2                         # pointersize 2
+var s=2                         # short size 2
+var l=4                         # long size 4
+var f=4                         # float size 4
+var d=8                         # double size 8
+var M=int                       # Unused in this example
+var NAME=int22                  # for cpp (NAME=int results in #define int 1)
+var LIB=mach/int/lib/tail_      # part of file name for libraries
+var RT=mach/int/lib/head_       # part of file name for run-time startoff
+var SIZE_FLAG=-sm               # default internal table size flag
+var INCLUDES=-I{EM}/include     # use {EM}/include for #include files
+name asld                       # Assembler/loader
+        from .k.m.a             # accepts compact code and archives
+        to e.out                # output file name
+        program {EM}/lib/em_ass         # load file pathname
+        mapflag -l* LNAME={EM}/{LIB}*   # e.g. -ly becomes
+                                        #   {EM}/mach/int/lib/tail_y
+        mapflag -+* ASS_F={ASS_F?} -+*  # recognize -+ and --
+        mapflag --* ASS_F={ASS_F?} --*
+        mapflag -s* SIZE_FLAG=-s*       # overwrite old value of SIZE_FLAG
+        args {SIZE_FLAG} \
+                ({RTS}:.c={EM}/{RT}cc) ({RTS}:.p={EM}/{RT}pc) -o > < \
+                (.p:{TAIL}={EM}/{LIB}pc) \
+                (.c:{TAIL}={EM}/{LIB}cc.1s {EM}/{LIB}cc.2g) \
+                (.c.p:{TAIL}={EM}/{LIB}mon)
+                # -s[sml] must be first argument
+                # the next line contains the choice for head_cc or head_pc
+                # and the specification of in- and output.
+                # the last three args lines choose libraries
+        prop C  # This is the final stage
+end
+.DE
+
+The command "ack -mint -v -v -I../h -L -ly prog.c"
+ would result in the following
+calls (with exec(II)):
+.DS X
+1)  /lib/cpp -I../h -I/usr/em/include -Dint22 -DEM_WSIZE=2 -DEM_PSIZE=2
+      -DEM_SSIZE=2 -DEM_LSIZE=4 -DEM_FSIZE=4 -DEM_DSIZE=8 prog.c
+2)  /usr/em/lib/em_cem -Vw2i2p2f4s2l4d8 -l
+3)  /usr/em/lib/em_ass -sm /usr/em/mach/int/lib/head_cc -o e.out prog.k
+      /usr/em/mach/int/lib/tail_y /usr/em/mach/int/lib/tail_cc.1s
+      /usr/em/mach/int/lib/tail_cc.2g /usr/em/mach/int/lib/tail_mon
+.DE
diff --git a/doc/cg.doc b/doc/cg.doc
new file mode 100644
index 000000000..1a7b003c7
--- /dev/null
+++ b/doc/cg.doc
@@ -0,0 +1,1832 @@
+.RP
+.TL
+The table driven code generator from 
+.br
+the Amsterdam Compiler Kit
+.AU
+Hans van Staveren
+.AI
+Dept. of Mathematics and Computer Science
+Vrije Universiteit
+Amsterdam, The Netherlands
+.AB
+It is possible to automate the process of compiler building
+to a great extent using collections of tools.
+The Amsterdam Compiler Kit is such a collection of tools.
+This document provides a description of the internal workings
+of the table driven code generator in the Amsterdam Compiler Kit,
+and a description of syntax and semantics of the driving table.
+.AE
+.NH 1
+Introduction
+.PP
+Part of the Amsterdam Compiler Kit is a code generator system consisting
+of a code generator generator (\fIcgg\fP for short) and some machine
+independent C code.
+.I Cgg
+reads a machine description table and creates two files,
+tables.h and tables.c.
+These are then used together with other C code to produce
+a code generator for the machine at hand.
+.PP
+This in turn reads compact EM code and produces
+assembly code.
+The remainder of this document will first broadly describe
+the working of the code generator,
+then a description of the machine table follows after which
+the internal workings of the code generator will be explained.
+.PP
+The reader is assumed to have at least a vague notion about the
+semantics of the intermediary EM code.
+Someone wishing to write a table for a new machine
+should be thoroughly acquainted with EM code
+and the assembly code of the machine at hand.
+.NH 1
+Global overview of the workings of the code generator.
+.PP
+The code generator or
+.I cg
+tries to generate good code by simulating the runtime stack
+of the program compiled and delaying emission of code as long
+as possible.
+It also keeps track of register contents, which enables it to
+eliminate redundant moves, and tries to eliminate redundant tests
+by keeping information about condition code status,
+if applicable for the machine.
+.PP
+.I Cg
+maintains a `fakestack' containing `tokens' that are built
+by executing the pseudo code contained in the code rules given
+by the table writer.
+One can think of the fakestack as a logical extension of the real
+stack the program compiled will have when run.
+During code generation tokens will be kept on the fakestack as long
+as possible but when they are moved to the real stack,
+by generating code for the push,
+all tokens above\u*\d
+.FS
+* in the rest of this document the stack is assumed to grow downwards,
+although the top of the stack will mean the first element that will
+be popped.
+.FE
+the tokens pushed will be pushed also,
+so that the fakestack will not contain holes.
+.PP
+The main loop of
+.I cg
+is this:
+.IP 1)
+find a pattern of EM instructions starting at the current one to
+generate code for.
+This pattern will usually be of length one but longer patterns can be used.
+.IP 2)
+Select one of the possibly many stack patterns that go with this
+EM pattern on the basis of heuristics and/or lookahead.
+.IP 3)
+Force the current fakestack contents to match the pattern.
+This may involve
+copying tokens to registers, making dummy transformations, e.g. to
+transform a "local" into an "register offsetted" or might even
+cause to have the complete fakestack contents put to the real stack
+and then back into registers if no suitable transformations
+were provided by the table writer.
+.IP 4)
+Execute the pseudocode associated with the code rule just selected,
+this may cause registers to be allocated,
+code to be emitted etc..
+.IP 5)
+Put tokens onto the fakestack to reflect the result of the operation.
+.IP 6)
+Insert some EM instructions into the stream,
+this is possible but not common.
+.IP 7)
+Account for the cost.
+The cost is kept in a (space, time) vector and lookahead decisions
+are based on a linear combination of these.
+.PP
+The table that drives
+.I cg
+is not read in every time,
+but instead is used at compiletime
+of
+.I cg
+to set parameters and to load pseudocode tables.
+A program called
+.I cgg
+reads the table and produces large lists of numbers that are
+compiled together with machine independent code to produce
+a code generator for the machine at hand.
+.NH 1
+Description of the machine table
+.PP
+The machine description table consists of the following sections:
+.IP 1)
+Constant definitions
+.IP 2)
+Register definitions
+.IP 3)
+Token definitions
+.IP 4)
+Token expression definitions
+.IP 5)
+Code rules
+.IP 6)
+Move definitions
+.IP 7)
+Test definitions
+.IP 8)
+Stacking definitions
+.PP
+Input is in free format, white space and newlines may be used
+at will to improve legibility.
+Identifiers used in the table have the same syntax as C identifiers,
+upper and lower case considered different, all characters significant.
+There is however one exception:
+identifiers must be more than one character long for parsing reasons.
+C style comments are accepted
+.DS
+	/* this is a comment */
+.DE
+and #define macros may be used if the need arises.
+.NH 2
+Some constants
+.PP
+Before anything else three constants must be defined,
+all with the syntax NAME=value, value being an integer.
+These constants are:
+.IP EM_WSIZE 10
+Number of bytes in a machine word.
+This is the number of bytes
+a simple \fBloc\fP instruction will put on the stack.
+.IP EM_PSIZE
+Number of bytes in a pointer.
+This is the number of bytes
+a \fBlal\fP instruction will put on the stack.
+.IP EM_BSIZE
+Number of bytes in the hole between AB and LB.
+If the calling sequence just saves PC and LB this
+size will be twice the pointersize.
+.PP
+EM_WSIZE and EM_PSIZE are checked when a program is compiled
+with the resulting code generator.
+EM_BSIZE is used by
+.I cg
+to add to the offset of instructions dealing with locals
+having positive offsets,
+i.e. parameters.
+.PP
+Optionally one can give here the factors with which the size and time
+parts of the cost function have to be multiplied to ensure they have the
+same order of magnitude.
+This can be done as
+.DS
+TIMEFACTOR = C\d1\u/C\d2\u
+SIZEFACTOR = C\d3\u/C\d4\u
+.DE
+Above numbers must be read as rational numbers.
+Defaults are 1/1 for both of them.
+These constants set the default size/time tradeoff in the code generator,
+so if TIMEFACTOR and SIZEFACTOR are both 1 the code generator will choose
+at random between two codesequences where one has
+cost (10,4) and the other has cost (8,6).
+See also the description of the cost field below.
+.PP
+Also optional is the definition of a printformat for integers in the codefile.
+This is given as
+.DS
+FORMAT = string
+.DE
+The default for string is "%d" or "%ld" depending on the wordsize of 
+the machine. For example on the PDP 11 one can use
+.DS
+FORMAT= "0%o"
+.DE
+to satisfy the old UNIX assembler that reads octal unless followed by
+a period, and the ACK assembler that follows C conventions.
+.NH 2
+Register definition
+.PP
+The next part of the tables describes the various registers of the
+machine and defines identifiers
+to be used in later parts of the tables.
+Example for the PDP-11:
+.DS L
+REGISTERS:
+R0 = ( "r0",2), REG.
+R1 = ( "r1",2), REG, ODDREG.
+R2 = ( "r2",2), REG.
+R3 = ( "r3",2), REG, ODDREG.
+R4 = ( "r4",2), REG.
+LB = ( "r5",2), LOCALBASE.
+R01= ( "r0",4,R0,R1), REGPAIR.
+R23= ( "r2",4,R2,R3), REGPAIR.
+FR0= ( "r0",4), FREG.
+FR1= ( "r1",4), FREG.
+FR2= ( "r2",4), FREG.
+FR3= ( "r3",4), FREG.
+DR0= ( "r0",8,FR0), DREG.
+DR1= ( "r1",8,FR1), DREG.
+DR2= ( "r2",8,FR2), DREG.
+DR3= ( "r3",8,FR3), DREG.
+.DE
+.PP
+The identifier before the '=' sign is the name of the register
+as used further on in the table.
+The string is the name of the register as far as the assembler is concerned.
+The number is the size of the register in bytes.
+Identifiers following the number but within the parentheses are previously
+defined registernames that are contained in the register being defined.
+The identifiers following the closing parenthesis are properties
+of the register.
+So for example R23 is a register with assembler name r2, 4 bytes long,
+contains the registers R2 and R3 and has the property REGPAIR.
+.PP
+It might seem wise to list each and every property of a register,
+so one might give R0 the extra property MFPTREG named after the not
+too well known MFPT instruction on newer PDP-11 types,
+but this is not a good idea.
+Every extra property means the registerset is more unorthogonal
+and 
+.I cg
+execution time is influenced by that,
+because it has to take into account a larger set of registers
+that are not equivalent.
+.PP
+There is a predefined property SCRATCH that is dynamic,
+i.e. a register can have the property SCRATCH one time,
+and loose it the next.
+A register has the property SCRATCH when it has a reference count of one.
+One needs to be able to discriminate between SCRATCH registers
+and others,
+because it is only allowed to do arithmetic on
+SCRATCH registers.
+.NH 2
+Stack token definition
+.PP
+The next part describes all possible tokens that can reside on
+the fakestack during code generation.
+Attributes of a token are described in the form of a C struct declaration,
+this is followed by the size in bytes of the token,
+optionally followed by the cost of the token when used as an addressing mode
+and the format
+to be used on output.
+.PP
+Tokens should usually be declared for every addressing mode
+of the machine at hand and for every size directly usable in
+a machine instruction.
+Example for the PDP-11 (incomplete):
+.DS L
+TOKENS:
+IREG2 =		{ REGISTER reg; } 2 "*%[reg]" /* indirect register */
+REGCONST =	{ REGISTER reg; STRING off; } 2 /* not really addressable */
+REGOFF2 =	{ REGISTER reg; STRING off; } 2 "%[off](%[reg])"
+IREGOFF2 =	{ REGISTER reg; STRING off; } 2 "*%[off](%[reg])"
+CONST =		{ INT off; } 2 cost=(2,850) "$%[off]."
+EXTERN2 =	{ STRING off; } 2 "%[off]"
+IEXTERN2 =	{ STRING off; } 2 "*%[off]"
+PAIRSIGNED =	{ REGISTER regeven,regodd; } 2 "%[regeven]"
+.DE
+.PP
+Types allowed in the struct are REGISTER, INT and STRING.
+Tokens without a printformat should never be output.
+.PP
+Notice that tokens need not correspond to addressing modes,
+the REGCONST token listed above,
+meaning the sum of the contents of the register and the constant,
+has no corresponding addressing mode on the PDP-11,
+but is included so that a sequence of add constant, load indirect,
+can be handled efficiently.
+This REGCONST token is needed as part of the path
+.DS
+REGISTER -> REGCONST -> REGOFF
+.DE
+of which the first and the last "exist" and the middle is needed
+only as an intermediate step.
+.NH 2
+Token expressions
+.PP
+Usually machines have certain collections of addressing modes that
+can be used with certain instructions.
+The stack patterns in the table are lists of these collections
+and since it is cumbersome to write out these long lists
+every time, there is a section here to give names to these
+collections.
+Please note that it is not forbidden to write out a token expression
+in the remainder of the table,
+but for clarity it is usually better not to.
+Example for the PDP-11 (incomplete):
+.DS L
+TOKENEXPRESSIONS:
+SOURCE2 = REG + IREG2 + REGOFF2 + IREGOFF2 + CONST + EXTERN2 +
+	  IEXTERN2
+SREG    = REG * SCRATCH
+.DE
+Permissible in the expressions are all PASCAL set operators, i.e.
+.IP +
+set union
+.IP -
+set difference
+.IP *
+set intersection
+.PP
+Every tokenidentifier is also a token expression identifier
+denoting the singleton collection of tokens containing
+just itself.
+Every register property as defined above is also a token expression
+matching all registers with that property when on the fakestack.
+The standard token expression identifier ALL denotes the collection of 
+all tokens.
+.NH 2
+Expressions
+.PP
+Throughout the rest of the table expressions can be used in some
+places.
+This section will give the syntax and semantics of expressions.
+There are four types of expressions: integer, string, register and undefined.
+Type checking is performed by
+.I cgg .
+An operator with at least one undefined operand returns undefined except
+for the defined() function mentioned below.
+An undefined expression is interpreted as FALSE when it is needed
+as a truth value.
+Basic terms in an expression are
+.IP number 16
+A number is a constant of type integer.
+.IP "string"
+A string within double quotes is a constant of type string.
+All the normal C style escapes may be used within the string.
+.IP REGIDENT
+The name of a register is a constant of type register.
+.IP $\fIi\fP
+A dollarsign followed by a number is the representation of the argument
+of EM instruction \fI\fP.
+The type of the operand is dependent on the instruction,
+sometimes it is integer,
+sometimes it is string.
+It is undefined when the instruction has no operand.
+.br
+Although an exhaustive list could be given describing all the types
+the following rule of thumb will suffice.
+If you cannot imagine the operand of the instruction ever to be
+something different from a plain integer, the type is integer,
+otherwise it is string.
+.br
+.I Cg
+makes all necessary conversions for you,
+like adding EM_BSIZE to positive arguments of instructions
+dealing with locals,
+prepending underlines to global names,
+converting codelabels into a unique representation etc.
+Details about this can be found in the section about
+machine dependent C code.
+.IP %[1]
+This in general means the token mentioned first in the
+stack pattern.
+When used inside an expression the token must be a simple register.
+Type of this is register.
+.IP %[1.off]
+This means field "off" of the first stack pattern token.
+Type is the same as that of field "off".
+To use this expression implies a check that all tokens
+in the token expression used have the same attributes.
+.IP %[1.1]
+This is the first subregister of the first token.
+Previous comments apply.
+.IP %[b]
+The second allocated register.
+.IP %[a.2]
+The second subregister of the first allocated register.
+.PP
+All normal C operators apply to integers,
+the + operator serves for string concatenation
+and register expressions can only be compared to each other.
+Furthermore there are some special "functions":
+.IP tostring(e) 16
+Converts an integer expression e to a string.
+.IP defined(e)
+Returns 1 if expression e is defined, 0 otherwise.
+.IP samesign(e1,e2)
+Returns 1 if integer expression e1 and e2 have the same sign.
+.IP sfit(e1,e2)
+Returns 1 if integer expression e1 fits as a signed integer
+into a field of e2 bits, 0 otherwise.
+.IP ufit(e1,e2)
+Same as above but now for unsigned e1.
+.IP rom(a,n)
+Integer expression giving the n'th argument from the \fBrom\fP descriptor
+pointed at by the a'th EM instruction.
+Undefined if that descriptor does not exist.
+.IP loww(a)
+Returns the lower half of the argument of the a'th EM instruction.
+This is used to split the arguments of a \fBldc\fP instruction.
+.IP highw(a)
+Same for upper half.
+.NH 2
+Code rules
+.PP
+The largest section of the tables consists of the code generation rules.
+They specify EM patterns, stack patterns, code to be generated etc.
+Syntax is
+.DS L
+code rule : EM pattern '|' stack pattern '|' code '|' 
+	   stack replacement '|' EM replacement '|' cost ;
+.DE
+All parts are optional, however there must be at least one pattern present.
+If the empattern is missing the rule becomes a rewriting rule or
+.I coercion
+to be used when code generation cannot continue 
+because of an invalid stack pattern.
+The code rules are preceded by the word
+.DS
+CODE:
+.DE
+The next paragraphs describe the various parts in detail.
+.NH 3
+The EM pattern
+.PP
+The EM pattern consists of a list of EM mnemonics followed
+by a boolean expression.
+Examples:
+.DS
+\fBloe\fP
+.DE
+will match a single \fBloe\fP instruction,
+.DS
+\fBloc\fP \fBloc\fP \fBcif\fP $1==2 && $2==8
+.DE
+is a pattern that will match
+.DS
+\fBloc\fP 2
+\fBloc\fP 8
+\fBcif\fP
+.DE
+and
+.DS
+\fBlol\fP \fBinc\fP \fBstl\fP $1==$3
+.DE
+will match for example
+.DS
+\fBlol\fP 6
+\fBinc\fP
+\fBstl\fP 6
+.DE
+A missing boolean expression evaluates to TRUE.
+.PP
+When the EM pattern is the same as in the previous code rule the pattern
+should be given as `...'.
+The code generator will match the longest EM pattern on every occasion,
+if two patterns of the same length match the first in the table will be chosen,
+while all patterns of length greater than or equal to three are considered
+to be of the same length.
+.NH 3
+The stack pattern
+.PP
+The stack pattern is a list of token expressions,
+usually token expression identifiers for clarity.
+No boolean expression is allowed here.
+The first expression is the one that matches the top of the stack.
+.PP
+The pattern can be followed by the word STACK
+in which case the pattern only matches if there is nothing
+else on the fakestack.
+The code generator will stack everything not matched at the start
+of the rule.
+.PP
+The pattern can be preceded with the word
+.DS
+nocoercions:
+.DE
+which tells the code generator not to try to coerce to the pattern
+but only to use it when it is already there.
+There are two reasons for this construction,
+correctness and speed.
+It is needed for correctness when the pattern contains a register
+that is not transparent when data is moved through it.
+.PP
+Example: on the PDP-11 the shortest code for
+.DS
+\fBlae\fP a
+\fBloi\fP 8
+\fBlae\fP b
+\fBsti\fP 8
+.DE
+is
+.DS
+movf _a,fr0
+movf fr0,_b
+.DE
+assuming that the floating point processor is in double
+precision mode and fr0 is free.
+Unfortunately this is not correct since a trap can occur on certain
+kinds of data.
+This could happen if there was a pattern for \fBsti\fP\ 8 that allowed
+one to move a floating point register not preceded by nocoercions: .
+The code generator would then find that moving the 8-byte global _a
+to a floating point register and then storing it to _b was the cheapest,
+assuming that the space/time knob was turned far enough to space.
+It is unfortunate that the type information is no longer present,
+since if _a really is a floating point number the move could be
+made without error.
+.PP
+The second reason for the nocoercions: construct is speed.
+When the code generator has a long list of possible stack patterns
+for one EM pattern it can waste a lot of time trying to find coercions
+to all of them, while the mere presence of such a long list
+indicates that the table writer has given a lot of special cases.
+In this case prepending all the special cases by nocoercions:
+will stop the code generator from trying to find things there aren't.
+.NH 3
+The code part
+.PP
+The code part consists of three parts, stack cleanup, register allocation
+and code to generate.
+All of these may be omitted.
+.NH 4
+Stack cleanup
+.PP
+The stack cleanup part describes certain stacktokens that should neither remain on
+the fakestack, nor remembered as contents of registers.
+This is usually only required with store operations.
+The entire fakestack, except for the part matched in the stack pattern,
+is searched for tokens matching the expression and they are copied
+to the real stack.
+Every register that contains the stacktoken is marked as empty.
+.PP
+Syntax is
+.DS
+remove(token expression) \fIor\fP
+remove(token expression, boolean expression)
+.DE
+Example:
+.DS
+remove(REGOFF2,%[reg] != LB || %[off] == $1)
+.DE
+is part of a remove() call for use in the \fBstl\fP code rule.
+It removes all register offsetted tokens where the register is not the
+localbase plus the local wherein the store is done.
+The necessity for this can be seen from the following example:
+.DS
+\fBlol\fP 4
+\fBinl\fP 4
+\fBstl\fP 6
+.DE
+Without a proper remove() call in the rule for \fBinl\fP code would
+be generated as here
+.DS
+inc 4(r5)
+mov 4(r5),6(r5)
+.DE
+so local 6 would be given the new value of local 4 instead of the old
+as the EM code prescribed.
+.PP
+When generating something like a branch instruction it 
+might be needed to empty the fakestack completely.
+This can of course be done with
+.DS
+remove(ALL)
+.DE
+.NH 4
+Register allocation
+.PP
+The register allocation part describes the kind of registers needed.
+Syntax for allocate() is
+.DS
+allocate(itemlist)
+.DE
+where itemlist is a list of three kinds of things:
+.IP 1)
+a tokendescription, for example %[1].
+.br
+This will instruct the code generator to temporarily decrement the reference count 
+of all registers contained in the token,
+so that they are available for allocation in this allocate() call
+if they were only used in that token.
+See example below.
+.IP 2)
+a register property.
+.br
+This will allocate a register with that property.
+The register will be marked as empty at this point.
+Lookahead will be performed if necessary.
+.IP 3)
+a register property with initialization.
+.br
+This will allocate the register as in 2) but will also
+initialize it.
+This eases the task of the code generator because it can
+find a register already filled with the right value
+if it exists.
+.PP
+Examples:
+.DS
+allocate(OREG)
+.DE
+will allocate an odd register, while 
+.DS
+allocate(REG={REGOFF2,LB,$1})
+.DE
+will allocate a register while simultaneously filling it with
+the asked value.
+.br
+Inside the coercion from SOURCE2 to REGISTER in the PDP-11 table
+the following allocate() can be found.
+.DS
+allocate(%[1],REG=%[1])
+.DE
+This tells the code generator that registers contained in %[1] can be used
+again and asks to fill the register allocated with %[1].
+So if %[1]={REGOFF2,R3,"4"} and R3 has a reference count of 1
+the following code might be generated.
+.DS
+mov 4(r3),r3
+.DE
+In the rest of the line the registers allocated can be named by
+%[a] and %[b.1],%[b.2], i.e. with lower case letters
+in order of allocation.
+.PP
+Warning: 
+.DS
+allocate(R3)
+.DE
+is \fRnot\fP the way to allocate R3.
+R3 is not a register property, so it will be seen as a token description
+and the effect is that R3 will have its reference count decremented.
+.NH 4
+Code
+.PP
+Code to be generated is specified as a list of items of the following kind:
+.IP 1)
+a string in double quotes ("This is a string").
+.br
+This is copied to the codefile and a newline ( \en ) is appended.
+Inside the string all normal C string conventions are allowed,
+and substitutions can be made of the following sorts.
+.RS
+.IP a)
+$1, $2 etc.
+These are the operands of the corresponding EM instructions
+and are printed according to their type.
+To put a real '$' inside the string it must be doubled ('$$').
+.IP b)
+%[1], %[2.reg], %[b.1] etc.
+These have their obvious meaning.
+If they describe a complete token ( %[1] )
+the printformat for the token is used.
+If they stand for a basic term in an expression
+they will be printed according to their type.
+To put a real '%' inside the string it must be doubled ('%%').
+.IP c)
+%( arbitrary expression %).
+This allows inclusion of arbitrary expressions inside strings.
+Usually not needed very often,
+so that the awkward notation is not too bad.
+Note that %(%[1]%) is equivalent to %[1].
+.RE
+.IP 2)
+a move() call.
+This has the following syntax:
+.DS
+move(token description, token description)
+.DE
+Moves are handled specially since that enables the code generator
+to keep track of register contents.
+Example:
+.DS
+move(R3,{REGOFF2,LB,$1})
+.DE
+will generate code to move R3 to $1(r5) except when
+R3 already was a copy of $1(r5).
+Then the code will be omitted.
+The rules describing how to move things to each other
+can be found in the MOVES section described below.
+.IP 3)
+an erase() call.
+This has the following syntax:
+.DS
+erase(register expression)
+.DE
+This tells the code generator that the register mentioned no longer has any
+useful value.
+This is 
+.I necessary
+after code in the table has changed the contents of registers.
+For example, after an add to a register the register must be erased,
+because the contents do no longer match any token.
+.IP 4)
+For machines that have condition codes,
+alas most of them do,
+there are provisions to remember condition code setting
+and prevent needless testing.
+To set the condition code to a token put in the code the following call:
+.DS
+test(token)
+.DE
+where token can be all of the standard forms that can also be used in move().
+This will generate a test if the condition codes 
+were not already set to that token.
+It is also possible to tell 
+.I cg
+that a certain operation, like a preceding add
+has set the condition codes to some token with the call
+.DS
+setcc(token)
+.DE
+So a sequence of a setcc and a test on the same token will generate
+no code. 
+Another allowed call within the code is
+.DS
+samecc
+.DE
+which tells the code generator that condition codes were unaffected
+in this rule.
+If no setcc or samecc has been given the default is
+.DS
+nocc
+.DE
+when a piece of code contained strings,
+which tells the code generator that the condition codes
+have no useful value any more.
+.NH 3
+Stack replacement
+.PP
+The stack replacement is a possibly empty list of items to be pushed onto
+the fakestack. Three kinds of items are possible:
+.IP 1)
+An item of the form %[1]. This will push the stacktoken mentioned back
+onto the stack unchanged.
+.IP 2)
+A register expression. This will push the register mentioned
+onto the fakestack.
+.IP 3)
+An item of the form { REGOFF2,%[1.reg],$1 }.
+This generates a token with tokenidentifier REGOFF2 and attributes 
+in order of declaration.
+.PP
+All tokens matched by the stack pattern at the beginning of the code rule
+are first removed and their registers deallocated.
+Items are pushed in the order of appearance.
+This means that the last item will be on the top of the
+stack after the push.
+So if the stack pattern contained two token expressions
+and you want to push them back unchanged,
+you have to specify as stack replacement
+.DS
+%[2] %[1]
+.DE
+and not the other way around.
+.NH 3
+EM replacement
+.PP
+In exceptional cases it might be useful to leave part of an empattern
+undone.
+For example, a \fBsdl\fP instruction might be split into two \fBstl\fP instructions
+when there is no 4-byte quantity on the stack. The emreplacement part allows
+one to express this.
+Example:
+.DS
+\fBstl\fP $1 \fBstl\fP $1+2
+.DE
+The instructions are inserted in the stream so that they can match
+the first part of a pattern in the next step.
+Note that since the code generator traverses the EM instructions in a strict
+linear fashion,
+it is impossible to let the EM replacement match later parts of a pattern.
+So if there is a pattern
+.DS
+\fBloc\fP \fBstl\fP $1==0
+.DE
+and the input is
+.DS
+\fBloc\fP 0 \fBsdl\fP 4
+.DE
+the \fBloc\fP\ 0 will be processed first,
+then the \fBsdl\fP might be split into two \fBstl\fP's but the pattern
+cannot match now.
+.NH 3
+Cost
+.PP
+The cost field can be specified when there is more than one
+code rule with the same empattern.
+If the code generator has a choice between two possibilities
+to generate code it will choose the cheapest according to
+the cost field.
+The cost for a code generation is the sum of the costs
+of all the coercions needed, plus the cost for freeing
+registers plus the cost of the code rule itself.
+.PP
+The format of the costfield is
+.DS
+( nbytes, time )		or
+( nbytes, time ) + %[\fIi\fP]
+.DE
+with time in the metric desired, like nanoseconds or states.
+See constants section above.
+The %[\fIi\fP] in the second example is used for adding the cost of a certain
+address mode used in the code generated.
+This can of course be repeated if desired.
+The cost of the address mode must then be specified in the token definition
+section.
+.NH 3
+Examples
+.PP
+A list of examples for the PDP-11 is given here.
+Far from being complete it gives examples of most kinds
+of instructions.
+.DS L
+\fBadi\fP $1==2 | SREG,SOURCE2 |
+	"add %[2],%[1]" erase(%[1]) setcc(%[1])
+	  | %[1] | | (2,450) + %[2]
+\&...       | SOURCE2,SREG |
+	"add %[1],%[2]" erase(%[2]) setcc(%[2])
+	  | %[2] | | (2,450) + %[1]
+.DE
+is an example of the use of the `...' construct
+and shows how to place erase() and setcc() calls.
+.DS L
+
+\fBdvi\fP $1==2 | SOURCE2,SPAIRSIGNED |
+	"div %[1],%[2]" erase(%[2])
+	  | %[2.regeven] | |
+
+\fBcmi\fP \fBtgt\fP $1==2 | SOURCE2,SOURCE2 | allocate(REG={CONST,0})
+	"cmp %[2],%[1];ble 1f;inc %[a];1:" erase(%[a])
+	  | %[a] | |
+
+\fBcal\fP | STACK |
+	"jsr pc,$1" 
+	  | | |
+
+\fBlol\fP | | | { REGOFF2, LB, $1 } | |
+
+\fBstl\fP | SOURCE2 |
+	remove(REGOFF2,%[off]==$1)
+	move(%[1],{REGOFF2,LB,$1})
+	  | | |
+
+| SOURCE2 |
+	allocate(%[1],REGPAIR)
+	move(%[1],%[a.2])
+	test(%[a.2])
+	"sxt %[a.even]" | { PAIRSIGNED, %[a.1], %[a.2] }| | 
+.DE
+This coercion shows how to use the move and test calls.
+At first you might think that the testcall is unnecessary,
+since the move will have set the condition codes,
+but the move may never have been executed
+if the register already contained the value,
+in which case it is necessary to do the test.
+If the move was executed the test will be omitted.
+.DS L
+| SOURCE2 | allocate(%[1],REG=%[1]) | %[a] | |
+
+\fBsdl\fP | SOURCE2 | | %[1] | \fBstl\fP $1 \fBstl\fP $1+2 |
+
+\fBexg\fP $1==2 | SOURCE2 SOURCE2 | | %[1] %[2] | |
+.DE
+This last example again shows the difference in the order
+of the stack pattern and the stack replacement.
+.NH 2
+Move code rules
+.PP
+When issuing a move() call as described above or a register allocation
+with initialization, the code generator has to know which
+instruction to use for the move.
+The code will of course only be generated if it cannot be omitted.
+This is listed in the move section of the tables by giving a list
+of tuples:
+.DS
+( source, destination, codepart [ , costfield ] )
+.DE
+where the square brackets mean the costfield is optional.
+Example for the PDP-11
+.DS
+MOVES:
+( CONST %[off]==0 , SOURCE2, "clr %[2]" )
+( SOURCE2, SOURCE2, "mov %[1],%[2]" )
+.DE
+The moves are scanned from top to bottom,
+so the first one that matches will be chosen.
+.NH 2
+Test code rules
+.PP
+When issuing a test() call as described above,
+the code generator has to know which instruction
+to use for the test.
+The code will only be generated if the condition codes
+were not already set to the token.
+This is listed in the test section of the tables by giving
+a list of tuples:
+.DS
+( source, codepart [ , costfield ] )
+.DE
+Example for the PDP-11
+.DS
+TESTS:
+( SOURCE2, "tst %[1]")
+( DREG, "tstf %[1]\encfcc")
+.DE
+The tests are scanned from top to bottom,
+so the first one that matches will be chosen.
+.NH 2
+Stacking code rules.
+.PP
+When the code generator has to stack a token it must know
+which code to use.
+Since it must at all times be possible to empty the fakestack
+even when no registers are free,
+it is mandatory that all
+tokens used must have a rule attached for stacking them
+without using a scratch register.
+Since however this might be clumsy and 
+a register might in practice be available
+it is also possible to give rules
+which use a register.
+On the Intel 8086 for example,
+there is no instruction to push a constant without using a register,
+and the code needed to do it without, must use global data
+and as such is very complicated and wasteful of memory and time.
+It can therefore be left to be used in extreme cases,
+while in general the constant is pushed through a register.
+The stacking rules are listed in the stack section of the table as a list
+of tuples:
+.DS
+(source, [ register property ] , codepart [ , costfield ] )
+.DE
+Example for the Intel 8086:
+.DS
+STACKS:
+(CONST, REG, move(%[1],%[a]) "push %[a]")
+(REG ,, "push %[1]")
+.DE
+.NH 1
+The files mach.h and mach.c
+.PP
+The table writer must also supply two files containing
+machine dependent declarations and C code.
+These files are mach.h and mach.c.
+.NH 2
+Types in the code generator
+.PP
+Three different types of integer coexist in the code generator
+and their range depends on the machine at hand.
+The type 'int' is used for things like labelcounters that won't require
+more than 16 bits precision.
+The type 'word' is used among others to assemble datawords and
+is of type 'long' if EM_WSIZE>2.
+The type 'full' is used for addresses and is of type 'long' if
+EM_WSIZE>2 or EM_PSIZE>2.
+.PP
+In macro and function definitions in later paragraphs implicit typing
+will be used for parameters, that is parameters starting with an 's'
+will be of type string, and the letters 'i','w','f' will stand for
+int, word and full respectively.
+.NH 2
+Global variables to work with
+.PP
+Some global variables are present in the code generator
+that can be manipulated by the routines in mach.h and mach.c.
+.LP
+The declarations are:
+.DS L
+.ta 20
+FILE *codefile;	/* code is emitted on this stream */
+word part_word;	/* words to be output are put together here */
+int part_size;	/* number of bytes already put in part_word */
+char str[];	/* Last string read in */
+long argval;	/* Last int read and kept */
+.DE
+.NH 2
+Macros in mach.h
+.PP
+In the file mach.h a collection of macros is defined that have
+to do with formatting of assembly code for the machine at hand.
+Some of these macros can of course be left undefined in which case the
+macro calls are left in the source and will be treated as 
+function calls.
+These functions can then be defined in \fImach.c\fR.
+.PP
+The macros to be defined are:
+.IP ex_ap(s) 16
+Must print the magic incantations that will mark the symbol \fI\fR
+to be exported to other modules.
+This is the translation of the EM \fBexa\fP and \fBexp\fP instructions.
+.IP in_ap(s)
+Same to import the symbol.
+Translation of \fBina\fP and \fBinp\fP.
+.IP newilb(s)
+Must print the definition of instruction label \fIs\fR.
+.IP newdlb(s)
+Must print the definition of data label \fIs\fR.
+.IP dlbdlb(s1,s2)
+Must define data label
+.I s1
+to be equal to
+.I s2 .
+.IP newlbss(s,f)
+Must declare a piece of memory initialized to BSS_INIT(see below)
+of length 
+.I f
+and with label
+.I s .
+.IP cst_fmt
+Format to be used when converting constant arguments of
+EM instructions to string.
+Argument to be formatted will be 'full'.
+.IP off_fmt
+Format to be used for integer part of label+constant,
+argument will be 'full'.
+.IP ilb_fmt
+Format to be used for creation of unique instruction labels.
+Arguments will be a unique procedure number (int) and the label
+number (int).
+.IP dlb_fmt
+Format to be used for printing numeric data labels.
+Argument will be 'int'.
+.IP hol_fmt
+Format to be used for generation of labels for
+space generated by a
+.B hol
+pseudo.
+Argument will be 'int'.
+.IP hol_off
+Format to be used for printing of the address of an element in
+.B hol
+space.
+Arguments will be the offset in the
+.B hol
+block (word) and the number of the
+.B hol
+(int).
+.IP con_cst(w)
+Must generate output that will assemble into one machineword.
+.IP con_ilb(s)
+Must generate output that will put the address of the instruction label
+into the datastream.
+.IP con_dlb(s)
+Must generate output that will put the address of the data label
+into the datastream.
+.IP id_first
+Must be a character.
+This is prepended to all nonnumeric global labels if their length
+is shorter than the maximum allowed(currently 8) or if they already
+start with that character.
+This is to avoid conflicts of user labels with system labels.
+.IP BSS_INIT
+Must be a constant.
+This is the value filled in all the words not initialized explicitly.
+This is loader and system dependent.
+If omitted no initialization is assumed.
+.NH 3
+Example mach.h for the PDP-11
+.DS L
+.ta 8 16 24 32 40 48 56
+#define ex_ap(y)	fprintf(codefile,"\et.globl %s\en",y)
+#define in_ap(y)	/* nothing */
+
+#define newilb(x)	fprintf(codefile,"%s:\en",x)
+#define newdlb(x)	fprintf(codefile,"%s:\en",x)
+#define	dlbdlb(x,y)	fprintf(codefile,"%s=%s\en",x,y)
+#define newlbss(l,x)	fprintf(codefile,"%s:.=.+%d.\en",l,x);
+
+#define cst_fmt		"$%d."
+#define off_fmt		"%d."
+#define ilb_fmt		"I%02x%x"
+#define dlb_fmt		"_%d"
+#define	hol_fmt		"hol%d"
+
+#define hol_off		"%d.+hol%d"
+
+#define con_cst(x)	fprintf(codefile,"%d.\en",x)
+#define con_ilb(x)	fprintf(codefile,"%s\en",x)
+#define con_dlb(x)	fprintf(codefile,"%s\en",x)
+
+#define id_first	'_'
+#define BSS_INIT	0
+.DE
+.NH 2
+Functions in mach.c
+.PP
+In mach.c some functions must be supplied,
+mostly manipulating data resulting from pseudoinstructions.
+The specifications are given here,
+implicit typing of parameters as above.
+.IP con_part(isz,word) 20
+This function must manipulate the globals 
+part_word and part_size to append the isz bytes
+contained in word to the output stream.
+If part_word is full, i.e. part_size==EM_WSIZE
+the function part_flush() may be called to empty the buffer.
+This is the function that must go through the trouble of
+doing byte order in words correct.
+.IP con_mult(w_size)
+This function must take the string str[] and create an integer
+from the string of size w_size and generate code to assemble global
+data for that integer.
+Only the sizes for which arithmetic is implemented need be
+handled,
+so if you didn't implement 200-byte integer division
+you don't have to implement 200-byte integer global data.
+Here one must take care of word order in long integers.
+.IP con_float()
+This function must generate code to assemble a floating
+point number of which the size is contained in argval
+and the ASCII representation in str[].
+.IP prolog(f_nlocals)
+This function is called at the start of every procedure.
+Function prolog code must be generated,
+and room made for local variables for a total of f_nlocals bytes.
+.IP mes(w_mesno)
+This function is called when a
+.B mes
+pseudo is seen that is not handled by the machine independent part.
+Example below shows all you probably have to know about that.
+.IP segname[]
+This is not a function,
+but an array of four strings.
+These strings are put out whenever the code generator
+switches segments.
+Segments are SEGTXT, SEGCON, SEGROM and SEGBSS in that order.
+.NH 3
+Example mach.c for the PDP-11
+.PP
+As an example of the sort of code expected,
+the mach.c for the PDP-11 is presented here.
+.DS L
+.ta 8 16 24 32 40 48 56 64
+/*
+ * machine dependent back end routines for the PDP-11
+ */
+
+con_part(sz,w) register sz; word w; {
+
+	while (part_size % sz)
+		part_size++;
+	if (part_size == EM_WSIZE)
+		part_flush();
+	if (sz == 1) {
+		w &= 0xFF;
+		if (part_size)
+			w <<= 8;
+		part_word |= w;
+	} else {
+		assert(sz == 2);
+		part_word = w;
+	}
+	part_size += sz;
+}
+
+con_mult(sz) word sz; {
+	long l;
+
+	if (sz != 4)
+		fatal("bad icon/ucon size");
+	l = atol(str);
+	fprintf(codefile,"\et%o;%o\en",(int)(l>>16),(int)l);
+}
+
+con_float() {
+	double f;
+	register short *p,i;
+
+	/*
+	 * This code is correct only when the code generator is
+	 * run on a PDP-11 or VAX-11 since it assumes native
+	 * floating point format is PDP-11 format.
+	 */
+
+	if (argval != 4 && argval != 8)
+		fatal("bad fcon size");
+	f = atof(str);
+	p = (short *) &f;
+	i = *p++;
+	if (argval == 8) {
+		fprintf(codefile,"\et%o;%o;",i,*p++);
+		i = *p++;
+	}
+	fprintf(codefile,"\et%o;%o\en",i,*p++);
+}
+
+prolog(nlocals) full nlocals; {
+
+	fprintf(codefile,"mov r5,-(sp)\enmov sp,r5\en");
+	if (nlocals == 0)
+		return;
+	if (nlocals == 2)
+		fprintf(codefile,"tst -(sp)\en");
+	else
+		fprintf(codefile,"sub $%d.,sp\en",nlocals);
+}
+
+mes(type) word type; {
+	int argt ;
+
+	switch ( (int)type ) {
+	case ms_ext :
+		for (;;) {
+			switch ( argt=getarg(
+			    ptyp(sp_cend)|ptyp(sp_pnam)|sym_ptyp) ) {
+			case sp_cend :
+				return ;
+			default:
+				strarg(argt) ;
+				fprintf(codefile,".globl %s\en",argstr) ;
+				break ;
+			}
+		}
+	default :
+		while ( getarg(any_ptyp) != sp_cend ) ;
+		break ;
+	}
+}
+
+char    *segname[] = {
+	".text",        /* SEGTXT */
+	".data",        /* SEGCON */
+	".data",        /* SEGROM */
+	".bss"          /* SEGBSS */
+};
+.DE
+.NH 1
+Coercions
+.PP
+A central part in code generation is taken by the
+.I coercions .
+It is the responsibility of the table writer to provide
+all necessary coercions so that code generation can continue.
+The very minimal set of coercions are
+the coercions to unstack every token expression,
+in combination with the rules to stack every token.
+.PP
+If these are present the code generator can always make the necessary
+transformations by stacking and unstacking.
+Of course for codequality it is usually best to provide extra coercions
+to prevent this stacking to take place.
+.I Cg
+discriminates three types of coercions:
+.IP 1)
+Unstacking coercions.
+This category can use the allocate() call in its code.
+.IP 2)
+Splitting coercions, these are the coercions that split
+larger tokens into smaller ones.
+.IP 3)
+Transforming coercions, these are the coercions that transform
+a token into another one of the same size.
+This category can use the allocate() call in its code.
+.PP
+When a stack configuration does not match the stack pattern
+.I coercions
+are searched for in the following order:
+.IP 1)
+First tokens are split if necessary to get their sizes right.
+.IP 2)
+Then transforming coercions are found that will make the pattern match.
+.IP 3)
+Finally if the stack pattern is longer than the fakestack contents
+unstacking coercions will be used to fill up the pattern.
+.PP
+At any point, when coercions are missing so code generation could not
+continue, the offending tokens are stacked.
+.NH 1
+Internal workings of the code generator.
+.NH 2
+Description of tables.c and tables.h contents
+.PP
+In this section the intermediate files will be described 
+that are produced by
+.I cgg
+and compiled with machine independent code to produce a code generator.
+.NH 3
+Tables.c
+.PP
+Tables.c contains a large number of initialized array's of all sorts.
+Description of each follows:
+.br
+.in 1i
+.ti -0.5i
+byte code rules[]
+.br
+Pseudo code interpreted by the code generator.
+Always starts with some opcode followed by operands depending
+on the opcode.
+Integers in this table are between 0 and 32767 and have a one byte
+encoding if between 0 and 127.
+.ti -0.5i
+char stregclass[]
+.br
+Number of computed static register class per register.
+Two registers are in the same class if they have the same properties
+and don't share a common subregister.
+.ti -0.5i
+struct reginfo machregs[]
+.br
+Info per register.
+Initialized with representation string, size,
+members of the register and set of registers affected when this
+one is changed.
+Also contains room for runtime information,
+like contents and reference count.
+.ti -0.5i
+tkdef_t tokens[]
+.br
+Information per tokentype.
+Initialized with size, cost, type of operands and formatstring.
+.ti -0.5i
+node_t enodes[]
+.br
+List of triples representing expressions for the code generator.
+.ti -0.5i
+string code strings[]
+.br
+List of strings.
+All strings are put in a list and checked for duplication,
+so only one copy per string will reside here.
+.ti -0.5i
+set_t machsets[]
+.br
+List of token expression sets.
+Bit 0 of the set is used for the SCRATCH property of registers,
+bit 1 upto NREG are for the corresponding registers
+and bit NREG+1 upto the end are for corresponding tokens.
+.ti -0.5i
+inst_t tokeninstances[]
+.br
+List of descriptions for building tokens.
+Contains type of rule for building one,
+plus operands depending on the type.
+.ti -0.5i
+move_t moves[]
+.br
+List of move rules.
+Contains token expressions for source and destination
+plus cost and index for code rule.
+.ti -0.5i
+byte pattern[]
+.br
+EM patterns.
+This is structured internally as chains of patterns,
+each chain pointed at by pathash[].
+After each pattern the list of possible code rules is given.
+.ti -0.5i
+int pathash[256]
+.br
+Indices into pattern[] for all patterns with a certain low order
+byte of the hashing function.
+.ti -0.5i
+c1_t c1coercs[]
+.br
+List of rules to stack tokens.
+Contains token expressions,
+register needed,
+cost
+and code rule.
+.ti -0.5i
+c2_t c2coercs[]
+.br
+List of splitting coercions.
+Token expressions,
+split factor,
+replacements
+and code rule.
+.ti -0.5i
+c3_t c3coercs[]
+.br
+List of one to one coercions.
+Token expressions,
+register needed,
+replacement
+and code rule.
+.ti -0.5i
+struct reginfo **reglist[]
+.br
+List of lists of pointers to register information.
+For every property the list is here
+to find the registers corresponding to it.
+.in 0
+.NH 3
+tables.h
+.PP
+In tables.h various derived constants for the tables are
+given.
+They are then used to determine array sizes in the actual code generator,
+plus loop termination in some cases.
+.NH 2
+Other important data structures
+.PP
+During code generation some other data structures are used
+and here is a short description of some of the important ones.
+.PP
+Tokens are kept in the code generator as a struct consisting of
+one integer
+.I t_token
+which is -1 if the token is a register,
+and the number of the token otherwise,
+plus an array of
+.I TOKENSIZE
+unions
+.I t_att
+of which the first is the register number in case of a register.
+.PP
+The fakestack is an array of these tokens,
+there is a global variable
+.I stackheight .
+.PP
+The results of expressions are kept in a struct
+.I result
+with elements
+.I e_typ ,
+giving the type of the expression:
+.I EV_INT ,
+.I EV_REG
+or
+.I EV_STR ,
+and a union
+.I e_v
+which contains the real result.
+.NH 2
+A tour through the sources
+.NH 3
+codegen.c
+.PP
+The file codegen.c contains one large function consisting
+of one giant switch statement.
+It is the interpreter for the code generator pseudo code
+as contained in code rules[].
+This function can call itself recursively when doing lookahead.
+Arguments are:
+.IP codep 10
+Pointer into code rules, pseudo program counter.
+.IP ply
+Number of EM pattern lookahead allowed.
+.IP toplevel
+Boolean telling whether this is the toplevel codegen() or
+a deeper incarnation.
+.IP costlimit
+A cutoff value to limit searches.
+If the cost crosses costlimit the incarnation can terminate.
+.IP forced
+A register number if nonzero.
+This is used inside coercions to force the allocate() call to allocate
+a register determined by earlier lookahead.
+.PP
+The instructions inplemented in the switch:
+.NH 4
+DO_NEXTEM
+.PP
+Matches the next EM pattern and does lookahead if necessary to find the best
+code rule associated with this pattern.
+Heuristics are used to determine best code rule when possible.
+This is done by calling the distance() function.
+.NH 4
+DO_COERC
+.PP
+This sets the code generator in the state to do a from stack coercion.
+.NH 4
+DO_XMATCH
+.PP
+This is done when a match no longer has to be checked.
+Used when the nocoercions: trick is used in the table.
+.NH 4
+DO_MATCH
+.PP
+This is the big one inside this function.
+It has the task to transform the contents of the current
+fakestack to match the pattern given after it.
+.PP
+Since the code generator does not know combining coercions,
+i.e. there is no way to make a big token out of two smaller ones,
+the first thing done is to stack every token that is too small.
+After that all tokens too big are split if possible to the right size.
+.PP
+Next the coercions are sought that would transform tokens in place to
+the right one, plus the coercions that would pop tokens of the stack.
+Each of those might need a register, so a list of registers is generated
+and at the end of looking for coercions the function 
+.I tuples()
+is called to generate the list of all possible \fIn\fP-tuples,
+where 
+.I n
+equals the number of registers needed.
+.PP
+Lookahead is now performed if the number of tuples is greater than one.
+If no possibility is found within the costlimit,
+the fakestack is made smaller by pushing the bottom token,
+and this process is repeated until either a way is found or
+the fakestack is completely empty and there is still no way
+to make the match.
+.PP
+If there is a way the corresponding coercions are executed
+and the code is finished.
+.NH 4
+DO_REMOVE
+.PP
+Here the remove() call is executed, all tokens matched by the 
+token expression plus boolean expression are pushed.
+In the current implementation there is no attempt to move those
+tokens to registers, but that is a possible future extension.
+.NH 4
+DO_DEALLOCATE
+.PP
+This one temporarily decrements by one the reference count of all registers
+contained in the token given as argument.
+.NH 4
+DO_REALLOCATE
+.PP
+Here all temporary deallocates are made undone.
+.NH 4
+DO_ALLOCATE
+.PP
+This is the part that allocates a register and decides which one to use.
+If the
+.I forced
+argument was given its task is simple,
+otherwise some work must be done.
+First the list of possible registers is scanned,
+all free registers noted and it is noted whether any of those
+registers is already
+containing the initialization.
+If no registers are available some fakestack token is stacked and the
+process is repeated.
+.PP
+After that if an exact match was found, 
+the list of registers is reduced to one register matching exactly
+out of every register class.
+Now lookahead is performed if necessary and the register chosen.
+If an initialization was given the corresponding move is performed,
+otherwise the register is marked empty.
+.NH 4
+DO_LOUTPUT
+.PP
+This prints a string and an expression.
+Only done on toplevel.
+.NH 4
+DO_ROUTPUT
+.PP
+Prints a string and a new line.
+Only on toplevel.
+.NH 4
+DO_MOVE
+.PP
+Calls the move() function in the code generator to implement the move()
+function in the table.
+.NH 4
+DO_ERASE
+.PP
+Marks the register that is its argument as empty.
+.NH 4
+DO_TOKREPLACE
+.PP
+This is the token replacement part.
+It is also called if there is no token replacement because it has
+some other functions as well.
+.PP
+First the tokens that will be pushed on the fakestack are computed
+and stored in a temporary array.
+Then the tokens that were matched in this rule are popped
+and their embedded registers have their reference count
+decremented.
+After that the replacement tokens are pushed.
+.PP
+Finally all registers allocated in this rule have their reference count
+decremented.
+If they were not pushed on the fakestack they will be available again
+in the next code rule.
+.NH 4
+DO_EMREPLACE
+.PP
+Places replacement EM instructions back into the instruction stream.
+.NH 4
+DO_COST
+.PP
+Accounts for cost as given in the code rule.
+.NH 4
+DO_RETURN
+.PP
+Returns from this level of codegen().
+Is used at the end of coercions,
+move rules etc..
+.NH 3
+compute.c
+.PP
+This module computes the various expressions as given
+in the enodes[] array.
+Nothing very special happens here,
+it is just a recursive function computing leaves
+of expressions and applying the operator.
+.NH 3
+equiv.c
+.PP
+In this module the tuples() function is implemented.
+It is given the number of registers needed and
+a list of register lists and it constructs a list of tuples
+where the \fIn\fP'th register comes from the \fIn\fP'th list.
+Before the list is constructed however 
+the dynamic register classes are computed.
+Two registers are in the same dynamic class if they are in the
+same static class and their contents is the same.
+.PP
+After that the permute() recursive function is called to
+generate the list of tuples.
+After construction a generated tuple is added to the list
+if it is not already pairwise in the same class
+or if the register relations are not the same,
+i.e. if the first and second register share a common
+subregister in one tuple and not in the other they are considered different.
+.NH 3
+fillem.c
+.PP
+This is the routine that does the reading of EM instructions
+and the handling of pseudos.
+The mach.c module provided by the table writer is included
+at the end of this module.
+The routine fillemlines() is called by nextem() at toplevel
+to make sure there are enough instruction to match.
+It fills the EM instruction buffer up to 5 places from the end to
+keep room for EM replacement instructions,
+or up to a pseudo.
+.PP
+The dopseudo() function performs the function of the pseudo last
+encountered.
+If the pseudo is a 
+.B rom
+the corresponding label is saved with the contents of the
+.B rom
+to be available to the code generator later.
+The rest of the routines are small service routines for either
+input or data output.
+.NH 3
+gencode.c
+.PP
+This module contains routines called by codegen() to generate the real
+code to the codefile.
+The function gencode() gets a string as argument and copies it to codefile
+while processing certain embedded control characters implementing
+the $2 and [1.reg] escapes.
+The function genexpr() prints the expression given as argument.
+It is used to implement the %(\ expr\ %) escape.
+The prtoken() function interprets the tokenformat as given in
+the tokens[] array.
+.NH 3
+glosym.c
+.PP
+This module maintains a list of global symbols that have a 
+.B rom
+pseudo associated.
+There are functions to enter a symbol and to find a symbol.
+.NH 3
+main.c
+.PP
+Main routine of the code generator.
+Processes arguments and flags.
+Flags available are:
+.IP -d
+Sets debug mode if the code generator was not compiled with
+the NDEBUG macro defined.
+Debug mode gives very long output on stderr indicating
+all steps of the code generation process including nesting
+of the codegen() function.
+.IP -p\fIn\fP
+Sets the lookahead depth to
+.I n ,
+the
+.I p
+stands for ply,
+a well known word in chess playing programs.
+.IP -w\fIn\fP
+Sets the weight percentage for size in the cost function to
+.I n
+percent.
+Uses Euclides algorithm to simplify rationals.
+.NH 3
+move.c
+.PP
+Function to implement the move() pseudo function in the tables,
+register initialization and the setcc and test pseudo functions.
+First tests are made to try to prevent the move from really happening.
+The condition code register is treated special here.
+After that, if there is an after that,
+the move rule is found and the code executed.
+.NH 3
+nextem.c
+.PP
+The entry point of this module is nextem().
+It hashes the next three EM instructions,
+and uses the low order byte of the hash
+as an index into the array pathash[],
+to find a chain of patterns in the array
+pattern[],
+that are all tried for a match.
+.PP
+The function trypat() does most of the work
+checking patterns.
+When a pattern is found to match all instructions
+the operands of the instruction are placed into the dollar[] array.
+Then the boolean expression is tried.
+If it matches the function can return,
+leaving the operands still in the dollar[] array,
+so later in the code rule they can still be used.
+.NH 3
+reg.c
+.PP
+Collection of routines to handle registers.
+Reference count routines are here,
+chrefcount() and getrefcount(),
+plus routines to erase a single register or all of them,
+erasereg() and cleanregs().
+.PP
+If NDEBUG hasn't been defined, here is also the routine that checks
+if the reference count kept with the register information is in
+agreement with the number of times it occurs on the fakestack.
+.NH 3
+salloc.c
+.PP
+Module for string allocation and garbage collection.
+Contains entry points myalloc(),
+a routine calling malloc() and checking whether room is left,
+myfree(), just free(),
+popstr() a function called from state.c to free all strings
+made since the last saved status.
+Furthermore there is salloc() which has the size of the string as parameter
+and returns a pointer to the allocated space,
+while keeping a copy of the pointer for garbage allocation purposes.
+.PP
+The function garbage_collect is called from codegen() at toplevel
+every now and then,
+and checks all places where strings may reside to mark strings
+as being in use.
+Strings not in use are returned to the pool of free space.
+.NH 3
+state.c
+.PP
+Set of routines called to save current status,
+restore a previous saved state and to free the room
+occupied by a saved state.
+A list of structs is kept here to save the state.
+If this is not done,
+small allocates will take space
+from the holes big enough for state saves,
+and as a result every new state save will need a new struct.
+The code generator runs out of room very rapidly under these conditions.
+.NH 3
+subr.c
+.PP
+Random set of leftover routines.
+.NH 4
+match
+.PP
+Computes whether a certain token matches a certain token expression.
+Just computes a bitnumber according to the algorithm explained with
+machsets[],
+and tests the bit and the boolean expression if it is there.
+.NH 4
+instance,cinstance
+.PP
+These two functions compute a token from a description.
+They differ very slight, cinstance() is used to compute
+the result of a coercion in a certain context
+and therefore has more arguments, which it uses instead of
+the global information instance() works on.
+.NH 4
+eqtoken
+.PP
+eqtoken computes whether two tokens can be considered identical.
+Used to check register contents during moves mainly.
+.NH 4
+distance
+.PP
+This is the heuristic function that computes a distance from
+the current fakestack contents to the token pattern in the table.
+It likes exact matches most, then matches where at least the sizes are correct
+and if the sizes are not correct it likes too large sizes more than too
+small, since splitting a token is easier than combining one.
+.NH 4
+split
+.PP
+This function tries to find a splitting coercion
+and executes it immediately when found.
+The fakestack is shuffled thoroughly when this happens,
+so pieces below the token that must be split are saved first.
+.NH 4
+docoerc
+.PP
+This function executes a coercion that was found.
+The same shuffling is done, so the top of the stack is again saved.
+.NH 4
+stackupto
+.PP
+This function gets a pointer into the fakestack and must stack
+every token including the one pointed at up to the bottom of the fakestack.
+The first stacking rule possible is used,
+so rules using registers must come first.
+.NH 4
+findcoerc
+.PP
+Looks for a one to one coercion, if found it returns a pointer
+to it and leaves a list of possible registers to use in the global
+variable curreglist.
+This is used by codegen().
+.NH 3
+var.c
+.PP
+Global variables used by more than one module.
+External definitions are in extern.h.
diff --git a/doc/cref.doc b/doc/cref.doc
new file mode 100644
index 000000000..43c89b27a
--- /dev/null
+++ b/doc/cref.doc
@@ -0,0 +1,317 @@
+.ll 72
+.nr ID 4
+.de hd
+'sp 2
+'tl ''-%-''
+'sp 3
+..
+.de fo
+'bp
+..
+.tr ~
+.               TITLE
+.de TL
+.sp 15
+.ce
+\\fB\\$1\\fR
+..
+.               AUTHOR
+.de AU
+.sp 15
+.ce
+by
+.sp 2
+.ce
+\\$1
+..
+.               DATE
+.de DA
+.sp 3
+.ce
+( Dated \\$1 )
+..
+.               INSTITUTE
+.de VU
+.sp 3
+.ce 4
+Wiskundig Seminarium
+Vrije Universteit
+De Boelelaan 1081
+Amsterdam
+..
+.               PARAGRAPH
+.de PP
+.sp
+.ti +\n(ID
+..
+.nr CH 0 1
+.               CHAPTER
+.de CH
+.nr SH 0 1
+.bp
+.in 0
+\\fB\\n+(CH.~\\$1\\fR
+.PP
+..
+.               SUBCHAPTER
+.de SH
+.sp 3
+.in 0
+\\fB\\n(CH.\\n+(SH.~\\$1\\fR
+.PP
+..
+.               INDENT START
+.de IS
+.sp
+.in +\n(ID
+..
+.               INDENT END
+.de IE
+.in -\n(ID
+.sp
+..
+.de PT
+.ti -\n(ID
+.ta \n(ID
+.fc " @
+"\\$1@"\c
+.fc
+..
+.               DOUBLE INDENT START
+.de DS
+.sp
+.in +\n(ID
+.ll -\n(ID
+..
+.               DOUBLE INDENT END
+.de DE
+.ll +\n(ID
+.in -\n(ID
+.sp
+..
+.               EQUATION START
+.de EQ
+.sp
+.nf
+..
+.               EQUATION END
+.de EN
+.fi
+.sp
+..
+.               ITEM
+.de IT
+.sp
+.in 0
+\\fB~\\$1\\fR
+.ti +5
+..
+.de CS
+.br
+~-~\\
+..
+.br
+.fi
+.TL "Ack-C reference manual"
+.AU "Ed Keizer"
+.DA "September 12, 1983"
+.VU
+.wh 0 hd
+.wh 60 fo
+.CH "Introduction"
+The C frontend included in the Amsterdam Compiler Kit
+translates UNIX-V7 C into compact EM code [1].
+The language accepted is described in [2] and [3].
+This document describes which implementation dependent choices were
+made in the Ack-C frontend and
+some restrictions and additions.
+.CH "The language"
+.PP
+Under the same heading as used in [2] we describe the
+properties of the Ack-C frontend.
+.IT "2.2 Identifiers"
+External identifiers are unique up to 7 characters and allow
+both upper and lower case.
+.IT "2.4.3 Character constants"
+The ASCII-mapping is used when a character is converted to an
+integer.
+.IT "2.4.4 Floating constants"
+To prevent loss of precision the compiler does not perform
+floating point constant folding.
+.IT "2.6 Hardware characteristics"
+The size of objects of the several arithmetic types and the two
+pointer types depend on the EM-implementation used.
+The ranges of the arithmetic types depend on the size used,
+the C-frontend assumes two's complement representation for the
+integral types. All sizes are multiples of bytes.
+The calling program \fIack\fP[4] passes information about the
+size of the types to the compiler proper.
+.br
+However, a few general remarks must be made:
+.sp 1
+.IS
+.PT (a)
+Two different pointer types exist: pointers to data and
+pointers to functions.
+The latter type is twice as large as the former.
+Pointers to functions use the same format as Pascal procedure
+parameters, thereby allowing C to use Pascal procedure
+parameters and vice-versa.
+The extra information passed indicates the scope level of the
+procedure.
+.PT (b)
+The size of pointers to data is a multiple of
+(or equal to) the size of an \fIint\fP.
+.PT (c)
+The following relations exist for the sizes of the types
+mentioned:
+.br
+.ti +5
+\fIchar<=short<=int<=long\fP
+.PT (d)
+Objects of type \fIchar\fP use one 8-bit byte of storage,
+although several bytes are allocated sometimes.
+.PT (e)
+All sizes are in multiples of bytes.
+.PT (f)
+Most EM implementations use 4 bytes for floats and 8 bytes
+for doubles, but exceptions to this rule occur.
+.IE
+.IT "6.1 Characters and integers"
+Objects of type \fIchar\fP are unsigned and do not cause
+sign-extension when converted to \fIint\fP.
+The range of characters values is from 0 to 255.
+.IT "6.3 Floating and integral"
+Floating point numbers are truncated towards zero when
+converted to the integral types.
+.IT "6.4 Pointers and integers"
+When a \fIlong\fP is added to or subtracted from a pointer and
+longs are larger then data pointers the \fIlong\fP is converted to an
+\fIint\fP before the operation is performed.
+.IT "8.5 Structure and union declarations"
+The only type allowed for fields is \fIint\fP.
+Fields with exactly the size of \fIint\fP are signed,
+all other fields are unsigned.
+.br
+The size of any single structure must be less then 4096 bytes.
+.IT "8.6 Initialization"
+Initialization of structures containing bit fields is not
+allowed.
+There is one restriction when using an 'address expression' to initialize
+an integral variable.
+The integral variable must have the size of a data pointer.
+Conversions altering the size of the address expression are not allowed.
+.IT "10.1 External function definitions"
+The total amount for storage used for parameters
+in any function must be less then 4096 bytes.
+The same holds for the total amount of storage occupied by the
+automatic variables declared inside any function.
+.sp
+Using formal parameters whose size is smaller the the size of an int
+is less efficient on several machines.
+At procedure entry these parameters are converted from integer to the
+declared type, because the compiler doesn't know where the least
+significant bytes are stored in the int.
+.IT "11.2 Scope of externals"
+Most C compilers are rather lax in enforcing the restriction
+that only one external definition without the keyword
+\fIextern\fP is allowed in a program.
+The Ack-C frontend is very strict in this.
+The only exception is that declarations of arrays with a
+missing first array bounds expression are regarded to have an
+explicit keyword \fIextern\fP.
+.IT "14.4 Explicit pointer conversions"
+Pointers may be larger the ints, thus assigning a pointer to an
+int and back will not always result in the same pointer.
+The process mentioned above works with integrals
+of the same size or larger as pointers in all EM implementations
+having such integrals.
+Note that pointers to functions have
+twice the size of pointers to data.
+When converting data pointers to an integral type or vice-versa,
+the pointers is seen as an unsigned with the same size a data-pointer.
+When converting function pointers to anything else the static link part
+of the pointer is discarded,
+the resulting value is treated as if it were a data pointer.
+When converting a data pointer or object of integral type to a function pointer
+a static link with the value 0 is added to complete the function pointer.
+.br
+EM guarantees that any object can be placed at a word boundary,
+this allows the C-programs to use \fIint\fP pointers
+as pointers to objects of any type not smaller than an \fIint\fP.
+.CH "Frontend options"
+The C-frontend has a few options, these are controlled
+by flags:
+.IS
+.PT -V
+This flag is followed by a sequence of letters each followed by
+positive integers. Each letter indicates a
+certain type, the integer following it specifies the size of
+objects of that type. One letter indicates the wordsize used.
+.IS
+.sp 1
+.TS
+center tab(:);
+l l16 l l.
+letter:type:letter:type
+
+w:wordsize:i:int
+s:short:l:long
+f:float:d:double
+p:pointer::
+.TE
+.sp 1
+All existing implementations use an integer size equal to the
+wordsize.
+.IE
+The calling program \fIack\fP[4] provides the frontend with
+this flag, with values depending on the machine used.
+.sp 1
+.PT -l
+The frontend normally generates code to keep track of the line
+number and source file name at runtime for debugging purposes.
+Currently a pointer to a
+string containing the filename is stored at a fixed place in
+memory at each function
+entry and the line number at the start of every expression.
+At the return from a function these memory locations are not reset to
+the values they had before the call.
+Most library routines do not use this feature and thus do not
+ruin the current line number and filename when called.
+However, you are really unlucky when your program crashes due
+to a bug in such a library function, because the line number
+and filename do not indicate that something went wrong inside
+the library function.
+.br
+Providing the flag -l to the frontend tells it not to generate
+the code updating line number and file name.
+This is, for example, used when translating the stdio library.
+.br
+When the \fIack\fP[4] is called with the -L flag it provides
+the frontend with this flag.
+.sp 1
+.PT -Xp
+When this flag is present the frontend generates a call to
+the function \fBprocentry\fP at each function entry and a
+call to \fBprocexit\fP at each function exit.
+Both functions are provided with one parameter,
+a pointer to a string containing the function name.
+.br
+When \fIack\fP is called with the -p flag it provides the
+frontend with this flag.
+.IE
+.CH References
+.IS
+.PT [1]
+A.S. Tanenbaum, Hans van Staveren, Ed Keizer and Johan
+Stevenson \fIDescription of a machine architecture for use with
+block structured languages\fP Informatica report IR-81.
+.sp 1
+.PT [2]
+B.W. Kernighan and D.M. Ritchie, \fIThe C Programming
+language\fP, Prentice-Hall, 1978
+.PT [3]
+D.M. Ritchie, \fIC Reference Manual\fP
+.sp
+.PT [4]
+UNIX manual ack(I).
diff --git a/doc/install.doc b/doc/install.doc
new file mode 100644
index 000000000..e791aa5f0
--- /dev/null
+++ b/doc/install.doc
@@ -0,0 +1,622 @@
+.nr LL 7.5i
+.nr PD 1v
+.TL
+Amsterdam Compiler Kit installation guide
+.AU
+Ed Keizer
+.AI
+Wiskundig Seminarium
+Vrije Universiteit
+Amsterdam
+.NH
+Introduction
+.PP
+This document
+describes the process of installing Amsterdam Compiler Kit.
+It depends on your combination of hard- and software how
+hard it will be to install the kit.
+This description is intended for a PDP 11/44 running
+.UX
+Version 7.
+Installation on other PDP 11's should be easy, as long
+as they have separate instruction and data space.
+Installation on machine's without this feature, like PDP 11/34,
+PDP 11/60 requires extensive surgery on some programs and is
+thought of as impossible.
+See chapter 6 for installation on other systems.
+.NH
+Restoring tree
+.PP
+The process of installing Amsterdam Compiler Kit is quite simple.
+It is important that the original Amsterdam Compiler Kit
+distribution tree structure is restored.
+Proceed as follows
+.IP "  -" 10
+Create a directory, for example /usr/em, on a device
+with at least 20000 blocks left.
+.IP "  -"
+Change to that directory (cd ...); it will be the working directory.
+.IP "  -"
+Extract all files from the distribution medium, for instance
+magtape:
+\fBtar x\fP.
+.IP "  -"
+Keep a copy of the original distribution to be able to repeat the process
+of installation in case of disasters.
+This copy is also useful as a reference point for diff-listings.
+.LP
+The directories in the tree contain the following information:
+.nr PD 1v
+.IP "lib" 14
+.br
+almost all binaries and shell files used by commands and
+library em_data.a from misc/data
+.IP "lib/ack"
+.br
+The command descriptor files used by the program ack.
+.nr PD 0
+.IP "bin"
+.br
+the few utilities that knot things together
+.IP "etc"
+.br
+The MAIN description of EM sits here.
+contains files (e.g. em_table) describing
+the opcodes and pseudos in use,
+the operands allowed, effect in stack etc. etc.
+Make in this directory creates most of the files in h
+.IP "include"
+.br
+More or less system independent include files needed by modules
+in the C library from lang/cem/libcc.
+Especially needed for "stdio".
+.IP "h"
+.br
+The #include files for:
+.nf
+as_spec.h    Used by EM assembler and interpreters.
+em_abs.h     Contains trap numbers and address for lin and fil
+em_flag.h    Definition of bits in array em_flag in lib/em_data.a
+             Describes parameters effect on flow of instructions
+em_mes.h     Definition of names for mes pseudo numbers
+em_mnem.h    instruction => compact mapping.
+em_pseu.h    pseudo instruction => compact mapping
+em_ptyp.h    Useful for compact code reading/writing,
+             defines classes of parameters
+em_spec.h    Definition of constants used in compact code
+local.h      Various definitions for local versions
+pc_err.h     Definitions of error numbers in Pascal
+pc_file.h    Macro's used in file handling in Pascal
+em_path.h    Pathnames used by \fIack\fP, intended
+             for all utilities
+pc_size.h    Sizes of objects used by Pascal compiler and
+             run-time system.
+em_reg.h     Definition of names for register types.
+.IP "doc"
+.br
+Documentation
+.nf
+cg.doc          Use and internal specification of the backend.
+.br
+regadd.doc      Update for cg.doc concerning register variables
+.br
+regadd.doc      Description of steps to add register variables.
+.br
+ack.doc         Layout of description files needed for each machine.
+.br
+cref.doc        C reference manual, addendum
+.br
+install.doc     Ack Installation Guide
+.br
+pcref.doc       Pascal reference manual, addendum
+.br
+peep.doc        Description of the peephole optimizer
+.br
+em.doc          EM reference manual
+.br
+toolkit.doc     A general overview of the toolkit
+.br
+v7bugs.doc      Bugs in the standard V7 system
+.br
+val.doc         Pascal validation suite version 3 report
+.nf
+.IP "doc/em.doc"
+.br
+The EM-manual IR-81
+.IP "doc/em.doc/int"
+.br
+The EM interpreter written in pascal
+.IP "mkun"
+.br
+The PUBMAC macro package for nroff/troff from the Katholieke Universiteit at
+Nijmegen.
+It is used for the EM reference manual,
+the Makefile installs the macro package in
+/usr/lib/tmac/tmac.mkun*.
+This package is in the public domain.
+.IP "mach"
+.br
+just there to group the directories for all machines
+these directories have sub-directories named:
+.nf
+  as      the assembler ( *.s + libraries => a.out )
+  cg      the new backend   ( *.m => *.s )
+  lib     the libraries for all run-time systems
+          these libraries are used by the assembler.
+  libpc   Used to create Pascal run-time system in 'lib'
+  libcc   Used to create C run-time system in 'lib'
+  libem   Sources for EM runtime system, result sits in 'lib'
+  test    Various tests
+  dl      Down-load programs
+  int     Source for an interpreter
+available are:
+    PMDS II 68000, wordsize 2, ptrsize 4
+        mach/m68k2
+        mach/m68k2/as
+        mach/m68k2/cg
+        mach/m68k2/libem
+        mach/m68k2/lib
+        mach/m68k2/dl
+        mach/m68k2/libpc
+        mach/m68k2/libcc
+        mach/m68k2/libsys
+    bare 6809
+        mach/6809
+        mach/6809/as
+    8080, wordsize 2, ptrsize 2
+        mach/8080
+        mach/8080/as
+        mach/8080/test
+        mach/8080/libcc
+        mach/8080/lib
+   bare 8086, wordsize 2, ptrsize 2
+        mach/i86
+        mach/i86/as
+        mach/i86/lib
+        mach/i86/libcc
+        mach/i86/dl
+        mach/i86/libem
+        mach/i86/libpc
+        mach/i86/saio  (library for stand-alone EM on 86/12A )
+    pdp 11, UNIX/V7, wordsize 2, ptrsize 2
+        mach/pdp
+        mach/pdp/test
+        mach/pdp/libem
+        mach/pdp/lib
+        mach/pdp/libcc
+        mach/pdp/libpc
+        mach/pdp/cg
+        mach/pdp/int         -PDP 11/44 EM interpreter
+    vax 780, UNIX V7, wordsize 4, ptrsize 4
+        mach/vax4
+        mach/vax4/cg
+        mach/vax4/lib
+        mach/vax4/libcc
+        mach/vax4/libem
+        mach/vax4/libpc
+    z80, CP/M, wordsize 2, ptrsize 2
+        mach/z80
+        mach/z80/as
+        mach/z80/libem
+        mach/z80/lib
+        mach/z80/libcc
+        mach/z80/libpc
+        mach/z80/int         -Z80 EM interpreter
+    z80, nascom
+        mach/z80a
+        mach/z80a/dl
+    vax 11/780, Berkeley UNIX, wordsize 2, ptrsize 4
+        mach/vax2
+        mach/vax2/cg
+        mach/vax2/lib
+        mach/vax2/libpc
+        mach/vax2/libem
+    bare 6500, wordsize 2, ptrsize 2
+        mach/6500
+        mach/6500/as
+        mach/6500/dl
+        mach/6500/libem
+        mach/6500/lib
+    bare 6800, wordsize 2, ptrsize 2
+        mach/6800
+        mach/6800/as
+    EM virtual machine code, wordsize 2, ptrsize 2
+        mach/int
+        mach/int/libcc
+        mach/int/libpc
+        mach/int/lib
+        mach/int/test
+    The directory proto contains files used by most machines.
+    e.g. makefiles for libraries for C and Pascal
+        mach/proto
+        mach/proto/libg
+.fi
+.IP "emtest"
+.br
+Contains prototype of em test set.
+.IP "man"
+.br
+Man files for various utilities
+.IP "lang"
+.br
+just there to group the directories for all front-ends
+.IP "lang/pc"
+.br
+Pascal front-end
+.IP "lang/pc/libpc"
+.br
+Source of Pascal run-time system ( in EM or C )
+.IP "lang/pc/test"
+.br
+Some test programs written in Pascal
+.IP "lang/pc/pem"
+.br
+The compiler proper
+.IP "lang/cem"
+.br
+C front-end
+.IP "lang/cem/libcc"
+.br
+Directories with sources of C runtime system, libraries (in EM or C)
+.IP "lang/cem/libcc/gen"
+.br
+Sources for routines in chapter III of UNIX programmers manual,
+excluding STDIO
+.IP "lang/cem/libcc/stdio"
+.br
+STDIO sources
+.IP "lang/cem/libcc/mon"
+.br
+Sources for routines in chapter II, written in EM
+.IP "lang/cem/comp"
+.br
+The compiler proper
+.IP "lang/cem/ctest"
+.br
+C test set
+.IP "lang/cem/ctest/cterr"
+.br
+Programs developed for pinpointing previous errors
+.IP "lang/cem/ctest/ct*"
+.br
+The test programs.
+.IP "util"
+.br
+Contains directories with various utilities
+.IP "util/opt"
+.br
+EM peephole optimizer (*.k => *.m)
+.IP "util/misc"
+.br
+Decode (*.[km] => *.e) + encode (*.e => *.k)
+.IP "util/data"
+.br
+The C-code for `lib/em_data.a`
+These sources are created by the Makefile in `etc`
+.IP "util/ass"
+.br
+The EM assembler ( *.[km] + libraries => e.out )
+.IP "util/arch"
+.br
+The archiver to be used for ALL EM utilities
+.IP "util/cgg"
+.br
+A program needed for compiling backends.
+.IP "util/cpp"
+.br
+The V7 C preprocessor.
+.LP
+All pathnames mentioned in the text of this document are relative to the
+working directory, unless they start with '/'.
+.PP
+The person doing the installation needs permission to write in the
+directories of the Amsterdam Compiler Kit distribution tree.
+Preferably you should log in as sys (uid=3,gid=0).
+.NH
+Pathnames
+.PP
+Absolute pathnames are concentrated in "h/em_path.h".
+Only the pascal runtime system and the utility \fIack\fP use
+absolute pathnames to access files in the kit.
+The tree is distributed with /usr/em as the working
+directory.
+The definition of EM_HOME in em_path.h should be altered to
+specify the root
+directory for the Compiler Kit distribution on your system.
+The trailing " in the definition of EM_HOME is intentionally
+missing!
+Em_path.h also specifies which directory should be used for
+temporary files.
+Most programs from the kit do indeed use that directory
+although some remain stubborn and use /tmp or /usr/tmp.
+.LP
+The shape of the tree should not be altered lightly because
+most Makefiles and the
+utility \fIack\fP know the shape of the ACK tree.
+All pathnames in all Makefiles are relative, that is do not
+have "/" as the first character.
+The knowledge of the utility \fIack\fP about the shape of the tree is
+concentrated in the files in the directory lib/ack.
+.NH
+Commands
+.PP
+The kit is distributed with all available commands in the bin
+directory.
+The commands distributed are:
+.IP "\fIack\fP, \fIacc\fP, \fIapc\fP and their links"
+.br
+They are used to compile the Pascal, C, etc... programs.
+.IP \fIarch\fP
+.br
+The archiver used for the EM- and universal assembler.
+.IP "\fIem\fP and \fIeminform\fP"
+.br
+The EM interpretator for the PDP-11 and the program to unravel
+its post-mortem information.
+.LP
+We currently make the kit available to our users by telling
+them that they should include the bin directory of the kit in
+their PATH shell variable.
+The programs will still work when moved to a different
+directory.
+The copying should preferably be done with tar, since links are
+heavily used.
+Renaming of the programs linked to \fIack\fP will not always
+produce the desired result.
+This program uses its call name as an argument.
+Any call name not being \fIcc\fP, \fIacc\fP, \fIpc\fP or \fIapc\fP will be
+interpreted as the name of a 'machine description' and the
+program will try to find a description file with that name.
+All recompilations will only touch the utilities in the bin
+directory, not your own copies.
+.NH
+Options
+.PP
+There is one important option in h/local.h.
+The utility \fIack\fP uses a default machine name when called
+as \fIacc\fP, \fIcc\fP, \fIapc\fP, \fIpc\fP or \fIack\fP.
+The machine name used for default is determined by the
+definition of ACKM in h/local.h.
+The current definition is \fIpdp\fP.
+.PP
+The distribution is tailored to one specific opreating system per CPU type.
+For some of these  CPU's it is possible to tailor the distribution to another
+operating system.
+The steps to be taken are described in READ_ME (or README) files in the
+subdirectories of the directory in EM_HOME/mach for that particular machine.
+For example: The vax2 distribution is tailoerd to BSD4.1, but has #define's
+for BSD4.1c and BSD4.2.
+For the names and places of these define's look in EM_HOME/mach/vax2/cg and
+EM_HOME/mach/vax2/libem.
+.NH
+Recompilation
+.PP
+The kit comes with binaries in the directories \fBbin\fP and
+\fBlib\fP.
+Some directories among mach/*/lib contain archives with object files,
+notably mach/pdp/lib.
+The binaries and object files are for a PDP 11/44 with floating
+point running UNIX V7.
+.PP
+Almost all directories contain a "Makefile" or a shell command file called
+"make".
+Apart from commands applying to that specific directory these
+files all recognize a few special commands.
+When called with one of these they will apply the command to
+their own directory and all subdirectories.
+The special commands are:
+.IP "install" 20
+recompile and install all binaries and libraries.
+.br
+Some Makefiles allow errors to occur in the programs they call.
+They ignore such errors and notify the user with the message
+"~....... error code n: ignored".
+Whenever such a message appears in the output you can ignore it
+too.
+.br
+The installation of the PUBMAC macro package is not done
+automatically from the higher level directory.
+.IP "cmp"
+recompile all binaries and libraries and compare them to the
+ones already installed.
+.IP pr
+print the sources and documentation on the standard output.
+.IP opr
+make pr | opr
+.br
+Opr should be an off-line printer daemon.
+On some systems it exists under another name e.g. lpr.
+The easiest way to call such a spooler is using a shell script
+with the name opr that calls lpr.
+This script should be placed in /usr/bin or EM_HOME/bin or
+one of the directories in your PATH.
+.IP clean
+remove all files not needed for day-to-day use,
+that is binaries not in bin or lib, object files etc.
+.LP
+Example:
+.nf
+.sp 1
+        make install
+.sp 1
+.fi
+given as command in the home directory will cause
+recompilation of all programs in the kit.
+.LP
+Recompilation of the complete kit lasts about 9 hours an a PDP
+11/44.
+.NH 2
+Recompilation on a different machine.
+.PP
+Installation on other systems will often require recompilation
+of all programs.
+The presence of a C compiler is essential for recompilation.
+Except the Pascal compiler proper all programs are written in C.
+Some modules are derived from \fIyacc\fP sources.
+Retranslating these programs from that yacc source is not
+necessary, although it might improve performance.
+Some versions of \fIyacc\fP 'know' that the resulting C programs will
+run on a 32-bit int machine.
+C modules produced by such a \fIyacc\fP are not portable and
+should not be used to (cross)compile programs for 16-bit machines.
+We assume a version UNIX which, apart from the C-compiler,
+contains most normal utilities, like ed, sed, grep, make, the
+Bourne shell etc.
+All Makefiles use the system C-compiler.
+The existence of a backend for your system is of course essential
+if you wish to produce executable files for that system.
+When the backend exists it is also possible to boot the Pascal
+Compiler,
+that is written in Pascal itself.
+The kit contains the compact code files for the 2/2 and 2/4
+versions of the Pascal compiler.
+The current version of this compiler can only be used on machines
+with a 16-bit word size and 16- or 32-bit pointers.
+The Makefile automatically tries to boot the Pascal compiler
+from one of these compact code files, if the compiler proves
+unable to compile itself.
+.PP
+The native assemblers and loaders are used on PDP-11 and VAX.
+The description files in lib/ack for other systems use our
+universal assembler.
+The load file produced by this assembler is not directly
+usable in any system known to us,
+but has to be converted before it can be put to use.
+The \fIdl\fP programs present for some machines unravel
+these load files and transmit commands to load memory
+to a microprocessor over a serial line.
+The PDP-11 version of our universal assembler is supplied
+with a conversion program.
+The file man/a.out.5 contains a description of the format of
+the universal assembler load file,
+it might be useful to those who wish or need to write their
+own conversion programs.
+.br
+Berkeley UNIX for the VAX'en has (at least) three different
+versions, BSD4.1a, BSD4.1c and BSD4.2. The READ_ME files in the
+directories mach/vax2/cg, mach/vax2/libem, mach/vax4/cg and
+mach/vax4/libem tell you how to adapt the vax2 and vax4 backend
+to these versions.
+.NH 2
+Recompiling libraries
+.PP
+The kit contains sources for part II and III of the C-library, except
+the math functions, they are grabbed from our V7 system and sometimes
+altered in a EM dependent way or replaced altogether when the original
+was in assembly.
+These files can be used to make libraries for the Ack C-compiler.
+The recompilation process uses a few include files.
+The include directory in the EM home directory contains a few more
+or less system independent include files.
+The system dependent include files are fetched from /usr/include
+on the system you use to recompile.
+This may lead to several problems.
+Sometimes the system differs so much from V7 that certain manifest constants
+do not exist any more.
+At other times these include files were written for a compiler without
+a restriction on name length.
+In that case - I've seen it happen - people tend to use differing
+identifiers that are identical in the first eight characters.
+All these problems you have to solve yourself,
+the libraries are only included as an extra and too much system
+dependent to give any guarantees.
+.NH
+Fixes to the UNIX V7 system
+.PP
+UNIX System V7 has a few bugs that prevent a part of or the whole kit
+from working properly.
+To be honest, we do not know which of the following changes are
+essential to the functioning of our kit.
+.PP
+The file "doc/v7bugs.doc" gives for each of the following bugs
+a small test program and a diff listing of the source files that have to be
+modified.
+.IP 1
+Bug in the C optimizer for unsigned comparison
+.nr PD 0
+.IP 2
+The loader 'ld' fails for large data and text portions
+.IP 3
+Floating point registers are not saved if more memory is needed.
+.IP 4
+Floating point registers are not copied to child in fork().
+.nr PD 1v
+.LP
+Use the test programs to see if the errors are present in your system
+and to check if the modifications are effective.
+.NH
+Testing
+.PP
+Test sets are available in Pascal, C and EM assembly.
+.IP em 8
+.br
+The directory emtest contains a few EM test programs.
+The EM assembly files in these tests must be transformed into
+load files, thereby avoiding use of the EM optimizer.
+These tests use the LIN and NOP instructions to mark the passing of each
+test.
+The NOP instruction prints the current line number during the
+test phase.
+Each test notifies its correctness by calling LIN with a unique
+number followed by a NOP which prints this line number.
+The test finishes normally with 0 as the last number printed
+In all other cases a bug showed its
+existence.
+.IP Pascal
+.br
+The directory lang/pc/test contains a few pascal test programs.
+All these programs print the number of errors found and a
+identification of these errors.
+.IP C
+.br
+The sub-directories in lang/cem/ctest contain C test programs.
+The idea behind these tests is:
+when you have a program called xx.c, compile it into xx.cem.
+Run it with standard output to xx.cem.r, compare this file to
+xx.cem.g, a file containing the 'ideal' output.
+Any differences will point to implementation differences or
+bugs.
+Giving the command "run gen" or plain "run" starts this
+process.
+The differences will be presented on standard output.
+The contents of the result files depend on the wordsize,
+the xx.cem.g files on the distribution are intended for a
+16-bit machine.
+.NH
+Documentation
+.PP
+Manual pages for Amsterdam Compiler Kit can be copied
+to "/usr/man/man?" by the
+following commands:
+.DS
+cd man
+make install
+.DE
+.LP
+Several documents are provided:
+.DS
+doc/toolkit.doc: a general overview
+doc/pcref.doc: the Pascal-frontend reference manual
+doc/val.doc: the results of running the Pascal Validation Suite
+doc/cref.doc: the C-frontend manual
+doc/em.doc: a description of the EM machine architecture
+doc/peep.doc: internal documentation for the peephole optimizer
+doc/cg.doc: documentation for backend writers and maintainers
+doc/regadd.doc: addendum to previous document describing register variables
+doc/install.doc: this document
+.DE
+.LP
+The Validation Suite is a collection of more than 200 Pascal programs,
+designed by Brian Wichmann and Arthur Sale to test Pascal compilers.
+We are not allowed to distribute it, but you may
+request a copy from
+.DS
+Richard J. Cichelli
+A.N.P.A.
+1350 Sullivan Trail
+P.O. Box 598
+Easton, Pennsylvania 18042
+USA
+.DE
+.LP
+Good luck.
diff --git a/doc/pcref.doc b/doc/pcref.doc
new file mode 100644
index 000000000..6f587d2bf
--- /dev/null
+++ b/doc/pcref.doc
@@ -0,0 +1,1511 @@
+.ds OF \\fBtest~off:~\\fR
+.ds ON \\fBtest~on:~~\\fR
+.ds AL \\fBtest~all:~\\fR
+.ll 72
+.wh 0 hd
+.wh 60 fo
+.de hd
+'sp 5
+..
+.de fo
+'bp
+..
+.tr ~
+.               TITLE
+.de TL
+.sp 15
+.ce
+\\fB\\$1\\fR
+..
+.               AUTHOR
+.de AU
+.sp 15
+.ce
+by
+.sp 2
+.ce
+\\$1
+..
+.               DATE
+.de DA
+.sp 3
+.ce
+( Dated \\$1 )
+..
+.               INSTITUTE
+.de VU
+.sp 3
+.ce 4
+Wiskundig Seminarium
+Vrije Universiteit
+De Boelelaan 1081
+Amsterdam
+..
+.               PARAGRAPH
+.de PP
+.sp
+.ti +5
+..
+.nr CH 0 1
+.               CHAPTER
+.de CH
+.nr SH 0 1
+.bp
+.in 0
+\\fB\\n+(CH.~\\$1\\fR
+.PP
+..
+.               SUBCHAPTER
+.de SH
+.sp 3
+.in 0
+\\fB\\n(CH.\\n+(SH.~\\$1\\fR
+.PP
+..
+.               INDENT START
+.de IS
+.sp
+.in +5
+..
+.               INDENT END
+.de IE
+.in -5
+.sp
+..
+.               DOUBLE INDENT START
+.de DS
+.sp
+.in +5
+.ll -5
+..
+.               DOUBLE INDENT END
+.de DE
+.ll +5
+.in -5
+.sp
+..
+.               EQUATION START
+.de EQ
+.sp
+.nf
+..
+.               EQUATION END
+.de EN
+.fi
+.sp
+..
+.               ITEM
+.de IT
+.sp
+.in 0
+\\fBISO~\\$1:\\fR~\\
+..
+.               IMPLEMENTATION 1
+.de I1
+.IS
+.ti -3
+1.~\\
+..
+.               IMPLEMENTATION 2
+.de I2
+.sp
+.ti -3
+2.~\\
+..
+.de CS
+.br
+~-~\\
+..
+.br
+.fi
+.TL "Amsterdam Compiler Kit-Pascal reference manual"
+.AU "Johan W. Stevenson"
+.DA "January 4, 1983"
+.VU
+.CH "Introduction"
+This document refers to the (March 1980) ISO standard proposal for Pascal [1].
+Ack-Pascal complies with the requirements of this proposal almost completely.
+The standard requires an accompanying document describing the
+implementation-defined and implementation-dependent features,
+the reaction on errors and the extensions to standard Pascal.
+These four items will be treated in the rest of this document,
+each in a separate chapter.
+The other chapters describe the deviations from the standard and
+the list of options recognized by the compiler.
+.PP
+The Ack-Pascal compiler produces code for an EM machine as defined in [2].
+It is up to the implementor of the EM machine to decide whether errors like
+integer overflow, undefined operand and range bound error are recognized or not.
+For these errors the reaction of some known implementations is given.
+.PP
+There does not (yet) exist a hardware EM machine.
+Therefore, EM programs must be interpreted, or translated into
+instructions for a target machine.
+For the following implementations the behavior is documented:
+.I1
+an interpreter running on a PDP-11.
+Normally the interpreter performs some tests to detect undefined
+integers, integer overflow, range errors, etc.
+However, an option of the interpreter is to skip these tests.
+Another option is to perform some extra tests
+to check for instance the number of actual parameter
+words against the number expected by
+the called procedure.
+We will refer to these modes of operation as 'test all', 'test on' and 'test off'.
+.I2
+a translator into PDP-11 instructions.
+.IE
+.CH "Implementation-defined features"
+For each implementation-defined feature mentioned in the ISO standard
+we give the section number, the quotation from that section and the definition.
+First we quote the definition of implementation-defined:
+.DS
+Those parts of the language which may differ between processors, but which
+will be defined for any particular processor.
+.DE
+.IT 6.1.7
+Each string-character shall denote an implementation-defined value of char-type.
+.IS
+All 7-bits ASCII characters except linefeed LF (10) are allowed.
+Note that an apostrophe ' must be doubled within a string.
+.IE
+.IT 6.4.2.2
+The values of type real shall be an implementation-defined subset
+of the real numbers denoted as specified by 6.1.5.
+.IS
+The format of reals is not defined in EM.
+Even the size of reals depends on the implementation.
+The compiler can be instructed, by the f-option, to use a different
+size for real values.
+The size of reals is preset by the calling program \fIack\fP
+[4] to
+the proper size.
+For each implementation of EM the following constants must be defined:
+     epbase: the base for the exponent part
+     epprec: the precision of the fraction
+     epemin: the minimum exponent
+     epemax: the maximum exponent
+.br
+These constants must be chosen so that zero and all numbers with
+exponent e in the range
+.EQ
+     epemin <= e <= epemax
+.EN
+and fraction-parts of the form
+.EQ
+     f = +_ f\d1\u.b\u-1\d + ... + f\depprec\u.b\u-epprec\d
+.EN
+where
+.EQ
+     f\di\u = 0,...,epbase-1 and f\d1\u <> 0
+.EN
+are possible values for reals.
+All other values of type real are considered illegal.
+(See [3] for more information about these constants).
+.br
+For the known EM implementations these constants are:
+.I1
+epbase = 2
+.br
+epprec = 24
+.br
+epemin = -127
+.br
+epemax = +127
+.I2
+ditto
+.IE
+.IT 6.4.2.2
+The type char shall be the enumeration of a set of implementation-defined
+characters, some possibly without graphic representations.
+.IS
+The 7-bits ASCII character set is used, where LF (10) denotes the
+end-of-line marker on text-files.
+.IT 6.4.2.2
+The ordinal numbers of the character values shall be values of integer-type,
+that are implementation-defined, and that are determined by mapping
+the character values on to consecutive non-negative integer values
+starting at zero.
+.IS
+The normal ASCII ordering is used: ord('0')=48, ord('A')=65, ord('a')=97, etc.
+.IE
+.IT 6.4.3.4
+The largest and smallest values of integer-type
+permitted as numbers of a value
+of a set-type shall be implementation-defined.
+.IS
+The smallest value is 0. The largest value is default 15, but can be
+changed by using the i-option of the compiler up to a maximum
+of 32767.
+The compiler allocates as many bits for set-type variables as are necessary
+to store all possible values of the host-type of the base-type of the set,
+rounded up to the nearest multiple of 16.
+If 8 bits are sufficient then only
+8 bits are used if part of a packed structure.
+Thus, the variable s, declared by
+.EQ
+     var s: set of '0'..'9';
+.EN
+will contain 128 bits, not 10 or 16.
+These 128 bits are stored in 16 bytes, both for packed and unpacked sets.
+If the host-type of the base-type is integer, then the
+number of bits depends on the i-option.
+The programmer may specify how many bits to allocate for these sets.
+The default is 16, the maximum is 32767.
+The effective number of bits is rounded up to the next multiple of 16, or up
+to 8 if the number of bits is less than or equal to 8.
+Note that the use of set-constructors for sets with more than 256 elements
+is far less efficient than for smaller sets.
+.IT 6.7.2.2
+The predefined constant maxint shall be of integer-type and shall denote
+an implementation-defined value, that satisfies the following conditions:
+.sp 1
+.in +5
+.ti -4
+(a)~All integral values in the closed interval from -maxint to +maxint
+shall be values in the integer-type.
+.ti -4
+(b)~Any monadic operation performed on an integer value in this interval
+shall be correctly performed according to the mathematical rules for
+integer arithmetic.
+.ti -4
+(c)~Any dyadic integer operation on two integer values in this same interval
+shall be correctly performed according to the mathematical rules for
+integer arithmetic, provided that the result is also in this interval.
+.ti -4
+(d)~Any relational operation on two integer values in this same interval
+shall be correctly performed according to the mathematical rules for
+integer arithmetic.
+.in -5
+.IS
+The representation of integers in EM is a \fIn\fP*8-bit word using
+two's complement arithmetic.
+Where \fIn\fP is called wordsize.
+The compiler can only generate code for EM with wordsize 2.
+Thus always:
+.EQ
+     maxint = 32767
+.EN
+Because the number -32768 may be used to indicate 'undefined', the
+range of available integers depends on the EM implementation:
+.I1
+\*(ON-32767..+32767.
+.br
+\*(OF-32768..+32767.
+.I2
+-32768..+32767.
+.IE
+.IT 6.9.4.2
+The default TotalWidth values for integer, Boolean and real types
+shall be implementation-defined.
+.IS
+The defaults are:
+     integer    6
+     Boolean    5
+     real      13
+.IT 6.9.4.5.1
+ExpDigits, the number of digits written in an exponent part of a real,
+shall be implementation-defined.
+.IS
+ExpDigits is defined as
+.EQ
+     ceil(log10(log10(2 ** epemax)))
+.EN
+For the current implementations this evaluates to 2.
+.IT 6.9.4.5.1
+The character written as part of the representation of
+a real to indicate the beginning of the exponent part shall be
+implementation-defined, either 'E' or 'e'.
+.IS
+The exponent part starts with 'e'.
+.IT 6.9.4.6
+The case of the characters written as representation of the
+Boolean values shall be implementation-defined.
+.IS
+The representations of true and false are 'true' and 'false'.
+.IT 6.9.6
+The effect caused by the standard procedure page
+on a text file shall be implementation-defined.
+.IS
+The ASCII character form feed FF (12) is written.
+.IT 6.10
+The binding of the variables denoted by the program-parameters
+to entities external to the program shall be implementation-defined if
+the variable is of a file-type.
+.IS
+The program parameters must be files and all, except input and output,
+must be declared as such in the program block.
+.PP
+The program parameters input and output, if specified, will correspond
+with the UNIX streams 'standard input' and 'standard output'.
+.PP
+The other program parameters will be mapped to the argument strings
+provided by the caller of this program.
+The argument strings are supposed to be path names of the files to be
+opened or created.
+The order of the program parameters determines the mapping:
+the first parameter is mapped onto the first argument string etc.
+Note that input and output are ignored in this mapping.
+.PP
+The mapping is recalculated each time a program parameter
+is opened for reading or writing by a call to the standard procedures
+reset or rewrite.
+This gives the programmer the opportunity to manipulate the list
+of string arguments using the external procedures argc, argv and argshift
+available in libpc [7].
+.IT 6.10
+The effect of an explicit use of reset or rewrite
+on the standard textfiles input or output shall be implementation-defined.
+.IS
+The procedures reset and rewrite are no-ops
+if applied to input or output.
+.CH "Implementation-dependent features"
+For each implementation-dependent feature mentioned in the ISO standard draft,
+we give the section number, the quotation from that section and the way
+this feature is treated by the Ack-Pascal system.
+First we quote the definition of 'implementation-dependent':
+.DS
+Those parts of the language which may differ between processors,
+and for which there need not be a definition for a particular processor.
+.DE
+.IT 5.1.1
+The method for reporting errors or warnings shall be implementation-dependent.
+.IS
+The error handling is treated in a following chapter.
+.IE
+.IT 6.1.4
+Other implementation-dependent directives may be defined.
+.IS
+Except for the required directive 'forward' the Ack-Pascal compiler recognizes
+only one directive: 'extern'.
+This directive tells the compiler that the procedure block of this
+procedure will not be present in the current program.
+The code for the body of this procedure must be included at a later
+stage of the compilation process.
+.PP
+This feature allows one to build libraries containing often used routines.
+These routines do not have to be included in all the programs using them.
+Maintenance is much simpler if there is only one library module to be
+changed instead of many Pascal programs.
+.PP
+Another advantage is that these library modules may be written in a different
+language, for instance C or the EM assembly language.
+This is useful if you want to use some specific EM instructions not generated
+by the Pascal compiler. Examples are the system call routines and some
+floating point conversion routines.
+Another motive could be the optimization of some time-critical program parts.
+.PP
+The use of external routines, however, is dangerous.
+The compiler normally checks for the correct number and type of parameters
+when a procedure is called and for the result type of functions.
+If an external routine is called these checks are not sufficient,
+because the compiler can not check whether the procedure heading of the
+external routine as given in the Pascal program matches the actual routine
+implementation.
+It should be the loader's task to check this.
+However, the current loaders are not that smart.
+Another solution is to check at run time, at least the number of words
+for parameters. Some EM implementations check this:
+.I1
+\*(ALthe number of words passed as parameters is checked, but this will not catch all faulty cases.
+.br
+\*(ONnot checked.
+.I2
+not checked.
+.IT 6.7.2.1
+The order of evaluation of the operands of a dyadic operator
+shall be implementation-dependent.
+.IS
+Operands are always evaluated, so the program part
+.EQ
+     if (p<>nil) and (p^.value<>0) then
+.EN
+is probably incorrect.
+.PP
+The left-hand operand of a dyadic operator is almost always evaluated
+before the right-hand side.
+Some peculiar evaluations exist for the following cases:
+.IS
+.ti -3
+1.~\
+the modulo operation is performed by a library routine to
+check for negative values of the right operand.
+.IE
+.sp
+.ti -3
+2.~\
+the expression
+.EQ
+     set1 <= set2
+.EN
+where set1 and set2 are compatible set types is evaluated in the
+following steps:
+.IS
+.CS
+evaluate set2
+.CS
+evaluate set1
+.CS
+compute set2+set1
+.CS
+test set2 and set2+set1 for equality
+.IE
+This is the only case where the right-hand side is computed first.
+.sp
+.ti -3
+3.~\
+the expression
+.EQ
+     set1 >= set2
+.EN
+where set1 and set2 are compatible set types is evaluated in the following steps:
+.IS
+.CS
+evaluate set1
+.CS
+evaluate set2
+.CS
+compute set1+set2
+.CS
+test set1 and set1+set2 for equality
+.IE
+.IT 6.7.3
+The order of evaluation, accessing and binding
+of the actual-parameters for functions
+shall be implementation-dependent.
+.IS
+The order of evaluation is from right to left.
+.IT 6.8.2.2
+If access to the variable in an assignment-statement involves the indexing of an array
+and/or a reference to a field within a variant of a record
+and/or the de-referencing of a pointer-variable
+and/or a reference to a buffer-variable,
+the decision whether these actions precede or follow the evaluation
+of the expression shall be implementation-dependent.
+.IS
+The expression is evaluated first.
+.IT 6.8.2.3
+The order of evaluation and binding of the actual-parameters for procedures
+shall be implementation-dependent.
+.IS
+The same as for functions.
+.IT 6.9.6
+The effect of inspecting a text file to which the page
+procedure was applied during generation is
+implementation-dependent.
+.IS
+The formfeed character written by page is
+treated like a normal character, with ordinal value 12.
+.IT 6.10
+The binding of the variables denoted by the program-parameters
+to entities external to the program shall be implementation-dependent unless
+the variable is of a file-type.
+.IS
+Only variables of a file-type are allowed as program parameters.
+.IE
+.CH "Error handling"
+There are three classes of errors to be distinguished.
+In the first class are the error messages generated by the compiler.
+The second class consists of the occasional errors generated by the other
+programs involved in the compilation process.
+Errors of the third class are the errors as defined in the standard by:
+.DS
+An error is a violation by a program of the requirements of this standard
+such that detection normally requires execution of the program.
+.DE
+.SH "Compiler errors"
+The error messages (and the listing) are not generated by the compiler itself.
+The compiler only detects errors and writes the errors in condensed form on
+an intermediate file.
+Each error in condensed form contains:
+.IS
+.CS
+an optional error message parameter (identifier or number).
+.CS
+an error number
+.CS
+a line number
+.CS
+a column number.
+.IE
+Every time the compiler detects an error that does not have influence
+on the code produced by the compiler or on the syntax decisions, a warning
+messages is given.
+If only warnings are generated, compilation proceeds and probably results
+in a correctly compiled program.
+.PP
+The intermediate error file is read by the interface program
+\fIack\fP [4],
+that produces the error messages.
+It uses an other file, the error message file,
+to find an error script line.
+Whenever this error script line contains the character '%', the error messages
+parameter is substituted.
+For negative error numbers the message constructed is prepended with 'Warning: '.
+.PP
+Sometimes the compiler produces several errors for the same file position
+(line number, column number).
+Only the first of these messages is given, because the others are probably
+directly caused by the first one.
+If the first one is a warning while one of its successors for that position
+is a fatal message, then the warning is promoted to a fatal one.
+However, parameterized messages are always given.
+.PP
+The error messages and listing come in three flavors, selected by flags
+given to \fIack\fP [4]:
+.in +10
+.sp
+.ti -8
+default:no listing, one line per error giving the file name
+of the Pascal source file, the line number and the error messages.
+.sp
+.ti -8
+-e:~~~~~for each erroneous line a listing of the line and its predecessor.
+The next line contains one or more characters '^' pointing to the
+places where an error is detected.
+For each error on that line a message follows.
+.sp
+.ti -8
+-E:~~~~~same as for '-e', except that all source lines are listed,
+even if the program is perfect.
+.IE
+.IE
+.SH "Other errors detected at compilation time"
+Two main categories: file system problems and table overflow.
+Problems with the file system may be caused by protection (you may not read
+or create files) or by space problems (no space left on device; out of inodes;
+too many processes).
+Table overflow problems are often caused by peculiar source programs:
+very long procedures or functions, a lot of strings.
+Table overflow problems can sometimes be cured
+by giving a flag (-sl when producing e.out files) to \fIack\fP [4].
+.PP
+Extensive treatment of these errors is outside the scope of this manual.
+.SH "Runtime errors"
+Errors detected at run time cause an error message to be generated on the
+diagnostic output stream (UNIX file descriptor 2).
+The message consists of the name of the program followed by a message
+describing the error, possibly followed by the source line number.
+Unless the l-option is turned off, the compiler generates code to keep track
+of which source line causes which EM instructions to be generated.
+It depends on the EM implementation whether these LIN instructions
+are skipped or executed:
+.I1
+LIN instructions are always executed. The old line number is saved and
+restored whenever a procedure or function is called.
+All error messages contain this line number, except when the l-option
+was turned off.
+.I2
+same as above, but line numbers are not saved when procedures and functions
+are called.
+.IE
+For each error mentioned in the standard we give the section number,
+the quotation from that section and the way it is processed by the
+Pascal-compiler or runtime system.
+.PP
+For detected errors the corresponding message
+and trap number are given.
+Trap numbers are useful for exception-handling routines.
+Normally, each error causes the program to terminate.
+By using exception-handling routines one can
+ignore errors or perform alternate actions.
+Only some of the errors can be ignored
+by restarting the failing instruction.
+These errors are marked as non-fatal,
+all others as fatal.
+A list of errors with trap number between 0 and 63
+(EM errors) can be found in [2].
+Errors with trap number between 64 and 127 (Pascal errors) are listed in [8].
+.IT 6.4.3.3
+It shall be an error if any field-identifier defined within a variant
+is used in a field-designator unless the value of the tag-field
+is associated with that variant.
+.IS
+This error is not detected.
+Sometimes this feature is used to achieve easy type conversion.
+However, using record variants this way is dangerous, error prone and not portable.
+.IT 6.4.6
+It shall be an error if a value of type T2 must be
+assignment-compatible with type T1, while
+T1 and T2 are compatible ordinal-types and the value of
+type T2 is not in the closed interval specified by T1.
+.IS
+The compiler distinguishes between array-index expressions and the other
+places where assignment-compatibility is required.
+.PP
+Array subscripting is done using the EM array instructions.
+These instructions have three arguments: the array base address,
+the index and the address of the array descriptor.
+An array descriptor describes one dimension by three values:
+the element size, the lower bound on the index and the number of elements
+minus one.
+It depends on the EM implementation whether these bounds are checked:
+.I1
+\*(ONchecked (array bound error, trap 0, non-fatal).
+.br
+\*(OFnot checked
+.I2
+not checked.
+.IE
+The other places where assignment-compatibility is required are:
+.IS
+.CS
+assignment
+.CS
+value parameters
+.CS
+procedures read and readln
+.CS
+the final value of the for-statement
+.IE
+For these places the compiler generates an EM range check instruction, except
+when the r-option is turned off, or when the range of values of T2
+is enclosed in the range of T1.
+If the expression consists of a single variable and if that variable
+is of a subrange type,
+then the subrange type itself is taken as T2, not its host-type.
+Therefore, a range instruction is only generated if T1 is a subrange type
+and if the expression is a constant, an expression with two or more
+operands, or a single variable with a type not enclosed in T1.
+If a constant is assigned, then the EM optimizer removes the range check
+instruction, except when the value is out of bounds.
+.PP
+It depends on the EM implementation whether the range check instruction
+is executed or skipped:
+.I1
+\*(ONchecked (range bound error, trap 1, non-fatal).
+.br
+\*(OFskipped
+.I2
+skipped
+.IE
+.IT 6.4.6
+It shall be an error if a value of type T2 must be
+assignment-compatible with type T1, while T1 and T2 are compatible
+set-types and any member of the value of type T2
+is not in the closed interval specified by the base-type
+of the type T1.
+.IS
+This error is not detected.
+.IT 6.5.4
+It shall be an error if
+the pointer-variable has a nil-value or is undefined at the time
+it is de-referenced.
+.IS
+The EM definition does not specify the binary representation of pointer
+values, so that it is not possible to choose an otherwise illegal
+binary representation for the pointer value NIL.
+Rather arbitrary the compiler uses the integer value zero to represent NIL.
+For all current implementations this does not cause problems.
+.PP
+The size of pointers depends on the implementation and is
+preset in the compiler by \fIack\fP [4].
+The compiler can be instructed, by the p-option, to use
+any size for pointer objects.
+NIL is represented here by the appropriate number of zero words.
+.PP
+It depends on the EM implementation whether de-referencing of a pointer
+with value NIL causes an error:
+.I1
+\*(ONfor every de-reference the pointer value is checked to be legal.
+The value NIL is always illegal.
+Objects addressed by a NIL pointer always cause an error, except
+when they are part of some extraordinary sized structure
+(bad pointer, trap 22, fatal).
+.br
+\*(OFde-referencing for fetching will not cause
+an error to occur.
+However, if the pointer value is used for a store operation,
+a segmentation violation probably results (memory fault, trap 21, fatal).
+(Note: this is only true if the interpreter is executed with coinciding
+address spaces and protected text part. The interpreter must therefore
+be loaded with the '-n' option of the UNIX loader [5]).
+.I2
+de-referencing for a fetch operation will not cause an error.
+A store operation probably causes an error if the '-n' flag is
+specified to \fIack\fP [4] or ld [5] while loading your program.
+.IE
+Some implementations of EM initialize all memory cells for newly
+created variables with a constant that probably causes an error if that variable
+is not initialized with a value of its own type before use.
+For each implementation we give whether memory cells are initialized,
+with what value, and whether this value causes an error if de-referenced.
+.I1
+each memory word is initialized with the bit representation 1000000000000000,
+representing -32768 in 2's complement notation.
+For most small and medium sized programs this value will cause a segmentation
+violation (memory fault, trap 21, fatal).
+.I2
+no initialization.
+Whenever a pointer is de-referenced, without being properly initialized,
+a segmentation violation (memory fault, trap 21, fatal)
+or 'bus error' are possible.
+.IE
+.IT 6.5.5
+It shall be an error if the value of a file-variable f is altered
+while the buffer-variable is an actual variable parameter, or
+an element of the record-variable-list of a with-statement, or both.
+.IS
+This error is not detected
+.IT 6.5.5
+It shall be an error if the value of a file-variable f is altered
+by an assignment-statement which contains the buffer-variable f^ in
+its left-hand side.
+.IS
+This error is not detected.
+.IT 6.6.5.2
+It shall be an error if
+the stated pre-assertion does not hold immediately
+prior to any use of the file handling procedures
+rewrite, put, reset and get.
+.IS
+For each of these four operations the pre-assertions
+can be reformulated as:
+.sp
+rewrite(f):~no pre-assertion.
+.br
+put(f):~~~~~f is opened for writing and f^ is not undefined.
+.br
+reset(f):~~~f exists.
+.br
+get(f):~~~~~f is opened for reading and eof(f) is false.
+.sp
+The following errors are detected for these operations:
+.sp
+rewrite(f):
+.in +10
+.ti -5
+more args expected, trap 64, fatal:
+.br
+f is a program-parameter and the corresponding
+file name is not supplied by the caller of the program.
+.ti -5
+rewrite error, trap 101, fatal:
+.br
+the caller of the program lacks the necessary
+access rights to create the file in the file system
+or operating system problems like table overflow
+prevent creation of the file.
+.in -10
+.sp
+put(f):
+.in +10
+.ti -5
+file not yet open, trap 72, fatal:
+.br
+reset or rewrite are never applied to the file.
+The checks performed by the run time system are not foolproof.
+.ti -5
+not writable, trap 96, fatal:
+.br
+f is opened for reading.
+.ti -5
+write error, trap 104, fatal:
+.br
+probably caused by file system problems.
+For instance, the file storage is exhausted.
+Because IO is buffered to improve performance,
+it might happen that this error occurs if the
+file is closed.
+Files are closed whenever they are rewritten or reset, or on
+program termination.
+.in -10
+.sp
+reset(f):
+.in +10
+.ti -5
+more args expected, trap 64, fatal:
+.br
+same as for rewrite(f).
+.ti -5
+reset error, trap 100, fatal:
+.br
+f does not exist, or the caller has insufficient access rights, or
+operating system tables are exhausted.
+.in -10
+.sp
+get(f):
+.in +10
+.ti -5
+file not yet open, trap 72, fatal:
+.br
+as for put(f).
+.ti -5
+not readable, trap 97, fatal:
+.br
+f is opened for writing.
+.ti -5
+end of file, trap 98, fatal:
+.br
+eof(f) is true just before the call to get(f).
+.ti -5
+read error, trap 103, fatal:
+.br
+unlikely to happen. Probably caused by hardware problems
+or by errors elsewhere in your program that destroyed
+the file information maintained by  the run time system.
+.ti -5
+truncated, trap 99, fatal:
+.br
+the file is not properly formed by an integer
+number of file elements.
+For instance, the size of a file of integer is odd.
+.ti -5
+non-ASCII char read, trap 106, non-fatal:
+.br
+the character value of the next character-type
+file element is out of range (0..127).
+Only for text files.
+.in -10
+.IT 6.6.5.3
+It shall be an error to change any variant-part of a variable
+allocated by the form new(p,c1,...,cn) from the variant specified.
+.IS
+This error is not detected.
+.IT 6.6.5.3
+It shall be an error if a variable to be disposed had been allocated
+using the form new(p,c1,...,cn) with more variants specified than
+specified to dispose.
+.IS
+This error can cause more memory to be freed then was allocated.
+Dispose causes a fatal trap 73 when memory already on the free
+list is freed again.
+.IT 6.6.5.3
+It shall be an error if the variants of a variable to be disposed
+are different from those specified by the case-constants to dispose.
+.IS
+This error is not detected.
+.IT 6.6.5.3
+It shall be an error if the value of the pointer parameter of dispose has
+nil-value or is undefined.
+.IS
+The same comments apply as for de-referencing NIL or undefined pointers.
+.IT 6.6.5.3
+It shall be an error if a variable that is identified by the pointer parameter
+of dispose (or a component thereof) is currently either an actual
+variable parameter, or an element of the record-variable-list of a
+with-statement, or both.
+.IS
+This error is not detected.
+.IT 6.6.5.3
+It shall be an error if a referenced-variable created using the second form
+of new is used in its entirety
+as an operand in an expression, or as the variable in an assignment-statement
+or as an actual-parameter.
+.IS
+This error is not detected.
+.IT 6.6.6.2
+It shall be an error if the mathematical defined result of an
+arithmetic function would fall outside the set of values
+of the indicated result.
+.IS
+Except for the errors for undefined arguments,
+the following errors may occur for the arithmetic functions:
+.in +16
+.ti -11
+abs(x):~~~~none.
+.ti -11
+sqr(x):~~~~real underflow, trap 5, non-fatal;
+.br
+real overflow, trap 4, non-fatal
+.ti -11
+sin(x):~~~~real underflow, trap 5, non-fatal
+.ti -11
+cos(x):~~~~real underflow, trap 5, non-fatal
+.ti -11
+exp(x):~~~~error in exp, trap 65, non-fatal (if x>10000);
+.br
+real underflow, trap 5, non-fatal;
+.br
+real overflow, trap 4, non-fatal
+.ti -11
+ln(x):~~~~~error in ln, trap 66, non-fatal ( if x<=0)
+.ti -11
+sqrt(x):~~~error in sqrt, trap 67, non-fatal (if x<0)
+.ti -11
+arctan(x):~real underflow, trap 5, non-fatal;
+.br
+real overflow, trap 4, non-fatal
+.in -16
+.IE
+.IT 6.6.6.2
+It shall be an error if x in ln(x) is not greater than zero.
+.IS
+See above.
+.IT 6.6.6.2
+It shall be an error if x in sqrt(x) is negative.
+.IS
+See above.
+.IT 6.6.6.2
+It shall be an error if
+the integer value of trunc(x) does not exist.
+.IS
+This error is detected (conversion error, trap 10, non-fatal).
+.IT 6.6.6.2
+It shall be an error if
+the integer value of round(x) does not exist.
+.IS
+This error is detected (conversion error, trap 10, non-fatal).
+.IT 6.6.6.2
+It shall be an error if
+the integer value of ord(x) does not exist.
+.IS
+This error can not occur, because the compiler will not allow
+such ordinal types.
+.IT 6.6.6.2
+It shall be an error if
+the character value of chr(x) does not exist.
+.IS
+Except when the r-option is turned off, the compiler generates an EM
+range check instruction. The effect of this instruction depends on the
+EM implementation as described before.
+.IT 6.6.6.2
+It shall be an error if the value of succ(x) does not exist.
+.IS
+Same comments as for chr(x).
+.IT 6.6.6.2
+It shall be an error if the value of pred(x) does not exist.
+.IS
+Same comments as for chr(x).
+.IT 6.6.6.5
+It shall be an error if
+f in eof(f) is undefined.
+.IS
+This error is detected (file not yet open, trap 72, fatal).
+.IT 6.6.6.5
+It shall be an error if
+f in eoln(f) is undefined, or if eof(f) is true at that time.
+.IS
+The following errors may occur:
+.IS
+file not yet open, trap 72, fatal;
+.br
+not readable, trap 97, fatal;
+.br
+end of file, trap 98, fatal.
+.IE
+.IT 6.7.1
+It shall be an error if any variable or function used as an operand in an expression is
+undefined at the time of its use.
+.IS
+Detection of undefined operands is only possible if there is at least one bit
+representation that is not allowed as legal value.
+The set of legal values depends on the type of the operand.
+To detect undefined operands, all newly created variables must be assigned
+a value illegal for the type of the created variable.
+The compiler itself does not generate code to initialize newly created variables.
+Instead, the compiler generates code to allocate some new memory cells.
+It is up to the EM implementation to initialize these memory cells.
+However, the EM machine does not know the types of the variables for which
+memory cells are allocated.
+Therefore, the best an EM implementation can do is to initialize with a value
+that is illegal for the most common types of operands.
+.PP
+For all current EM implementations we will describe whether memory cells
+are initialized, which value is used to initialize, for each operand type
+whether that value is illegal, and for all operations on all operand
+types whether that value is detected as undefined.
+.I1
+\*(ONnew memory words are initialized with -32768.
+Assignment of this value is always allowed. Errors may occur
+whenever undefined operands are used in operations.
+.br
+.ul
+integer:
+-32768 is illegal. All arithmetic operations (except unary +) cause
+an error (undefined integer, trap 8, non-fatal).
+Relational operations do not, except for IN when the left operand is undefined.
+Printing of -32768 using write is allowed.
+.br
+.ul
+real:
+the bit representation of a real, caused by initializing the constituent
+memory words with -32768, is illegal.
+All arithmetic and relational operations (except unary +) cause an error
+(real undefined, trap 9, non-fatal).
+Printing causes the same error.
+.br
+.ul
+char:
+the value -32768 is illegal. For objects of type 'packed array[] of char'
+half the characters will have the value chr(0), which is legal, and the
+others will have the value chr(128), outside the valid ASCII range.
+The relational operators, however, do not cause an error.
+.br
+.ul
+Boolean:
+the value -32768 is illegal. For objects of type 'packed array[] of boolean'
+half the booleans will have the value false, while the others have the value v,
+where ord(v) = 128, naturally illegal.
+However, the Boolean and relational operations do not cause an error.
+.br
+.ul
+set:
+undefined operands of type set can not be distinguished from
+properly initialized ones.
+The set and relational operations, therefore, can never cause an error.
+However, if one forgets to initialize a set of character, then spurious
+characters like '/', '?', 'O', '_' and 'o' appear.
+.sp
+\*(OFnew memory cells are initialized with -32768.
+The only cases where this value causes an error are when
+an undefined operand of type real is used in an arithmetic or relational
+operation (except unary +) or when an undefined real is used as an
+argument to a standard function.
+.I2
+Newly created memory cells are not initialized and therefore
+they have a random value.
+.IT 6.7.1
+It shall be an error if
+the value of any member denoted by any member-designator of the
+set-constructor is outside the implementation-defined limits.
+.IS
+This error is detected (set bound error, trap 2, non-fatal).
+.IT 6.7.1
+It shall be an error if
+the possible types of an set-constructor do not permit it
+to assume a suitable type.
+.IS
+The compiler allocates as many bits as are necessary to store all
+elements of the host-type of the base-type of the set, not the
+base-type itself.
+Therefore, all possible errors can be detected at compile time.
+.IT 6.7.2.2
+It shall be an error if j is zero in 'i div j'.
+.IS
+It depends on the EM implementation whether this error is detected:
+.I1
+\*(ONdetected (divide by 0, trap 6, non-fatal).
+.br
+\*(OFnot detected.
+.I2
+not detected.
+.IE
+.IT 6.7.2.2
+It shall be an error if
+j is zero or negative in i MOD j.
+.IS
+This error is detected (only positive j in 'i mod j', trap 71, non-fatal).
+.IT 6.7.2.2
+It shall be an error if the result of any operation on integer
+operands is not performed according to the mathematical
+rules for integer arithmetic.
+.IS
+The reaction depends on the EM implementation:
+.I1
+\*(ONerror detected if
+.EQ
+     (result >= 32768) or (result < -32768).
+.EN
+(integer overflow, trap 3, non-fatal).
+Note that if the result is -32768 the use of this value in further operations
+may cause an error.
+.br
+\*(OFnot detected.
+.I2
+not detected.
+.IT 6.8.3.5
+It shall be an error if none of the case-constants is equal to the value of the
+case-index upon entry to the case-statement.
+.IS
+This error is detected (case error, trap 20, fatal).
+.IT 6.8.3.9
+It shall be an error if the final-value of a for-statement is not
+assignment-compatible with the control-variable when the
+initial-value is assigned to the control-variable.
+.IS
+It is detected if the control variable leaves
+its allowed range of values while stepping
+from initial to final value.
+This is equivalent with the requirements if the
+for-statement is not terminated before
+the final value is reached.
+.IT 6.9.2
+It shall be an error if the sequence of characters read looking for an integer does not
+form a signed-integer as specified in 6.1.5.
+.IS
+This error is detected (digit expected, trap 105, non-fatal).
+.IT 6.9.2
+It shall be an error if the sequence of characters read looking for a real does not
+form a signed-number as specified in 6.1.5.
+.IS
+This error is detected (digit expected, trap 105, non-fatal).
+.IT 6.9.2
+It shall be an error if read is applied to f while f is undefined or
+not opened for reading.
+.IS
+This error is detected (see get(f)).
+.IT 6.9.4
+It shall be an error if write is applied to f while f is undefined or
+not opened for writing.
+.IS
+This error is detected (see put(f)).
+.IT 6.9.4
+It shall be an error if TotalWidth or FracDigits as specified in
+write or writeln procedure calls are less than one.
+.IS
+This error is not detected. Moreover, it is considered an extension to
+allow zero or negative values.
+.IT 6.9.6
+It shall be an error if page is applied to f while f is undefined or
+not opened for writing.
+.IS
+This error is detected (see put(f)).
+.CH "Extensions to the standard"
+.IS
+.ti -3
+1.~\
+Separate compilation.
+.sp
+The compiler is able to (separately) compile a collection of declarations,
+procedures and functions to form a library.
+The library may be linked with the main program, compiled later.
+The syntax of these modules is
+.EQ
+     module = [constant-definition-part]
+              [type-definition-part]
+              [var-declaration-part]
+              [procedure-and-function-declaration-part]
+.EN
+The compiler accepts a program or a module:
+.EQ
+     unit = program | module
+.EN
+All variables declared outside a module must be imported
+by parameters, even the files input and output.
+Access to a variable declared in a module is only possible
+using the procedures and functions declared in that same module.
+By giving the correct procedure/function heading followed by the
+directive 'extern' you may use procedures and functions declared in
+other units.
+.sp
+.ti -3
+2.~\
+Assertions.
+.sp
+The Ack-Pascal compiler recognizes an additional statement, the assertion.
+Assertions can be used as an aid in debugging and documentation.
+The syntax is:
+.EQ
+     assertion = 'assert' Boolean-expression
+.EN
+An assertion is a simple-statement, so
+.EQ
+     simple-statement = [assignment-statement |
+                         procedure-statement |
+                         goto-statement |
+                         assertion
+                        ]
+.EN
+An assertion causes an error if the Boolean-expression is false.
+That is its only purpose.
+It does not change any of the variables, at least it should not.
+Therefore, do not use functions with side-effects in the Boolean-expression.
+If the a-option is turned off, then assertions are skipped by the
+compiler. 'assert' is not a word-symbol (keyword) and may be used as identifier.
+However, assignment to a variable and calling of a procedure with that name will be impossible.
+.sp
+.ti -3
+3.~\
+Additional procedures.
+.sp
+Three additional standard procedures are available:
+.IS
+.IS
+.ti -8
+halt:~~~a call of this procedure is equivalent to jumping to the
+end of your program. It is always the last statement executed.
+The exit status of the program may be supplied
+as optional argument.
+.ti -8
+release:
+.ti -8
+mark:~~~for most applications it is sufficient to use the heap as second stack.
+Mark and release are suited for this type of use, more suited than dispose.
+mark(p), with p of type pointer, stores the current value of the
+heap pointer in p. release(p), with p initialized by a call
+of mark(p), restores the heap pointer to its old value.
+All the heap objects, created by calls of new between the call of
+mark and the call of release, are removed and the space they used
+can be reallocated.
+Never use mark and release together with dispose!
+.sp
+.in -10
+.ti -3
+4.~\
+UNIX interfacing.
+.sp
+If the c-option is turned on, then some special features are available
+to simplify an interface with the UNIX environment.
+First of all, the compiler allows you to use a different type
+of string constants.
+These string constants are delimited by double quotes ('"').
+To put a double quote into these strings, you must repeat the double quote,
+like the single quote in normal string constants.
+These special string constants are terminated by a zero byte (chr(0)).
+The type of these constants is a pointer to a packed array of characters,
+with lower bound 1 and unknown upper bound.
+.br
+Secondly, the compiler predefines a new type identifier 'string' denoting
+this just described string type.
+.PP
+The only thing you can do with these features is declaration of
+constants and variables of type 'string'.
+String objects may not be allocated on the heap and string pointers
+may not be de-referenced.
+Still these strings are very useful in combination with external routines.
+The procedure write is extended to print these zero-terminated strings correctly.
+.sp
+.ti -3
+5.~\
+Double length (32 bit) integers.
+.sp
+If the d-option is turned on, then the additional type 'long' is known to the compiler.
+Long variables have integer values in the range -2147483647..+2147483647.
+Long constants may be declared.
+It is not allowed to form subranges of type long.
+All operations allowed on integers are also
+allowed on longs and are indicated by the same
+operators: '+', '-', '*', '/', 'div', 'mod'.
+The procedures read and write have been extended to handle long arguments correctly.
+The default width for longs is 11.
+The standard procedures 'abs' and 'sqr' have been extended to work on long arguments.
+Conversion from integer to long, long to real,
+real to long and long to integer are automatic, like the conversion from integer to real.
+These conversions may cause a
+.IS
+conversion error, trap 10, non-fatal
+.IE
+This last error is only detected in implementation 1, with 'test on'.
+Note that all current implementations use target
+machine floating point instructions
+to perform some of the long operations.
+.sp
+.ti -3
+6.~\
+Underscore as letter.
+.sp
+The character '_' may be used in forming identifiers, if the u-option is turned on.
+.sp
+.ti -3
+7.~\
+Zero field width in write.
+.sp
+Zero or negative TotalWidth arguments to write
+are allowed.
+No characters are written for character, string or Boolean type arguments then.
+A zero or negative FracDigits argument for fixed-point representation of reals causes the
+fraction and the character '.' to be suppressed.
+.sp
+.ti -3
+8.~\
+Alternate symbol representation.
+.sp
+The comment delimiters '(*' and '*)' are recognized and treated like '{' and '}'.
+The other alternate representations of symbols are not recognized.
+.CH "Deviations from the standard"
+Ack-Pascal deviates from the (March 1980) standard proposal in the following ways:
+.IS
+.ti -3
+1.~\
+Only the first 8 characters of identifiers are significant,
+as requested by all standard proposals prior to March 1980.
+In that proposal, however, the sentence
+.DS
+"A conforming program should not have its meaning altered
+by the truncation of its identifiers to eight characters
+or the truncation of its labels to four digits."
+.DE
+is missing.
+.sp
+.ti -3
+2.~\
+The character sequences 'procedur', 'procedur8', 'functionXyZ' etc. are
+all erroneously classified as the word-symbols 'procedure' and 'function'.
+.sp
+.ti -3
+3.~\
+Standard procedures and functions are not allowed as parameters in Ack-Pascal,
+conforming to all previous standard proposals.
+You can obtain the same result with negligible loss of performance
+by declaring some user routines like:
+.EQ
+     function sine(x:real):real;
+     begin
+         sine:=sin(x)
+     end;
+.EN
+.sp
+.ti -3
+4.~\
+The scope of identifiers and labels should start at the beginning of the block
+in which these identifiers or labels are declared.
+The Ack-Pascal compiler, as most other one pass compilers, deviates in this respect,
+because the scope of variables and labels start
+at their defining-point.
+.CH "Compiler options"
+Some options of the compiler may be controlled by using "{$....}".
+Each option consists of a lower case letter followed by +, - or an unsigned
+number.
+Options are separated by commas.
+The following options exist:
+.in 8
+.sp
+.ti -8
+a~+/-~~~\
+this option switches assertions on and off.
+If this option is on, then code is included to test these assertions
+at run time. Default +.
+.sp
+.ti -8
+c~+/-~~~\
+this option, if on, allows you to use C-type string constants
+surrounded by double quotes.
+Moreover, a new type identifier 'string' is predefined.
+Default -.
+.sp
+.ti -8
+d~+/-~~~\
+this option, if on, allows you to use variables of type 'long'.
+Default -.
+.sp
+.ti -8
+f~<num>~\
+the size of reals can be changed by this option. <num> should be specified in 8-bit bytes.
+The default in most implementations is 8, but other values can
+occur.
+.sp
+.ti -8
+i~<num>~\
+with this flag the setsize for a set of integers can be
+manipulated.
+The number must be the number of bits per set.
+The default value is 16, just fitting in one word on the PDP and many other minis.
+.sp
+.ti -8
+l~+/-~~~\
+if + then code is inserted to keep track of the source line number.
+When this flag is switched on and off, an incorrect line number may appear
+if the error occurs in a part of your program for which this flag is off.
+These same line numbers are used for the profile, flow and count options
+of the EM interpreter em [6].
+Default +.
+.sp
+.ti -8
+p~<num>~the size of pointers can be changed by this option. <num> should be specified in bytes.
+Default 2 in most implementations.
+.sp
+.ti -8
+r~+/-~~~\
+if + then code is inserted to check subrange variables against
+lower and upper subrange limits.
+Default +.
+.sp
+.ti -8
+s~+/-~~~\
+if + then the compiler will hunt for places in your program
+where non-standard features are used, and for each place found
+it will generate a warning. Default -.
+.sp
+.ti -8
+t~+/-~~~\
+if + then each time a procedure is entered, the routine 'procentry'
+is called.
+The compiler checks this flag just before the first symbol that follows the
+first 'begin' of the body of the procedure.
+Also, when the procedure exits, then the procedure 'procexit' is called
+if the t flag is on just before the last 'end' of the procedure body.
+Both 'procentry' and 'procexit' have a packed array of 8 characters as a parameter.
+Default procedures are present in the run time library.
+Default -.
+.sp
+.ti -8
+u~+/-~~~\
+if + then the character '_' is treated like a lower case letter,
+so that it may be used in identifiers.
+Procedure and function identifiers starting with an underscore may cause problems,
+because they may collide with library routine names.
+Default -.
+.in 0
+.sp
+Seven of these flags (c, d, f, i, p, s and u) are only effective when they appear
+before the 'program' symbol. The others may be switched on and off.
+.PP
+A second method of passing options to the compiler ia available.
+This method uses the file on which the compact EM code will be written.
+The compiler starts reading from this file scanning for options
+in the same format as used normally, except for the comment delimiters and
+the dollar sign.
+All options found on the file override the options set in your program.
+Note that the compact code file must always exist before the compiler is called.
+.PP
+The user interface program \fIack\fP[4]
+takes care of creating this file normally
+and also writes one of its options onto this file.
+The user can specify, for instance, without changing any character in its
+Pascal program, that the compiler must include code for
+procedure/function tracing.
+.PP
+Another very powerful debugging tool is the knowledge that inaccessible
+statements and useless tests are removed by the EM optimizer.
+For instance, a statement like:
+.sp
+.nf
+        if debug then
+          writeln('initialization done');
+.fi
+.sp
+is completely removed by the optimizer if debug is a constant with
+value false.
+The first line is removed if debug is a constant with value true.
+Of course, if debug is a variable nothing can be removed.
+.PP
+A disadvantage of Pascal, the lack of preinitialized data, can be
+diminished by making use of the possibilities of the EM optimizer.
+For instance, initializing an array of reserved words is sometimes
+optimized into 3 EM instructions. To maximize this effect you must initialize
+variables as much as possible in order of declaration and array entries
+in order of decreasing index.
+.CH "References"
+.in +5
+.ti -5
+[1]~~\
+ISO standard proposal ISO/TC97/SC5-N462, dated February 1979.
+The same proposal, in slightly modified form, can be found in:
+A.M.Addyman e.a., "A draft description of Pascal",
+Software, practice and experience, May 1979.
+An improved version, received March 1980,
+is followed as much as possible for the
+current Ack-Pascal.
+.sp
+.ti -5
+[2]~~\
+A.S.Tanenbaum, J.W.Stevenson, Hans van Staveren, E.G.Keizer,
+"Description of a machine architecture for use with block structured languages",
+Informatica rapport IR-81.
+.sp
+.ti -5
+[3]~~\
+W.S.Brown, S.I.Feldman, "Environment parameters and basic functions
+for floating-point computation",
+Bell Laboratories CSTR #72.
+.sp
+.ti -5
+[4]~~\
+UNIX manual ack(I).
+.sp
+.ti -5
+[5]~~\
+UNIX manual ld(I).
+.sp
+.ti -5
+[6]~~\
+UNIX manual em(I).
+.sp
+.ti -5
+[7]~~\
+UNIX manual libpc(VII)
+.sp
+.ti -5
+[8]~~\
+UNIX manual pc_prlib(VII)
diff --git a/doc/peep.doc b/doc/peep.doc
new file mode 100644
index 000000000..c5ceab4aa
--- /dev/null
+++ b/doc/peep.doc
@@ -0,0 +1,505 @@
+.TL
+Internal documentation on the peephole optimizer
+.br
+from the Amsterdam Compiler Kit
+.NH 1
+Introduction
+.PP
+Part of the Amsterdam Compiler Kit is a program to do
+peephole optimization on an EM program.
+The optimizer scans the program to match patterns from a table
+and if found makes the optimization from the table,
+and with the result of the optimization
+it tries to find yet another optimization
+continuing until no more optimizations are found.
+.PP
+Furthermore it does some optimizations that can not be called
+peephole optimizations for historical reasons,
+like branch chaining and the deletion of unreachable code.
+.PP
+The peephole optimizer consists of three parts
+.IP 1)
+A driving table
+.IP 2)
+A program translating the table to internal format
+.IP 3)
+C code compiled with the table to make the optimizer proper
+.PP
+In this document the table format, internal format and 
+data structures in the optimizer will be explained,
+plus a hint on what the code does where it might not be obvious.
+It is a simple program mostly.
+.NH 1
+Table format
+.PP
+The driving table consists of pattern/replacement pairs,
+in principle one per line,
+although a line starting with white space is considered
+a continuation line for the previous.
+The general format is:
+.DS
+optimization : pattern ':' replacement '\en'
+.sp
+pattern : EMlist optional_boolean_expression
+.sp
+replacement : EM_plus_operand_list
+.DE
+Example of a simple one
+.DS
+loc stl $1==0 : zrl $2
+.DE
+There is no real limit for the length of the pattern or the replacement,
+the replacement might even be longer than the pattern,
+and expressions can be made arbitrarily complicated.
+.PP
+The expressions in the table are made of the following pieces:
+.IP -
+Integer constants
+.IP -
+$\fIn\fP, standing for the operand of the \fIn\fP'th EM
+instruction in the pattern,
+undefined if that instruction has no operand.
+.IP -
+w, standing for the wordsize of the code optimized.
+.IP -
+p, for the pointersize.
+.IP -
+defined(expr), true if expression is defined
+.IP -
+samesign(expr,expr), true if expressions have the same sign.
+.IP -
+sfit(expr,expr), ufit(expr,expr),
+true if the first expression fits signed or unsigned in the number
+of bits given in the second expression.
+.IP -
+rotate(expr,expr),
+first expression rotated left the number of bits given by the second expression.
+.IP -
+notreg(expr),
+true if the local with the expression as number is not a candidate to put
+in a register.
+.IP -
+rom(\fIn\fP,expr), contents of the rom descriptor at index expr that
+is associated with the global label that should be the argument of
+the \fIn\fP'th EM instruction.
+Undefined if such a thing does not exist.
+.PP
+The usual arithmetic operators may be used on integer values,
+if any operand is undefined the expression is undefined,
+except for the defined() function above.
+An undefined expression used for its truth value is false.
+All arithmetic on local label operands is forbidden,
+only things allowed are tests for equality.
+Arithmetic on global labels makes sense,
+i.e. one can add a global label and a constant,
+but not two global labels.
+.PP
+In the table one can use five additional EM instructions in patterns.
+These are:
+.IP lab
+Stands for a local label
+.IP LLP
+Load Local Pointer, translates into a 
+.B lol
+or into a 
+.B ldl
+depending on the relationship between wordsize and pointersize.
+.IP LEP
+Load External Pointer, translates into a 
+.B loe
+or into a 
+.B lde .
+.IP SLP
+Store Local Pointer,
+.B stl
+or 
+.B sdl .
+.IP SEP
+Store External Pointer,
+.B ste
+or
+.B sde .
+.PP
+There is only one peephole optimizer,
+so the substitutions to be made for the last four instructions
+are made at run time before the first optimizations are made.
+.NH 1
+Internal format
+.PP
+The translating program,
+.I mktab
+converts the table into an array of bytes where all
+patterns follow unaligned.
+Format of a pattern is:
+.IP 1)
+One byte for high byte of hash value,
+will be explained later on.
+.IP 2)
+Two bytes for the index of the next pattern in a chain.
+.IP 3)
+An integer\u*\d,
+.FS
+* An integer is encoded as a byte when less than 255,
+otherwise as a byte containing 255 followed by two
+bytes with the real value.
+.FE
+pattern length.
+.IP 4)
+The list of pattern opcodes, one per byte.
+.IP 5)
+An integer expression index, 0 if not used.
+.IP 6)
+An integer, replacement length.
+.IP 7)
+A list of pairs consisting of a one byte opcode and an integer
+expression index.
+.PP
+The expressions are kept in an array of triples,
+implementing a binary tree.
+The
+.I mktab
+program tries to minimize the number of triples by reusing
+duplicates and even reverses the operands of commutative operators
+when doing so would spare a triple.
+.NH 1
+A tour through the sources
+.PP
+Now we will walk through the sources and note things of interest.
+.NH 2
+The header files
+.PP
+The header files are the place where data structures and options reside.
+.NH 3
+alloc.h
+.PP
+In the header file alloc.h several defines can be used to select various
+kinds of core allocation schemes.
+This is important on small machines like the PDP-11 since a complete
+procedure must be in core at the same space,
+and the peephole optimizer should not be the limiting factor in
+determining the maximum size of procedures if possible.
+Options are:
+.IP -
+USEMALLOC, standard malloc() and free() are used instead of the own
+core allocation package.
+Not recommended unless the own package does not work on some bizarre
+machine.
+.IP -
+COREDEBUG, prints large amounts of information about core management.
+Better not define it unless you change the code and it stops working.
+.IP -
+SEPID, if you define this you will get an extra procedure that will
+go through a lot of work to scrape the last bytes together if the
+system won't provide more.
+This is not a good idea if memory is scarce and code and data reside
+in the same spaces, since the room used by the procedure might well
+be more than the room saved.
+.IP -
+STACKROOM, number of shorts used in stack space.
+This is used if memory is scarce and stack space and data space are
+different.
+On the PDP-11 a UNIX process starts with an 8K stack segment which
+cannot be transferred to the data segment.
+Under these conditions one can use a lot of the stack space for storage.
+.NH 3
+assert.h
+.PP
+Just defines the assert macro.
+When compiled with -DNDEBUG all asserts will be off.
+.NH 3
+ext.h
+.PP
+Gives external definitions of variables used by more than one module.
+.NH 3
+line.h
+.PP
+Defines the structures used to keep instructions,
+one structure per line of EM code,
+and the structure to keep arguments of pseudos,
+one structure per argument.
+Both structures essentially contain a pointer to the next,
+a type,
+and a union containing information depending on the type.
+Core is allocated only for the part of the union used.
+.PP
+The 
+.I
+struct line
+.R
+has a very compact encoding for small integers,
+they are encoded in the type field.
+On the PDP-11 this gives a line structure of only 4 bytes for most
+instructions.
+.NH 3
+lookup.h
+.PP
+Contains definition of the struct used for symbol table management,
+global labels and procedure names are kept in one table.
+.NH 3
+optim.h
+.PP
+If one defines the DIAGOPT option in this header file,
+for every optimization performed a number is written on stderr.
+The number gives the number of the pattern in the table
+or one of the four special numbers in this header file.
+.NH 3
+param.h
+.PP
+Contains one settable option,
+LONGOFF.
+If this is not defined the optimizer can only optimize programs
+with wordsize 2 and pointersize 2.
+Set this only if it must be run on a Z80 or something pathetic like that.
+.PP
+Other defines here should not be touched.
+.NH 3
+pattern.h
+.PP
+Contains defines of indices in a pattern,
+definition of the expression triples,
+definitions of the various expression operators
+and definition of the result struct where expression results are put.
+.PP
+This header file is the main one that is also included by
+.I mktab .
+.NH 3
+proinf.h
+.PP
+This one contains definitions 
+for the local label table structs
+and for the struct where all information for one procedure is kept.
+This is in one struct so it can be saved easily when recursive
+procedures have to be resolved.
+.NH 3
+types.h
+.PP
+Collection of typedefs to be used by almost all modules.
+.NH 2
+The C code itself.
+.PP
+The C code will now be the center of our attention.
+We will make a walk through the sources and we will try
+to follow the sources in a logical order.
+So we will start at
+.NH 3
+main.c
+.PP
+The main.c module contains the main() function.
+Here nothing spectacular happens,
+only thing of interest is the handling of flags:
+.IP -L
+This is an instruction to the peephole optimizer to perform
+one of its auxiliary functions, the generation of a library module.
+This makes the peephole optimizer write its output on a temporary file,
+and at the end making the real output by first generating a list
+of exported symbols and then copying the temporary file behind it.
+.IP -n
+Disables all optimization.
+Only thing the optimizer does now is filling in the blank after the
+.I END
+pseudo and resolving recursive procedures.
+.PP
+The place where main() is left is the call to getlines() which brings
+us to
+.NH 3
+getline.c
+.PP
+This module reads the EM code and constructs a list of 
+.I
+struct line
+.R
+records,
+linked together backwards,
+i.e. the first instruction read is the last in the list.
+Pseudos are handled here also,
+for most pseudos this just means that a chain of argument records
+is linked into the linked line list but some pseudos get special attention:
+.IP exc
+This pseudo is acted upon right away.
+Lines read are shuffled around according to instruction.
+.IP mes
+Some messages are acted upon.
+These are:
+.RS
+.IP ms_err 8
+The input is drained, just in case it is a pipe.
+After that the optimizer exits.
+.IP ms_opt
+The do not optimize flag is set.
+Acts just like -n on the command line.
+.IP ms_emx
+The word- and pointersize are read,
+complain if we are not able to handle this.
+.IP ms_reg
+We take notice of the offset of this local.
+See also comments in the description of peephole.c
+.RE
+.IP pro
+A new procedure starts, if we are already in one save the status,
+else process collected input.
+Collect information about this procedure and if already in a procedure
+call getlines() recursively.
+.IP end
+Process collected input.
+.PP
+The phrase "process collected input" is used twice,
+which brings us to
+.NH 3
+process.c
+.PP
+This module contains the entry point process() which is called at any
+time the collected input must be processed.
+It calls a variety of other routines to get the real work done.
+Routines in this module are in chronological order:
+.IP symknown 12
+Marks all symbols seen until now as known,
+i.e. it is now known whether their scope is local or global.
+This information is used again during output.
+.IP symvalue
+Runs through the chain of pseudos to give values to data labels.
+This needs an extra pass.
+It cannot be done during the getlines pass, since an
+.B exc
+pseudo could destroy things.
+Nor can it be done during the backward pass since it is impossible
+to do good fragment numbering backward.
+.IP checklocs
+Checks whether all local labels referenced are defined.
+It needs to be sure about this since otherwise the
+semi global optimizations made cannot work.
+.IP relabel
+This routine finds the final destination for each label in the procedure.
+Labels followed by unconditional branches or other labels are marked during
+the peephole fase and this leeds to chains of identical labels.
+These chains are followed here, and in the local label table each label
+has associated with it its replacement label, after this procedure is run.
+Care is taken in this routine to prevent a loop in the program to
+cause the optimizer to loop.
+.IP cleanlocals
+This routine empties the local label table after everything
+is processed.
+.PP
+But before this can all be done,
+the backward linked list of instructions first has to be reversed,
+so here comes
+.NH 3
+backward.c
+.PP
+The routine backward has a number of functions:
+.IP -
+It reverses the backward linked list, making two forward linked lists,
+one for the instructions and one for the pseudos.
+.IP -
+It notes the last occurrence of data labels in the backward linked list
+and puts it in the global symbol table.
+This is of course the first occurence in the procedure.
+This information is needed to decide whether the symbols are global
+or local to this module.
+.IP -
+It decides about the fragment boundaries of data blocks.
+Fragments are numbered backwards starting at 3.
+This is done to be able to make the type of an expression
+containing a symbol equal to its fragment.
+This type can then not clash with the types integer and local label.
+.IP -
+It allocates a rom buffer to every data label with a rom behind
+it, if that rom contains only plain integers at the start.
+.PP
+The first thing done after process() has called backward() and some
+of its own little routines is a call to the real routine,
+the one that does the work the program was written for
+.NH 3
+peephole.c
+.PP
+The first routines in peephole.c 
+implement a linked list for the offsets of local variables
+that are candidates for a register implementation.
+Several patterns use the notreg() function,
+since it is forbidden to combine a load of that variable
+with the load of another and
+it is not allowed to take the address of that variable.
+.PP
+The routine peephole hashes the patterns the first time it is called
+after which it doesn't do much more than calling optimize.
+But first hashpatterns().
+.PP
+The patterns are hashed at run time of the optimizer because of
+the
+.B LLP ,
+.B LEP ,
+.B SLP 
+and
+.B SEP
+instructions added to the instruction set in this optimizer.
+These are first replaced everywhere in the table by the correct
+replacement after which the first three instructions of the
+pattern are hashed and the pattern is linked into one of the
+256 linked lists.
+There is a define CHK_HASH in this module that you
+can set if you do not trust the randomness of the hashing
+function.
+.PP
+The attention now shifts to optimize().
+This routine calls  basicblock() for every piece of code between two labels.
+It also notes which labels have another label or a branch behind them
+so the relabel() routine from process.c can do something with that.
+.PP
+Basicblock() keeps making passes over its basic block
+until no more optimizations are found.
+This might be inefficient if there is a long basicblock with some
+deep recursive optimization in one part of it.
+The entire basic block is then scanned a lot of times just for
+that one piece.
+The alternative is backing up after making an optimization and running
+through the same code again, but that is difficult
+in a single linked list.
+.PP
+It hashes instructions and calls trypat() for every pattern that has
+a full hash value match,
+i.e. lower byte and upper byte equal.
+Longest pattern is tried first.
+.PP
+Trypat() checks length and opcodes of the pattern.
+If correct it fills the iargs[] array with argument values
+and calculates the expression.
+If that is also correct the work shifts to tryrepl().
+.PP
+Tryrepl() generates the list of replacement instructions,
+links it into the list and returns true.
+Why then the name tryrepl() if it always succeeds?
+Well, there is a mechanism in the optimizer,
+unused until today that makes it possible to do optimizations that cannot
+be described by the table.
+It is possible to give a number as a replacement which will cause the
+optimizer to call a routine special() to do some work.
+This routine might decide not to do an optimization and return false.
+.PP
+The last routine that is called from process() is putline()
+to write the optimized code, bringing us to
+.NH 3
+putline.c
+.PP
+The major part of putline.c is the standard set of routines
+that makes EM compact code.
+The extra functions performed are:
+.IP -
+For every occurence of a global symbol it might be necessary to
+output a 
+.B exa ,
+.B exp ,
+.B ina
+or 
+.B inp
+pseudo instruction.
+That task is performed.
+.IP -
+The
+.B lin
+instructions are optimized here,
+.B lni
+instructions added for 
+.B lin
+instructions and superfluous
+.B lin
+instructions deleted.
+
diff --git a/doc/regadd.doc b/doc/regadd.doc
new file mode 100644
index 000000000..c5dd5de00
--- /dev/null
+++ b/doc/regadd.doc
@@ -0,0 +1,131 @@
+.TL
+Addition of register variables to an existing table.
+.NH 1
+Introduction
+.PP
+This is a short description of the newest feature in the
+table driven code generator for the Amsterdam Compiler Kit.
+It describes how to add register variables to an existing table.
+This assumes you have the distribution of October 1983 or later.
+It is not clear whether you should read this when starting with
+a table for a new machine,
+or whether you should wait till the table is well debugged already.
+.NH 1
+Modifications to the table itself.
+.NH 2
+Register section
+.PP
+You can add just before the properties of the register one
+of the following:
+.IP - 2
+regvar
+.IP -
+regvar ( pointer )
+.IP -
+regvar ( loop )
+.IP -
+regvar ( float )
+.LP
+All register variables of one type must be of the same size,
+and they may have no subregisters.
+.NH 2
+Codesection
+.PP
+.IP - 2
+Two pseudo functions are added to the list allowed inside expressions:
+.RS
+.IP 1) 3
+inreg ( expr ) has as a parameter the offset of a local,
+and returns 0,1 or 2:
+.RS
+.IP 2: 3
+if the variable is in a register.
+.IP 1:
+if the variable could be in a register but isn't.
+.IP 0:
+if the variable cannot be in a register.
+.RE
+.IP 2)
+regvar ( expr ) returns the register associated with the variable.
+Undefined if it is not in a register.
+So regvar ( expr ) is defined if and only if inreg (expr ) == 2.
+.RE
+.IP -
+It is now possible to remove() a register expression,
+this is of course needed for a store into a register local.
+.IP -
+The return out of a procedure may now involve register restores,
+so the special word 'return' in the table will invoke a user defined
+function.
+.NH 1
+Modifications to mach.c
+.PP
+If register variables are used in a table, the program
+.I cgg
+will define the word REGVARS during compilation of the sources.
+So the following functions described here should be bracketed
+by #ifdef REGVARS and #endif.
+.IP - 2
+regscore(off,size,typ,freq,totyp) long off;
+.br
+This function should assign a score to a register variable,
+the score should preferably be the estimated number of bytes
+gained when it is put in a register.
+Off and size are the offset and size of the variable,
+typ is the type, that is reg_any, reg_pointer, reg_loop or reg_float.
+Freq is the number of times it occurs statically, and totyp
+is the type of the register it is planned to go into.
+.br
+Keep in mind that the gain should be net, that is the cost for
+register save/restore sequences and the cost of initialisation
+in the case of parameters should already be included.
+.IP -
+i_regsave()
+.br
+This function is called at the start of a procedure, just before
+register saves are done.
+It can be used to initialise some variables if needed.
+.IP -
+f_regsave()
+.br
+This function is called at end of the register save sequence.
+It can be used to do the real saving if multiple register move
+instructions are available.
+.IP -
+regsave(regstr,off,size) char *regstr; long off;
+.br
+Should either do the real saving or set up a table to have
+it done by f_regsave.
+Note that initialisation of parameters should also be done,
+or planned here.
+.IP -
+regreturn()
+.br
+Should restore saved registers and return.
+The function result is already in the function return area by now.
+.NH 1
+Examples
+.PP
+Here are some examples out of the PDP 11 table
+.DS
+lol inreg($1)==2| |		| regvar($1)			| |
+
+lil inreg($1)==2| |		| {regdef2, regvar($1)}		| |
+
+stl inreg($1)==2| xsource2 |
+			remove(regvar($1))
+			move(%[1],regvar($1))              |       | |
+
+inl inreg($1)==2| |     remove(regvar($1))
+			"inc %(regvar($1)%)"
+			setcc(regvar($1))          |       | |
+.NH 1
+Afterthoughts.
+.PP
+At the time of this writing the tables for the PDP 11 and the M68000 and
+the VAX are converted, in all cases the two byte wordsize versions.
+No big problems have occurred, but experience has shown that it is
+necessary to check your table carefully for all patterns with locals in them
+because if you forget one code will be generated by that one coderule
+to use the memoryslot the local is not in.
+
diff --git a/doc/toolkit.doc b/doc/toolkit.doc
new file mode 100644
index 000000000..8ff7be9ae
--- /dev/null
+++ b/doc/toolkit.doc
@@ -0,0 +1,896 @@
+.RP
+.ND
+.nr LL 78m
+.tr ~
+.ds as *
+.TL
+A Practical Tool Kit for Making Portable Compilers
+.AU
+Andrew S. Tanenbaum
+Hans van Staveren
+E. G. Keizer
+Johan W. Stevenson
+.AI
+Mathematics Dept.
+Vrije Universiteit
+Amsterdam, The Netherlands
+.AB
+The Amsterdam Compiler Kit is an integrated collection of programs designed to
+simplify the task of producing portable (cross) compilers and interpreters.
+For each language to be compiled, a program (called a front end) 
+must be written to
+translate the source program into a common intermediate code.
+This intermediate code can be optimized and then either directly interpreted
+or translated to the assembly language of the desired target machine.
+The paper describes the various pieces of the tool kit in some detail, as well
+as discussing the overall strategy.
+.sp
+Keywords: Compiler, Interpreter, Portability, Translator
+.sp
+CR Categories: 4.12, 4.13, 4.22
+.sp 12
+Author's present addresses:
+  A.S. Tanenbaum, H. van Staveren, E.G. Keizer: Mathematics
+     Dept., Vrije Universiteit, Postbus 7161, 1007 MC Amsterdam,
+     The Netherlands
+
+  J.W. Stevenson: NV Philips, S&I, T&M, Building TQ V5, Eindhoven,
+     The Netherlands
+.AE
+.NH 1
+Introduction
+.PP
+As more and more organizations acquire many micro- and minicomputers,
+the need for portable compilers is becoming more and more acute.
+The present situation, in which each hardware vendor provides its own
+compilers -- each with its own deficiencies and extensions, and none of them
+compatible -- leaves much to be desired.
+The ideal situation would be an integrated system containing a family
+of (cross) compilers, each compiler accepting a standard source language and
+producing code for a wide variety of target machines.
+Furthermore, the compilers should be compatible, so programs written in
+one language can call procedures written in another language.
+Finally, the system should be designed so as to make adding new languages
+and new machines easy.
+Such an integrated system is being built at the Vrije Universiteit.
+Its design and implementation is the subject of this article.
+.PP
+Our compiler building system, which is called the "Amsterdam Compiler Kit"
+(ACK), can be thought of as a "tool kit."
+It consists of a number of parts that can be combined to form compilers
+(and interpreters) with various properties.
+The tool kit is based on an idea (UNCOL) that was first suggested in 1960
+[7], but which never really caught on then.
+The problem which UNCOL attempts to solve is how to make a compiler for
+each of
+.I N
+languages on
+.I M
+different machines without having to write 
+.I N
+x
+.I M
+programs.
+.PP
+As shown in Fig. 1, the UNCOL approach is to write
+.I N
+"front ends," each
+of which translates one source language to a common intermediate language,
+UNCOL (UNiversal Computer Oriented Language), and
+.I M
+"back ends," each
+of which translates programs in UNCOL to a specific machine language.
+Under these conditions, only
+.I N
++
+.I M
+programs must be written to provide all
+.I N
+languages on all
+.I M
+machines, instead of 
+.I N
+x
+.I M
+programs.
+.PP
+Various researchers have attempted to design a suitable UNCOL
+[2,8], but none of these have become popular.
+It is our belief that previous attempts have failed because they have been
+too ambitious, that is, they have tried to cover all languages
+and all machines using a single UNCOL.
+Our approach is more modest: we cater only to algebraic languages
+and machines whose memory consists of 8-bit bytes, each with its own address.
+Typical languages that could be handled include
+Ada, ALGOL 60, ALGOL 68, BASIC, C, FORTRAN,
+Modula, Pascal, PL/I, PL/M, PLAIN, and RATFOR,
+whereas COBOL, LISP, and SNOBOL would be less efficient.
+Examples of machines that could be included are the Intel 8080 and 8086,
+Motorola 6800, 6809, and 68000, Zilog Z80 and Z8000, DEC PDP-11 and VAX,
+and IBM 370 but not the Burroughs 6700, CDC Cyber, or Univac 1108 (because
+they are not byte-oriented).
+With these restrictions, we believe the old UNCOL idea can be used as the
+basis of a practical compiler-building system.
+.KF
+.sp 15P
+.ce 1
+Fig. 1.  The UNCOL model.
+.sp
+.KE
+.NH 1
+An Overview of the Amsterdam Compiler Kit
+.PP
+The tool kit consists of eight components:
+.sp
+  1. The preprocessor.
+  2. The front ends.
+  3. The peephole optimizer.
+  4. The global optimizer.
+  5. The back end.
+  6. The target machine optimizer.
+  7. The universal assembler/linker.
+  8. The utility package.
+.sp
+.PP
+A fully optimizing compiler,
+depicted in Fig. 2, has seven cascaded phases.
+Conceptually, each component reads an input file and writes a
+transformed output file to be used as input to the next component.
+In practice, some components may use temporary files to allow multiple
+passes over the input or internal intermediate files.
+.KF
+.sp 12P
+.ce 1
+Fig. 2.  Structure of the Amsterdam Compiler Kit.
+.sp
+.KE
+.PP
+In the following paragraphs we will briefly describe each component.
+After this overview, we will look at all of them again in more detail.
+A program to be compiled is first fed into the (language independent)
+preprocessor, which provides a simple macro facility,
+and similar textual facilties.
+The preprocessor's output is a legal program in one of the programming
+languages supported, whereas the input is a program possibly augmented
+with macros, etc.
+.PP
+This output goes into the appropriate front end, whose job it is to
+produce intermediate code.
+This intermediate code (our UNCOL) is the machine language for a simple
+stack machine called EM (Encoding Machine).
+A typical front end might build a parse tree from the input, and then
+use the parse tree to generate EM code, which is similar to reverse Polish.
+In order to perform this work, the front end has to maintain tables of
+declared variables, labels, etc., determine where to place the
+data structures in memory, and so on.
+.PP
+The EM code generated by the front end is fed into the peephole optimizer,
+which scans it with a window of a few instructions, replacing certain
+inefficient code sequences by better ones.
+Such a search is important because EM contains instructions to handle
+numerous important special cases efficiently
+(e.g., incrementing a variable by 1).
+It is our strategy to relieve the front ends of the burden of hunting for
+special cases because there are many front ends and only one peephole
+optimizer.
+By handling the special cases in the peephole optimizer, 
+the front ends become simpler, easier to write and easier to maintain.
+.PP
+Following the peephole optimizer is a global optimizer [5], which
+unlike the peephole optimizer, examines the program as a whole.
+It builds a data flow graph to make possible a variety of 
+global optimizations,
+among them, moving invariant code out of loops, avoiding redundant
+computations, live/dead analysis and eliminating tail recursion.
+Note that the output of the global optimizer is still EM code.
+.PP
+Next comes the back end, which differs from the front ends in a
+fundamental way.
+Each front end is a separate program, whereas the back end is a single
+program that is driven by a machine dependent driving table.
+The driving table for a specific machine tells how the EM code is mapped
+onto the machine's assembly language.
+Although a simple driving table might just macro expand each EM instruction
+into a sequence of target machine instructions, a much more sophisticated
+translation strategy is normally used, as described later.
+For speed, the back end does not actually read in the driving table at run time.
+Instead, the tables are compiled along with the back end in advance, resulting
+in one binary program per machine.
+.PP
+The output of the back end is a program in the assembly language of some
+particular machine.
+The next component in the pipeline reads this program and performs peephole
+optimization on it.
+The optimizations performed here involve idiosyncracies
+of the target machine that cannot be performed in the machine-independent
+EM-to-EM peephole optimizer.
+Typically these optimizations take advantage of special instructions or special
+addressing modes.
+.PP
+The optimized target machine assembly code then goes into the final
+component in the pipeline, the universal assembler/linker.
+This program assembles the input to object format, extracting routines from
+libraries and including them as needed.
+.PP
+The final component of the tool kit is the utility package, which contains
+various test programs, interpreters for EM code, 
+EM libraries, conversion programs, and other aids for the implementer and
+user.
+.NH 1
+The Preprocessor
+.PP
+The function of the preprocessor is to extend all the programming languages
+by adding certain generally useful facilities to them in a uniform way.
+One of these is a simple macro system, in which the user can give names to
+character strings.
+The names can be used in the program, with the knowledge that they will be
+macro expanded prior to being input to the front end.
+Macros can be used for named constants, expanding short "procedures"
+in line, etc.
+.PP
+Another useful facility provided by the preprocessor is the ability to
+include compile-time libraries.
+On large projects, it is common to have all the declarations and definitions
+gathered together in a few files that are textually included in the programs
+by instructing the preprocessor to read them in, thus fooling the front end
+into thinking that they were part of the source program.
+.PP
+A third feature of the preprocessor is conditional compilation.
+The input program can be split up into labeled sections.
+By setting flags, some of the sections can be deleted by the preprocessor,
+thus allowing a family of slightly different programs to be conveniently stored
+on a single file.
+.NH 1
+The Front Ends
+.PP
+A front end is a program that converts input in some source language to a
+program in EM.
+At present, front ends 
+exist or are in preparation for Pascal, C, and Plain, and are being considered
+for Ada, ALGOL 68, FORTRAN 77, and Modula 2.
+Each of the present front ends is independent of all the other ones,
+although a general-purpose, table-driven front end is conceivable, provided
+one can devise a way to express the semantics of the source language in the
+driving tables.
+The Pascal front end uses a top-down parsing algorithm (recursive descent),
+whereas the C and Plain front ends are bottom-up.
+.PP
+All front ends, independent of the language being compiled,
+produce a common intermediate code called EM, which is
+the assembly language for a simple stack machine.
+The EM machine is based on a memory architecture
+containing a stack for local variables, a (static) data area for variables
+declared in the outermost block and global to the whole program, and a heap
+for dynamic data structures.
+In some ways EM resembles P-code [6], but is more general, since it is
+intended for a wider class of languages than just Pascal.
+.PP
+The EM instruction set has been described elsewhere
+[9,10,11]
+so we will only briefly summarize it here.
+Instructions exist to:
+.sp
+  1. Load a variable or constant of some length onto the stack.
+  2. Store the top item on the stack in memory.
+  3. Add, subtract, multiply, divide, etc. the top two stack items.
+  4. Examine the top one or two stack items and branch conditionally.
+  5. Call procedures and return from them.
+.sp
+.PP
+Loads and stores come in several variations, corresponding to the most common
+programming language semantics, for example, constants, simple variables,
+fields of a record, elements of an array, and so on.
+Distinctions are also made between variables local to the current block
+(i.e., stack frame), those in the outermost block (static storage), and those
+at intermediate lexicographic levels, which are accessed by following the
+static chain at run time.
+.PP
+All arithmetic instructions have a type (integer, unsigned, real,
+pointer, or set) and an
+operand length, which may either be explicit or may be popped from the stack
+at run time.
+Monadic branch instructions pop an item from the stack and branch if it is
+less than zero, less than or equal to zero, etc.
+Dyadic branch instructions pop two items, compare them, and branch accordingly.
+.PP
+In addition to these basic EM instructions, there is a collection of special
+purpose instructions (e.g., to increment a local variable), which are typically
+produced from the simple ones by the peephole optimizer.
+Although the complete EM instruction set contains nearly 150 instructions,
+only about 60 of them are really primitive; the rest are simply abbreviations
+for commonly occurring EM instruction sequences.
+.PP
+Of particular interest is the way object sizes are parametrized.
+The front ends allow the user to indicate how many bytes an integer, real, etc.
+should occupy.
+Given this information, the front ends can allocate memory, determining 
+the placement of variables within the stack frame.
+Sizes for primitive types are restricted to 8, 16, 32, 64, etc. bits.
+The front ends are also parametrized by the target machine's word length
+and address size so they can tell, for example, how many "load" instructions
+to generate to move a 32-bit integer.
+In the examples used henceforth,
+we will assume a 16-bit word size and 16-bit integers.
+.PP
+Since only byte-addressable target machines are permitted,
+it is nearly
+always possible to implement any requested sizes on any target machine.
+For example, the designer of the back end tables for the Z80 should provide
+code for 8-, 16-, and 32-bit arithmetic.
+In our view, the Pascal, C, or Plain programmer specifies what lengths 
+are needed,
+without reference to the target machine,
+and the back end provides it.
+This approach greatly enhances portability.
+While it is true that doing all arithmetic using 32-bit integers on the Z80
+will not be terribly fast, we feel that if that is what the programmer needs,
+it should be possible to implement it.
+.PP
+Like all assembly languages, EM has not only machine instructions, but also
+pseudoinstructions.
+These are used to indicate the start and end of each procedure, allocate
+and initialize storage for data, and similar functions.
+One particularly important pseudoinstruction is the one that is used to
+transmit information to the back end for optimization purposes.
+It can be used to suggest variables that are good candidates to assign to
+registers, delimit the scope of loops, indicate that certain variables 
+contain a useful value (next operation is a load) or not (next operation is
+a store), and various other things.
+.NH 1
+The Peephole Optimizer
+.PP
+The peephole optimizer reads in unoptimized EM programs and writes out
+optimized ones.
+Both the input and output are expressed in a highly compact code, rather than
+in ASCII, to reduce the i/o time, which would otherwise dominate the CPU
+time.
+The program itself is table driven, and is, by and large, ignorant of the
+semantics of EM.
+The knowledge of EM is contained in a
+language- and machine-independent table consisting of about 400
+pattern-replacement pairs.
+We will briefly describe the kinds of optimizations it performs below;
+a more complete discussion can be found in [9].
+.PP
+Each line in the driving table describes one optimization, consisting of a
+pattern part and a replacement part.
+The pattern part is a series of one or more EM instructions and a boolean
+expression.
+The replacement part is a series of EM instructions with operands.
+A typical optimization might be:
+.sp
+  LOL  LOC  ADI  STL  ($1 = $4) and ($2 = 1) and ($3 = 2) ==> INL $1
+.sp
+where the text prior to the ==> symbol is the pattern and the text after it is
+the replacement.
+LOL loads a local variable onto the stack, LOC loads a constant onto the stack,
+ADI is integer addition, and STL is store local.
+The pattern specifies that four consecutive EM instructions are present, with
+the indicated opcodes, and that furthermore the operand of the first 
+instruction (denoted by $1) and the fourth instruction (denoted by $4) are the
+same, the constant pushed by LOC is 1, and the size of the integers added by
+ADI is 2 bytes.
+(EM instructions have at most one operand, so it is not necessary to specify
+the operand number.)
+Under these conditions, the four instructions can be replaced by a single INL
+(increment local) instruction whose operand is equal to that of LOL.
+.PP
+Although the optimizations cover a wide range, the main ones
+can be roughly divided into the following categories.
+\fIConstant folding\fR
+is used to evaluate constant expressions, such as 2*3~+~7 at
+compile time instead of run time.
+\fIStrength reduction\fR
+is used to replace one operation, such as multiply, by
+another, such as shift.
+\fIReordering of expressions\fR
+helps in cases like -K/5, which can be better
+evaluated as K/-5, because the former requires
+a division and a negation, whereas the latter requires only a division.
+\fINull instructions\fR
+include resetting the stack pointer after a call with 0 parameters,
+offsetting zero bytes to access the
+first element of a record, or jumping to the next instruction.
+\fISpecial instructions\fR
+are those like INL, which deal with common special cases
+such as adding one to a variable or comparing something to zero.
+\fIGroup moves\fR
+are useful because a sequence
+of consecutive moves can often be replaced with EM code
+that allows the back end to generate a loop instead of in line code.
+\fIDead code elimination\fR
+is a technique for removing unreachable statements, possibly made unreachable
+by previous optimizations.
+\fIBranch chain compression\fR
+can be applied when a branch instruction jumps to another branch instruction.
+The first branch can jump directly to the final destination instead of
+indirectly.
+.PP
+The last two optimizations logically belong in the global optimizer but are
+in the local optimizer for historical reasons (meaning that the local
+optimizer has been the only optimizer for many years and the optimizations were
+easy to do there).
+.NH 1
+The Global Optimizer
+.PP
+In contrast to the peephole optimizer, which examines the EM code a few lines
+at a time through a small window, the global optimizer examines the 
+program's large scale structure.
+Three distinct types of optimizations can be found here:
+.sp
+  1. Interprocedural optimizations.
+  2. Intraprocedural optimizations.
+  3. Basic block optimizations.
+.sp
+We will now look at each of these in turn.
+.PP
+Interprocedural optimizations are those spanning procedure boundaries.
+The most important one is deciding to expand procedures in line,
+especially short procedures that occur in loops and pass several parameters.
+If it takes more time or memory to pass the parameters than to do the work,
+the program can be improved by eliminating the procedure.
+The inverse optimization -- discovering long common code sequences and
+turning them into a procedure -- is also possible, but much more difficult.
+Like much of the global optimizer's work, the decision to make or not make
+a certain program transformation is a heuristic one, based on knowledge of
+how the back end works, how most target machines are organized, etc.
+.PP
+The heart of the global optimizer is its analysis of individual
+procedures.
+To perform this analysis, the optimizer must locate the basic blocks,
+instruction sequences which can be entered only at the top and exited
+only at the bottom.
+It then constructs a data flow graph, with the basic blocks as nodes and
+jumps between blocks as arcs.
+.PP
+From the data flow graph, many important properties of the program can be
+discovered and exploited.
+Chief among these is the presence of loops, indicated by cycles in the graph.
+One important optimization is looking for code that can be moved outside the
+loop, either prior to it or subsequent to it.
+Such code motion saves execution time, although it does not save memory.
+Unrolling loops is also possible and desirable in some cases.
+.PP
+Another area in which global analysis of loops is especially important is
+in register allocation. 
+While it is true that EM does not have any registers to allocate,
+the optimizer can easily collect information to allow the
+back end to allocate registers wisely.
+For example, the global optimizer can collect static frequency-of-use
+and live/dead information about variables.
+(A variable is dead at some point in the program if its current value is
+not needed, i.e., the next reference to it overwrites it rather than
+reading it; if the current value will eventually be used, the variable is
+live.)
+If two variables are never simultaneously live over some interval of code
+(e.g., the body of a loop), they can be packed into a single variable,
+which, if used often enough, may warrant being assigned to a register.
+.PP
+Many loops involve arrays: this leads to other optimizations.
+If an array is accessed sequentially, with each iteration using the next
+higher numbered element, code improvement is often possible.
+Typically, a pointer to the bottom element of each array can be set up
+prior to the loop.
+Within the loop the element is accessed indirectly via the pointer, which is
+also incremented by the element size on each iteration.
+If the target machine has an autoincrement addressing mode and the pointer
+is assigned to a register, an array access can often be done in a single
+instruction.
+.PP
+Other intraprocedural optimizations include removing tail recursion
+(last statement is a recursive call to the procedure itself),
+topologically sorting the basic blocks to minimize the number of branch
+instructions, and common subexpression recognition.
+.PP
+The third general class of optimizations done by the global optimizer is
+improving the structure of a basic block.
+For the most part these involve transforming arithmetic or boolean
+expressions into forms that are likely to result in better target code.
+As a simple example, A~+~B*C can be converted to B*C~+~A.
+The latter can often
+be handled by loading B into a register, multiplying the register by C, and
+then adding in A, whereas the former may involve first putting A into a
+temporary, depending on the details of the code generation table.
+Another example of this kind of basic block optimization is transforming
+-B~+~A~<~0 into the equivalent, but simpler, A~<~B.
+.NH 1
+The Back End
+.PP
+The back end reads a stream of EM instructions and generates assembly code
+for the target machine.
+Although the algorithm itself is machine independent, for each target
+machine a machine dependent driving table must be supplied.
+The driving table effectively defines the mapping of EM code to target code.
+.PP
+It will be convenient to think of the EM instructions being read as a
+stream of tokens.
+For didactic purposes, we will concentrate on two kinds of tokens:
+those that load something onto the stack, and those that perform some operation
+on the top one or two values on the stack.
+The back end maintains at compile time a simulated stack whose behavior
+mirrors what the stack of a hardware EM machine would do at run time.
+If the current input token is a load instruction, a new entry is pushed onto
+the simulated stack.
+.PP
+Consider, as an example, the EM code produced for the statement K~:=~I~+~7.
+If K and I are
+2-byte local variables, it will normally be LOL I; LOC 7; ADI~2; STL K.
+Initially the simulated stack is empty.
+After the first token has been read and processed, the simulated stack will
+contain a stack token of type MEM with attributes telling that it is a local,
+giving its address, etc.
+After the second token has been read and processed, the top two tokens on the
+simulated stack will be CON (constant) on top and MEM directly underneath it.
+.PP
+At this point the back end reads the ADI~2 token and
+looks in the driving table to find a line or lines that define the
+action to be taken for ADI~2.
+For a typical multiregister machine, instructions will exist to add constants
+to registers, but not to memory.
+Consequently, the driving table will not contain an entry for ADI~2 with stack
+configuration CON, MEM.
+.PP
+The back end is now faced with the problem of how to get from its
+current stack configuration, CON, MEM, which is not listed, to one that is
+listed.
+The table will normally contain rules (which we call "coercions")
+for converting between CON, REG, MEM, and similar tokens.
+Therefore the back end attempts to "coerce" the stack into a configuration
+that
+.I is
+present in the table.
+A typical coercion rule might tell how to convert a MEM into
+a REG, namely by performing the actions of allocating a
+register and emitting code to move the memory word to that register.
+Having transformed the compile-time stack into a configuration allowed for
+ADI~2, the rule can be carried out.
+A typical rule 
+for ADI~2 might have stack configuration REG, MEM
+and would emit code to add the MEM to the REG, leaving the stack
+with a single REG token instead of the REG and MEM tokens present before the
+ADI~2.
+.PP
+In general, there will be more than one possible coercion path.
+Assuming reasonable coercion rules for our example,
+we might be able to convert
+CON MEM into CON REG by loading the variable I into a register.
+Alternatively, we could coerce CON to REG by loading the constant into a register.
+The first coercion path does the add by first loading I into a register and
+then adding 7 to it.
+The second path first loads 7 into a register and then adds I to it.
+On machines with a fast LOAD IMMEDIATE instruction for small constants
+but no fast ADD IMMEDIATE, or vice
+versa, one code sequence will be preferable to the other.
+.PP
+In fact, we actually have more choices than suggested above.
+In both coercion paths a register must be allocated.
+On many machines, not every register can be used in every operation, so the
+choice may be important.
+On some machines, for example, the operand of a multiply must be in an odd
+register.
+To summarize, from any state (i.e., token and stack configuration), a
+variety of choices can be made, leading to a variety of different target
+code sequences.
+.PP
+To decide which of the various code sequences to emit, the back end must have
+some information about the time and memory cost of each one.
+To provide this information, each rule in the driving table, including
+coercions, specifies both the time and memory cost of the code emitted when
+the rule is applied.
+The back end can then simply try each of the legal possibilities (including all
+the possible register allocations) to find the cheapest one.
+.PP
+This situation is similar to that found in a chess or other game-playing
+program, in which from any state a finite number of moves can be made.
+Just as in a chess program, the back end can look at all the "moves" that can
+be made from each state reachable from the original state, and thus find the
+sequence that gives the minimum cost to a depth of one.
+More generally, the back end can evaluate all paths corresponding to accepting
+the next
+.I N
+input tokens, find the cheapest one, and then make the first move along
+that path, precisely the way a chess program would.
+.PP
+Since the back end is analogous to both a parser and a chess playing program,
+some clarifying remarks may be helpful.
+First, chess programs and the back end must do some look ahead, whereas the
+parser for a well-designed grammar can usually suffice with one input token
+because grammars are supposed to be unambiguous.
+In contrast, many legal mappings
+from a sequence of EM instructions to target code may exist.
+Second, like a parser but unlike a chess program, the back end has perfect
+information -- it does not have to contend with an unpredictable opponent's
+moves.
+Third, chess programs normally make a static evaluation of the board and
+label the
+.I nodes
+of the tree with the resulting scores.
+The back end, in contrast, associates costs with
+.I arcs
+(moves) rather than nodes (states).
+However, the difference is not essential, since it could 
+also label each node with the cumulative cost from the root to that node.
+.PP
+As mentioned above, the cost field in the table contains
+.I both
+the time and memory costs for the code emitted.
+It should be clear that the back end could use either one
+or some linear combination of them as the scoring function for evaluating moves.
+A user can instruct the compiler to optimize for time or for memory or
+for, say,  0.3 x time + 0.7 x memory.
+Thus the same compiler can provide a wide range of performance options to
+the user.
+The writer of the back end table can take advantage of this flexibility by
+providing several code sequences with different tradeoffs for each EM
+instruction (e.g., in line code vs. call to a run time routine).
+.PP
+In addition to the time-space tradeoffs, by specifying the depth of search
+parameter,
+.I N ,
+the user can effectively also tradeoff compile time vs. object
+code quality, for whatever code metric has been chosen.
+In summary, by combining the properties of a parser and a game playing program,
+it is possible to make a code generator that is table driven,
+highly flexible, and has the ability to produce good code from a
+stack machine intermediate code.
+.NH 1
+The Target Machine Optimizer
+.PP
+In the model of Fig 2., the peephole optimizer comes before the global
+optimizer.
+It may happen that the code produced by the global optimizer can also
+be improved by another round of peephole optimization.
+Conceivably, the system could have been designed to iterate peephole and
+global optimizations until no more of either could be performed.
+.PP
+However, both of these optimizations are done on the machine independent
+EM code.
+Neither is able to take advantage of the peculiarities and idiosyncracies with
+which most target machines are well endowed.
+It is the function of the final 
+optimizer to do any (peephole) optimizations that still remain.
+.PP
+The algorithm used here is the same as in the EM peephole optimizer.
+In fact, if it were not for the differences between EM syntax, which is
+very restricted, and target assembly language syntax,
+which is less so, precisely the same program could be used for both.
+Nevertheless, the same ideas apply concerning patterns and replacements, so
+our discussion of this optimizer will be restricted to one example.
+.PP
+To see what the target optimizer might do, consider the
+PDP-11 instruction sequence sub #2,r0;  mov (r0),x.
+First 2 is subtracted from register 0, then the word pointed to by it
+is moved to x.
+The PDP-11 happens to have an addressing mode to perform this sequence in
+one instruction: mov -(r0),x.
+Although it is conceivable that this instruction could be included in the
+back end driving table for the PDP-11, it is awkward to do so because it
+can occur in so many contexts.
+It is much easier to catch things like this in a separate program.
+.NH 1
+The Universal Assembler/Linker
+.PP
+Although assembly languages for different machines may appear very different
+at first glance, they have a surprisingly large intersection.
+We have been able to construct an assembler/linker that is almost entirely
+independent of the assembly language being processed.
+To tailor the program to a specific assembly language, it is necessary to
+supply a table giving the list of instructions, the bit patterns required for
+each one, and the language syntax.
+The machine independent part of the assembler/linker is then compiled with the
+table to produce an assembler and linker for a particular target machine.
+Experience has shown that writing the necessary table for a new machine can be
+done in less than a week.
+.PP
+To enforce a modicum of uniformity, we have chosen to use a common set of
+pseudoinstructions for all target machines.
+They are used to initialize memory, allocate uninitialized memory, determine the
+current segment, and similar functions found in most assemblers.
+.PP
+The assembler is also a linker.
+After assembling a program, it checks to see if there are any
+unsatisfied external references.
+If so, it begins reading the libraries to find the necessary routines, including
+them in the object file as it finds them.
+This approach requires libraries to be maintained in assembly language form,
+but eliminates the need for inventing a language to express relocatable
+object programs in a machine independent way.
+It also simplifies the assembler, since producing absolute object code is
+easier than producing relocatable object code.
+Finally, although assembly language libraries may be somewhat larger than
+relocatable object module libraries, the loss in speed due to having more
+input may be more than compensated for by not having to pass an intermediate
+file between the assembler and linker.
+.NH 1
+The Utility Package
+.PP
+The utility package is a collection of programs designed to aid the
+implementers of new front ends or new back ends.
+The most useful ones are the test programs.
+For example, one test set, EMTEST, systematically checks out a back end by
+executing an ever larger subset of the EM instructions.
+It starts out by testing LOC, LOL and a few of the other essential instructions.
+If these appear to work, it then tries out new instructions one at a time,
+adding them to the set of instructions "known" to work as they pass the tests.
+.PP
+Each instruction is tested with a variety of operands chosen from values 
+where problems can be expected.
+For example, on target machines which have 16-bit index registers but only
+allow 8-bit displacements, a fundamentally different algorithm may be needed
+for accessing
+the first few bytes of local variables and those with offsets of thousands.
+The test programs have been carefully designed to thoroughly test all relevant
+cases.
+.PP
+In addition to EMTEST, test programs in Pascal, C, and other languages are also
+available.
+A typical test is:
+.sp
+   i := 9; \fBif\fP i + 250 <> 259 \fBthen\fP error(16);
+.sp
+Like EMTEST, the other test programs systematically exercise all features of the
+language being tested, and do so in a way that makes it possible to pinpoint
+errors precisely.
+While it has been said that testing can only demonstrate the presence of errors
+and not their absence, our experience is that 
+the test programs have been invaluable in debugging new parts of the system
+quickly.
+.PP
+Other utilities include programs to convert
+the highly compact EM code produced by front ends to ASCII and vice versa,
+programs to build various internal tables from human writable input formats,
+a variety of libraries written in or compiled to EM to make them portable,
+an EM assembler, and EM interpreters for various machines.
+.PP
+Interpreting the EM code instead of translating it to target machine language
+is useful for several reasons.
+First, the interpreters provide extensive run time diagnostics including
+an option to list the original source program (in Pascal, C, etc.) with the
+execution frequency or execution time for each source line printed in the
+left margin.
+Second, since an EM program is typically about one-third the size of a
+compiled program, large programs can be executed on small machines.
+Third, running the EM code directly makes it easier to pinpoint errors in 
+the EM output of front ends still being debugged.
+.NH 1
+Summary and Conclusions
+.PP
+The Amsterdam Compiler Kit is a tool kit for building
+portable (cross) compilers and interpreters.
+The main pieces of the kit are the front ends, which convert source programs
+to EM code, optimizers, which improve the EM code, and back ends, which convert
+the EM code to target assembly language.
+The kit is highly modular, so writing one front end
+(and its associated runtime routines)
+is sufficient to implement
+a new language on a dozen or more machines, and writing one back end table
+and one universal assembler/linker table is all that is needed to bring up all
+the previously implemented languages on a new machine.
+In this manner, the contents, and hopefully the usefulness, of the toolkit
+will increase in time.
+.PP
+We believe the principal lesson to be learned from our work is that the old
+UNCOL idea is basically a sound way to produce compilers, provided suitable
+restrictions are placed on the source languages and target machines.
+We also believe that although compilers produced by this technology may not
+be equal to the very best handcrafted compilers,
+in terms of object code quality, they are certainly
+competitive with many existing compilers.
+However, when one factors in the cost of producing the compiler,
+the possible slight loss in performance may be more than compensated for by the
+large decrease in production cost.
+As a consequence of our work and similar work by other researchers [1,3,4],
+we expect integrated compiler building kits to become increasingly popular
+in the near future.
+.PP
+The toolkit is now available for various computers running the
+.UX
+operating system.
+For information, contact the authors.
+.NH 1
+References
+.LP
+.nr r 0 1
+.in +4
+.ti -4
+\fB~\n+r.\fR Graham, S.L.
+Table-Driven Code Generation.
+.I "Computer~13" ,
+8 (August 1980), 25-34.
+.PP
+A discussion of systematic ways to do code generation,
+in particular, the idea of having a table with templates that match parts of
+the parse tree and convert them into machine instructions.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Haddon, B.K., and Waite, W.M.
+Experience with the Universal Intermediate Language Janus.
+.I "Software Practice & Experience~8" ,
+5 (Sept.-Oct. 1978), 601-616.
+.PP
+An intermediate language for use with ALGOL 68, Pascal, etc. is described.
+The paper discusses some problems encountered and how they were dealt with.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Johnson, S.C.
+A Portable Compiler: Theory and Practice.
+.I "Ann. ACM Symp. Prin. Prog. Lang." ,
+Jan. 1978.
+.PP
+A cogent discussion of the portable C compiler.
+Particularly interesting are the author's thoughts on the value of
+computer science theory.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Leverett, B.W., Cattell, R.G.G, Hobbs, S.O., Newcomer, J.M.,
+Reiner, A.H., Schatz, B.R., and Wulf, W.A.
+An Overview of the Production-Quality Compiler-Compiler Project.
+.I Computer~13 ,
+8 (August 1980), 38-49.
+.PP
+PQCC is a system for building compilers similar in concept but differing in
+details from the Amsterdam Compiler Kit.
+The paper describes the intermediate representation used and the code generation
+strategy.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Lowry, E.S., and Medlock, C.W.
+Object Code Optimization.
+.I "Commun.~ACM~12",
+(Jan. 1969), 13-22.
+.PP
+A classic paper on global object code optimization.
+It covers data flow analysis, common subexpressions, code motion, register
+allocation and other techniques.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Nori, K.V., Ammann, U., Jensen, K., Nageli, H.
+The Pascal P Compiler Implementation Notes.
+Eidgen. Tech. Hochschule, Zurich, 1975.
+.PP
+A description of the original P-code machine, used to transport the Pascal-P
+compiler to new computers.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Steel, T.B., Jr. UNCOL: the Myth and the Fact. in
+.I "Ann. Rev. Auto. Prog."
+Goodman, R. (ed.), vol 2., (1960), 325-344.
+.PP
+An introduction to the UNCOL idea by its originator.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Steel, T.B., Jr.
+A First Version of UNCOL.
+.I "Proc. Western Joint Comp. Conf." ,
+(1961), 371-377.
+.PP
+The first detailed proposal for an UNCOL.  By current standards it is a
+primitive language, but it is interesting for its historical perspective.
+.sp 2
+.ti -4
+\fB~\n+r.\fR Tanenbaum, A.S., van Staveren, H., and Stevenson, J.W.
+Using Peephole Optimization on Intermediate Code.
+.I "ACM Trans. Prog. Lang. and Sys. 3" ,
+1 (Jan. 1982) pp. 21-36.
+.PP
+A detailed description of a table-driven peephole optimizer.
+The driving table provides a list of patterns to match as well as the
+replacement text to use for each successful match.
+.sp 2
+.ti -4
+\fB\n+r.\fR Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren, H.
+Description of an Experimental Machine Architecture for use with Block
+Structured Languages.
+Informatica Rapport 81, Vrije Universiteit, Amsterdam, 1983.
+.PP
+The defining document for EM.
+.sp 2
+.ti -4
+\fB\n+r.\fR Tanenbaum, A.S.
+Implications of Structured Programming for Machine Architecture.
+.I "Comm. ACM~21" ,
+3 (March 1978), 237-246.
+.PP
+The background and motivation for the design of EM.
+This early version emphasized the idea of interpreting the intermediate
+code (then called EM-1) rather than compiling it.
diff --git a/doc/v7bugs.doc b/doc/v7bugs.doc
new file mode 100644
index 000000000..ef8255f2c
--- /dev/null
+++ b/doc/v7bugs.doc
@@ -0,0 +1,302 @@
+.wh 0 hd
+.wh 60 fo
+.de hd
+'sp 5
+..
+.de fo
+'bp
+..
+.nr e 0 1
+.de ER
+.br
+.ne 20
+.sp 2
+.in 5
+.ti -5
+ERROR \\n+e:
+..
+.de PS
+.sp
+.nf
+.in +5
+..
+.de PE
+.sp
+.fi
+.in -5
+..
+.sp 3
+.ce
+UNIX version 7 bugs
+.sp 3
+This document describes the UNIX version 7 errors fixed at the
+Vrije Universiteit, Amsterdam.
+Several of these are discovered at the VU.
+Others are quoted from a list of bugs distributed by BellLabs.
+.sp
+For each error the differences between the original and modified
+source files are given,
+as well as a test program.
+.ER
+C optimizer bug for unsigned comparison
+.sp
+The following C program caused an IOT trap, while it should not
+(compile with 'cc -O prog.c'):
+.PS
+unsigned	i = 0;
+
+main() {
+	register j;
+
+	j = -1;
+	if (i > 40000)
+		abort();
+}
+.PE
+BellLabs suggests to make the following patch in c21.c:
+.PS
+/* modified /usr/src/cmd/c/c21.c */
+
+189		if (r==0) {
+190	/* next 2 lines replaced as indicated by
+191	 * Bell Labs bug distribution ( v7optbug )
+192			p->back->back->forw = p->forw;
+193			p->forw->back = p->back->back;
+194	  End of lines changed */
+195			if (p->forw->op==CBR
+196			  || p->forw->op==SXT
+197			  || p->forw->op==CFCC) {
+198				p->back->forw = p->forw;
+199				p->forw->back = p->back;
+200			} else {
+201				p->back->back->forw = p->forw;
+202				p->forw->back = p->back->back;
+203			}
+204	/* End of new lines */
+205			decref(p->ref);
+206			p = p->back->back;
+207			nchange++;
+208		} else if (r>0) {
+.PE
+Use the previous program to test before and after the modification.
+.ER
+The loader fails for large data or text portions
+.sp
+The loader 'ld' produces a "local symbol botch" error
+for the following C program.
+.PS
+int	big1[10000] = {
+	1
+};
+int	big2[10000] = {
+	2
+};
+
+main() {
+	printf("loader is fine\\n");
+}
+.PE
+We have made the following fix:
+.PS
+/* original /usr/src/cmd/ld.c */
+
+113	struct {
+114		int	fmagic;
+115		int	tsize;
+116		int	dsize;
+117		int	bsize;
+118		int	ssize;
+119		int	entry;
+120		int	pad;
+121		int	relflg;
+122	} filhdr;
+
+/* modified /usr/src/cmd/ld.c */
+
+113	/*
+114	 * The original Version 7 loader had problems loading large
+115	 * text or data portions.
+116	 * Why not include <a.out.h> ???
+117	 * then they would be declared unsigned
+118	 */
+119	struct {
+120		int	fmagic;
+121		unsigned	tsize;		/* not int !!! */
+122		unsigned	dsize;		/* not int !!! */
+123		unsigned	bsize;		/* not int !!! */
+124		unsigned	ssize;		/* not int !!! */
+125		unsigned	entry;		/* not int !!! */
+126		unsigned	pad;		/* not int !!! */
+127		unsigned	relflg;		/* not int !!! */
+128	} filhdr;
+.PE
+.ER
+Floating point registers
+.sp
+When a program is swapped to disk if it needs more memory,
+then the floating point registers were not saved, so that
+it may have different registers when it is restarted.
+A small assembly program demonstrates this for the status register.
+If the error is not fixed, then the program generates an IOT error.
+A "memory fault" is generated if all is fine.
+.PS
+start:	ldfps	$7400
+1:	stfps	r0
+	mov	r0,-(sp)
+	cmp	r0,$7400
+	beq	1b
+	4
+.PE
+You have to dig into the kernel to fix it.
+The following patch will do:
+.PS
+/* original /usr/sys/sys/slp.c */
+
+563		a2 = malloc(coremap, newsize);
+564		if(a2 == NULL) {
+565			xswap(p, 1, n);
+566			p->p_flag |= SSWAP;
+567			qswtch();
+568			/* no return */
+569		}
+
+/* modified /usr/sys/sys/slp.c */
+
+590		a2 = malloc(coremap, newsize);
+591		if(a2 == NULL) {
+592	#ifdef FPBUG
+593			/*
+594			 * copy floating point register and status,
+595			 * but only if you must switch processes
+596			 */
+597			if(u.u_fpsaved == 0) {
+598				savfp(&u.u_fps);
+599				u.u_fpsaved = 1;
+600			}
+601	#endif
+602			xswap(p, 1, n);
+603			p->p_flag |= SSWAP;
+604			qswtch();
+605			/* no return */
+606		}
+.PE
+.ER
+Floating point registers.
+.sp
+A similar problem arises when a process forks.
+The child will have random floating point registers as is
+demonstrated by the following assembly language program.
+The child process will die by an IOT trap and the father prints
+the message "child failed".
+.PS
+exit	= 1.
+fork	= 2.
+write	= 4.
+wait	= 7.
+
+start:	ldfps	$7400
+	sys	fork
+	br	child
+	sys	wait
+	tst	r1
+	bne	bad
+	stfps	r2
+	cmp	r2,$7400
+	beq	start
+	4
+child:	stfps	r2
+	cmp	r2,$7400
+	beq	ex
+	4
+bad:	clr	r0
+	sys	write;mess;13.
+ex:	clr	r0
+	sys	exit
+
+	.data
+mess:	<child failed\\n>
+.PE
+The same file slp.c should be patched as follows:
+.PS
+/* original /usr/sys/sys/slp.c */
+
+499		/*
+500		 * When the resume is executed for the new process,
+501		 * here's where it will resume.
+502		 */
+503		if (save(u.u_ssav)) {
+504			sureg();
+505			return(1);
+506		}
+507		a2 = malloc(coremap, n);
+508		/*
+509		 * If there is not enough core for the
+510		 * new process, swap out the current process to generate the
+511		 * copy.
+512		 */
+
+/* modified /usr/sys/sys/slp.c */
+
+519		/*
+520		 * When the resume is executed for the new process,
+521		 * here's where it will resume.
+522		 */
+523		if (save(u.u_ssav)) {
+524			sureg();
+525			return(1);
+526		}
+527	#ifdef FPBUG
+528		/* copy the floating point registers and status to child */
+529		if(u.u_fpsaved == 0) {
+530			savfp(&u.u_fps);
+531			u.u_fpsaved = 1;
+532		}
+533	#endif
+534		a2 = malloc(coremap, n);
+535		/*
+536		 * If there is not enough core for the
+537		 * new process, swap out the current process to generate the
+538		 * copy.
+539		 */
+.PE
+.ER
+/usr/src/libc/v6/stat.c
+.sp
+Some system calls are changed from version 6 to version 7.
+A library of system call entries, that make a version 6 UNIX look like
+a version 7 system, is provided to enable you to run some
+useful version 7 utilities, like 'tar', on UNIX-6.
+The entry for 'stat' contained two bugs:
+the 24-bit file size was incorrectly converted to 32 bits
+(sign extension of bit 15)
+and the uid/gid fields suffered from sign extension.
+.sp
+Transferring your files from version 6 to version 7 using 'tar'
+will fail for all files for which
+.sp
+	( (size & 0100000) != 0 )
+.sp
+These two errors are fixed if stat.c is modified as follows:
+.PS
+/* original /usr/src/libc/v6/stat.c */
+
+11		char  os_size0;
+12		short os_size1;
+13		short os_addr[8];
+
+49		buf->st_nlink = osbuf.os_nlinks;
+50		buf->st_uid = osbuf.os_uid;
+51		buf->st_gid = osbuf.os_gid;
+52		buf->st_rdev = 0;
+
+/* modified /usr/src/libc/v6/stat.c */
+
+11		char  os_size0;
+12		unsigned os_size1;
+13		short os_addr[8];
+
+49		buf->st_nlink = osbuf.os_nlinks;
+50		buf->st_uid = osbuf.os_uid & 0377;
+51		buf->st_gid = osbuf.os_gid & 0377;
+52		buf->st_rdev = 0;
+.PE
diff --git a/doc/val.doc b/doc/val.doc
new file mode 100644
index 000000000..b344e5912
--- /dev/null
+++ b/doc/val.doc
@@ -0,0 +1,752 @@
+.ll 72
+.wh 0 hd
+.wh 60 fo
+.de hd
+'sp 5
+..
+.de fo
+'bp
+..
+.tr ~
+.               PARAGRAPH
+.de PP
+.sp
+..
+.               CHAPTER
+.de CH
+.br
+.ne 15
+.sp 3
+.in 0
+\\fB\\$1\\fR
+.in 5
+.PP
+..
+.               SUBCHAPTER
+.de SH
+.br
+.ne 10
+.sp
+.in 5
+\\fB\\$1\\fR
+.in 10
+.PP
+..
+.               INDENT START
+.de IS
+.sp
+.in +5
+..
+.               INDENT END
+.de IE
+.in -5
+.sp
+..
+.               DOUBLE INDENT START
+.de DS
+.sp
+.in +5
+.ll -5
+..
+.               DOUBLE INDENT END
+.de DE
+.ll +5
+.in -5
+.sp
+..
+.               EQUATION START
+.de EQ
+.sp
+.nf
+..
+.               EQUATION END
+.de EN
+.fi
+.sp
+..
+.               TEST
+.de TT
+.ti -5
+Test~\\$1:~
+.br
+..
+.               IMPLEMENTATION 1
+.de I1
+.br
+Implementation~1:
+..
+.               IMPLEMENTATION 2
+.de I2
+.br
+Implementation~2:
+..
+.de CS
+.br
+~-~\\
+..
+.br
+.fi
+.sp 5
+.ce
+\fBPascal Validation Suite Report\fR
+.CH "Pascal processor identification"
+The ACK-Pascal compiler produces code for an EM machine
+as defined in [1].
+It is up to the implementor of the EM machine whether errors like
+integer overflow, undefined operand and range bound error are recognized or not.
+Therefore it depends on the EM machine implementation whether these errors
+are recognized in Pascal programs or not.
+The validation suite results of all known implementations are given.
+.PP
+There does not (yet) exist a hardware EM machine.
+Therefore, EM programs must be interpreted, or translated into
+instructions for a target machine.
+The following implementations currently exist:
+.IS
+.I1
+an interpreter running on a PDP-11 (using UNIX).
+The normal mode of operation for this interpreter is to check
+for undefined integers, overflow, range errors etc.
+.sp
+.I2
+a translator into PDP-11 instructions (using UNIX).
+Less checks are performed than in the interpreter, because the translator
+is intended to speed up the execution of well-debugged programs.
+.IE
+.CH "Test Conditions"
+Tester: E.G. Keizer
+.br
+Date: October 1983
+.br
+Validation Suite version: 3.0
+.PP
+The final test run is made with a slightly
+modified validation suite.
+.SH "Erroneous programs"
+Some test did not conform to the standard proposal of February 1979.
+It is this version of the standard proposal that is used
+by the authors of the validation suite.
+.IS
+.TT 6.6.3.7-4
+The semicolon between high and integer on line 17 is replaced
+by a colon.
+.sp
+.TT 6.7.2.2-13
+The div operator on line 14 replaced by mod.
+.CH "Conformance tests"
+Number of tests passed = 150
+.br
+Number of tests failed = 6
+.SH "Details of failed tests"
+.IS
+.TT 6.1.2-1
+Character sequences starting with the 8 characters 'procedur'
+or 'function' are
+erroneously classified as the word-symbols 'procedure' and 'function'.
+.sp
+.TT 6.1.3-2
+Identifiers identical in the first eight characters, but
+differing in ninth or higher numbered characters are treated as
+identical.
+.sp
+.TT 6.5.1-1
+ACK-Pascal requires all formal program parameters to be
+declared with type \fIfile\fP.
+.sp
+.TT 6.6.6.5-1
+Gives run-time error eof seen at call to eoln.
+A have a hunch that this is a error in the suit.
+.sp
+.TT 6.6.4.1-1
+Redefining the names of some standard procedures leads to incorrect
+behaviour of the runtime system.
+In this case it crashes without a sensible error message.
+.sp
+.TT 6.9.3.5.1-1
+This test can not be translated by our compiler because two
+non-identical variables are used in the same block with the same first eight
+characters.
+The test passed after replacement of one of those names.
+.IE
+.CH "Deviance tests"
+Number of deviations correctly detected = 120
+.br
+Number of tests not detecting deviations = 20
+.SH "Details of deviations"
+The following tests are compiled without a proper error
+indication although they do
+not conform to the standard.
+.IS
+.TT 6.1.6-5
+ACK-Pascal allows labels in the range 0..32767.
+A warning is produced when testing for deviations from the
+standard.
+.sp
+.TT 6.1.8-5
+A missing space between a number and a word symbol is not
+detected.
+.sp
+.TT 6.2.2-8
+.TT 6.3-6
+.TT 6.4.1-3
+.TT 6.6.1-3
+.TT 6.6.1-4
+Undetected scope error. The scope of an identifier should start at the
+beginning of the block in which it is declared.
+In the ACK-Pascal compiler the scope starts just after the declaration,
+however.
+.sp
+.TT 6.4.3.3-7
+The values of fields from one variant are accessible from
+another variant.
+The correlation is exact.
+.sp
+.TT 6.6.3.3-4
+The passing as a variable parameter of the selector of a
+variant part is not detected.
+A runtime error is produced because the variant selector is not
+initialized.
+.sp
+.TT 6.8.2.4-2
+.TT 6.8.2.4-3
+.TT 6.8.2.4-4
+.TT 6.8.2.4-5
+.TT 6.8.2.4-6
+The ACK-Pascal compiler does not restrict the places from where
+you may jump to a label by means of a goto-statement.
+.sp
+.TT 6.8.3.9-5
+.TT 6.8.3.9-6
+.TT 6.8.3.9-7
+.TT 6.8.3.9-16
+There are no errors produced for assignments to a variable
+in use as control-variable of a for-statement.
+.TT 6.8.3.9-8
+.TT 6.8.3.9-9
+Use of a controlled variable after leaving the loop without
+intervening initialization is not detected.
+.IE
+.CH "Error handling"
+The results depend on the EM implementation.
+.sp
+Number of errors correctly detected =
+.in +5
+.I1
+32
+.I2
+17
+.in -5
+Number of errors not detected =
+.in +5
+.I1
+21
+.I2
+36
+.in -5
+Number of errors incorrectly detected =
+.in +5
+.I1
+2
+.I2
+2
+.in -5
+.SH "Details of errors not detected"
+The following test fails because the ACK-Pascal compiler only
+generates a warning that does not prevent to run the tests.
+.IS
+.TT 6.6.2-8
+A warning is produced if there is no assignment to a function-identifier.
+.IE
+With this test the ACK-Pascal compiler issues an error message for a legal
+construct not directly related to the error to be detected.
+.IS
+.TT 6.5.5-2
+Program does not compile.
+Buffer variable of text file is not allowed as variable
+parameter.
+.IE
+The following errors are not detected at all.
+.IS
+.TT 6.2.1-11
+.I2
+The use of an undefined integer is not caught as an error.
+.sp
+.TT 6.4.3.3-10
+.TT 6.4.3.3-11
+.TT 6.4.3.3-12
+.TT 6.4.3.3-13
+The notion of 'current variant' is not implemented, not even if a tagfield
+is present.
+.sp
+.TT 6.4.5-15
+.TT 6.4.6-9
+.TT 6.4.6-10
+.TT 6.4.6-11
+.TT 6.5.3.2-2
+.I2
+Subrange bounds are not checked.
+.sp
+.TT 6.4.6-12
+.TT 6.4.6-13
+.TT 6.7.2.4-4
+If the base-type of a set is a subrange, then the set elements are not checked
+against the bounds of the subrange.
+Only the host-type of this subrange-type is relevant for ACK-Pascal.
+.sp
+.TT 6.5.4-1
+.I2
+Nil pointers are not detected.
+.sp
+.TT 6.5.4-2
+.I2
+Undefined pointers are not detected.
+.sp
+.TT 6.5.5-3
+Changing the file position while the window is in use as actual variable
+parameter or as an element of the record variable list of a with-statement
+is not detected.
+.sp
+.TT 6.6.2-9
+An undefined function result is not detected,
+because it is never used in an expression.
+.sp
+.TT 6.6.5.3-6
+.TT 6.6.5.3-7
+Disposing a variable while it is in use as actual variable parameter or
+as an element of the record variable list of a with-statement is not detected.
+.sp
+.TT 6.6.5.3-8
+.TT 6.6.5.3-9
+.TT 6.6.5.3-10
+It is not detected that a record variable, created with the variant form
+of new, is used as an operand in an expression or as the variable in an
+assignment or as an actual value parameter.
+.sp
+.TT 6.6.5.3-11
+Use of a variable that is not reinitialized after a dispose is
+not detected.
+.sp
+.TT 6.6.6.4-4
+.TT 6.6.6.4-5
+.TT 6.6.6.4-7
+.I2
+There are no range checks for pred, succ and chr.
+.sp
+.TT 6.6.6.5-6
+ACK-Pascal considers a rewrite of a file as a defining
+occurence.
+.sp
+.TT 6.7.2.2-8
+.TT 6.7.2.2-9
+.TT 6.7.2.2-10
+.TT 6.7.2.2-12
+.I2
+Division by 0 or integer overflow is not detected.
+.sp
+.TT 6.8.3.9-18
+The use of the some control variable in two nested for
+statements in not detected.
+.sp
+.TT 6.8.3.9-19
+Access of a control variable after leaving the loop results in
+the final-value, although an error should be produced.
+.sp
+.TT 6.9.3.2-3
+The program stops with a file not open error.
+The rewrite before the write is missing in the program.
+.sp
+.TT 6.9.3.2-4
+.TT 6.9.3.2-5
+Illegal FracDigits values are not detected.
+.CH "Implementation dependence"
+Number of tests run = 14
+.br
+Number of tests incorrectly handled = 0
+.SH "Details of implementation dependence"
+.IS
+.TT 6.1.9-5
+Alternate comment delimiters are implemented
+.sp
+.TT 6.1.9-6
+The equivalent symbols @ for ^, (. for [ and .) for ] are not
+implemented.
+.sp
+.TT 6.4.2.2-10
+Maxint = 32767
+.sp
+.TT 6.4.3.4-5
+Only elements with non-negative ordinal value are allowed in sets.
+.sp
+.TT 6.6.6.1-1
+Standard procedures and functions are not allowed as parameters.
+.sp
+.TT 6.6.6.2-11
+Details of the machine characteristics regarding real numbers:
+.IS
+.nf
+beta =       2
+t =         56
+rnd =        1
+ngrd =       0
+machep =   -56
+negep =    -56
+iexp =       8
+minexp =  -128
+maxexp =   127
+eps =     1.387779e-17
+epsneg =  1.387779e-17
+xmin =    2.938736e-39
+xmax =    1.701412e+38
+.fi
+.IE
+.sp
+.TT 6.7.2.3-3
+.TT 6.7.2.3-4
+All operands of boolean expressions are evaluated.
+.sp
+.TT 6.8.2.2-1
+.TT 6.8.2.2-2
+The expression in an assignment statement is evaluated
+before the variable selection if this involves pointer
+dereferencing or array indexing.
+.sp
+.TT 6.8.2.3-2
+Actual parameters are evaluated in reverse order.
+.sp
+.TT 6.9.3.2-6
+The default width for integer, Boolean and real are 6, 5 and 13.
+.sp
+.TT 6.9.3.5.1-2
+The number of digits written in an exponent is 2.
+.sp
+.TT 6.9.3.6-1
+The representations of true and false are (~true) and (false).
+The parenthesis serve to indicate width.
+.IE
+.CH "Quality measurement"
+Number of tests run = 60
+.br
+Number of tests handled incorrectly = 1
+.SH "Results of tests"
+Several test perform operations on reals on indicate the error
+introduced by these operations.
+For each of these tests the following two quality measures are extracted:
+.sp
+.in +5
+maxRE:~~maximum relative error
+.br
+rmsRE:~~root-mean-square relative error
+.in -5
+.sp 2
+.IS
+.TT 1.2-1
+.I1
+25 thousand Whetstone instructions per second.
+.I2
+169 thousand Whetstone instructions per second.
+.sp
+.TT 1.2-2
+The value of (TRUEACC-ACC)*2^56/100000 is 1.4 .
+This is well within the bounds specified in [3].
+.br
+The GAMM measure is:
+.I1
+238 microseconds
+.I2
+26.3 microseconds.
+.sp
+.TT 1.2-3
+The number of procedure calls calculated in this test exceeds
+the maximum integer value.
+The program stops indicating overflow.
+.sp
+.TT 6.1.3-3
+The number of significant characters for identifiers is 8.
+.sp
+.TT 6.1.5-8
+There is no maximum to the line length.
+.sp
+.TT 6.1.5-9
+The error message "too many digits" is given for numbers larger
+than maxint.
+.sp
+.TT 6.1.5-10
+.TT 6.1.5-11
+.TT 6.1.5-12
+Normal values are allowed for real constants and variables.
+.sp
+.TT 6.1.7-14
+A reasonably large number of strings is allowed.
+.sp
+.TT 6.1.8-6
+No warning is given for possibly unclosed comments.
+.sp
+.TT 6.2.1-12
+.TT 6.2.1-13
+.TT 6.2.1-14
+.TT 6.2.1-15
+.TT 6.5.1-2
+Large lists of declarations are possible in each block.
+.sp
+.TT 6.4.3.2-6
+An 'array[integer] of' is not allowed.
+.sp
+.TT 6.4.3.2-7
+.TT 6.4.3.2-8
+Large values are allowed for arrays and indices.
+.sp
+.TT 6.4.3.3-14
+Large amounts of case-constant values are allowed in variants.
+.sp
+.TT 6.4.3.3-15
+Large amounts of record sections can appear in the fixed part of
+a record.
+.sp
+.TT 6.4.3.3-16
+Large amounts of variants are allowed in a record.
+.TT 6.4.3.4-4
+Size and speed of Warshall's algorithm depend on the
+implementation of EM:
+.IS
+.I1
+.br
+size: 122 bytes
+.br
+speed: 5.2 seconds
+.sp
+.I2
+.br
+size: 196 bytes
+.br
+speed: 0.7 seconds
+.IE
+.TT 6.5.3.2-3
+Deep nesting of array indices is allowed.
+.sp
+.TT 6.5.3.2-4
+.TT 6.5.3.2-5
+Arrays can have at least 8 dimensions.
+.sp
+.TT 6.6.1-8
+Deep static nesting of procedure is allowed.
+.sp
+.TT 6.6.3.1-6
+Large amounts of formal parameters are allowed.
+.sp
+.TT 6.6.5.3-12
+Dispose is fully implemented.
+.sp
+.TT 6.6.6.2-6
+Test sqrt(x): no errors.
+The error is within acceptable bounds.
+.in +5
+maxRE:~~2~**~-55.50
+.br
+rmsRE:~~2~**~-57.53
+.in -5
+.sp
+.TT 6.6.6.2-7
+Test arctan(x): may cause underflow or overflow errors.
+The error is within acceptable bounds.
+.in +5
+.br
+maxRE:~~2~**~-55.00
+.br
+rmsRE:~~2~**~-56.36
+.in -5
+.sp
+.TT 6.6.6.2-8
+Test exp(x): may cause underflow or overflow errors.
+The error is not within acceptable bounds.
+.in +5
+maxRE:~~2~**~-50.03
+.br
+rmsRE:~~2~**~-51.03
+.in -5
+.sp
+.TT 6.6.6.2-9
+Test sin(x): may cause underflow errors.
+The error is not within acceptable bounds.
+.in +5
+maxRE:~~2~**~-38.20
+.br
+rmsRE:~~2~**~-43.68
+.in -5
+.sp
+Test cos(x): may cause underflow errors.
+The error is not within acceptable bounds.
+.in +5
+maxRE:~~2~**~-41.33
+.br
+rmsRE:~~2~**~-46.62
+.in -5
+.sp
+.TT 6.6.6.2-10
+Test ln(x):
+The error is not within acceptable bounds.
+.in +5
+maxRE:~~2~**~-54.05
+.br
+rmsRE:~~2~**~-55.77
+.in -5
+.sp
+.TT 6.7.1-3
+.TT 6.7.1-4
+.TT 6.7.1-5
+Complex nested expressions are allowed.
+.sp
+.TT 6.7.2.2-14
+Test real division:
+The error is within acceptable bounds.
+.in +5
+maxRE:~~0
+.br
+rmsRE:~~0
+.in -5
+.sp
+.TT 6.7.2.2-15
+Operations of reals in the integer range are exact.
+.sp
+.TT 6.7.3-1
+.TT 6.8.3.2-1
+.TT 6.8.3.4-2
+.TT 6.8.3.5-15
+.TT 6.8.3.7-4
+.TT 6.8.3.8-3
+.TT 6.8.3.9-20
+.TT 6.8.3.10-7
+Static deep nesting of function calls,
+compound statements, if statements, case statements, repeat
+loops, while loops, for loops and with statements is possible.
+.sp
+.TT 6.8.3.2-2
+Large amounts of statements are allowed in a compound
+statement.
+.sp
+.TT 6.8.3.5-12
+The compiler requires case constants to be compatible with
+the case selector.
+.sp
+.TT 6.8.3.5-13
+.TT 6.8.3.5-14
+Large case statements are possible.
+.sp
+.TT 6.9-2
+Recursive IO on the same file is well-behaved.
+.sp
+.TT 6.9.1-6
+The reading of real values from a text file is done with
+sufficient accuracy.
+.in +5
+maxRE:~~2~**~-54.61
+.br
+rmsRE:~~2~**~-56.32
+.in -5
+.sp
+.TT 6.9.1-7
+.TT 6.9.2-2
+.TT 6.9.3-3
+.TT 6.9.4-2
+Read, readln, write and writeln may have large amounts of
+parameters.
+.sp
+.TT 6.9.1-8
+The loss of precision for reals written on a text file and read
+back is:
+.in +5
+maxRE:~~2~**~-53.95
+.br
+rmsRE:~~2~**~-55.90
+.in -5
+.sp
+.TT 6.9.3-2
+File IO buffers without trailing marker are correctly flushed.
+.sp
+.TT 6.9.3.5.2-2
+Reals are written with sufficient accuracy.
+.in +5
+maxRE:~~0
+.br
+rmsRE:~~0
+.in -5
+.IE
+.CH "Level 1 conformance tests"
+Number of test passed = 4
+.br
+Number of tests failed = 1
+.SH "Details of failed tests"
+.IS
+.TT 6.6.3.7-4
+An expression indicated by parenthesis whose
+value is a conformant array is not allowed.
+.IE
+.CH "Level 1 deviance tests"
+Number of deviations correctly detected = 4
+.br
+Number of tests not detecting deviations = 0
+.IE
+.CH "Level 1 error handling"
+The results depend on the EM implementation.
+.sp
+Number of errors correctly detected =
+.in +5
+.I1
+1
+.I2
+0
+.in -5
+Number of errors not detected =
+.in +5
+.I1
+0
+.I2
+1
+.in -5
+.SH "Details of errors not detected"
+.IS
+.TT 6.6.3.7-9
+.I2
+Subrange bounds are not checked.
+.IE
+.CH "Level 1 quality measurement"
+Number of tests run = 1
+.SH "Results of test"
+.IS
+.TT 6.6.3.7-10
+Large conformant arrays are allowed.
+.IE
+.CH "Extensions"
+Number of tests run = 3
+.SH Details of test failed
+.IS
+.TT 6.1.9-7
+The alternative relational operators are not allowed.
+.sp
+.TT 6.1.9-8
+The alternative symbols for colon, semicolon and assignment are
+not allowed.
+.sp
+.TT 6.8.3.5-16
+The otherwise selector in case statements is not allowed.
+.IE
+.CH "References"
+.ti -5
+[1]~~\
+A.S.Tanenbaum, E.G.Keizer, J.W.Stevenson, Hans van Staveren,
+"Description of a machine architecture for use with block structured
+languages",
+Informatica rapport IR-81.
+.ti -5
+[2]~~\
+ISO standard proposal ISO/TC97/SC5-N462, dated February 1979.
+The same proposal, in slightly modified form, can be found in:
+A.M.Addyman e.a., "A draft description of Pascal",
+Software, practice and experience, May 1979.
+An improved version, received March 1980,
+is followed as much as possible for the
+current ACK-Pascal.
+.ti -5
+[3]~~\
+B. A. Wichman and J du Croz,
+A program to calculate the GAMM measure, Computer Journal,
+November 1979.