From: keie <none@none>
Date: Mon, 17 Dec 1984 11:35:39 +0000 (+0000)
Subject: *** empty log message ***
X-Git-Tag: release-5-5~5856
X-Git-Url: https://git.ndcode.org/public/gitweb.cgi?a=commitdiff_plain;h=c69304401368deff4f0500a36d46c20d1f1ef1f0;p=ack.git

*** empty log message ***
---

diff --git a/doc/6500.doc b/doc/6500.doc
new file mode 100644
index 000000000..aeef24a9a
--- /dev/null
+++ b/doc/6500.doc
@@ -0,0 +1,2163 @@
+. \" $Header$"
+.po +10
+.ND
+.TL
+.B
+A backend table for the 6500 microprocessor
+.R
+.AU
+Jan van Dalen
+.AB
+The backend table is part of the Amsterdam Compiler Kit (ACK).
+It translates the intermediate language family EM to a machine
+code for the MCS6500 microprocessor family.
+.AE
+.PP
+.bp
+.NH
+Introduction.
+.PP
+As more and more organizations aquire many micro and minicomputers,
+the need for portable compilers is becoming more and more acute.
+The present situation, in which each harware vendor provides its
+own compilers -- each with its own deficiencies and extensions, and
+none of them compatible -- leaves much to be desired.
+The ideal situation would be an integrated system containing
+a family of (cross) compilers, each compiler accepting a standard
+source language and, producing code for a wide variety of target
+machines. Furthermore, the compilers should be compatible, so programs 
+written in one language can call procedures written in another
+language. Finally, the system should be designed so as to make
+adding new languages and, new machines easy. Such an integerated
+system is being built at the Vrije Universiteit.
+.PP
+The compiler building system, which is called the "Amsterdam Compiler
+Kit" (ACK), can be thought of as a "tool kit." It consists of
+a number of parts that can be combined to form compilers (and
+interpreters) with various properties. The tool kit is based
+on an idea (UNCOL) that was first suggested in 1960 [5],
+but which never really caught on then. The problem which UNCOL
+attemps to solve is how to make a compiler for each of
+.B
+N
+.R
+languages on
+.B
+M
+.R
+different machines without having to write
+.B
+N
+.R
+x
+.B
+M
+.R
+programs.
+.PP
+As shown in Fig. 1, the UNCOL approach is to write
+.B
+N
+.R
+"front ends," each of which translates
+one source language to a common
+intermediate language, UNCOL (UNiversal Computer Oriented
+Language), and
+.B
+M
+.R
+"back ends," each of which translates programs
+in UNCOL to a specific machine language. Under these conditions,
+only
+.B
+N
+.R
++
+.B
+M
+.R
+programs must be written to provide all
+.B
+N
+.R
+languages on all
+.B
+M
+.R
+machines, instead of
+.B
+N
+.R
+x
+.B
+M
+.R
+programs.
+.PP
+Various reseachers have attempted to design a suitable UNCOL [1,6],
+but none of these have become popular. It is the believe of the
+designers of the Amsterdam Compiler Kit that previous attemps 
+have failed because they have been too ambitious, that is, they have
+tried to cover all languages and all machines using a single UNCOL.
+The approach of the designers is more modest:
+they cater only to algebraic languages and machines whose memory
+consist of 8-bit bytes, each with its own address.
+Typical languages that could be handled include Ada, ALGOL 60,
+ALGOL 68, BASIC, C, FORTRAN, Modula, Pascal, PL/I, PL/M, PLAIN and
+RATFOR, where COBOL, LISP and SNOBOL would be less efficient.
+Examples of machines that could be included are the Intel 8080 and
+8086, Motorola 6800, 6809 and 68000, Zilog Z80 and Z8000, DEC PDP-11
+and Vax, MOS Technology MCS6500 family and IBM but not the Burroughs
+6700, CDC Cyber or Univac 1108 (because they are not byte_oriented).
+With these restrictions the designers believe that the old UNCOL
+idea can be used as the basis of a practical compiler-building 
+system.
+.sp 10
+.bp
+.NH
+An overview of the Amsterdam Compiler kit
+.PP
+The tool kit consists of eight components:
+.IP 1.
+The preprocessor.
+.IP 2.
+The front ends.
+.IP 3.
+The peephole optimizer.
+.IP 4.
+The global optimizer.
+.IP 5.
+The back end.
+.IP 6.
+The target machine optimizer.
+.IP 7.
+The universal assembler/linker.
+.IP 8.
+The utility package.
+.PP
+A fully optimizing compiler, depicted in Fig. 2, has seven cascaded
+phases. Conceptually, each component reads an input file and writes
+a transformed output file to be used as input to the next component.
+In practice, some components may use temporary files to allow
+multiple passes over the input or internal intermediate files.
+.sp 20
+.PP
+In the following paragraphs a brief decription of each component
+is given.
+A more detailed description of the back end will be given in the 
+rest of this document. For a more detailed descripiton on the rest
+of the components see [7]. A program to be compiled is first fed
+into the (language independed) preprocessor, which provides a
+simple macro facility and similar textual facilities.
+The preprocessor's ouput is a legal program in one of the programming
+languages supported, whereas the input is a program possibly
+augmented with macro's, etc.
+.PP
+This output goes into the appropriate front end, whose job it is to 
+produce intermediate cade.
+This intermediate code (the UNCOL of ACK) is the machine language
+for a simple stack machine EM (Encoding Machine).
+A typical front end might build a parse tree from the input
+and then use the parse tree to generate EM cade,
+which is similar to reverse Polish.
+In order to perform this work, the front end has to maintain tables of declare
+tables of declared variables, labels, etc., determine where
+to place the data structures in memory and so on.
+.PP
+The EM code generated by the front end is fed into the peephole
+optimizer, which scans it with a window of a view instructions,
+replacing certain inefficient code sequences by better ones.
+Such a search is important because EM contains instructions to
+handle numerous important special cases efficiently
+(e.g. incrementing a variable by 1).
+It is our strategy to relieve the front ends of the burden
+of hunting for special cases because there are many front ends
+and just one peephole optimizer.
+By handeling the special cases in the peephole optimizer,
+the front ends become simpler, easier to write and easier to maintain.
+.PP
+Following the peephole optimizer is a global optimizer [2],
+which unlike the peephole optimizer, examines the program as a whole.
+It builts a data flow graph to make possible a variety of global
+optimizations, among them, moving invariant code out of loops,
+avoiding redundant computations, live/dead analysis and
+eliminating tail recursion.
+Note that the output of the global optimizer is still EM code.
+.PP
+Next comes the back end, which differs from the front ends in a
+fundamental way.
+Each front end is a separate program, whereas the back end is a 
+single program that is driven by a machine dependent driving table.
+The driving table for a specific machine tells how EM code is
+mapped onto the machine's assembly language.
+Although a simple driving table just might macro expand each
+EM instruction into a sequence of target machine instructions,
+a much more sophisticated translation strategy is normaly used,
+as described later.
+For speech, the back end does not actually read in the driving
+table at run time.
+Instead, the tables are compiled along with the back end in advance,
+resulting in one binairy program per machine.
+.PP
+The output of the back end is a program in the assembly language
+of some particular machine.
+The next component in the pipeline reads this program and performs
+peephole optimization on it.
+The optimizations performed here involve idiosyncrasies of the
+target machine that cannot be performed by the machine-independent
+EM-to-EM peephole optimizer.
+Typically these optimizations take advantage of the special
+instructions or special addressing modes.
+.PP
+The optimized target machine assembly code then goes into the final
+component in the pipeline, the universal assembler/linker.
+This program assembles the input to object format, extracting
+routines from libraries and including them as needed.
+.PP
+The final component of the tool kit is the utility package,
+which contains various test programs, interpreters for EM code,
+EM libraries, conversion programs and other aids for the
+implementer and user.
+.bp
+.DS C
+.B
+THE MCS6500 MICROPROCESSOR.
+.R
+.DE
+.NH 0
+Introduction
+.PP
+Why a back end table for the MCS6500 microprocessor family.
+Although the MCS6500 microprocessor family has an simple
+instruction set and internal structure, it is used in a
+variety of microcomputers and homecomputers.
+This is because of is low cost.
+As an example the Apple II, a well known and width spread
+microprocessor, uses the MCS6502 CPU.
+Also the BBC homecomputer, whose popularity is growing day
+by day uses the MCS6502 CPU.
+The BBC homecomputer is based on the MCS6502 CPU although 
+better and stronger microprocessors are available.
+The designers of Acorn computer Industries have probably
+choosen for the MCS6502 because of the amount of software
+available for this CPU.
+Since its width spreaded use, a variaty of software
+will be needed for it.
+One can think of games!!, administration programs,
+teaching programs, basic interpreters and other application
+programs.
+Even do it will not be possible to run the total compiler kit
+on a MCS6500 based computer, it is possible to write application
+programs in a high level language, such as Pascal or C on a
+minicomputer.
+These application programs can be tested and compiled on that
+minicomputer and put in a ROM (Read Only Memory), for example,
+cso that it an be executed by a MCS6500 CPU.
+The strategy of writing testprograms on a minicomputer, 
+compile it and then execute it on a MCS6500 based
+microprocessor is used by the development of the back end.
+The minicomputer used is M68000 based one, manufactured by
+Bleasdale Computer Systems Ltd..
+The micro- or homecomputer used is a BBC microcomputer,
+manufactured by Acorn Computer Ltd..
+.NH
+The MOS Technology MCS6500
+.PP
+The MCS6500 is as a family of CPU devices developed by MOS
+Technology.
+The members of the MCS6500 family are the same chips in a 
+different housing.
+The MCS6502, the big brother in the family, can handle 64k
+bytes of memory, while for example the MCS6504 can only handle
+8k bytes of memory.
+This difference is due to the fact that the MCS6502 is in a
+40 pins house and the MCS6504 has a 28 pins house, so less
+address lines are available.
+.bp
+.NH
+The MCS6500 CPU programmable registers
+.PP
+The MCS6500 series is based on the same chip so all have the
+same programmable registers.
+.sp 9
+.NH 2
+The accumulator A.
+.PP
+The accumulator A is the only register on which the arithmetic
+and logical instructions can be used.
+For example, the instruction ADC (add with carry) adds the
+contents of the accumulator A and a byte from memory or data.
+.NH 2
+The index register X.
+.PP
+As the name suggests this register can be used for some
+indirect addressing modes.
+The modes are explaned below.
+.NH 2
+The index register Y.
+.PP
+This register is, just as the index register X, used for
+certain indirect addressing modes.
+These addressing modes are different from the modes which
+use index register X.
+.NH 2
+The program counter PC
+.PP 
+This is the only 16-bit register available.
+It is used to point to the next instruction to be
+carried out.
+.NH 2
+The stack pointer SP
+.PP
+The stack pointer is an 8-bit register, so the stack can contain
+at most 256 bytes.
+The CPU always appends 00000001 as highbyte of any stack address,
+which means that memory locations
+.B
+0100
+.R
+through
+.B
+01FF
+.R
+are permanently assigned to the stack.
+.sp 12
+.NH 2
+The status register
+.PP
+The status register maintains six status flags and a master
+interrupt control bit.
+.br
+These are the six status flags:
+    Carry        (c)
+    Zero         (z)
+    Overflow     (o)
+    Sign         (n)
+    Decimal mode (d)
+    Break        (b)
+
+
+
+
+
+The bit (i) is the master interrupt control bit.
+.NH
+The MCS6500 memory layout.
+.PP
+In the MCS6500 memory space three area's have special meaning.
+These area's are:
+.IP 1)
+Top page.
+.IP 2)
+Zero page.
+.IP 3)
+The stack.
+.PP
+MCS6500 memory is divided up into pages.
+These pages consist 256 bytes.
+So in a memory address the highbyte denotes the page number
+and the lowbyte the offset within the page.
+.NH 2
+Top page.
+.PP
+When a MCS6500 is restared it jumps indirect via memory address
+.B
+FFFC.
+.R
+At
+.B
+FFFC
+.R
+(lowbyte) and 
+.B
+FFFD
+.R
+(highbyte) there must be the address of the bootstrap subroutine.
+When a break instruction (BRK) occurs or an interrupt takes place,
+the MCS6500 jumps indirect through memory address
+.B
+FFFE.
+.R
+.B
+FFFE
+.R
+and 
+.B
+FFFF
+.R
+thus, must contain the address of the interrupt routine.
+The former only goes for maskeble interrupt.
+There also exist a nonmaskeble interrupt.
+This cause the MCS6500 to jump indirect through memory address
+.B
+FFFA.
+.R
+So the top six bytes of memory are used by the operating system
+and therefore not available for the back end.
+.NH 2
+Zero page.
+.PP
+This page has a special meaning in the sence that addressing
+this page uses special opcodes.
+Since a page consists of 256 bytes, only one byte is needed
+for addressing zero page.
+So an instruction which uses zero page occupies two bytes.
+It also uses less clock cycle's while carrying out the instruction.
+Zero page is also needed when indirect addressing is used.
+This means that when indirect addressing is used, the address must
+reside in zero page (two consecutive bytes).
+In this case (the back end), zero page is used, for example
+to hold the local base, the second local base, the stack pointer
+etc.
+.NH 2
+The stack.
+.PP
+The stack is described in paragraph 3.5 about the MCS6500
+programmable registers.
+.NH 
+The memory adressing modes
+.PP
+MCS6500 memory reference instructions use direct addressing,
+indexed addressing, and indirect addressing.
+.NH 2
+direct addressing.
+.PP
+Three-byte instructions use the second and third bytes of the
+object code to provide a direct 16-bit address:
+therefore, 65.536 bytes of memory can be addressed directly.
+The commonly used memory reference instructions also have a two-byte
+object code variation, where the second byte directly addresses
+one of the first 256 bytes.
+.NH 2
+Base page, indexed addressing.
+.PP
+In this case, the instruction has two bytes of object code.
+The contents of either the X or Y index registers are added to the 
+second  object code byte in order to compute a memory address.
+This may be illustrated as follows:
+.sp 15
+Base page, indexed addressing, as illustrated above, is 
+wraparound - which means that there is no carry.
+If the sum of the index register and second object code byte contents
+is more than
+.B
+FF
+.R
+, the carry bit will be dicarded.
+This may be illustrated as follows:
+.sp 9
+.NH 2
+Absolute indexed addressing.
+.PP
+In this case, the contents of either the X or Y register are added
+to a 16-bit direct address provided by the second and third bytes
+of an instruction's object code.
+This may be illustrated as follows:
+.sp 10
+.NH 2
+Indirect addressing.
+.PP
+Instructions that use simple indirect addressing have three bytes of
+object code.
+The second and third object code bytes provide a 16-bit address;
+therefore, the indirect address can be located anywhere in
+memory.
+This is straightforward indirect addressing.
+.NH 3
+Pre-indexed indirect addressing.
+.PP
+In this case, the object code consists of two bytes and the 
+second object code byte provides an 8-bit address.
+Instructions that use pre-indexed indirect addressing add the contents
+of the X index register and the second object code byte to access
+a memory location in the first 256 bytes of memory, where the 
+indirect address will be found:
+.sp 18
+When using pre-indexed indirect addressing, once again wraparound
+addition is used, which means that when the X index register contents
+are added to the second object code byte, any carry will be discarded.
+Note that only the X index register can be used with pre-indexed
+addressing.
+.NH 3
+Post-indexed indirect addressing.
+.PP
+In this case, the object code consists of two bytes and the
+second object code byte provides an 8-bit address.
+Now the second object code byte indentifies a location
+in the first 256 bytes of memory where an indirect address
+will be found.
+The contents of the Y index register are added to this indirect
+address.
+This may be illustrated as follows:
+.sp 18
+Note that only the Y index register can be used with post-indexed
+indirect addressing.
+.bp
+.NH
+What the CPU has and doesn't has.
+.PP
+Although the designers of the MCS6500 CPUs family state that
+there is nothing very significant about the short stack (only
+256 bytes) this stack caused problems for the back end.
+The designers say that a 256-byte stack usually is sufficient
+for any typical microcomputer, this is only true if the stack
+is used only for return addresses of the JSR (jump to
+subroutine) instruction.
+But since the EM machine is suppost to be a stack machine and
+high level languages need the ability of parameters and
+locals in there procedures and function, this short stack
+is unsufficiant.
+So an software stack is implemented in this back end, requiring two
+additional subroutines for stack handling.
+These two stack handling subroutines slow down the processing time
+of a program since the stack is used heavely.
+.PP
+Since parameters and locals of EM procedures are offseted
+from the localbase of that procedure, indirect addressing
+is havily used.
+Offsets are positive (for parameters) and negative (for
+local variables).
+As explaned before the addressing modes the MCS6500 have a
+post indexed indirect addressing mode.
+This addressing mode can only handle positive offsets.
+This raises a problem for accessing the local variables
+I have chosen for the next solution.
+A second local base is introduced.
+This second local base is the real local base subtracted by
+a constant BASE.
+In the present situation of the back end the value of BASE
+is 240.
+This means that there are 240 bytes reseved for local
+variables to be indirect addressed and 14 bytes for
+the parameters.
+.DS C
+.B
+THE CODE GENERATOR.
+.R
+.DE
+.NH 0
+Description of the machine table.
+.PP
+The machine description table consists of the following sections:
+.IP 1.
+The macro definitions.
+.IP 2.
+Constant definitions.
+.IP 3.
+Register definitions.
+.IP 4.
+Token definitions.
+.IP 5.
+Token expressions.
+.IP 6.
+Code rules.
+.IP 7.
+Move definitions.
+.IP 8.
+Test definitions.
+.IP 9.
+Stack definitions.
+.NH 2
+Macro definitions.
+.PP
+The macro definitions at the top of the table are expanded
+by the preprocessor on occurence in the rest of the table.
+.NH 2
+Constant definitions.
+.PP
+There are three constants which must be defined at first.
+The are:
+.IP EM_WSIZE: 11
+Number of bytes in a machine word.
+This is the number of bytes a simple
+.B
+loc
+.R
+instruction will put on the stack.
+.IP EM_PSIZE:
+Number of bytes in a pointer.
+This is the number of bytes a
+.B
+lal
+.R
+instruction will put on the stack.
+.IP EM_BSIZE:
+Number of bytes in the hole between AB and LB.
+The calling sequence only saves LB on the stack so this
+constant is equal to the pointer size.
+.NH 1
+Register definitions.
+.PP
+The only important register definition is the definition of
+the registerpair AX.
+Since the rest of the machine's registers Y, PC, ST serve
+special purposes, the code generator cannot use them.
+.NH 2
+Token definitions
+.PP
+There is a fake token.
+This token is put in the table, since the code generator generator
+complains if it cannot find one.
+.NH 2
+Token expression definitions.
+.PP
+The token expression is also a fake one.
+This token expression is put in the table, since the code generator
+generator complains if it cannot find one.
+.NH 2
+Code rules.
+.PP
+The code rule section is the largest section in the table.
+They specify EM patterns, stack patterns, code to be generated,
+etc.
+The syntax is:
+.IP code rule:
+EM pattern '|' stack pattern '|' code '|'
+stack replacement '|' EM replacement '|'
+.PP
+All patterns are optional, however there must be at least one
+pattern present.
+If the EM pattern is missing the rule becomes a rewriting
+rule or a
+.B
+coercion
+.R
+to be used when code generation cannot continue because of an
+invalid stack pattern.
+The code rules are preceeded by the word CODE:.
+.NH 3
+The EM pattern.
+.PP
+The EM pattern consists of a list of EM mnemonics followed by
+a boolean expression. Examples:
+.sp 1
+.br
+.B
+loe
+.R
+.sp 1
+will match a single
+.B
+loe
+.R
+instruction,
+.sp 1
+.br
+.B
+loc loc cif
+.R
+$1==2 && $2==8
+.sp 1
+is a pattern that will match
+.sp 1
+.br
+.B
+loc
+.R
+2
+.br
+.B
+loc
+.R
+8
+.br
+.B
+cif
+.R
+.sp 1
+and
+.sp 1
+.br
+.B
+lol
+inc
+stl
+.R
+$1==$3
+.sp 1
+will match for example
+.sp 1
+.br
+.B
+lol
+.R
+6
+.br
+.B
+inc
+.R
+.br
+.B
+stl
+.R
+6
+.sp 1
+A missing boolean expession evaluates to TRUE.
+.PP
+The code generator will match the longest EM pattern on every occasion,
+if two patterns of the same length match the first in the table
+will be chosen, while all patterns of length greater than or equal
+to three are considered to be of the same length.
+.NH 3
+The stack pattern.
+.PP
+The only stack pattern that can occur is R16, which means that the
+registerpair AX contains the word on top of the stack.
+If this is not the case a coersion occurs.
+This coersion generates a "jsr Pop", which means that the top
+of the stack is popped and stored in the registerpair AX.
+.NH 3
+The code part.
+.PP
+The code part consists of three parts, stack cleanup, register
+allocation, and code to be generated.
+All of these may be omitted.
+.NH 4
+Stack cleanup.
+.PP
+When generating something like a branch instruction it might be
+needed to empty the fake stack, that is, remove the AX registerpair.
+This is done by the instruction remove(ALL)
+.NH 4
+Register allocation.
+.PP
+If the machine code to be generated uses the registerpair AX,
+this is signaled to the code generator by the allocate(R16)
+instruction.
+If the registerpair AX resides on the fake stack, this will result
+in a "jsr Push", which means that the registerpair AX is pushed on
+the stack and will be free for further use.
+If registerpair AX is not on the fake stack nothing happens.
+.NH 4
+Code to be generated.
+.PP
+Code to be generated is specified as a list of items of the following
+kind:
+.IP 1)
+A string in double quotes("This is a string").
+This is copied to the codefile and a newline ('\n') is appended.
+Inside the string all normal C string conventions are allowed,
+and substitutions can be made of the following sorts.
+.RS
+.IP a)
+$1, $2 etc. These are the operand of the corresponding EM 
+instructions and are printed according to there type.
+To put a real '$' inside the string it must be doubled ('$$').
+.IP b)
+%[1], %[2.reg], %[b.1] etc. these have there obvious meaning.
+If they describe a complete token (%[1]) the printformat for
+the token is used.
+If they stand fo a basic term in an expression they will be
+printed according to their type.
+To put a real '%' inside the string it must be doubled ('%%').
+.IP c)
+%( arbitrary expression %). This allows inclusion of arbitrary
+expressions inside strings.
+Usually not needed very often, so that the akward notation
+is not too bad.
+Note that %(%[1]%) is equivalent to %[1].
+.RE
+.NH 3
+stack replacement.
+.PP
+The stack replacement is a possibly empty list of items to be
+pushed on the fake stack.
+Three things can occur:
+.IP 1)
+%[1] is used if the registerpair AX was on the fake stack and is
+to be pushed back onto it.
+.IP 2)
+%[a] is used if the registerpair AX is allocated with allocate(R16)
+and is to be pushed onto the fake stack.
+.IP 3)
+It can also be empty.
+.NH 3
+EM replacement.
+.PP
+In exeptional cases it might be useful to leave part of the an EM
+pattern undone.
+For example, a
+.B
+sdl
+.R
+instruction might be split into two
+.B
+stl
+.R
+instructions when there is no 4-byte quantity on the stack.
+The EM replacement part allows one to express this.
+Example:
+.sp 1
+.br
+.B
+stl
+.R
+$1
+.B
+stl
+.R
+$1+2
+.sp 1
+The instructions are inserted in the stream so they can match
+the first part of a pattern in the next step.
+Note that since the code generator traverses the EM instructions
+in a strict linear fashion, it is impossible to let the EM
+replacement match later parts of a pattern.
+So if there is a pattern
+.sp 1
+.br
+.B
+loc
+stl
+.R
+$1==0
+.sp1
+and the input is
+.sp 1
+.br
+.B
+loc
+.R
+0
+.B
+sdl
+.R
+4
+.sp 1
+the
+.B
+loc
+.R
+0
+will be processed first, then the
+.B
+sdl
+.R
+might be split into two
+.B
+stl
+.R
+'s but the pattern cannot match now.
+.NH 3
+Move definitions.
+.PP
+This definition is a fake. This definition is put in the
+table, since the code generator generator complains if it
+cannot find one.
+.NH 3
+Test definitions.
+.PP
+Test definitions aren't used by the table.
+.NH 3
+Stack definitions.
+.PP
+When the generator has to push the registerpair AX, it must
+know how to do so.
+The machine code to be generated is defined here.
+.NH 1
+Some remarks.
+.PP
+The above description of the machine table is
+a description of the table for the MCS6500.
+It uses only a part of the possibilities which the code generator
+generator offers.
+For a more precise and detailed description see [4].
+.DS C
+.B
+THE BACK END TABLE.
+.R
+.DE
+.NH 0
+Introduction.
+.PP
+The code rules are divided in 15 groups.
+These groups are:
+.IP 1.
+Load instructions.
+.IP 2.
+Store instructions.
+.IP 3.
+Integer arithmetic instructions.
+.IP 4.
+Unsigned arithmetic instructions.
+.IP 5.
+Floating point arithmetic instructions.
+.IP 6.
+Pointer arithmetic instructions.
+.IP 7.
+Increment, decrement and zero instructions.
+.IP 8.
+Convert instructions.
+.IP 9.
+Logical instructions.
+.IP 10.
+Set manipulation instructions.
+.IP 11.
+Array instructions.
+.IP 12.
+Compare instructions.
+.IP 13.
+Branch instructions.
+.IP 14.
+Procedure call instructions.
+.IP 15.
+Miscellaneous instructions.
+.PP
+From all of these groups one or two typical EM pattern will be explained
+in the next paragraphs.
+Comment is placed between /* and */ (/* This is a comment */).
+.NH
+The instructions.
+.NH 2
+The load instructions.
+.PP
+In this group a typical instruction is
+.B
+lol
+.R
+.
+A
+.B
+lol
+.R
+instruction pushes the word at local base + offset, where offset
+is the instructions argument, onto the stack.
+Since the MCS6500 can only offset by 256 bytes, as explaned at the
+memory addressing modes, there is a need for two code rules in the
+table.
+One which can offset directly and one that must explicit
+calculate the address of the local.
+.NH 3
+The lol instruction with indirect offsetting.
+.PP
+In this case an indirect offsetted load from the second local base
+is possible.
+The table content is:
+.sp 1
+.br
+.B
+lol
+.R
+IN($1) | |
+.br
+allocate(R16)	/* allocate registerpair AX */
+.br
+"ldy #BASE+$1"	/* load Y with the offset from the second
+.br
+					      local base */
+.br
+"lda (LBl),y"	/* load indirect the lowbyte of the word */
+.br
+"tax"		/* move register A to register X */
+.br
+"iny"		/* increment register Y (offset) */
+.br
+"lda (LBl),y"	/* load indirect the highbyte of the word */
+.br
+| %[a] | |	/* push the word onto the fake stack */
+.NH 3
+The lol instruction whose offset is to big.
+.PP
+In this case, the library subroutine "Lol" is used.
+This subroutine expects the offset in registerpair AX, then
+calculates the address of the local or parameter, and loads
+it into registerpair AX.
+The table content is:
+.sp 1
+.br
+.B
+lol
+.R
+| |
+.br
+allocate(R16)	/* allocate registerpair AX */
+.br
+"lda #[$1].h"	/* load highbyte of offset into register A */
+.br
+"ldx #[$1].l"	/* load lowbyte of offset into register X */
+.br
+"jsr Lol"	/* perform the subroutine */
+.br
+| %[a] | |	/* push word onto the fake stack */
+.NH 2
+The store instructions.
+.PP
+In this group a typical instruction is
+.B
+stl.
+.R
+A
+.B
+stl
+.R
+instruction poppes a word from the stack and stores it in the word
+at local base + offset, where offset is the instructions argument.
+Here also is the need for two code rules in the table as a result
+of the offset limits.
+.NH 3
+The stl instruction with indirect offsetting.
+.PP
+In this case it an indirect offsetted store from the second local
+base is possible.
+The table content is:
+.sp 1
+.br
+.B
+stl
+.R
+IN($1) | R16 |	/* expect registerpair AX on top of the
+.br
+							fake stack */
+.br
+"ldy #BASE+1+$1"  /* load Y with the offset from the
+.br
+						second local base */
+.br
+"sta (LBl),y"	/* store the highbyte of the word from A */
+.br
+"txa"		/* move register X to register A */
+.br
+"dey"		/* decrement offset */
+.br
+"sta (LBl),y"	/* store the lowbyte of the word from A */
+.br
+| | |
+.NH 3
+The stl instruction whose offset is to big.
+.PP
+In this case the library subroutine 'Stl' is used.
+This subroutine expects the offset in registerpair AX, then
+calculates the address, poppes the word stores it at its place.
+The table content is:
+.sp 1
+.br
+.B
+stl
+.R
+| |
+.br
+allocate(R16)	/* allocate registerpair AX */
+.br
+"lda #[$1].h"	/* load highbyte of offset in register A */
+.br
+"ldx #[$1].l"	/* load lowbyte of offset in register X */
+.br
+"jsr Stl"	/* perform the subroutine */
+.br
+| | |
+.NH 2
+Integer arithmetic instructions.
+.PP
+In this group typical instructions are
+.B
+adi
+.R
+and
+.B
+mli.
+.R
+These instructions, in this table, are implemented for 2-byte
+and 4-byte integers.
+The only arithmetic instructions available on the MCS6500 are
+the ADC (add with carry), and SBC (subtract with not(carry)).
+Not(carry) here means that in a subtraction, the one's complement
+of the carry is taken.
+The absence of multiply and division instructions forces the
+use of subroutines to handle these cases.
+Because there are no registers left to perform on the multiply
+and division, zero page is used here.
+The 4-byte integer arithmetic is implemented, because in C there
+exists the integer type long.
+A user is freely to use the type long, but will pay in performance.
+.NH 3
+The adi instruction.
+.PP
+In case of the
+.B
+adi
+.R
+2 (and
+.B
+sbi
+.R
+2) instruction there are many EM
+patterns, so that the instruction can be performed in line in
+most cases.
+For the worst case there exists a subroutine in the library
+which deals with the EM instruction.
+In case of a
+.B
+adi
+.R
+4 (or
+.B
+sbi
+.R
+4) there only is a subroutine to deal with it.
+A table content is:
+.sp 1
+.br
+.B
+lol lol adi
+.R
+(IN($1) && IN($2) && $3==2) | | /* is it in range */
+.br
+allocate(R16)	/* allocate registerpair AX */
+.br
+"ldy #BASE+$1+1" /* load Y with offset for first operand */
+.br
+"lda (LBl),y"	/* load indirect highbyte first operand */
+.br
+"pha"		/* save highbyte first operand on hard_stack */
+.br
+"dey"		/* decrement offset first operand */
+.br
+"lda (LBl),y"	/* load indirect lowbyte first operand */
+.br
+"ldy #BASE+$2"	/* load Y with offset for second operand */
+.br
+"clc"		/* clear carry for addition */
+.br
+"adc (LBl),y"	/* add the lowbytes of the operands */
+.br
+"tax"		/* store lowbyte of result in place */
+.br
+"iny"		/* increment offset second operand */
+.br
+"pla"		/* get highbyte first operand */
+.br
+"adc (LBl),y"	/* add the highbytes of the operands */
+.br
+| %[a] | |	/* push the result onto the fake stack */
+.NH 3
+The mli instruction.
+.PP
+The
+.B
+mli
+.R
+2 instruction uses most the subroutine 'Mlinp'.
+This subroutine expects the multiplicand in zero page
+at locations ARTH, ARTH+1, while the multiplier is in zero
+page locations ARTH+2, ARTH+3.
+For a description of the algorithms used for multiplication and
+division, see [9].
+A table content is:
+.sp  1
+.br
+.B
+lol lol mli
+.R
+(IN($1) && IN($2) && $3==2) | |
+.br
+allocate(R16)	/* allocate registerpair AX */
+.br
+"ldy #BASE+$1"	/* load Y with offset of multiplicand */
+.br
+"lda (LBl),y"	/* load indirect lowbyte of multiplicand */
+.br
+"sta ARTH"	/* store lowbyte in zero page */
+.br
+"iny"		/* increment offset of multiplicand */
+.br
+"lda (LBl),y"	/* load indirect highbyte of multiplicand */
+.br
+"sta ARTH+1"	/* store highbyte in zero page */
+.br
+"ldy #BASE+$2"	/* load Y with offset of multiplier */
+.br
+"lda (LBl),y"	/* load indirect lowbyte of multiplier */
+.br
+"sta ARTH+2"	/* store lowbyte in zero page */
+.br
+"iny"		/* increment offset of multiplier */
+.br
+"lda (LBl),y"	/* load indirect highbyte of multiplier */
+.br
+"sta ARTH+3"	/* store highbyte in zero page */
+.br
+"jsr Mlinp"	/* perform the multiply */
+.br
+| %[a] | |	/* push result onto fake stack */
+.NH 2
+The unsgned arithmetic instructions.
+.PP
+Since unsigned addition an subtraction is performed in the same way
+as signed addition and subtraction, these cases are dealt with by
+an EM replacement.
+For mutiplication and division there are special subroutines.
+.NH 3
+Unsigned addition.
+.PP
+This is an example of the EM replacement strategy.
+.sp 1
+.br
+.B
+lol lol adu
+.R
+	| | | |
+.B
+lol
+.R
+$1
+.B
+lol
+.R
+$2
+.B
+adi
+.R
+$3 |
+.NH 2
+Floating point arithmetic.
+.PP
+Floating point arithmetic isn't implemented in this table.
+.NH 2
+Pointer arithmetic instructions.
+.PP
+A typical pointer arithmetic instruction is
+.B
+adp
+.R
+2.
+This instruction adds an offset and a pointer.
+A table content is:
+.sp 1
+.br
+.B
+adp
+.R
+	| | | |
+.B
+loc
+.R
+$1
+.B
+adi
+.R
+2 |
+.NH 2
+Increment, decrement and zero instructions.
+.PP
+In this group a typical instruction is
+.B
+inl
+.R
+, which increments a local or parameter.
+The MCS6500 doesn't have an instruction to increment the
+accumulator A, so the 'ADC' instruction must be used.
+A table content is:
+.sp 1
+.br
+.B
+inl
+.R
+IN($1) | |
+.br
+allocate(R16)	/* allocate registerpair AX */
+.br
+"ldy #BASE+$1"	/* load Y with offset of the local */
+.br
+"clc"		/* clear carry for addition */
+.br
+"lda (LBl),y"	/* load indirect lowbyte of local */
+.br
+"adc #1"	/* increment lowbyte */
+.br
+"sta (LBl),y"	/* restore indirect the incremented lowbyte */
+.br
+"bcc 1f"	/* if carry is clear then ready */
+.br 
+"iny"		/* increment offset of local */
+.br
+"lda (LBl),y"	/* load indirect highbyte of local */
+.br
+"adc #0"	/* add carry to highbyte */
+.br
+"sta (LBl),y\\n1:"  /* restore indirect the highbyte */
+.PP
+If the offset of the local or parameter is to big, first the
+local or parameter is fetched, than incremented, and then
+restored.
+.NH 2
+Convert instructions.
+.PP
+In this case there are two convert instructions
+which really do something.
+One of them is in line code, and deals with the extension of
+a character (1-byte) to an integer.
+The other one is a subroutine which handles the conversion
+between 2-byte integers and 4-byte integers.
+.NH 3
+The in line conversion.
+.PP
+The table content is:
+.sp 1
+.br
+.B
+loc loc cii
+.R
+$1==1 && $2==2 | R16 |
+.br
+"txa"		/* see if sign extension is needed */
+.br
+"bpl 1f"	/* there is no need for sign extension */
+.br
+"lda #0FFh"	/* sign extension here */
+.br
+"bne 2f"	/* conversion ready */
+.br
+"1: lda #0\\n2:"	/* no sign extension here */
+.NH 2
+Logical instructions.
+.PP
+A typical instruction in this group is the logical
+.B
+and
+.R
+on two 2-byte words.
+The logical
+.B
+and
+.R
+on groups of more than two bytes (max 254)
+is also possible and uses a library subroutine.
+.NH 3
+The logical and on 2-byte groups.
+.PP
+The table content is:
+.sp 1
+.br
+.B
+and
+.R
+$1==2 | R16 |	/* one group must be on the fake stack */
+.br
+"sta ARTH+1"	/* temporary save of first group highbyte */
+.br
+"stx ARTH"	/* temporary save of first group lowbyte */
+.br
+"jsr Pop"	/* pop second group from the stack */
+.br
+"and ARTH+1"	/* logical and on highbytes */
+.br
+"pha"		/* temporary save the result's highbyte */
+.br
+"txa"		/* logical and can only be done in A */
+.br
+"and ARTH"	/* logical and on lowbytes */
+.br
+"tax"		/* restore results lowbyte */
+.br
+"pla"		/* restore results highbyte */
+.br
+| %[1] | |	/* push result onto fake stack */
+.NH 2
+Set manipulation instructions.
+.PP
+A typical EM pattern in this group is
+.B
+loc inn zeq
+.R
+$1>0 && $1<16 && $2==2.
+This EM pattern works on sets of 16 bits.
+Sets can be bigger (max 256 bytes = 2048 bits), but than a
+library routine is used instead of in line code.
+The table content of the above EM pattern is:
+.sp 1
+.br
+.B
+loc inn zeq
+.R
+$1>0 && $1<16 && $2==2 | R16 |
+.br
+"ldy #$1+1"	/* load Y with bit number */
+.br
+"stx ARTH"	/* cannot rotate X, so use zero page */
+.br
+"1: lsr a"	/* right shift A */
+.br
+"ror ARTH"	/* right rotate zero page location */
+.br
+"dey"		/* decrement Y */
+.br
+"bne 1b"	/* shift $1 times */
+.br
+"bcc $1"	/* no carry, so bit is zero */
+.NH 2
+Array instructions.
+.PP
+In this group a typical EM pattern is
+.B
+lae lar
+.R
+defined(rom(1,3)) | | | |
+.B
+lae
+.R
+$1
+.B
+aar
+.R
+$2
+.B
+loi
+.R
+rom(1,3).
+This pattern uses the 
+.B
+aar
+.R
+instruction, which is part of a typical EM pattern:
+.sp 1
+.br
+.B
+lae aar
+.R
+$2==2 && rom(1,3)==2 && rom(1,1)==0 | R16 | /* registerpair AX contains
+the index in the array */
+.br
+"pha"		/* save highbyte of index */
+.br
+"txa"		/* move lowbyte of index to A */
+.br
+"asl a"		/* shift left lowbyte == 2 times lowbyte */
+.br
+"tax"		/* restore lowbyte */
+.br
+"pla"		/* restore highbyte */
+.br
+"rol a"		/* rotate left highbyte == 2 times highbyte */
+.br
+| %[1] | adi 2 | /* push new index, add to lowerbound array */
+.NH 2
+Compare instructions.
+.PP
+In this group all EM patterns are performed by calling
+a subroutine.
+Subroutines are used here because comparison is only
+possible byte by byte.
+This means a lot of code, and since compare are used frequently
+a lot of in line code would be generated, and thus reducing
+the space left for the software stack.
+These subroutines can be found in the library.
+.NH 2
+Branch instructions.
+.PP
+A typical branch instruction is
+.B
+beq.
+.R
+The table content for it is:
+.sp 1
+.br
+.B
+beq
+.R
+| R16 |
+.br
+"sta BRANCH+1"	/* save highbyte second operand in zero page */
+.br
+"stx BRANCH"	/* save lowbyte second operand in zero page */
+.br
+"jsr Pop"	/* pop the first operand */
+.br
+"cmp BRANCH+1" 	/* compare the highbytes */
+.br
+"bne 1f"	/* there not equal so go on */
+.br
+"cpx BRANCH"	/* compare the lowbytes */
+.br
+"beq $1\\n1:"	/* lowbytes are also equal, so branch */
+.PP
+Another typical instruction in this group is
+.B
+zeq.
+.R
+The table content is:
+.sp 1
+.br
+.B
+zeq
+.R
+| R16 |
+.br
+"tay"		/* move A to Y for setting testbits */
+.br
+"bmi $1"	/* highbyte s minus so branch */
+.br
+"txa"		/* move X to A for setting testbits */
+.br
+"beq $1\\n1:"	/* lowbyte also zero, thus branch */
+.NH 2
+Procedure call instructions.
+.PP
+In this group one code generation might seem a little
+akward.
+It is the EM instruction
+.B
+cai
+.R
+which generates a 'jsr Indir'.
+This is because there is no indirect jump_subroutine in the
+MCS6500.
+The only solution is to store the address in zero page, and then
+do a 'jsr' to a known label.
+At this label there must be an indirect jump instruction, which
+perform a jump to the address stored in zero page.
+In this case the label is Indir, and the address is stored in
+zero page at the addresses ADDR, ADDR+1.
+The tabel content is:
+.sp 1
+.br
+.B
+cai
+.R
+| R16 |
+.br
+"stx ADDR"	/* store lowbyte of address in zero page */
+.br
+"sta ADDR+1"	/* store highbyte of address in zero page */
+.br
+"jsr Indir"	/* use the indirect jump */
+.br
+| | |
+.NH 2
+Miscellaneous instructions.
+.PP
+In this group, as the name suggests, there is no
+typical EM instruction or EM pattern.
+Most of the MCS6500 code to be generated uses a library subroutine
+or is straightforward.
+.DS C
+.B
+PERFORMANCE.
+.R
+.DE
+.NH 0
+Introduction.
+.PP
+To measure the performance of the back end table some timing
+tests are done.
+What to time?
+In this case, the execution time of several Pascal statements
+are timed.
+Statements in C, which have a Pascal equivalence are timed also.
+The statements are timed as follows.
+A test program is been written, which executes two
+nested  for_loops from 1 to 1.000.
+Within these for_loops the statement, which is to be tested, is placed,
+so the statement will be executed 1.000.000 times.
+Then the same program is executed without the test statement.
+The time difference between the two executions is the time
+neccesairy to execute the test statement 1.000.000 times.
+The total time to execute the test statement requires thus the
+time difference divided by 1.000.000.
+.NH 0
+Testing Pascal statements.
+.PP
+The next statements are tested.
+.IP 1)
+int1 := 0;
+.IP 2)
+int1 := int2 - 1;
+.IP 3)
+int1 := int1 + 1;
+.IP 4)
+int1 := icon1 - icon2;
+.IP 5)
+int1 := icon2 div icon1;
+.IP 6)
+int1 := int2 * int3;
+.IP 7)
+bool := (int1 < 0);
+.IP 8)
+bool := (int1 < 3);
+.IP 9)
+bool := ((int1 > 3) or (int1 < 3))
+.IP 10)
+case int1 of 1: bool := false; 2: bool := true end;
+.IP 11)
+if int1 = 0 then int2 := 3;
+.IP 12)
+while int1 > 0 do int1 := int1 - 1;
+.IP 13)
+m := a[k];
+.IP 14)
+let2 := ['a'..'c'];
+.IP 15)
+P3(x);
+.IP 16)
+dum := F3(x);
+.IP 17)
+s.overhead := 5400;
+.IP 18)
+with s do overhead := 5400;
+.PP
+These statement were tested in a procedure test.
+.sp 1
+.br
+procedure test;
+.br
+var i, j, ... : integer;
+.br
+    bool : boolean;
+.br
+    let2 : set of char;
+.br
+begin
+.br
+    for i := 1 to 1000
+.br
+	for j := 1 to 1000
+.br
+	    STATEMENT
+.br
+end;
+.sp 1
+.PP
+STATEMENT is one of the statements as shown above, or it is
+the empty statement.
+The assignment of used variables, if neccesairy, is done before
+the first for_loop.
+In case of the statement which uses the procedure call, statement
+15, a dummy procedure is declared whose body is empty.
+In case of the statement which uses the function, statement 16,
+this function returns its argument.
+for the timing of C statements a similar test program was
+written.
+.sp 1
+.br
+main()
+.br
+{
+.br
+    int i, j, ...;
+.br
+    for (i = 1; i <= 1000; i++)
+.br
+	for (j = 1; j <= 1000; j++)
+.br
+	    STATEMENT
+.br
+}
+.sp 1
+.NH
+The results.
+.PP
+Here are tables with the results of the time measurments.
+Times are in microseconds (10^-6).
+Some statements appear twice in the tables.
+In the second case an array of 200 integers was declerated
+before the variable to be tested, so this variable cannot
+be accessed by indirect addressing from the second local base.
+This results in a larger execution time of the statement to be
+tested.
+The column 68000 contains the times measured on a Bleasdale,
+M68000 based, computer.
+The times in column pdp are measured on a DEC pdp11/44, where
+the times from column 6500 come from a BBC microcomputer.
+.bp
+.TS
+expand;
+c s s s
+c c c c
+lw35 nw7 nw7 nw7.
+Pascal timing results
+statement	68000	pdp	6500
+_
+T{
+int1 := 0;
+T}	4.0	5.8	16.7
+ 	4.0	4.2	97.8
+_
+T{
+int1 := int2 - 1;
+T}	7.2	7.1	27.2
+ 	6.9	7.1	206.5
+_
+T{
+int1 := int1 + 1;
+T}	6.9	6.8	27.2
+ 	6.4	6.7	106.5
+_
+T{
+int1 := icon1 + icon2;
+T}	6.2	6.2	25.6
+ 	6.2	6.0	106.6
+_
+T{
+int1 := icon2 div icon1;
+T}	14.9	14.3	372.6
+ 	14.9	14.7	453.7
+_
+T{
+int1 := int2 * int3;
+T}	11.5	12.0	558.1
+ 	11.3	11.6	728.6
+_
+T{
+bool := (int1 < 0);
+T}	7.2	6.9	122.8
+ 	7.8	8.1	453.2
+_
+T{
+bool := (int1 < 3);
+T}	7.3	7.6	126.0
+ 	7.2	8.1	232.2
+_
+T{
+bool := ((int1 > 3) or (int1 < 3))
+T}	10.1	12.0	307.8
+ 	10.2	11.9	440.1
+_
+T{
+case int1 of 1: bool := false; 2: bool := true end;
+T}	18.3	17.9	165.7
+_
+T{
+if int1 = 0 then int2 := 3;
+T}	9.5	8.5	133.8
+_
+T{
+while int1 > 0 do int1 := int1 - 1;
+T}	6.9	6.9	126.0
+_
+T{
+m := a[k];
+T}	7.2	6.8	134.3
+_
+T{
+let2 := ['a'..'c'];
+T}	38.4	38.8	447.4
+_
+T{
+P3(x);
+T}	18.9	18.8	180.3
+_
+T{
+dum := F3(x);
+T}	26.8	27.1	343.3
+_
+T{
+s.overhead := 5400;
+T}	4.6	4.1	16.7
+_
+T{
+with s do overhead := 5400;
+T}	4.2	4.3	16.7
+.TE
+.TS
+expand;
+c s s s
+c c c c
+lw35 nw7 nw7 nw7.
+C timing results
+statement	68000time	pdptime	6500time
+_
+T{
+int1 = 0;
+T}	4.1	3.6	17.2
+ 	4.1	4.1	97.7
+_
+T{
+int1 = int2 - 1;
+T}	6.6	6.9	27.2
+ 	6.1	6.5	206.4
+_
+T{
+int1 = int1 + 1;
+T}	6.4	7.3	27.2
+ 	6.3	6.2	206.4
+_
+T{
+int1 = int2 * int3;
+T}	11.4	12.3	522.6
+	9.6	10.1	721.2
+_
+T{
+int1 = (int2 < 0);
+T}	7.2	7.6	126.4
+ 	7.4	7.7	232.5
+_
+T{
+int1 = (int2 < 3);
+T}	7.0	7.5	126.0
+ 	7.8	7.8	232.6
+_
+T{
+int1 = ((int2 > 3) || (int2 < 3));
+T}	11.8	12.2	193.4
+ 	11.5	13.2	245.6
+_
+T{
+switch (int1) { case 1: int1 = 0; break; case 2: int1 = 1; break; }
+T}	28.3	29.2	164.1
+_
+T{
+if (int1 == 0) int2 = 3;
+T}	4.8	4.8	19.4
+_
+T{
+while (int2 > 0) int2 = int2 - 1;
+T}	5.8	6.0	125.9
+_
+T{
+int2 = a[int2];
+T}	4.8	5.1	192.8
+_
+T{
+P3(int2);
+T}	18.8	18.4	180.3
+_
+T{
+int2 = F3(int2);
+T}	27.0	27.2	309.4
+_
+T{
+s.overhead = 5400;
+T}	5.0	4.1	16.7
+.TE
+.NH
+Pascal statements which don't have a C equivalent.
+.PP
+At first, the two statements who perform an operation on constants
+are left out.
+These are left out while the C front end does constant folding,
+while the Pascal front end doesn't.
+So in C the statements int1 = icon1 + icon2; and int1 = icon1 / icont2;
+will use the same amount of time since the expression is evaluated
+by the front end.
+The two other statements (let2 := ['a'..'c']; and
+.B
+with
+.R
+s
+.B
+do
+.R
+overhead := 5400;), aren't included in the C statement timing table,
+because there constructs do not exist in C.
+Although in C there can be direct bit manipulation, and thus can
+be used to implement sets I have not used it here.
+The
+.B
+with
+.R
+statement does not exists in C and there is nothing with the slightest
+resemblance to it.
+.PP
+At first sight in the table , it looked if there is no much difference
+in the times for the M68000 and the pdp11/44, in comparison with the
+times needed by the MCS6500.
+To verify this impression, I calculated the correlation coefficient
+between the times of the M68000 and pdp11/44.
+It turned out to be 0.997 for both the Pascal time tests and the C
+time tests.
+Since the correlation coefficient is near to one and the difference
+between the times is small, they can be considered to be the same
+as seen from the times of the MCS6500.
+Then I have tried to make a grafic of the times from the M68000 and
+the MCS6500.
+Well, there was't any correlation to been seen, taken all the times.
+The only correlation one could see, with some effort, was in the
+times for the first three Pascal statements.
+The two first C statements show also a correlation, which two points
+always do.
+.PP
+Also the three Pascal statements
+.B
+case
+.R
+,
+.B
+if
+.R
+,
+and
+.B
+while
+.R
+have a correlation coefficient of 0.999.
+This is probably because the
+.B
+case
+.R
+statement uses a subroutine in both cases and the other two
+statements
+.B
+if
+.R
+and,
+.B
+while
+.R
+generate in line code.
+The last two Pascal statements use the same time, since the front
+end wil generate the same EM code for both.
+.PP
+The independence between the rest of the test times is because
+in these cases the object code for the MCS6500 uses library
+subroutines, while the other processors can handle the EM code
+with in line code.
+.PP
+It is clear that the MCS6500 is a slower device, it needs longer
+execution times, the need of more library subroutines, but
+there is no constant factor between it execution times and those
+of other processors.
+.PP
+The slowing down of the MCS6500 as result of the need of a
+library subroutine is illustrated by the muliplication
+statement.
+The MCS6500 needs a library subroutine, while the other
+two processors have a machine instruction to perform the
+multiply.
+This results in a factor of 48.5, when the operands can be accessed
+indirect by the MCS6500.
+When the MCS6500 cannot access the operands indirectly the situation
+is even worse.
+The slight differences between the MCS6500 execution times for
+Pascal statements and C statements is probably the result of the
+front end, and thus beyond the scope of this discussion.
+.PP
+Another timing test is done in C on the statement k = i + j + 1983.
+This statement is tested on many UNIX*
+.FS
+* UNIX is a Trademark of Bell Laboratories.
+.FE
+systems.
+For a complete list see appendix A.
+The slowest one is the IBM XT, which runs on a 8088 microprocessor.
+The fasted one is the Amdahl computer.
+Here is short table to illustrate the performance of the
+MCS6500.
+.TS
+c c c
+c n n.
+machine	short	int
+IBM XT	53.4	53.4
+Amdahl	0.5	0.3
+MCS6500	150.2	150.2
+.TE
+The MCS6500 is three times slower than the IBM XT, but threehundred
+times slower than the Amdahl.
+The reason why the times on the IBM XT and the MCS6500 are the
+same for short's and int's, is that most C compilers make the types
+short and integer the same size on 16-bit machines.
+In this project the MCS6500 is regarded as a 16-bit machine.
+.NH
+Length tests.
+.PP
+I have also compiled several programs written in Pascal and C to
+see if there is a resemblance between the number of bytes generated
+in the machine's language.
+In the tables:
+.IP length: 9
+The number of bytes of the source program.
+.IP 68000:
+The number of bytes of the a.out file for a M68000.
+.IP pdp:
+The number of bytes of the a.out file for a pdp11/44.
+.IP 6500:
+The number of bytes of the a.out file for a MCS6500.
+.LP
+These are the results:
+.TS
+c s s s
+c c c c
+n n n n.
+Pascal programs
+length	68000	pdp	6500
+_
+19946	14383	16090	26710
+19484	20169	20190	35416
+10849	10469	11464	18949
+273	4221	5106	7944
+1854	5807	6610	10301
+.TE
+.TS
+c s s s
+c c c c
+n n n n.
+C progams
+length	68000	pdp	6500
+_
+9444	6927	8234	11559
+7655	14353	18240	26251
+4775	11309	15934	19910
+639	6337	9660	12494
+.TE
+.PP
+In contrast to the execution times of the test statements, the
+object code files sizes show a constant factor between them.
+After calculating the correlation coefficient, I have calculated
+the line fitted between sizes.
+.FS
+* x is the number of bytes
+.FE
+.TS
+c s s
+c c c
+l c c.
+Pascal programs
+processor	corr. coef.	fitted line
+_
+68000-pdp	0.996	 
+68000-6500	0.999	1.76x + 502*
+pdp-6500	0.999	1.80x - 1577
+.TE
+.TS
+c s s
+c c c
+l c c.
+C programs
+processor	corr. coef.	fitted line
+_
+68000-pdp	0.974	 
+68000-6500	0.992	1.80x + 502*
+pdp-6500	0.980	1.40x - 1577
+.TE
+.PP
+As seen from the tables above the correlation coefficient for
+Pascal programs is better than the ones for C programs.
+Thus the line fits best for Pascal programs.
+With the formula of the best fitted line one can now estimate
+the size of the object code, which a program needs, for a MCS6500
+without having the compiler at hand.
+One also can see from these formula that the object code
+generated for a MCS6500 is about 1.8 times more than for the other
+processors.
+Since the number of bytes in the source file havily depends on the
+programmer, how many spaces he or she uses, the size of the indenting
+in structured programs, etc., there is no correlation between the
+size of the source file and the size of the object file.
+Also the use of comments has its influence on the size.
+.bp
+.DS C
+.B
+SUMMARY.
+.R
+.DE
+.NH 0
+Summary
+.PP
+In this chapter some final conclusions are made.
+.PP
+In spite of its simplicity, the MCS6500 is strong enough to
+implement a EM machine.
+A serious deficy of the MCS6500 is the missing of 16-bit
+general purpose registers, and especially the missing of a
+16-bit stackpointer.
+As pointed out before, one 16-bit register can be simulated
+by a pair of 8-bit registers, in fact, the accumulator A to
+hold the highbyte, and the index register X to hold the lowbyte
+of the word.
+By lack of a 16-bit stackpointer, zero page must be used to hold
+a stackpointer and there are also two subroutines needed for
+manipulating the stack (Push and Pop).
+.PP
+As seen at the time tests, the simple instruction set of the
+MCS6500 forces the use of library subroutines.
+These library subroutines increas the execution time of the
+programs.
+.PP
+The sizes of the object code files show a strong correlation
+in contrast to the execution times.
+With this correlatiuon one canestimate the size of a program
+if it is to be used on a MCS6500.
+.bp
+.NH 0
+.B
+REFERENCES.
+.R
+.IP 1.
+Haddon. B.K., and Waite, W.M.
+Experience with the Universal Intermediate Language Janus.
+.B
+Software Practice & Experience 8
+.R
+,
+5 (Sept.-Oct. 1978), 601-616.
+.RS
+.PP
+An intermediate language for use with Algol 68, Pascal, etc.
+is described.
+The paper discusses some problems encountered and how they were
+dealt with.
+.RE
+.IP 2.
+Lowry, E.S., and Medlock, C.W. Object Code Optimization.
+.B
+Commun. ACM 12
+.R
+,
+(Jan. 1969), 13-22.
+.RS
+.PP
+A classical paper on global object code optimization.
+It covers data flow analysis, common subexpressions, code motion,
+register allocation and other techniques.
+.RE
+.IP 3.
+Osborn, A., Jacobson, S., and Kane, J. The Mos Technology MCS6500.
+.B
+An Introduction to Microcomputers ,
+.R
+Volume II, Some Real Products (june 1977) chap. 9.
+.RS
+.PP
+A hardware description of some real existing CPU's, such as
+the Intel Z80, MCS6500, etc. is given in this book.
+.RE
+.IP 4.
+van Staveren, H.
+The table driven code generator from the Amsterdam Compiler Kit.
+Vrije Universiteit, Amsterdam, (July 11, 1983).
+.RS
+.PP
+The defining document for writing a back end table.
+.RE
+.IP 5.
+Steel, T.B., Jr. UNCOL: The Myth and the Fact. in
+.B
+Ann. Rev. Auto. Prog.
+.R
+Goodman, R. (ed.), vol 2., (1960), 325-344.
+.RS
+.PP
+An introduction to the UNCOL idea by its originator.
+.RE
+.IP 6.
+Steel. T.B., Jr. A first Version of UNCOL.
+.B
+Proc. Western Joint Comp. Conf.
+.R
+,
+(1961), 371-377.
+.IP 7.
+Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren,
+H.
+A Practical Tool Kit for Making Portable Compilers.
+Informatica Rapport 74, Vrije Universiteit, Amsterdam, 1983.
+.RS
+.PP
+An overview on the Amsterdam Compiler Kit.
+.RE
+.IP 8.
+Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren,
+H.
+Description of an Experimental Machine Architecture for use with
+Block Structured Languages.
+Informatica Rapport 81, Vrije Universiteit, Amsterdam, 1983.
+.RS
+.PP
+The defining document for EM.
+.RE
+.IP 9.
+Tanenbaum, A.S. Structured Computer Organization.
+Prentice Hall. (1976).
+.RS
+.PP
+In this book computers are described as a hierarchy of levels,
+with each one performing some well-defined function.
+.RE