From: keie Date: Mon, 17 Dec 1984 11:35:39 +0000 (+0000) Subject: *** empty log message *** X-Git-Tag: release-5-5~5856 X-Git-Url: https://git.ndcode.org/public/gitweb.cgi?a=commitdiff_plain;h=c69304401368deff4f0500a36d46c20d1f1ef1f0;p=ack.git *** empty log message *** --- diff --git a/doc/6500.doc b/doc/6500.doc new file mode 100644 index 000000000..aeef24a9a --- /dev/null +++ b/doc/6500.doc @@ -0,0 +1,2163 @@ +. \" $Header$" +.po +10 +.ND +.TL +.B +A backend table for the 6500 microprocessor +.R +.AU +Jan van Dalen +.AB +The backend table is part of the Amsterdam Compiler Kit (ACK). +It translates the intermediate language family EM to a machine +code for the MCS6500 microprocessor family. +.AE +.PP +.bp +.NH +Introduction. +.PP +As more and more organizations aquire many micro and minicomputers, +the need for portable compilers is becoming more and more acute. +The present situation, in which each harware vendor provides its +own compilers -- each with its own deficiencies and extensions, and +none of them compatible -- leaves much to be desired. +The ideal situation would be an integrated system containing +a family of (cross) compilers, each compiler accepting a standard +source language and, producing code for a wide variety of target +machines. Furthermore, the compilers should be compatible, so programs +written in one language can call procedures written in another +language. Finally, the system should be designed so as to make +adding new languages and, new machines easy. Such an integerated +system is being built at the Vrije Universiteit. +.PP +The compiler building system, which is called the "Amsterdam Compiler +Kit" (ACK), can be thought of as a "tool kit." It consists of +a number of parts that can be combined to form compilers (and +interpreters) with various properties. The tool kit is based +on an idea (UNCOL) that was first suggested in 1960 [5], +but which never really caught on then. The problem which UNCOL +attemps to solve is how to make a compiler for each of +.B +N +.R +languages on +.B +M +.R +different machines without having to write +.B +N +.R +x +.B +M +.R +programs. +.PP +As shown in Fig. 1, the UNCOL approach is to write +.B +N +.R +"front ends," each of which translates +one source language to a common +intermediate language, UNCOL (UNiversal Computer Oriented +Language), and +.B +M +.R +"back ends," each of which translates programs +in UNCOL to a specific machine language. Under these conditions, +only +.B +N +.R ++ +.B +M +.R +programs must be written to provide all +.B +N +.R +languages on all +.B +M +.R +machines, instead of +.B +N +.R +x +.B +M +.R +programs. +.PP +Various reseachers have attempted to design a suitable UNCOL [1,6], +but none of these have become popular. It is the believe of the +designers of the Amsterdam Compiler Kit that previous attemps +have failed because they have been too ambitious, that is, they have +tried to cover all languages and all machines using a single UNCOL. +The approach of the designers is more modest: +they cater only to algebraic languages and machines whose memory +consist of 8-bit bytes, each with its own address. +Typical languages that could be handled include Ada, ALGOL 60, +ALGOL 68, BASIC, C, FORTRAN, Modula, Pascal, PL/I, PL/M, PLAIN and +RATFOR, where COBOL, LISP and SNOBOL would be less efficient. +Examples of machines that could be included are the Intel 8080 and +8086, Motorola 6800, 6809 and 68000, Zilog Z80 and Z8000, DEC PDP-11 +and Vax, MOS Technology MCS6500 family and IBM but not the Burroughs +6700, CDC Cyber or Univac 1108 (because they are not byte_oriented). +With these restrictions the designers believe that the old UNCOL +idea can be used as the basis of a practical compiler-building +system. +.sp 10 +.bp +.NH +An overview of the Amsterdam Compiler kit +.PP +The tool kit consists of eight components: +.IP 1. +The preprocessor. +.IP 2. +The front ends. +.IP 3. +The peephole optimizer. +.IP 4. +The global optimizer. +.IP 5. +The back end. +.IP 6. +The target machine optimizer. +.IP 7. +The universal assembler/linker. +.IP 8. +The utility package. +.PP +A fully optimizing compiler, depicted in Fig. 2, has seven cascaded +phases. Conceptually, each component reads an input file and writes +a transformed output file to be used as input to the next component. +In practice, some components may use temporary files to allow +multiple passes over the input or internal intermediate files. +.sp 20 +.PP +In the following paragraphs a brief decription of each component +is given. +A more detailed description of the back end will be given in the +rest of this document. For a more detailed descripiton on the rest +of the components see [7]. A program to be compiled is first fed +into the (language independed) preprocessor, which provides a +simple macro facility and similar textual facilities. +The preprocessor's ouput is a legal program in one of the programming +languages supported, whereas the input is a program possibly +augmented with macro's, etc. +.PP +This output goes into the appropriate front end, whose job it is to +produce intermediate cade. +This intermediate code (the UNCOL of ACK) is the machine language +for a simple stack machine EM (Encoding Machine). +A typical front end might build a parse tree from the input +and then use the parse tree to generate EM cade, +which is similar to reverse Polish. +In order to perform this work, the front end has to maintain tables of declare +tables of declared variables, labels, etc., determine where +to place the data structures in memory and so on. +.PP +The EM code generated by the front end is fed into the peephole +optimizer, which scans it with a window of a view instructions, +replacing certain inefficient code sequences by better ones. +Such a search is important because EM contains instructions to +handle numerous important special cases efficiently +(e.g. incrementing a variable by 1). +It is our strategy to relieve the front ends of the burden +of hunting for special cases because there are many front ends +and just one peephole optimizer. +By handeling the special cases in the peephole optimizer, +the front ends become simpler, easier to write and easier to maintain. +.PP +Following the peephole optimizer is a global optimizer [2], +which unlike the peephole optimizer, examines the program as a whole. +It builts a data flow graph to make possible a variety of global +optimizations, among them, moving invariant code out of loops, +avoiding redundant computations, live/dead analysis and +eliminating tail recursion. +Note that the output of the global optimizer is still EM code. +.PP +Next comes the back end, which differs from the front ends in a +fundamental way. +Each front end is a separate program, whereas the back end is a +single program that is driven by a machine dependent driving table. +The driving table for a specific machine tells how EM code is +mapped onto the machine's assembly language. +Although a simple driving table just might macro expand each +EM instruction into a sequence of target machine instructions, +a much more sophisticated translation strategy is normaly used, +as described later. +For speech, the back end does not actually read in the driving +table at run time. +Instead, the tables are compiled along with the back end in advance, +resulting in one binairy program per machine. +.PP +The output of the back end is a program in the assembly language +of some particular machine. +The next component in the pipeline reads this program and performs +peephole optimization on it. +The optimizations performed here involve idiosyncrasies of the +target machine that cannot be performed by the machine-independent +EM-to-EM peephole optimizer. +Typically these optimizations take advantage of the special +instructions or special addressing modes. +.PP +The optimized target machine assembly code then goes into the final +component in the pipeline, the universal assembler/linker. +This program assembles the input to object format, extracting +routines from libraries and including them as needed. +.PP +The final component of the tool kit is the utility package, +which contains various test programs, interpreters for EM code, +EM libraries, conversion programs and other aids for the +implementer and user. +.bp +.DS C +.B +THE MCS6500 MICROPROCESSOR. +.R +.DE +.NH 0 +Introduction +.PP +Why a back end table for the MCS6500 microprocessor family. +Although the MCS6500 microprocessor family has an simple +instruction set and internal structure, it is used in a +variety of microcomputers and homecomputers. +This is because of is low cost. +As an example the Apple II, a well known and width spread +microprocessor, uses the MCS6502 CPU. +Also the BBC homecomputer, whose popularity is growing day +by day uses the MCS6502 CPU. +The BBC homecomputer is based on the MCS6502 CPU although +better and stronger microprocessors are available. +The designers of Acorn computer Industries have probably +choosen for the MCS6502 because of the amount of software +available for this CPU. +Since its width spreaded use, a variaty of software +will be needed for it. +One can think of games!!, administration programs, +teaching programs, basic interpreters and other application +programs. +Even do it will not be possible to run the total compiler kit +on a MCS6500 based computer, it is possible to write application +programs in a high level language, such as Pascal or C on a +minicomputer. +These application programs can be tested and compiled on that +minicomputer and put in a ROM (Read Only Memory), for example, +cso that it an be executed by a MCS6500 CPU. +The strategy of writing testprograms on a minicomputer, +compile it and then execute it on a MCS6500 based +microprocessor is used by the development of the back end. +The minicomputer used is M68000 based one, manufactured by +Bleasdale Computer Systems Ltd.. +The micro- or homecomputer used is a BBC microcomputer, +manufactured by Acorn Computer Ltd.. +.NH +The MOS Technology MCS6500 +.PP +The MCS6500 is as a family of CPU devices developed by MOS +Technology. +The members of the MCS6500 family are the same chips in a +different housing. +The MCS6502, the big brother in the family, can handle 64k +bytes of memory, while for example the MCS6504 can only handle +8k bytes of memory. +This difference is due to the fact that the MCS6502 is in a +40 pins house and the MCS6504 has a 28 pins house, so less +address lines are available. +.bp +.NH +The MCS6500 CPU programmable registers +.PP +The MCS6500 series is based on the same chip so all have the +same programmable registers. +.sp 9 +.NH 2 +The accumulator A. +.PP +The accumulator A is the only register on which the arithmetic +and logical instructions can be used. +For example, the instruction ADC (add with carry) adds the +contents of the accumulator A and a byte from memory or data. +.NH 2 +The index register X. +.PP +As the name suggests this register can be used for some +indirect addressing modes. +The modes are explaned below. +.NH 2 +The index register Y. +.PP +This register is, just as the index register X, used for +certain indirect addressing modes. +These addressing modes are different from the modes which +use index register X. +.NH 2 +The program counter PC +.PP +This is the only 16-bit register available. +It is used to point to the next instruction to be +carried out. +.NH 2 +The stack pointer SP +.PP +The stack pointer is an 8-bit register, so the stack can contain +at most 256 bytes. +The CPU always appends 00000001 as highbyte of any stack address, +which means that memory locations +.B +0100 +.R +through +.B +01FF +.R +are permanently assigned to the stack. +.sp 12 +.NH 2 +The status register +.PP +The status register maintains six status flags and a master +interrupt control bit. +.br +These are the six status flags: + Carry (c) + Zero (z) + Overflow (o) + Sign (n) + Decimal mode (d) + Break (b) + + + + + +The bit (i) is the master interrupt control bit. +.NH +The MCS6500 memory layout. +.PP +In the MCS6500 memory space three area's have special meaning. +These area's are: +.IP 1) +Top page. +.IP 2) +Zero page. +.IP 3) +The stack. +.PP +MCS6500 memory is divided up into pages. +These pages consist 256 bytes. +So in a memory address the highbyte denotes the page number +and the lowbyte the offset within the page. +.NH 2 +Top page. +.PP +When a MCS6500 is restared it jumps indirect via memory address +.B +FFFC. +.R +At +.B +FFFC +.R +(lowbyte) and +.B +FFFD +.R +(highbyte) there must be the address of the bootstrap subroutine. +When a break instruction (BRK) occurs or an interrupt takes place, +the MCS6500 jumps indirect through memory address +.B +FFFE. +.R +.B +FFFE +.R +and +.B +FFFF +.R +thus, must contain the address of the interrupt routine. +The former only goes for maskeble interrupt. +There also exist a nonmaskeble interrupt. +This cause the MCS6500 to jump indirect through memory address +.B +FFFA. +.R +So the top six bytes of memory are used by the operating system +and therefore not available for the back end. +.NH 2 +Zero page. +.PP +This page has a special meaning in the sence that addressing +this page uses special opcodes. +Since a page consists of 256 bytes, only one byte is needed +for addressing zero page. +So an instruction which uses zero page occupies two bytes. +It also uses less clock cycle's while carrying out the instruction. +Zero page is also needed when indirect addressing is used. +This means that when indirect addressing is used, the address must +reside in zero page (two consecutive bytes). +In this case (the back end), zero page is used, for example +to hold the local base, the second local base, the stack pointer +etc. +.NH 2 +The stack. +.PP +The stack is described in paragraph 3.5 about the MCS6500 +programmable registers. +.NH +The memory adressing modes +.PP +MCS6500 memory reference instructions use direct addressing, +indexed addressing, and indirect addressing. +.NH 2 +direct addressing. +.PP +Three-byte instructions use the second and third bytes of the +object code to provide a direct 16-bit address: +therefore, 65.536 bytes of memory can be addressed directly. +The commonly used memory reference instructions also have a two-byte +object code variation, where the second byte directly addresses +one of the first 256 bytes. +.NH 2 +Base page, indexed addressing. +.PP +In this case, the instruction has two bytes of object code. +The contents of either the X or Y index registers are added to the +second object code byte in order to compute a memory address. +This may be illustrated as follows: +.sp 15 +Base page, indexed addressing, as illustrated above, is +wraparound - which means that there is no carry. +If the sum of the index register and second object code byte contents +is more than +.B +FF +.R +, the carry bit will be dicarded. +This may be illustrated as follows: +.sp 9 +.NH 2 +Absolute indexed addressing. +.PP +In this case, the contents of either the X or Y register are added +to a 16-bit direct address provided by the second and third bytes +of an instruction's object code. +This may be illustrated as follows: +.sp 10 +.NH 2 +Indirect addressing. +.PP +Instructions that use simple indirect addressing have three bytes of +object code. +The second and third object code bytes provide a 16-bit address; +therefore, the indirect address can be located anywhere in +memory. +This is straightforward indirect addressing. +.NH 3 +Pre-indexed indirect addressing. +.PP +In this case, the object code consists of two bytes and the +second object code byte provides an 8-bit address. +Instructions that use pre-indexed indirect addressing add the contents +of the X index register and the second object code byte to access +a memory location in the first 256 bytes of memory, where the +indirect address will be found: +.sp 18 +When using pre-indexed indirect addressing, once again wraparound +addition is used, which means that when the X index register contents +are added to the second object code byte, any carry will be discarded. +Note that only the X index register can be used with pre-indexed +addressing. +.NH 3 +Post-indexed indirect addressing. +.PP +In this case, the object code consists of two bytes and the +second object code byte provides an 8-bit address. +Now the second object code byte indentifies a location +in the first 256 bytes of memory where an indirect address +will be found. +The contents of the Y index register are added to this indirect +address. +This may be illustrated as follows: +.sp 18 +Note that only the Y index register can be used with post-indexed +indirect addressing. +.bp +.NH +What the CPU has and doesn't has. +.PP +Although the designers of the MCS6500 CPUs family state that +there is nothing very significant about the short stack (only +256 bytes) this stack caused problems for the back end. +The designers say that a 256-byte stack usually is sufficient +for any typical microcomputer, this is only true if the stack +is used only for return addresses of the JSR (jump to +subroutine) instruction. +But since the EM machine is suppost to be a stack machine and +high level languages need the ability of parameters and +locals in there procedures and function, this short stack +is unsufficiant. +So an software stack is implemented in this back end, requiring two +additional subroutines for stack handling. +These two stack handling subroutines slow down the processing time +of a program since the stack is used heavely. +.PP +Since parameters and locals of EM procedures are offseted +from the localbase of that procedure, indirect addressing +is havily used. +Offsets are positive (for parameters) and negative (for +local variables). +As explaned before the addressing modes the MCS6500 have a +post indexed indirect addressing mode. +This addressing mode can only handle positive offsets. +This raises a problem for accessing the local variables +I have chosen for the next solution. +A second local base is introduced. +This second local base is the real local base subtracted by +a constant BASE. +In the present situation of the back end the value of BASE +is 240. +This means that there are 240 bytes reseved for local +variables to be indirect addressed and 14 bytes for +the parameters. +.DS C +.B +THE CODE GENERATOR. +.R +.DE +.NH 0 +Description of the machine table. +.PP +The machine description table consists of the following sections: +.IP 1. +The macro definitions. +.IP 2. +Constant definitions. +.IP 3. +Register definitions. +.IP 4. +Token definitions. +.IP 5. +Token expressions. +.IP 6. +Code rules. +.IP 7. +Move definitions. +.IP 8. +Test definitions. +.IP 9. +Stack definitions. +.NH 2 +Macro definitions. +.PP +The macro definitions at the top of the table are expanded +by the preprocessor on occurence in the rest of the table. +.NH 2 +Constant definitions. +.PP +There are three constants which must be defined at first. +The are: +.IP EM_WSIZE: 11 +Number of bytes in a machine word. +This is the number of bytes a simple +.B +loc +.R +instruction will put on the stack. +.IP EM_PSIZE: +Number of bytes in a pointer. +This is the number of bytes a +.B +lal +.R +instruction will put on the stack. +.IP EM_BSIZE: +Number of bytes in the hole between AB and LB. +The calling sequence only saves LB on the stack so this +constant is equal to the pointer size. +.NH 1 +Register definitions. +.PP +The only important register definition is the definition of +the registerpair AX. +Since the rest of the machine's registers Y, PC, ST serve +special purposes, the code generator cannot use them. +.NH 2 +Token definitions +.PP +There is a fake token. +This token is put in the table, since the code generator generator +complains if it cannot find one. +.NH 2 +Token expression definitions. +.PP +The token expression is also a fake one. +This token expression is put in the table, since the code generator +generator complains if it cannot find one. +.NH 2 +Code rules. +.PP +The code rule section is the largest section in the table. +They specify EM patterns, stack patterns, code to be generated, +etc. +The syntax is: +.IP code rule: +EM pattern '|' stack pattern '|' code '|' +stack replacement '|' EM replacement '|' +.PP +All patterns are optional, however there must be at least one +pattern present. +If the EM pattern is missing the rule becomes a rewriting +rule or a +.B +coercion +.R +to be used when code generation cannot continue because of an +invalid stack pattern. +The code rules are preceeded by the word CODE:. +.NH 3 +The EM pattern. +.PP +The EM pattern consists of a list of EM mnemonics followed by +a boolean expression. Examples: +.sp 1 +.br +.B +loe +.R +.sp 1 +will match a single +.B +loe +.R +instruction, +.sp 1 +.br +.B +loc loc cif +.R +$1==2 && $2==8 +.sp 1 +is a pattern that will match +.sp 1 +.br +.B +loc +.R +2 +.br +.B +loc +.R +8 +.br +.B +cif +.R +.sp 1 +and +.sp 1 +.br +.B +lol +inc +stl +.R +$1==$3 +.sp 1 +will match for example +.sp 1 +.br +.B +lol +.R +6 +.br +.B +inc +.R +.br +.B +stl +.R +6 +.sp 1 +A missing boolean expession evaluates to TRUE. +.PP +The code generator will match the longest EM pattern on every occasion, +if two patterns of the same length match the first in the table +will be chosen, while all patterns of length greater than or equal +to three are considered to be of the same length. +.NH 3 +The stack pattern. +.PP +The only stack pattern that can occur is R16, which means that the +registerpair AX contains the word on top of the stack. +If this is not the case a coersion occurs. +This coersion generates a "jsr Pop", which means that the top +of the stack is popped and stored in the registerpair AX. +.NH 3 +The code part. +.PP +The code part consists of three parts, stack cleanup, register +allocation, and code to be generated. +All of these may be omitted. +.NH 4 +Stack cleanup. +.PP +When generating something like a branch instruction it might be +needed to empty the fake stack, that is, remove the AX registerpair. +This is done by the instruction remove(ALL) +.NH 4 +Register allocation. +.PP +If the machine code to be generated uses the registerpair AX, +this is signaled to the code generator by the allocate(R16) +instruction. +If the registerpair AX resides on the fake stack, this will result +in a "jsr Push", which means that the registerpair AX is pushed on +the stack and will be free for further use. +If registerpair AX is not on the fake stack nothing happens. +.NH 4 +Code to be generated. +.PP +Code to be generated is specified as a list of items of the following +kind: +.IP 1) +A string in double quotes("This is a string"). +This is copied to the codefile and a newline ('\n') is appended. +Inside the string all normal C string conventions are allowed, +and substitutions can be made of the following sorts. +.RS +.IP a) +$1, $2 etc. These are the operand of the corresponding EM +instructions and are printed according to there type. +To put a real '$' inside the string it must be doubled ('$$'). +.IP b) +%[1], %[2.reg], %[b.1] etc. these have there obvious meaning. +If they describe a complete token (%[1]) the printformat for +the token is used. +If they stand fo a basic term in an expression they will be +printed according to their type. +To put a real '%' inside the string it must be doubled ('%%'). +.IP c) +%( arbitrary expression %). This allows inclusion of arbitrary +expressions inside strings. +Usually not needed very often, so that the akward notation +is not too bad. +Note that %(%[1]%) is equivalent to %[1]. +.RE +.NH 3 +stack replacement. +.PP +The stack replacement is a possibly empty list of items to be +pushed on the fake stack. +Three things can occur: +.IP 1) +%[1] is used if the registerpair AX was on the fake stack and is +to be pushed back onto it. +.IP 2) +%[a] is used if the registerpair AX is allocated with allocate(R16) +and is to be pushed onto the fake stack. +.IP 3) +It can also be empty. +.NH 3 +EM replacement. +.PP +In exeptional cases it might be useful to leave part of the an EM +pattern undone. +For example, a +.B +sdl +.R +instruction might be split into two +.B +stl +.R +instructions when there is no 4-byte quantity on the stack. +The EM replacement part allows one to express this. +Example: +.sp 1 +.br +.B +stl +.R +$1 +.B +stl +.R +$1+2 +.sp 1 +The instructions are inserted in the stream so they can match +the first part of a pattern in the next step. +Note that since the code generator traverses the EM instructions +in a strict linear fashion, it is impossible to let the EM +replacement match later parts of a pattern. +So if there is a pattern +.sp 1 +.br +.B +loc +stl +.R +$1==0 +.sp1 +and the input is +.sp 1 +.br +.B +loc +.R +0 +.B +sdl +.R +4 +.sp 1 +the +.B +loc +.R +0 +will be processed first, then the +.B +sdl +.R +might be split into two +.B +stl +.R +'s but the pattern cannot match now. +.NH 3 +Move definitions. +.PP +This definition is a fake. This definition is put in the +table, since the code generator generator complains if it +cannot find one. +.NH 3 +Test definitions. +.PP +Test definitions aren't used by the table. +.NH 3 +Stack definitions. +.PP +When the generator has to push the registerpair AX, it must +know how to do so. +The machine code to be generated is defined here. +.NH 1 +Some remarks. +.PP +The above description of the machine table is +a description of the table for the MCS6500. +It uses only a part of the possibilities which the code generator +generator offers. +For a more precise and detailed description see [4]. +.DS C +.B +THE BACK END TABLE. +.R +.DE +.NH 0 +Introduction. +.PP +The code rules are divided in 15 groups. +These groups are: +.IP 1. +Load instructions. +.IP 2. +Store instructions. +.IP 3. +Integer arithmetic instructions. +.IP 4. +Unsigned arithmetic instructions. +.IP 5. +Floating point arithmetic instructions. +.IP 6. +Pointer arithmetic instructions. +.IP 7. +Increment, decrement and zero instructions. +.IP 8. +Convert instructions. +.IP 9. +Logical instructions. +.IP 10. +Set manipulation instructions. +.IP 11. +Array instructions. +.IP 12. +Compare instructions. +.IP 13. +Branch instructions. +.IP 14. +Procedure call instructions. +.IP 15. +Miscellaneous instructions. +.PP +From all of these groups one or two typical EM pattern will be explained +in the next paragraphs. +Comment is placed between /* and */ (/* This is a comment */). +.NH +The instructions. +.NH 2 +The load instructions. +.PP +In this group a typical instruction is +.B +lol +.R +. +A +.B +lol +.R +instruction pushes the word at local base + offset, where offset +is the instructions argument, onto the stack. +Since the MCS6500 can only offset by 256 bytes, as explaned at the +memory addressing modes, there is a need for two code rules in the +table. +One which can offset directly and one that must explicit +calculate the address of the local. +.NH 3 +The lol instruction with indirect offsetting. +.PP +In this case an indirect offsetted load from the second local base +is possible. +The table content is: +.sp 1 +.br +.B +lol +.R +IN($1) | | +.br +allocate(R16) /* allocate registerpair AX */ +.br +"ldy #BASE+$1" /* load Y with the offset from the second +.br + local base */ +.br +"lda (LBl),y" /* load indirect the lowbyte of the word */ +.br +"tax" /* move register A to register X */ +.br +"iny" /* increment register Y (offset) */ +.br +"lda (LBl),y" /* load indirect the highbyte of the word */ +.br +| %[a] | | /* push the word onto the fake stack */ +.NH 3 +The lol instruction whose offset is to big. +.PP +In this case, the library subroutine "Lol" is used. +This subroutine expects the offset in registerpair AX, then +calculates the address of the local or parameter, and loads +it into registerpair AX. +The table content is: +.sp 1 +.br +.B +lol +.R +| | +.br +allocate(R16) /* allocate registerpair AX */ +.br +"lda #[$1].h" /* load highbyte of offset into register A */ +.br +"ldx #[$1].l" /* load lowbyte of offset into register X */ +.br +"jsr Lol" /* perform the subroutine */ +.br +| %[a] | | /* push word onto the fake stack */ +.NH 2 +The store instructions. +.PP +In this group a typical instruction is +.B +stl. +.R +A +.B +stl +.R +instruction poppes a word from the stack and stores it in the word +at local base + offset, where offset is the instructions argument. +Here also is the need for two code rules in the table as a result +of the offset limits. +.NH 3 +The stl instruction with indirect offsetting. +.PP +In this case it an indirect offsetted store from the second local +base is possible. +The table content is: +.sp 1 +.br +.B +stl +.R +IN($1) | R16 | /* expect registerpair AX on top of the +.br + fake stack */ +.br +"ldy #BASE+1+$1" /* load Y with the offset from the +.br + second local base */ +.br +"sta (LBl),y" /* store the highbyte of the word from A */ +.br +"txa" /* move register X to register A */ +.br +"dey" /* decrement offset */ +.br +"sta (LBl),y" /* store the lowbyte of the word from A */ +.br +| | | +.NH 3 +The stl instruction whose offset is to big. +.PP +In this case the library subroutine 'Stl' is used. +This subroutine expects the offset in registerpair AX, then +calculates the address, poppes the word stores it at its place. +The table content is: +.sp 1 +.br +.B +stl +.R +| | +.br +allocate(R16) /* allocate registerpair AX */ +.br +"lda #[$1].h" /* load highbyte of offset in register A */ +.br +"ldx #[$1].l" /* load lowbyte of offset in register X */ +.br +"jsr Stl" /* perform the subroutine */ +.br +| | | +.NH 2 +Integer arithmetic instructions. +.PP +In this group typical instructions are +.B +adi +.R +and +.B +mli. +.R +These instructions, in this table, are implemented for 2-byte +and 4-byte integers. +The only arithmetic instructions available on the MCS6500 are +the ADC (add with carry), and SBC (subtract with not(carry)). +Not(carry) here means that in a subtraction, the one's complement +of the carry is taken. +The absence of multiply and division instructions forces the +use of subroutines to handle these cases. +Because there are no registers left to perform on the multiply +and division, zero page is used here. +The 4-byte integer arithmetic is implemented, because in C there +exists the integer type long. +A user is freely to use the type long, but will pay in performance. +.NH 3 +The adi instruction. +.PP +In case of the +.B +adi +.R +2 (and +.B +sbi +.R +2) instruction there are many EM +patterns, so that the instruction can be performed in line in +most cases. +For the worst case there exists a subroutine in the library +which deals with the EM instruction. +In case of a +.B +adi +.R +4 (or +.B +sbi +.R +4) there only is a subroutine to deal with it. +A table content is: +.sp 1 +.br +.B +lol lol adi +.R +(IN($1) && IN($2) && $3==2) | | /* is it in range */ +.br +allocate(R16) /* allocate registerpair AX */ +.br +"ldy #BASE+$1+1" /* load Y with offset for first operand */ +.br +"lda (LBl),y" /* load indirect highbyte first operand */ +.br +"pha" /* save highbyte first operand on hard_stack */ +.br +"dey" /* decrement offset first operand */ +.br +"lda (LBl),y" /* load indirect lowbyte first operand */ +.br +"ldy #BASE+$2" /* load Y with offset for second operand */ +.br +"clc" /* clear carry for addition */ +.br +"adc (LBl),y" /* add the lowbytes of the operands */ +.br +"tax" /* store lowbyte of result in place */ +.br +"iny" /* increment offset second operand */ +.br +"pla" /* get highbyte first operand */ +.br +"adc (LBl),y" /* add the highbytes of the operands */ +.br +| %[a] | | /* push the result onto the fake stack */ +.NH 3 +The mli instruction. +.PP +The +.B +mli +.R +2 instruction uses most the subroutine 'Mlinp'. +This subroutine expects the multiplicand in zero page +at locations ARTH, ARTH+1, while the multiplier is in zero +page locations ARTH+2, ARTH+3. +For a description of the algorithms used for multiplication and +division, see [9]. +A table content is: +.sp 1 +.br +.B +lol lol mli +.R +(IN($1) && IN($2) && $3==2) | | +.br +allocate(R16) /* allocate registerpair AX */ +.br +"ldy #BASE+$1" /* load Y with offset of multiplicand */ +.br +"lda (LBl),y" /* load indirect lowbyte of multiplicand */ +.br +"sta ARTH" /* store lowbyte in zero page */ +.br +"iny" /* increment offset of multiplicand */ +.br +"lda (LBl),y" /* load indirect highbyte of multiplicand */ +.br +"sta ARTH+1" /* store highbyte in zero page */ +.br +"ldy #BASE+$2" /* load Y with offset of multiplier */ +.br +"lda (LBl),y" /* load indirect lowbyte of multiplier */ +.br +"sta ARTH+2" /* store lowbyte in zero page */ +.br +"iny" /* increment offset of multiplier */ +.br +"lda (LBl),y" /* load indirect highbyte of multiplier */ +.br +"sta ARTH+3" /* store highbyte in zero page */ +.br +"jsr Mlinp" /* perform the multiply */ +.br +| %[a] | | /* push result onto fake stack */ +.NH 2 +The unsgned arithmetic instructions. +.PP +Since unsigned addition an subtraction is performed in the same way +as signed addition and subtraction, these cases are dealt with by +an EM replacement. +For mutiplication and division there are special subroutines. +.NH 3 +Unsigned addition. +.PP +This is an example of the EM replacement strategy. +.sp 1 +.br +.B +lol lol adu +.R + | | | | +.B +lol +.R +$1 +.B +lol +.R +$2 +.B +adi +.R +$3 | +.NH 2 +Floating point arithmetic. +.PP +Floating point arithmetic isn't implemented in this table. +.NH 2 +Pointer arithmetic instructions. +.PP +A typical pointer arithmetic instruction is +.B +adp +.R +2. +This instruction adds an offset and a pointer. +A table content is: +.sp 1 +.br +.B +adp +.R + | | | | +.B +loc +.R +$1 +.B +adi +.R +2 | +.NH 2 +Increment, decrement and zero instructions. +.PP +In this group a typical instruction is +.B +inl +.R +, which increments a local or parameter. +The MCS6500 doesn't have an instruction to increment the +accumulator A, so the 'ADC' instruction must be used. +A table content is: +.sp 1 +.br +.B +inl +.R +IN($1) | | +.br +allocate(R16) /* allocate registerpair AX */ +.br +"ldy #BASE+$1" /* load Y with offset of the local */ +.br +"clc" /* clear carry for addition */ +.br +"lda (LBl),y" /* load indirect lowbyte of local */ +.br +"adc #1" /* increment lowbyte */ +.br +"sta (LBl),y" /* restore indirect the incremented lowbyte */ +.br +"bcc 1f" /* if carry is clear then ready */ +.br +"iny" /* increment offset of local */ +.br +"lda (LBl),y" /* load indirect highbyte of local */ +.br +"adc #0" /* add carry to highbyte */ +.br +"sta (LBl),y\\n1:" /* restore indirect the highbyte */ +.PP +If the offset of the local or parameter is to big, first the +local or parameter is fetched, than incremented, and then +restored. +.NH 2 +Convert instructions. +.PP +In this case there are two convert instructions +which really do something. +One of them is in line code, and deals with the extension of +a character (1-byte) to an integer. +The other one is a subroutine which handles the conversion +between 2-byte integers and 4-byte integers. +.NH 3 +The in line conversion. +.PP +The table content is: +.sp 1 +.br +.B +loc loc cii +.R +$1==1 && $2==2 | R16 | +.br +"txa" /* see if sign extension is needed */ +.br +"bpl 1f" /* there is no need for sign extension */ +.br +"lda #0FFh" /* sign extension here */ +.br +"bne 2f" /* conversion ready */ +.br +"1: lda #0\\n2:" /* no sign extension here */ +.NH 2 +Logical instructions. +.PP +A typical instruction in this group is the logical +.B +and +.R +on two 2-byte words. +The logical +.B +and +.R +on groups of more than two bytes (max 254) +is also possible and uses a library subroutine. +.NH 3 +The logical and on 2-byte groups. +.PP +The table content is: +.sp 1 +.br +.B +and +.R +$1==2 | R16 | /* one group must be on the fake stack */ +.br +"sta ARTH+1" /* temporary save of first group highbyte */ +.br +"stx ARTH" /* temporary save of first group lowbyte */ +.br +"jsr Pop" /* pop second group from the stack */ +.br +"and ARTH+1" /* logical and on highbytes */ +.br +"pha" /* temporary save the result's highbyte */ +.br +"txa" /* logical and can only be done in A */ +.br +"and ARTH" /* logical and on lowbytes */ +.br +"tax" /* restore results lowbyte */ +.br +"pla" /* restore results highbyte */ +.br +| %[1] | | /* push result onto fake stack */ +.NH 2 +Set manipulation instructions. +.PP +A typical EM pattern in this group is +.B +loc inn zeq +.R +$1>0 && $1<16 && $2==2. +This EM pattern works on sets of 16 bits. +Sets can be bigger (max 256 bytes = 2048 bits), but than a +library routine is used instead of in line code. +The table content of the above EM pattern is: +.sp 1 +.br +.B +loc inn zeq +.R +$1>0 && $1<16 && $2==2 | R16 | +.br +"ldy #$1+1" /* load Y with bit number */ +.br +"stx ARTH" /* cannot rotate X, so use zero page */ +.br +"1: lsr a" /* right shift A */ +.br +"ror ARTH" /* right rotate zero page location */ +.br +"dey" /* decrement Y */ +.br +"bne 1b" /* shift $1 times */ +.br +"bcc $1" /* no carry, so bit is zero */ +.NH 2 +Array instructions. +.PP +In this group a typical EM pattern is +.B +lae lar +.R +defined(rom(1,3)) | | | | +.B +lae +.R +$1 +.B +aar +.R +$2 +.B +loi +.R +rom(1,3). +This pattern uses the +.B +aar +.R +instruction, which is part of a typical EM pattern: +.sp 1 +.br +.B +lae aar +.R +$2==2 && rom(1,3)==2 && rom(1,1)==0 | R16 | /* registerpair AX contains +the index in the array */ +.br +"pha" /* save highbyte of index */ +.br +"txa" /* move lowbyte of index to A */ +.br +"asl a" /* shift left lowbyte == 2 times lowbyte */ +.br +"tax" /* restore lowbyte */ +.br +"pla" /* restore highbyte */ +.br +"rol a" /* rotate left highbyte == 2 times highbyte */ +.br +| %[1] | adi 2 | /* push new index, add to lowerbound array */ +.NH 2 +Compare instructions. +.PP +In this group all EM patterns are performed by calling +a subroutine. +Subroutines are used here because comparison is only +possible byte by byte. +This means a lot of code, and since compare are used frequently +a lot of in line code would be generated, and thus reducing +the space left for the software stack. +These subroutines can be found in the library. +.NH 2 +Branch instructions. +.PP +A typical branch instruction is +.B +beq. +.R +The table content for it is: +.sp 1 +.br +.B +beq +.R +| R16 | +.br +"sta BRANCH+1" /* save highbyte second operand in zero page */ +.br +"stx BRANCH" /* save lowbyte second operand in zero page */ +.br +"jsr Pop" /* pop the first operand */ +.br +"cmp BRANCH+1" /* compare the highbytes */ +.br +"bne 1f" /* there not equal so go on */ +.br +"cpx BRANCH" /* compare the lowbytes */ +.br +"beq $1\\n1:" /* lowbytes are also equal, so branch */ +.PP +Another typical instruction in this group is +.B +zeq. +.R +The table content is: +.sp 1 +.br +.B +zeq +.R +| R16 | +.br +"tay" /* move A to Y for setting testbits */ +.br +"bmi $1" /* highbyte s minus so branch */ +.br +"txa" /* move X to A for setting testbits */ +.br +"beq $1\\n1:" /* lowbyte also zero, thus branch */ +.NH 2 +Procedure call instructions. +.PP +In this group one code generation might seem a little +akward. +It is the EM instruction +.B +cai +.R +which generates a 'jsr Indir'. +This is because there is no indirect jump_subroutine in the +MCS6500. +The only solution is to store the address in zero page, and then +do a 'jsr' to a known label. +At this label there must be an indirect jump instruction, which +perform a jump to the address stored in zero page. +In this case the label is Indir, and the address is stored in +zero page at the addresses ADDR, ADDR+1. +The tabel content is: +.sp 1 +.br +.B +cai +.R +| R16 | +.br +"stx ADDR" /* store lowbyte of address in zero page */ +.br +"sta ADDR+1" /* store highbyte of address in zero page */ +.br +"jsr Indir" /* use the indirect jump */ +.br +| | | +.NH 2 +Miscellaneous instructions. +.PP +In this group, as the name suggests, there is no +typical EM instruction or EM pattern. +Most of the MCS6500 code to be generated uses a library subroutine +or is straightforward. +.DS C +.B +PERFORMANCE. +.R +.DE +.NH 0 +Introduction. +.PP +To measure the performance of the back end table some timing +tests are done. +What to time? +In this case, the execution time of several Pascal statements +are timed. +Statements in C, which have a Pascal equivalence are timed also. +The statements are timed as follows. +A test program is been written, which executes two +nested for_loops from 1 to 1.000. +Within these for_loops the statement, which is to be tested, is placed, +so the statement will be executed 1.000.000 times. +Then the same program is executed without the test statement. +The time difference between the two executions is the time +neccesairy to execute the test statement 1.000.000 times. +The total time to execute the test statement requires thus the +time difference divided by 1.000.000. +.NH 0 +Testing Pascal statements. +.PP +The next statements are tested. +.IP 1) +int1 := 0; +.IP 2) +int1 := int2 - 1; +.IP 3) +int1 := int1 + 1; +.IP 4) +int1 := icon1 - icon2; +.IP 5) +int1 := icon2 div icon1; +.IP 6) +int1 := int2 * int3; +.IP 7) +bool := (int1 < 0); +.IP 8) +bool := (int1 < 3); +.IP 9) +bool := ((int1 > 3) or (int1 < 3)) +.IP 10) +case int1 of 1: bool := false; 2: bool := true end; +.IP 11) +if int1 = 0 then int2 := 3; +.IP 12) +while int1 > 0 do int1 := int1 - 1; +.IP 13) +m := a[k]; +.IP 14) +let2 := ['a'..'c']; +.IP 15) +P3(x); +.IP 16) +dum := F3(x); +.IP 17) +s.overhead := 5400; +.IP 18) +with s do overhead := 5400; +.PP +These statement were tested in a procedure test. +.sp 1 +.br +procedure test; +.br +var i, j, ... : integer; +.br + bool : boolean; +.br + let2 : set of char; +.br +begin +.br + for i := 1 to 1000 +.br + for j := 1 to 1000 +.br + STATEMENT +.br +end; +.sp 1 +.PP +STATEMENT is one of the statements as shown above, or it is +the empty statement. +The assignment of used variables, if neccesairy, is done before +the first for_loop. +In case of the statement which uses the procedure call, statement +15, a dummy procedure is declared whose body is empty. +In case of the statement which uses the function, statement 16, +this function returns its argument. +for the timing of C statements a similar test program was +written. +.sp 1 +.br +main() +.br +{ +.br + int i, j, ...; +.br + for (i = 1; i <= 1000; i++) +.br + for (j = 1; j <= 1000; j++) +.br + STATEMENT +.br +} +.sp 1 +.NH +The results. +.PP +Here are tables with the results of the time measurments. +Times are in microseconds (10^-6). +Some statements appear twice in the tables. +In the second case an array of 200 integers was declerated +before the variable to be tested, so this variable cannot +be accessed by indirect addressing from the second local base. +This results in a larger execution time of the statement to be +tested. +The column 68000 contains the times measured on a Bleasdale, +M68000 based, computer. +The times in column pdp are measured on a DEC pdp11/44, where +the times from column 6500 come from a BBC microcomputer. +.bp +.TS +expand; +c s s s +c c c c +lw35 nw7 nw7 nw7. +Pascal timing results +statement 68000 pdp 6500 +_ +T{ +int1 := 0; +T} 4.0 5.8 16.7 + 4.0 4.2 97.8 +_ +T{ +int1 := int2 - 1; +T} 7.2 7.1 27.2 + 6.9 7.1 206.5 +_ +T{ +int1 := int1 + 1; +T} 6.9 6.8 27.2 + 6.4 6.7 106.5 +_ +T{ +int1 := icon1 + icon2; +T} 6.2 6.2 25.6 + 6.2 6.0 106.6 +_ +T{ +int1 := icon2 div icon1; +T} 14.9 14.3 372.6 + 14.9 14.7 453.7 +_ +T{ +int1 := int2 * int3; +T} 11.5 12.0 558.1 + 11.3 11.6 728.6 +_ +T{ +bool := (int1 < 0); +T} 7.2 6.9 122.8 + 7.8 8.1 453.2 +_ +T{ +bool := (int1 < 3); +T} 7.3 7.6 126.0 + 7.2 8.1 232.2 +_ +T{ +bool := ((int1 > 3) or (int1 < 3)) +T} 10.1 12.0 307.8 + 10.2 11.9 440.1 +_ +T{ +case int1 of 1: bool := false; 2: bool := true end; +T} 18.3 17.9 165.7 +_ +T{ +if int1 = 0 then int2 := 3; +T} 9.5 8.5 133.8 +_ +T{ +while int1 > 0 do int1 := int1 - 1; +T} 6.9 6.9 126.0 +_ +T{ +m := a[k]; +T} 7.2 6.8 134.3 +_ +T{ +let2 := ['a'..'c']; +T} 38.4 38.8 447.4 +_ +T{ +P3(x); +T} 18.9 18.8 180.3 +_ +T{ +dum := F3(x); +T} 26.8 27.1 343.3 +_ +T{ +s.overhead := 5400; +T} 4.6 4.1 16.7 +_ +T{ +with s do overhead := 5400; +T} 4.2 4.3 16.7 +.TE +.TS +expand; +c s s s +c c c c +lw35 nw7 nw7 nw7. +C timing results +statement 68000time pdptime 6500time +_ +T{ +int1 = 0; +T} 4.1 3.6 17.2 + 4.1 4.1 97.7 +_ +T{ +int1 = int2 - 1; +T} 6.6 6.9 27.2 + 6.1 6.5 206.4 +_ +T{ +int1 = int1 + 1; +T} 6.4 7.3 27.2 + 6.3 6.2 206.4 +_ +T{ +int1 = int2 * int3; +T} 11.4 12.3 522.6 + 9.6 10.1 721.2 +_ +T{ +int1 = (int2 < 0); +T} 7.2 7.6 126.4 + 7.4 7.7 232.5 +_ +T{ +int1 = (int2 < 3); +T} 7.0 7.5 126.0 + 7.8 7.8 232.6 +_ +T{ +int1 = ((int2 > 3) || (int2 < 3)); +T} 11.8 12.2 193.4 + 11.5 13.2 245.6 +_ +T{ +switch (int1) { case 1: int1 = 0; break; case 2: int1 = 1; break; } +T} 28.3 29.2 164.1 +_ +T{ +if (int1 == 0) int2 = 3; +T} 4.8 4.8 19.4 +_ +T{ +while (int2 > 0) int2 = int2 - 1; +T} 5.8 6.0 125.9 +_ +T{ +int2 = a[int2]; +T} 4.8 5.1 192.8 +_ +T{ +P3(int2); +T} 18.8 18.4 180.3 +_ +T{ +int2 = F3(int2); +T} 27.0 27.2 309.4 +_ +T{ +s.overhead = 5400; +T} 5.0 4.1 16.7 +.TE +.NH +Pascal statements which don't have a C equivalent. +.PP +At first, the two statements who perform an operation on constants +are left out. +These are left out while the C front end does constant folding, +while the Pascal front end doesn't. +So in C the statements int1 = icon1 + icon2; and int1 = icon1 / icont2; +will use the same amount of time since the expression is evaluated +by the front end. +The two other statements (let2 := ['a'..'c']; and +.B +with +.R +s +.B +do +.R +overhead := 5400;), aren't included in the C statement timing table, +because there constructs do not exist in C. +Although in C there can be direct bit manipulation, and thus can +be used to implement sets I have not used it here. +The +.B +with +.R +statement does not exists in C and there is nothing with the slightest +resemblance to it. +.PP +At first sight in the table , it looked if there is no much difference +in the times for the M68000 and the pdp11/44, in comparison with the +times needed by the MCS6500. +To verify this impression, I calculated the correlation coefficient +between the times of the M68000 and pdp11/44. +It turned out to be 0.997 for both the Pascal time tests and the C +time tests. +Since the correlation coefficient is near to one and the difference +between the times is small, they can be considered to be the same +as seen from the times of the MCS6500. +Then I have tried to make a grafic of the times from the M68000 and +the MCS6500. +Well, there was't any correlation to been seen, taken all the times. +The only correlation one could see, with some effort, was in the +times for the first three Pascal statements. +The two first C statements show also a correlation, which two points +always do. +.PP +Also the three Pascal statements +.B +case +.R +, +.B +if +.R +, +and +.B +while +.R +have a correlation coefficient of 0.999. +This is probably because the +.B +case +.R +statement uses a subroutine in both cases and the other two +statements +.B +if +.R +and, +.B +while +.R +generate in line code. +The last two Pascal statements use the same time, since the front +end wil generate the same EM code for both. +.PP +The independence between the rest of the test times is because +in these cases the object code for the MCS6500 uses library +subroutines, while the other processors can handle the EM code +with in line code. +.PP +It is clear that the MCS6500 is a slower device, it needs longer +execution times, the need of more library subroutines, but +there is no constant factor between it execution times and those +of other processors. +.PP +The slowing down of the MCS6500 as result of the need of a +library subroutine is illustrated by the muliplication +statement. +The MCS6500 needs a library subroutine, while the other +two processors have a machine instruction to perform the +multiply. +This results in a factor of 48.5, when the operands can be accessed +indirect by the MCS6500. +When the MCS6500 cannot access the operands indirectly the situation +is even worse. +The slight differences between the MCS6500 execution times for +Pascal statements and C statements is probably the result of the +front end, and thus beyond the scope of this discussion. +.PP +Another timing test is done in C on the statement k = i + j + 1983. +This statement is tested on many UNIX* +.FS +* UNIX is a Trademark of Bell Laboratories. +.FE +systems. +For a complete list see appendix A. +The slowest one is the IBM XT, which runs on a 8088 microprocessor. +The fasted one is the Amdahl computer. +Here is short table to illustrate the performance of the +MCS6500. +.TS +c c c +c n n. +machine short int +IBM XT 53.4 53.4 +Amdahl 0.5 0.3 +MCS6500 150.2 150.2 +.TE +The MCS6500 is three times slower than the IBM XT, but threehundred +times slower than the Amdahl. +The reason why the times on the IBM XT and the MCS6500 are the +same for short's and int's, is that most C compilers make the types +short and integer the same size on 16-bit machines. +In this project the MCS6500 is regarded as a 16-bit machine. +.NH +Length tests. +.PP +I have also compiled several programs written in Pascal and C to +see if there is a resemblance between the number of bytes generated +in the machine's language. +In the tables: +.IP length: 9 +The number of bytes of the source program. +.IP 68000: +The number of bytes of the a.out file for a M68000. +.IP pdp: +The number of bytes of the a.out file for a pdp11/44. +.IP 6500: +The number of bytes of the a.out file for a MCS6500. +.LP +These are the results: +.TS +c s s s +c c c c +n n n n. +Pascal programs +length 68000 pdp 6500 +_ +19946 14383 16090 26710 +19484 20169 20190 35416 +10849 10469 11464 18949 +273 4221 5106 7944 +1854 5807 6610 10301 +.TE +.TS +c s s s +c c c c +n n n n. +C progams +length 68000 pdp 6500 +_ +9444 6927 8234 11559 +7655 14353 18240 26251 +4775 11309 15934 19910 +639 6337 9660 12494 +.TE +.PP +In contrast to the execution times of the test statements, the +object code files sizes show a constant factor between them. +After calculating the correlation coefficient, I have calculated +the line fitted between sizes. +.FS +* x is the number of bytes +.FE +.TS +c s s +c c c +l c c. +Pascal programs +processor corr. coef. fitted line +_ +68000-pdp 0.996 +68000-6500 0.999 1.76x + 502* +pdp-6500 0.999 1.80x - 1577 +.TE +.TS +c s s +c c c +l c c. +C programs +processor corr. coef. fitted line +_ +68000-pdp 0.974 +68000-6500 0.992 1.80x + 502* +pdp-6500 0.980 1.40x - 1577 +.TE +.PP +As seen from the tables above the correlation coefficient for +Pascal programs is better than the ones for C programs. +Thus the line fits best for Pascal programs. +With the formula of the best fitted line one can now estimate +the size of the object code, which a program needs, for a MCS6500 +without having the compiler at hand. +One also can see from these formula that the object code +generated for a MCS6500 is about 1.8 times more than for the other +processors. +Since the number of bytes in the source file havily depends on the +programmer, how many spaces he or she uses, the size of the indenting +in structured programs, etc., there is no correlation between the +size of the source file and the size of the object file. +Also the use of comments has its influence on the size. +.bp +.DS C +.B +SUMMARY. +.R +.DE +.NH 0 +Summary +.PP +In this chapter some final conclusions are made. +.PP +In spite of its simplicity, the MCS6500 is strong enough to +implement a EM machine. +A serious deficy of the MCS6500 is the missing of 16-bit +general purpose registers, and especially the missing of a +16-bit stackpointer. +As pointed out before, one 16-bit register can be simulated +by a pair of 8-bit registers, in fact, the accumulator A to +hold the highbyte, and the index register X to hold the lowbyte +of the word. +By lack of a 16-bit stackpointer, zero page must be used to hold +a stackpointer and there are also two subroutines needed for +manipulating the stack (Push and Pop). +.PP +As seen at the time tests, the simple instruction set of the +MCS6500 forces the use of library subroutines. +These library subroutines increas the execution time of the +programs. +.PP +The sizes of the object code files show a strong correlation +in contrast to the execution times. +With this correlatiuon one canestimate the size of a program +if it is to be used on a MCS6500. +.bp +.NH 0 +.B +REFERENCES. +.R +.IP 1. +Haddon. B.K., and Waite, W.M. +Experience with the Universal Intermediate Language Janus. +.B +Software Practice & Experience 8 +.R +, +5 (Sept.-Oct. 1978), 601-616. +.RS +.PP +An intermediate language for use with Algol 68, Pascal, etc. +is described. +The paper discusses some problems encountered and how they were +dealt with. +.RE +.IP 2. +Lowry, E.S., and Medlock, C.W. Object Code Optimization. +.B +Commun. ACM 12 +.R +, +(Jan. 1969), 13-22. +.RS +.PP +A classical paper on global object code optimization. +It covers data flow analysis, common subexpressions, code motion, +register allocation and other techniques. +.RE +.IP 3. +Osborn, A., Jacobson, S., and Kane, J. The Mos Technology MCS6500. +.B +An Introduction to Microcomputers , +.R +Volume II, Some Real Products (june 1977) chap. 9. +.RS +.PP +A hardware description of some real existing CPU's, such as +the Intel Z80, MCS6500, etc. is given in this book. +.RE +.IP 4. +van Staveren, H. +The table driven code generator from the Amsterdam Compiler Kit. +Vrije Universiteit, Amsterdam, (July 11, 1983). +.RS +.PP +The defining document for writing a back end table. +.RE +.IP 5. +Steel, T.B., Jr. UNCOL: The Myth and the Fact. in +.B +Ann. Rev. Auto. Prog. +.R +Goodman, R. (ed.), vol 2., (1960), 325-344. +.RS +.PP +An introduction to the UNCOL idea by its originator. +.RE +.IP 6. +Steel. T.B., Jr. A first Version of UNCOL. +.B +Proc. Western Joint Comp. Conf. +.R +, +(1961), 371-377. +.IP 7. +Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren, +H. +A Practical Tool Kit for Making Portable Compilers. +Informatica Rapport 74, Vrije Universiteit, Amsterdam, 1983. +.RS +.PP +An overview on the Amsterdam Compiler Kit. +.RE +.IP 8. +Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren, +H. +Description of an Experimental Machine Architecture for use with +Block Structured Languages. +Informatica Rapport 81, Vrije Universiteit, Amsterdam, 1983. +.RS +.PP +The defining document for EM. +.RE +.IP 9. +Tanenbaum, A.S. Structured Computer Organization. +Prentice Hall. (1976). +.RS +.PP +In this book computers are described as a hierarchy of levels, +with each one performing some well-defined function. +.RE