From fb51183da21aa0d7010c457f1389fab89a4757ba Mon Sep 17 00:00:00 2001 From: ceriel Date: Fri, 27 Sep 1991 16:19:24 +0000 Subject: [PATCH] Added --- doc/sparc/.distr | 14 + doc/sparc/1 | 53 ++++ doc/sparc/2 | 109 +++++++ doc/sparc/3 | 82 ++++++ doc/sparc/4 | 468 +++++++++++++++++++++++++++++++ doc/sparc/5 | 153 ++++++++++ doc/sparc/A | 184 ++++++++++++ doc/sparc/B | 128 +++++++++ doc/sparc/Makefile | 10 + doc/sparc/init | 18 ++ doc/sparc/intro | 23 ++ doc/sparc/note_on_reg_wins | 58 ++++ doc/sparc/pics/.distr | 12 + doc/sparc/pics/EM_stack.orig | 34 +++ doc/sparc/pics/EM_stack.ours | 106 +++++++ doc/sparc/pics/compile_bars | 49 ++++ doc/sparc/pics/mem_config | 34 +++ doc/sparc/pics/perf | 12 + doc/sparc/pics/perf.comp | 7 + doc/sparc/pics/perf.d | 4 + doc/sparc/pics/perf.dhry | 7 + doc/sparc/pics/reg_layout | 24 ++ doc/sparc/pics/run-time_bars | 101 +++++++ doc/sparc/pics/run-time_bars.bup | 100 +++++++ doc/sparc/pics/signal_stack | 42 +++ doc/sparc/printP4P | 31 ++ doc/sparc/refs | 185 ++++++++++++ doc/sparc/timing | 22 ++ doc/sparc/title | 15 + 29 files changed, 2085 insertions(+) create mode 100644 doc/sparc/.distr create mode 100644 doc/sparc/1 create mode 100644 doc/sparc/2 create mode 100644 doc/sparc/3 create mode 100644 doc/sparc/4 create mode 100644 doc/sparc/5 create mode 100644 doc/sparc/A create mode 100644 doc/sparc/B create mode 100644 doc/sparc/Makefile create mode 100644 doc/sparc/init create mode 100644 doc/sparc/intro create mode 100644 doc/sparc/note_on_reg_wins create mode 100644 doc/sparc/pics/.distr create mode 100644 doc/sparc/pics/EM_stack.orig create mode 100644 doc/sparc/pics/EM_stack.ours create mode 100644 doc/sparc/pics/compile_bars create mode 100644 doc/sparc/pics/mem_config create mode 100644 doc/sparc/pics/perf create mode 100644 doc/sparc/pics/perf.comp create mode 100644 doc/sparc/pics/perf.d create mode 100644 doc/sparc/pics/perf.dhry create mode 100644 doc/sparc/pics/reg_layout create mode 100644 doc/sparc/pics/run-time_bars create mode 100644 doc/sparc/pics/run-time_bars.bup create mode 100644 doc/sparc/pics/signal_stack create mode 100644 doc/sparc/printP4P create mode 100644 doc/sparc/refs create mode 100644 doc/sparc/timing create mode 100644 doc/sparc/title diff --git a/doc/sparc/.distr b/doc/sparc/.distr new file mode 100644 index 000000000..51aa06a59 --- /dev/null +++ b/doc/sparc/.distr @@ -0,0 +1,14 @@ +1 +2 +3 +4 +5 +A +B +init +intro +note_on_reg_wins +refs +timing +title +Makefile diff --git a/doc/sparc/1 b/doc/sparc/1 new file mode 100644 index 000000000..122aa646f --- /dev/null +++ b/doc/sparc/1 @@ -0,0 +1,53 @@ +.so init +.NH +INTRODUCTION +.NH 2 +Why an EM backend for SPARC processors? +.PP +With the introduction of SPARC-based computers like the Sun-4, a +whole new range of fast computers became readily available to the general +public. The power of large mainframes had been captured into a small +desk-top computer at only a fraction of the cost. +.PP +In the older days, a new computer used to be very hard to integrate into +the existing environment, but due to standardization in the software world +incompatibility in hardware no longer means incompatibility in software. +Programs that are written for computer A can often be run on computer B +without major modifications. Unfortunately this is not true for all software. +.PP +There will always be programs that rely on the specific +hardware of a certain computer for many different reasons. They +can be categorized as: +.IP - +poorly written programs +.IP - +programs to directly control hardware (device drivers) +.IP - +code that requires efficiency (time-critical I/O drivers) +.IP - +programs to generate code to run on the hardware (compilers) +.LP +This project for instance, the design and implementation of an EM backend +for SPARC processors, comes in the last category. +.PP +We have designed and implemented an algorithm to convert EM programs to code +that will run directly on the SPARC hardware. Henceforth, both the algorithm +and the implementation will be referred to as the EM-to-SPARC backend, +or simply: the backend. +.NH 2 +Why has nobody done this before? +.PP +Since EM was designed around 1981 and even SPARC has been around for some +years now, one may wonder why nobody has ever written an EM to SPARC backend +before. The reason is twofold. In the first place, there are some +non-trivial problems to be solved in the design phase, and secondly, +the SPARC-design combined with the lack of documentation, would surely +cost a lot of blood, sweat and tears. The absence of +clues to any of the design problems, combined with the \(em at first +glance \(em inhuman +SPARC instruction set did not make this a very attractive project. +.PP +On the other hand, these were exactly the reasons which made us take on +this particular project: it would require design skills, as well as some +hard work; a golden combination for a successful project. +.bp diff --git a/doc/sparc/2 b/doc/sparc/2 new file mode 100644 index 000000000..2421ca36f --- /dev/null +++ b/doc/sparc/2 @@ -0,0 +1,109 @@ +.so init +.nr H1 1 +.NH +CLOSE-UP LOOK +.NH 2 +What is EM? +.PP +As the abstract of the IR-81 rapport on EM +.[ [ +description of a machine architecture +.]] +says: \*(OQEM is a family +of intermediate languages designed for producing portable compilers.\*(CQ +Because EM is to be used on a wide range of languages and processors, +the instruction set is kept simple enough to allow easy translation to, +or interpretation on, almost any processor. Yet it is also powerful enough +to accommodate easy translation from almost any block-structured language. +.PP +Even though EM was designed in the early 1980s, it +is based on +.\" already shows strong signs of being influenced by +the (then innovative) RISC architecture. All instructions +have 0 or 1 operands, there are no fancy addressing modes as in the +68020's\*(Si move.w a3(_array,d3.w*2), -(sp)\*(So, no explicit registers, +although instructions for higher languages +such as array-operations, multiway branches (case) and +floating point operations are provided. +.PP +To fully understand the discussion in the following chapters, +the reader should at least have some knowledge of EM. +.NH 2 +What is SPARC? +.PP +According to Sun's RISC tutorial: \*(OQSun Microsystems has designed a RISC +architecture, called SPARC, and has implemented that architecture with +the Sun-4 family of supercomputing workstations and servers. SPARC stands +for Scalable Processor ARChitecture, emphasizing its applicability to +large as well as small machines.\*(CQ +.PP +In sharp contrast to EM, SPARC does have +explicit registers (31 integer and 32 floating point, all of which +are 32 bits wide) and +does not support any high level language operations: it does not even have +multiplication or division instructions. Because the SPARC design is +very straightforward, all instructions could be hard-coded (no microcode +involved) to +provided extremely high performance. All register-to-register operations +require exactly one clock cycle, and all register-to-memory and +memory-to-register operations require two clock cycles, one to retrieve +the instruction and one to access external memory. At a clock speed of +over 20 MHz this means that you can achieve well over 10 VAX MIPS: +more than 4 times the speed of a 15 MHz 68020 used in the Sun3/50. +.PP +As above, the reader should also have some general knowledge about +the SPARC processer to be able to understand the following chapters. +.NH 2 +What exactly is a (fast) backend? +.PP +To put in the simplest of ways: a (fast) backend is a set of routines to +translate EM code to code that will run 'on the metal' (for example the SPARC +processor). The distinction between full-fledged backends (code generators) +.[ [ +The table driven code generator +.]] +and fast backends (code expanders) +.[ [ +The Code Expander Generator +.]] +is related to +the compilation-time vs. run-time trade off. Code generators generate +efficient code and code expanders generate code very efficient. +For details about code expanders see also +.[ [ +The design of very fast portable compilers +.]]. +.PP +The reasons for us to implement a code expander are numerous: Our first reason to +implement a code expander, rather than a code generator was that implementing a +code expander would be hard enough already. Code generators only give +more problems and there were already enough problems to be solved. Secondly, +we knew we would never be able to compete with original SPARC compilers due +to loss of information in the frontends (see also chapter 5). By implementing +a code expander we might be able to outrun the existing compilers on a +completely different terrain: compile speed. +.PP +The third 'reason' to implement a code expander lies a little deeper and was +not discovered until we had actually started the implementation... It was only +then that we found out that for certain architectures, such as the SPARC, +the idea behind the code-expander is not necessarily inferior to that +behind a code-generator. It seems that for highly orthogonal instruction +sets it is possible to generate near optimal code without using the +code-expander. We have to say, however, that this is only true for our +optimized version of the code-expander. With the original code-expander +it would not have been possible to generate near-optimal code for the +SPARC processor. +.NH 2 +So, what are the main differences between EM and SPARC? +.PP +The main +difference between EM and SPARC is the stack versus register orientation. +The other differences, such as the presence of high level language +operations in EM, can easily be overcome by subroutines, +or small pieces of in-line SPARC code. +The design-part of this project mostly concentrates on +building a bridge between EM's stack and SPARC's registers. +.PP +In the next chapter we will make a list of all our design problems which +will then be discussed in chapter 4. +.bp diff --git a/doc/sparc/3 b/doc/sparc/3 new file mode 100644 index 000000000..6dd213443 --- /dev/null +++ b/doc/sparc/3 @@ -0,0 +1,82 @@ +.so init +.nr H1 2 +.NH +PROBLEMS +.NH 2 +Maintain SPARC speed +.PP +If we want to generate SPARC code, we should try to generate efficient code +as fast as possible. It would be quite embarrassing to find out that the +same program would run faster on a Motorola 68020 than on a SPARC processor, +when both operate at the same clock frequency. +Looking at some code generated by Sun's C-compiler and optimizing assembler, +we can spot a few remarkable characteristics of the generated SPARC code: +.IP - +There are almost no memory references +.IP - +Parameters to functions are passed through registers. +.IP - +Almost all delay slots\(dg +.FS +\(dg For details about delay slots see the SPARC Architecture Manual, chapter 4, pp. 42-48 +.FE +are filled in by the assembler +.LP +If we want to generate efficient code, we should at least try to +reduce the number of memory references and use registers wherever we can. +Since EM is stack-oriented it references its stack for every operation so +this will not be an easy task; a suitable solution will however be given in +the next chapter. +.NH 2 +Increase compilation speed +.PP +Because we will implement a code expander (fast backend) we should keep +a close eye on efficiency; if we cannot beat regular compilers on producing +efficient code we will try to beat them on fast code generation. +The usual trick to achieve fast compilation is to pack the frontend, +optimizer, code-generator and +assembler all into a single large binary to reduce the overhead of +reading and writing temporary files. Unfortunately, due to the +SPARC instruction set, its relocation information is slightly bizarre +and cannot be represented with the present primitives. +This means that it will not be possible to generate the required output +format directly from our backend. +.PP +There are three solutions here: generate assembler code, and let an +existing assembler generate the required object (\fI.o\fR) files, +create our own primitives than can handle the SPARC relocation format, or +do not use any of the addressing modes that require the bizarre relocation. +Because we have enough on our hands already we will +let the existing assembler deal with generating object files. +.NH 2 +Convert stack to register operations +.PP +As we wrote in the previous chapter, for RISC machines a code expander can +produce almost as efficient code as a code generator. The fact that this is +true for stack-oriented RISC processors is rather obvious. The problem we +face, however, is that the SPARC processor is register, instead of +stack oriented. In the next chapter we will give a suitable solution to +convert most stack accesses to register accesses. +.NH 2 +Miscellaneous +.PP +Besides performance and \fI.o\fR-compatibility there are some other +peculiarities of the SPARC processor and Sun's C-compiler (henceforth +simply called \fIcc\fR). +.PP +For some reason, the SPARC stack pointer requires alignment +on 8 bytes, so you cannot push a 4-byte integer on the stack +and then \*(Sisub 4, %sp\*(So\(dd. +.FS +\(dd For more information about SPARC assembler see the Sun-4 Assembly +Language Reference Manual +.FE +This too will be discussed in the next chapter, where we will take a +more in-depth look into this problem and also discuss a couple of +possible solutions. +.PP +Another thing is that \fIcc\fR usually passes the first six parameters of a +function-call through registers. To be \fI.o\fR-compatible we would have to +pass the first six parameters of each function call through registers as well. +Exactly why this is not feasible will also be discussed in the next chapter. +.bp diff --git a/doc/sparc/4 b/doc/sparc/4 new file mode 100644 index 000000000..e7ee56ddb --- /dev/null +++ b/doc/sparc/4 @@ -0,0 +1,468 @@ +.so init +.hw data-structures +.nr H1 3 +.NH +SOLUTIONS +.NH 2 +Maintaining SPARC speed +.PP +In chapter 3 we wrote: +.sp 0.3 +.nf +>If we want to generate efficient code, we should at least try to reduce the number of +>memory references and use registers wherever we can. +.fi +.sp 0.3 +In this chapter we will device a strategy to swiftly generate acceptable +code by using push-pop optimization. +Note that this is not the push-pop +optimization already available in the EM-kit, since that is only present +in the assembler-to-binary part which we do not use +.[ [ +The Code Expander Generator +.]]. +Our push-pop optimization +works more like the fake-stack described in +.[ [ +The table driven code generator +.]]. +.NH 3 +Ad-hoc optimization +.PP +Before getting involved in any optimization let's have a look at some +code generated with a straightforward EM to SPARC conversion of the +C statement: \*(Sif(a[i]);\*(So Note that \*(Si%SP\*(So is an alias +for a general purpose +register and acts as the EM stack pointer. It has nothing to do with +\*(Si%sp\*(So \(em the SPARC stack pointer. +Analogous \*(Si%LB\*(So is EMs local base pointer. +.br +.IP +.HS +.TS +; +l s l s l +l1f6 lf6 l2f6 lf6 l. +EM code SPARC code Comment + +lae _a set _a, %g1 ! load address of external _a + dec 4, %SP + st %g1, [%SP] + +lol -4 set -4, %g1 ! load local -4 (i) + ld [%g1+%LB], %g2 + dec 4, %SP + st %g2, [%SP] + +loc 2 set 2, %g1 ! load constant 2 + dec 4, %SP + st %g1, [%SP] + +sli 4 ld [%SP], %g1 ! pop shift count + ld [%SP+4], %g2 ! pop shiftee + sll %g2, %g1, %g3 + inc 4, %SP + st %g3, [%SP] ! push 4 * i + +ads 4 ld [%SP], %g1 ! add pointer and offset + ld [%SP+4], %g2 + add %g1, %g2, %g3 + inc 4, %SP + st %g3, [%SP] ! push address of _a + (4 * i) + +loi 4 ld [%SP], %g1 ! load indirect 4 bytes + ld [%g1], %g2 + st %g2, [%SP] ! push a[i] +cal _f + ... +.TE +.HS +.LP +Although the code is easy understand, it clearly is far from optimal. +The above code uses approximately 60 clock-cycles\(dg +.FS +\(dg In general each instruction only takes one cycle, +except for \*(Sild\*(So and +\*(Sist\*(So which may both require additional clock cycles. The exact amount +of extra cycles needed depends on the SPARC implementation and memory access +time. Furthermore, the +\*(Siset\*(So pseudo-instruction is a bit tricky. It takes one cycle when +its argument lies between -4096 and 4095, and two cycles otherwise. +.FE +to push an array-element on the stack, +something which a 68020 can do in a single instruction. The SPARC +processor may be fast, but not fast enough to justify the above code. +.PP +The same statement can be translated much more efficiently: +.DS +.TS +; +l2f6 lf6 l. +sll %i0, 2, %g2 ! multiply index by 4 +set _a, g3 +ld [%g2+%g3], %g1 ! get contents of a[i] +dec 4, SP +st %g2, [SP] ! push a[i] onto the stack +.TE +.DE +which, instead of 60, uses only 5 clock cycles to retrieve the element +from memory and 5 additional cycles when the result has to be pushed +on the stack. Note that when the result is not a parameter it does not +have to be pushed on the stack. By making efficient use of the SPARC +registers we can fetch \*(Sia[i]\*(So in only 5 cycles! +.NH 3 +Analyzing optimization +.PP +Instead of ad-hoc optimization we will need something more solid. +When one tries to optimize the above code in an ad-hoc manner one will +probably notice the large overhead due to stack access. Almost every EM +instruction requires at least three SPARC instructions: one to carry out +the EM instruction and two to pop and push the result from and onto the +stack. This happens for every instruction, even though the data being pushed +will probably be needed by the next instruction. To optimize this extensive +pushing and popping of data we will use the appropriately named push-pop +optimization. +.PP +The idea behind push-pop optimization is to delay the push operation until +it is almost certain that the data actually has to be pushed. +As is often the case, the data does not have to be pushed, +but will be used as input to another EM instruction. +If we can decide at compile time that this will indeed be +the case we can save the time of first pushing the data and then popping it +back again by temporarily storing the data (possibly only during compilation!) +and using it no sooner than it is actually needed. +.PP +The \*(Sisli 4\*(So instruction, for instance, expects two inputs on top of the +stack: on top a counter and right below that the shiftee (the number +to be shifted). As a result \*(Sisli\*(So +pushes 'shiftee << counter' back to the stack. Now consider the following +sequence, which could be the result of the expression \*(Si4 * i\*(So +.DS +.TS +; +l1f6 lf6 l. +lol -4 +loc 2 +sli 4 +.TE +.DE +In the non-optimized situation the \*(Silol\*(So would push +a local variable (whose offset is -4) on the stack. +Then the \*(Siloc\*(So pushes a 2 on the stack and finally \*(Sisli\*(So +retrieves both these numbers to replace then with the result. +On most machines it is not necessary to +push the 2 on the stack, since it can be used in the shift instruction +as an immediately operand. On a SPARC, for instance, one can write +.DS +.TS +; +l2f6 lf6 l. +ld [%LB-4], %g1 ! load local variable into register g1 +sll %g1, 2, %g2 ! perform the shift-left-by-2 +.TE +.DE +where the output of the \*(Silol\*(So, as well as the immediate operand 2 are used +in the shift instruction. As suggested before, all of this can be +achieved with push-pop optimization. +.NH 3 +A mechanism for push-pop optimization +.PP +To implement the above optimization we need some mechanism to +temporarily store information during compilation. +We need to be able to store, compare and retrieve information from the +temporary storage (cache) without any +loss of information. Before describing all the routines used +to implement our cache we will first describe how the cache works. +.PP +Items in the cache are structures containing an external (\*(Sichar *\*(So), +two registers (\*(Sireg_t\*(So) and a constant (\*(Siarith\*(So), +any of which may be 0. +The value of such a structure is the sum of (the values of) +its elements. To put a register in the cache, one has to be allocated either +by calling \*(Sialloc_reg\*(So which returns a free register, by +\*(Siforced_alloc_reg\*(So which allocates a specific register or any +of the other routines available to allocate a register. The keep things +simple, we will not discuss all of the available primitives here. +When the register +is then put in the cache by the \*(Sipush_reg\*(So routine, the ownership will +be transferred from the user to the cache. Ownership is important, because +only the owner of a register may (and must!) deallocate it. Registers can be +owned by either an (imaginary) register manager, the cache or the user. +When the user retrieves a register from the stack with \*(Sipop_reg\*(So for +instance, ownership is back to the user. +The user should then call \*(Sifree_reg\*(So +to transfer ownership to the register manager or call \*(Sipush_reg\*(So +to give it back to the cache. +Since the cache behaves itself as a stack we will use the term pop resp. push +to get items from, resp. put items in the cache. +.PP +We shall now present the sets of routines that implement the cache. +.IP \(bu +The routines +.DS +\*(Si +reg_t alloc_reg(void) +reg_t alloc_reg_var(void) +reg_t alloc_float(void) +reg_t alloc_float_var(void) +reg_t alloc_double(void) +reg_t alloc_double_var(void) + +void forced_alloc_reg(reg_t) +void soft_alloc_reg(reg_t) + +void free_reg(reg_t) +void free_double_reg(reg_t) +\*(So +.DE +allocate and deallocate registers. If there are no more register left, +i.e. they are owned by the cache, +one or more registers will be freed by flushing part of the cache +onto the real stack. +The \*(Sialloc_xxx_var\*(So primitives try to allocate a register that +can be used to store local variables. (In the current implementation +only the input and local registers.) If none can be found \*(SiNULL\*(So +is returned. \*(Siforced_alloc_reg\*(So forces the allocation of a certain +register. If it was already in use, its contents are moved to another +register. Finally \*(Sisoft_alloc_reg\*(So provides the possibility to +push a register onto the cache and still keep a copy for later use. +(Used to implement the \*(Sidup 4\*(So for example.) +.IP \(bu +The routines +.DS +\*(Si +void push_const(arith) +arith pop_const(void) +\*(So +.DE +push or pop a constant onto or from the stack. Distinction between +constants and other types is made so as not to loose any information; constants +may be used later on as immediate operators, which is not the case +for other types. If \*(Sipop_const\*(So is called, but the element on top of +the cache has either one of the external or register fields non-zero a +fatal error will be reported. +.IP \(bu +The routines +.DS +\*(Si +reg_t pop_reg(void) +reg_t pop_float(void) +reg_t pop_double(void) +reg_t pop_reg_c13(char *n) + +void pop_reg_as(reg_t) + +void push_reg(reg_t) +\*(So +.DE +push or pop a register. These will be used most often since results from one +EM instruction, which are computed in a register, are often used in the next. +When the element on top of the cache is more +than just a register the cache manager +will generate code to compute the sum of its fields and put the result in a +register. This register will then be given to the user. +If the user wants the result is a special register, he should use the +\*(Sipop_reg_as\*(So routine. +The \*(Sipop_reg_c13\*(So gives an optional number (as character string) whose +value can be represented in 13 bits. The constant can then be used as an +offset for the SPARC \*(Sild\*(So and \*(Sist\*(So instructions. +.IP \(bu +The routine +.DS +\*(Si +void push_ext(char *) +\*(So +.DE +pushes an external onto the stack. There is no pop-variant of this one since +there is no use in popping an external. +.IP \(bu +The routines +.DS +\*(Si +void inc_tos(arith n) +void inc_tos_reg(reg_t r) +\*(So +.DE +increment the element on top of the cache by either the constant \*(Sin\*(So +or by a register. The latter is useful for pointer addition when referencing +external memory. +.KS +.IP \(bu +The routine +.DS +\*(Si +int type_of_tos(void) +\*(So +.DE +.KE +returns the type of the element on top of the cache. This is a combination +(binary OR) of \*(SiT_ext\*(So, \*(SiT_reg\*(So or \*(SiT_float\*(So, +\*(SiT_reg2\*(So or \*(SiT_float2\*(So, and \*(SiT_cst\*(So, +and tells the +user which of the three fields are non-zero. When the register-fields +represent \*(Si%g0\*(So, it is considered zero. +.IP \(bu +Miscellaneous routines: +.DS +\*(Si +void init_cache(void) +void cache_need(int) +void change_reg(void) +void flush_cache(void) +\*(So +.DE +\*(Siinit_cache\*(So should be called before any +other cache routines, to initialize some internal datastructures. +\*(Sicache_need\*(So is used to tell the cache that a certain number +of register are needed for the next operation. This way the cache can +load them efficiently in one fell swoop. \*(Sichange_reg\*(So is to be +called when the user changes a register of which the cache (possibly) has +co-ownership. Because the contents of registers in the cache are +not allowed to change the user should call \*(Sichange_reg\*(So to +instruct the cache to copy the contents to some other register. +\*(Siflush_cache\*(So writes the cache to the stack and invalidates +the cache. It should be used before branches, +before labels and on other places where the stack has to be valid (i.e. where +every item on the EM-stack should be stored on the real stack, not in some +virtual cache). +.NH 3 +Implementing push-pop optimization in the EM_table +.PP +As indicated above, there is no regular way to represent the described +optimization in the EM_table. The only possible escapes from the EM_table +are function calls, but that is clearly not enough to implement a good +push-pop optimizer. Therefore we will use a modified version of the EM_table +format, where the description of, say, the \*(Silol\*(So instruction might look +like this\(dg: +.FS +\(dg This is not the way the \*(Silol\*(So actually looks in the EM_table; +it only shows how it \fImight\fR look using the forementioned push/pop +primitives. +.FE +.DS +\*(Si +reg_t A, B; +const_str_t n; + +alloc_reg(A); +push_reg(LB); +inc_tos($1); +B = pop_reg_c13(n); +"ld [$B+$n], $A"; +push_reg(A); +free_reg(B); +\*(So +.DE +For more details about the exact implementation consult +appendix B which contains some characteristic excerpts from the EM_table. +.NH 2 +Stack management +.PP +When converting EM code to some executable code there is the problem of +maintaining multiple stacks. The usual way to do this is described in +.[ [ +Description of a Machine Architecture +.]] +and is shown in figure \*(SN1. +.KE +.PS +copy "pics/EM_stack.orig" +.PE +.ce 1 +\fIFigure \*(SN1: usual stack management. +.KE +.sp +.LP +This means that the EM stack and the hardware stack (used +for subroutine calls, etc.) are interleaved in memory. On the SPARC, however, +this brings up a large problem: in the former model it is assumed that the +resolution of the stack pointer is a word, but this is not the case on the +SPARC processor. On the SPARC processor the stack-pointer as well as the +frame-pointer have to be aligned on 8-byte boundaries, so one can not simply +push a word on the stack and then lower the stack-pointer by 4 bytes! +.NH 3 +Possible solutions +.PP +A simple idea might be to use a swiss-cheese stack; we could +push a 4-byte word onto the stack and then lower the stack by 8. +Unfortunately, this is not a very solid solution, because +pointer-arithmetic involving pointers to objects on the stack would cause +hard-to-predict anomalies. +.PP +Another try would be not to use the hardware stack at all. As long as we +do not generate subroutine-calls everything will be all right. This +approach, however, also has some disadvantages: first we would not be able +to use any of the existing debuggers such as \fIadb\fR, because they all +assume a regular stack format. Secondly, we would not be able to make use +of the SPARC's register windows to keep local variables. Finally, doing all the +administrative work necessary for subroutine calls ourselves instead of +letting the hardware handle it for us, +causes unnecessary procedure-call overhead. +.PP +Yet another alternative would be to emulate the EM-part of the stack, +and to let the hardware handle the subroutine call. Since we will +emulate our own stack, there are no alignment restrictions and because +we will use the hardware procedure call we can still make use of +the register windows. +.NH 3 +Our implementation +.PP +To implement the hybrid stack we need two extra registers: one for the +the EM stack pointer (the forementioned \*(Si%SP\*(So) and one for the +EM local base pointer (\*(Si%LB\*(So). The most elegant solution would be to +put both stacks in different segments, so they would not influence +each other. Unfortunately +.UX +lacks the ability to add segments and +since we will implement our backend under +.UX, +we will have to put +both stacks in the same segment. Exactly how this can be done is shown +in figure \*(SN2. +.DS +.PS +copy "pics/mem_config" +.PE +.ce 1 +\fIFigure \*(SN2: our stack management.\fR +.DE +.sp +During normal procedure execution, the SPARC stack pointer has to point to +a memory location where the operating system can dump the active part of +the register window. The rest of the +register window will be dumped in the therefor pre-allocated (stack) space +by following the frame +pointer. When a signal occurs things get even more complicated and +result in figure \*(SN3. +.DS +.PS +copy "pics/signal_stack" +.PE +.ce 1 +\fIFigure \*(SN3: our signal stack.\fR +.DE +.PP +The exact implementation of the stack is shown in figure \*(SN4. +.KF +.PS +copy "pics/EM_stack.ours" +.PE +.ce 1 +\fIFigure \*(SN4: stack overview.\fR +.KE +.NH 2 +Miscellaneous +.PP +As mentioned in the previous chapter, the generated \fI.o\fR-files are +not compatible with Sun's own object format. The primary reason for +this is that Sun usually passes the first six parameters of a procedure call +through registers. If we were to do that too, we would always have +to fetch the top six words from the stack into registers, even when +the procedure would not have any parameters at all. Apart from this, +structure-passing is another exception in Sun's object format which +makes is impossible to generate object-compatible code.\(dg +.FS +\(dg Exactly how Sun passes structures as parameters is described in +Appendix D of the SPARC Architecture Manual (Software Considerations) +.FE +.bp diff --git a/doc/sparc/5 b/doc/sparc/5 new file mode 100644 index 000000000..a65ff9451 --- /dev/null +++ b/doc/sparc/5 @@ -0,0 +1,153 @@ +.so init +.nr H1 4 +.NH +FUTURE WORK +.NH 2 +A critique of EM +.PP +In general, EM fits its purpose quite well. Numerous compilers have been +written using EM as their intermediate language and it has even become a +commercial product. A great deal of its success is probably due to its +simplicity. There are no extravagant instructions but it does have all the +necessary functions to write a decent compiler. +.PP +There are, however, a few functions that come rather close to being +extravagant. The \*(Silar\*(So function for example \(em used +to fetch an element from an array \(em does not make it much easier +to write a frontend, but does make it unnecessary hard to write an +efficient backend. Other instructions for which it is difficult +to generate efficient code for are those that permit +dynamic operators, such as the \*(Silos\*(So. Dynamic operators, however, provide +significant extra possibilities and can therefore not be disposed of. +Note that even though the array operations \*(Silar\*(So and \*(Sisar\*(So +provide dynamic operators, they do not add additional power, since +they can easily be replaced with a sequence using the \*(Silos\*(So or +\*(Sists\*(So instructions. +.PP +EM code to reference arrays generated by the C frontend can be translated +very efficiently for almost any processor. However the same operation +generated by the Modula-2 frontend (which uses the \*(Silar\*(So), +is much less efficient, although the only difference is that the +latter performs range checking whereas the former does not.\(dg +.FS +\(dg Actually this depends on whether or not explicit range checking in enabled. +This clearly shows that the current code generators are not optimal and +often depend on ad-hoc decisions. +.FE +Since range checking can also be expressed explicitly in +EM (\*(Sirck\*(So) there is no need for any of the array operations +(\*(Siaar\*(So, \*(Silar\*(So and \*(Sisar\*(So). +.PP +Besides efficiency of the array-operations themselves, there still is another +major disadvantage of using these array-operations. In sharp contrast to +all other EM instructions except the \*(Silos\*(So and the \*(Sists\*(So, +they allow dynamic operators, so their effect on the stack-pointer can not +always be +determined at compile-time. This means that efficient caching of the +top-of-stack in registers is almost impossible, +so using these array-operations also effects the +efficiency of the surrounding code. Now that processors are produced with +more and more registers it could be very beneficiary to cache the +top-of-stack, so that the memory/register reference ratio decreases +to the benefit of the overall performance. +.PP +As a final critique, we would also like to discuss the semantics of some of +the EM instructions. In +.[ [ +Description of a Machine Architecture +.]] +it is said that +all signed instructions such as the \*(Siadi\*(So, should cause an exception +on overflow. The unsigned operations such as \*(Siadu\*(So, however, +should act as modulo operations and therefor not perform overflow checking. +Since it is very expensive to perform overflow checking in EM, +we would suggest that the backend takes care of this. For languages which +do not require overflow checking, a simple message could be generated to +disable overflow checking in backends. This way all backends could be +written to fully comply to the official EM definition without any reduction in +efficiency.\(dd +.FS +\(dd Currently many backends do not implement error checks because they +are too expensive and almost never needed. Some frontends even have +facilities build in to generate EM-code to force these checks. If this +trend continues we will end up with a de-facto and a de-jure standard +both developed by the same people but nonetheless incompatible. +.FE +When such messages will be added we would like to suggest +that they can enforce overflow checks on unsigned, as well as signed arithmetic. +.PP +As a conclusion we would like to suggest removal of the array operations from +EM, or at least discontinuation of there usage in frontends. +.NH 2 +\*(OQWanted: Procedure call information\*(CQ +.PP +The advantage of an intermediate language such as EM is that the backend +no longer has to know about any 'quirks' of the 'input'-language. The major +disadvantage, however, is that the backend no longer knows about any 'quirks' +of the 'input'-language... If the SPARC backend ever has to compete +with Sun's own C-compiler for example, removal of the array-operations +will not be enough. The amount of information that is lost during +the translation to EM is too large to ever generate truly efficient SPARC code. +.PP +To write such an efficient backend one needs to know, for example, whether, +when and what type of parameter is being computed, so the result can be stored +in the proper place and scratch registers can be reused. +(On the SPARC processor, for example, it is very beneficiary +to pass the first six parameters of a procedure call through +registers instead of using the stack.) +One way to express such things in EM is to insert extra messages in +the EM-code. The C statement \*(Sia = f(4, a + b);\*(So for example, +could be translated to the following EM-code: +.DS +.TS +; +l1f6 lf6 l. +lol -4 ! a +lol -8 ! b +mes x, 2 ! next instruction will compute 2nd parameter +adi 4 +mes x, 1 ! next instruction will compute 1st parameter +loc 4 +cal _f ! call function f +lfr 4 +stl -4 ! store result in a +.TE +.DE +For a code expander it is important that the \*(Simes\*(So pseudo +instructions appear \fIbefore\fR +the EM instruction that computes the parameter, because that way the final +computation (the \*(Siadi\*(So and \*(Siloc\*(So in the previous example) +can be translated to machine code that performs the required computation +and also puts the result in the required place. If it is found to be +too difficult for the frontend to insert these \*(Simes\*(So instructions +at the right place the peep-hole optimizer might swap the \*(Simes\*(So and +the instruction that computes the parameter. +.PP +For some architectures, it is also +possible to generate more efficient code for a procedure when it is a +so-called leaf-procedure: a procedure that doesn't call other procedures. +On the SPARC, for example, it is not necessary to rotate the register +window for a call to a leaf procedure and it is also possible to use +the global registers for register variables in leaf procedures. +It will be a little harder to insert useful messages about leaf procedures, +because just as with register messages, they are only useful to the +backend when they appear immediately +after or before the \*(Sipro\*(So pseudo instruction. The frontend, +however, only knows whether a certain procedure is a leaf-procedure or not +when it has already generated the entire procedure in EM. Just as with the +\*(Sipro ? / end n\*(So-dilemma the peep-hole optimizer +.[ [ +Using Peephole Optimization +.]] +might be able to lend a hand +and help us out by delaying EM-code generation until it has reached the +end of the procedure. +.PP +As with most optimizations, the main problem is that they have to be +implemented with the \*(Simes\*(So pseudo instruction. +Because the \*(Simes\*(So instruction can have many different meanings +depending on its argument, +it is important that all optimizers recognize and respect them. Addition +of even a single message will require careful inspection of, and maybe even +incorporate small changes to each of the optimizers. +.bp diff --git a/doc/sparc/A b/doc/sparc/A new file mode 100644 index 000000000..df106d8ad --- /dev/null +++ b/doc/sparc/A @@ -0,0 +1,184 @@ +.so init +.SH +A. MEASUREMENTS +.SH +A.1. \*(OQThe bottom line\*(CQ +.PP +Although examples often are most illustrative, the cruel world out there is +usually more interested in everyday performance figures. To satisfy those +people too, we will present a series of measurements on our code expander +taken from (close to) real life situations. These include measurements +of compile and run times of different programs, +compiled with different compilers. +.SH +A.2. Compile time measurements +.PP +Figure A.2.1 shows compile-time measurements for typical C code: +the dhrystone benchmark\(dg +.[ [ +dhrystone +.]]. +.FS +\(dg To be certain that we only tested the compiler and not the quality of +the code in the library, we have added our own version of +\fIstrcmp\fR and \fIstrcpy\fR and have not used the ones present in the +library. +.FE +The numbers represent the duration of each separate pass of the compiler. +The numbers at the end of each bar represent the total duration of the +compilation process. As with all measurements in this chapter, the +quoted time or duration is the sum of user and system time in seconds. +.PS +copy "pics/compile_bars" +.PE +.DS +.IP cem: 6 +C to EM frontend +.IP opt: +EM peep-hole optimizer +.IP be: +EM to assembler backend +.IP cpp: +Sun's C preprocessor +.IP ccom: +Sun's C compiler +.IP iropt: +Sun's optimizer +.IP cg: +Sun's code generator +.IP as: +Sun's assembler +.IP ld: +Sun's linker +.ce 1 +\fIFigure A.2.1: compile-time measurements.\fR +.DE +.sp +.PP +A close examination of the first two bars in fig A.2.1 shows that the maximum +achievable compile-time +gain compared to \fIcc\fR is about 50% for medium-sized +programs.\(dd +.FS +\(dd (cpp+ccom+as+ld)/(cem+as+ld) = 1.53 +.FE +For small programs the gain will be less, due to the almost constant +start-up time of each pass in the compilation process. Only a +built-in assembler may increase this number up to +180% in the ideal case that the optimizer, backend and assembler +would run in zero time. Speed-ups of 5 to 10 times as mentioned in +.[ [ +fast portable compilers +.]] +are therefore not possible on the Sun-4 family. This is also due to +Sun's implementation of saving and restoring register windows. With +the current implementation in which only a single window is saved +or restored on a register-window overflow, it is very time consuming +when programs have highly dynamic stack use +due to procedure calls (as is often the case with compilers). +.PP +Although we are currently a little slower than \fIcc\fR, it is hard to +blame this on our backend. Optimizing the backend so that it would run +twice as fast would only reduce the total compilation process by +a mere 14%. +.PP +Finally it is nice to see that our push/pop-optimization, +initially designed to generate faster code, has also increased the +compilation speed. (see also figures A.4.1 and A.4.2.) +.SH +A.3. Run time performance +.PP +Figure A.3.1 shows the run-time performance of different compilers. +All results are normalized, where the best available compiler (Sun's +compiler with full optimization) is represented by 1.0 on our scale. +.PS +copy "pics/run-time_bars" +.PE +.ce 1 +\fIFigure A.3.1: run time performance.\fR +.sp 1 +.PP +The fact that our compiler behaves rather poorly compared to Sun's +compiler is due to the fact that the dhrystone benchmark uses +relatively many subroutine calls; all of which have to be 'emulated' +by our backend. +.SH +A.4. Overall performance +.LP +In the next two figures we will show the combined run and compile time +performance of 'our' compiler (the ACK C frontend and our backend) +compared to Sun's C compiler. Figure A.4.1 shows the results from +measurements on the dhrystone benchmark. +.G1 +frame invis left solid bot solid +label left "run time" "(in \(*msec/dhrystone)" +label bot "compile time (in sec)" +coord x 0,21 y 0,610 +ticks left out from 0 to 600 by 200 +ticks bot out from 0 to 20 by 5 +"\(bu" at 3.5, 1000000/1700 +"ack w/o opt" ljust at 3.5 + 1, 1000000/1700 +"\(bu" at 2.8, 1000000/8770 +"ack with opt" below at 2.8 + 0.1, 1000000/8770 +"\(bu" at 16.0, 1000000/10434 +"ack -O4" above at 16.0, 1000000/10434 +"\(bu" at 2.3, 1000000/7270 +"\fIcc\fR" above at 2.3, 1000000/7270 +"\(bu" at 9.0, 1000000/12500 +"\fIcc -O4\fR" above at 9.0, 1000000/12500 +"\(bu" at 5.9, 1000000/15250 +"\fIcc -O\fR" below at 5.9, 1000000/15250 +.G2 +.ce 1 +\fIFigure A.4.1: overall performance on dhrystones. +.sp 1 +.LP +Fortunately for us, dhrystones are not all there is. The following +figure shows the same measurements as the previous one, except +this time we took a benchmark that uses no subroutines: an implementation +of Eratosthenes' sieve: +.G1 +frame invis left solid bot solid +label left "run time" "for one run" "(in sec)" left .6 +label bot "compile time (in sec)" +coord x 0,11 y 0,21 +ticks bot out from 0 to 10 by 5 +ticks left out from 0 to 20 by 5 +"\(bu" at 2.5, 17.28 +"ack w/o opt" above at 2.5, 17.28 +"\(bu" at 1.6, 2.93 +"ack with opt" above at 1.6, 2.93 +"\(bu" at 9.4, 2.26 +"ack -O4" above at 9.4, 2.26 +"\(bu" at 1.5, 7.43 +"\fIcc\fR" above at 1.5, 7.43 +"\(bu" at 2.7, 2.02 +"\fIcc -O4\fR" ljust at 1.9, 1.2 +"\(bu" at 2.6, 2.10 +"\fIcc -O\fR" ljust at 3.1,2.5 +.G2 +.ce 1 +\fIFigure A.4.2: overall performance on Eratosthenes' sieve. +.sp 1 +.PP +Although the above figures speak for themselves, a small comment +may be in place. At first it is clear that our compiler is neither +faster than \fIcc\fR, nor produces faster code than \fIcc -O4\fR. It should +also be noted however, that we do produce better code than \fIcc\fR +at only a very small additional cost. +It is also worth noticing that push-pop optimization +increases run-time speed as well as compile speed. +The first seems rather obvious, +since optimized code is +faster code, but the increase in compile speed may come as a surprise. +The main reason is that the \fIas\fR+\fIld\fR time depends largely on the +amount of generated code, which in general +depends on the efficiency of the code. +Push-pop optimization removes a lot of useless instructions which +would otherwise +have found their way through to the assembler and the loader. +Useless instructions inserted in an early stage in the compilation +process will slow down every following stage, so elimination of useless +instructions in an early stage, even when it requires a little computational +overhead, can often be beneficial to the overall compilation speed. +.bp diff --git a/doc/sparc/B b/doc/sparc/B new file mode 100644 index 000000000..fcb48922d --- /dev/null +++ b/doc/sparc/B @@ -0,0 +1,128 @@ +.so init +.SH +B. IMPLEMENTATION +.SH +B.1. Excerpts from the non-optimized EM_table +.PP +Even though the non-optimized version of the EM_table is relatively +straight-forward, examples have never hurt anybody. +One of the simplest instructions is the \*(Siloc\*(So, which appears in +our EM_table as follows: +.DS +\f6 +.TA 8 16 24 32 40 48 56 64 +C_loc ==> "set $1, T1"; + "dec 4, SP"; + "st T1, [SP]". +\f1 +.DE +Just as \*(SiSP\*(So is an alias for \*(Si%l0\*(So, \*(SiT1\*(So is +an alias for \*(Si%g1\*(So. +A little more complex is the \*(Siadi\*(So which performs integer +addition. +.DS +\f6 +C_adi ==> "ld [SP], T1"; + "ld [SP+4], T2"; + "add T1, T2, T3"; + "st T3, [SP+4]; + "inc 4, SP". +\f1 +.DE +We could go on with even more complex instructions, but since that would +not contribute to anything the reader is referred to the implementation +for more details. +.SH +B.2. Excerpts from the optimized EM_table +.PP +The optimized EM_table uses the cache primitives mentioned in chapter 4. +This means that the \*(Siloc\*(So this time appears as +.DS +\f6 +C_loc ==> push_const($1). +\f1 +.DE +The \*(Silol\*(So can now be written as +.DS +\f6 +C_lol ==> push_reg(LB); + inc_tos($1); + push_const(4); + C_los(4). +\f1 +.DE +Due to the law of conservation of misery somebody has to do the dirty work. +In this case, it is the \*(Silos\*(So. To show just a small part of +the implementation of the \*(Silos\*(So: +.DS +\f6 +C_los $1 == 4 ==> + if (type_of_tos() == T_cst) { + arith size; + const_str_t n; + + size= pop_const(); + if (size <= 4) { + reg_t a; + reg_t a; + char *LD; + + switch (size) { + case 1: LD = "ldub"; break; + case 2: LD = "lduh"; break; + case 4: LD = "ld"; break; + default: arg_error("C_los", size); + } + a = pop_reg_c13(n); + b = alloc_reg(); + "$LD [$a+$n], $b"; + push_reg(b); + free_reg(a); + } else ... +\f1 +.DE +For the full implementation, the reader is again referred to the actual +implementation. Just to show how other instructions are affected +by the optimization we will show that implementation of the \*(Sitge\*(So +instruction: +.DS +\f6 +C_tge ==> { + reg_t a; + reg_t b; + + a = pop_reg(); + b = alloc_reg(); + " tst $a"; + " bge,a 1f"; + " mov 1, $b"; /* delay slot */ + " set 0, $b"; + "1:"; + free_reg(a); + push_reg(b); + }. + +\f1 +.DE +.SH +.bp +CREDITS +.PP +In order of appearance: +.TS +center; +r c l. +Original idea - Dick Grune +Design & implementation - Philip Homburg + - Raymond Michiels +Tutor - Dick Grune +Assistant Tutor - Ceriel Jacobs +Proofreading - Dick Grune + - Hans van Eck +.TE +.SH +REFERENCES +.PP +.[ +$LIST$ +.] diff --git a/doc/sparc/Makefile b/doc/sparc/Makefile new file mode 100644 index 000000000..6bff68167 --- /dev/null +++ b/doc/sparc/Makefile @@ -0,0 +1,10 @@ +# $Header$ + +REFER=refer +TBL=tbl +TARGET=-Tlp +PIC=pic +GRAP=grap + +../sparc.doc: refs title intro 1 2 3 4 5 A B init + $(REFER) -sA+T '-l\", ' -p refs title intro 1 2 3 4 5 A B | $(GRAP) | $(PIC) | $(TBL) | soelim > $@ diff --git a/doc/sparc/init b/doc/sparc/init new file mode 100644 index 000000000..6065e5c10 --- /dev/null +++ b/doc/sparc/init @@ -0,0 +1,18 @@ +.nr PS 12 +.nr VS 14 +.\" .fp 6 AM +.fp 6 CW +.ds Si \f6\s-1 +.ds So \f1\s+1 +.ds OQ `\h'-1p'` +.ds CQ '\h'-1p'' +.de UX +.ie \\n(UX \s-1UNIX\s0\\$1 +.el \{\ +\s-1UNIX\s0\\$1\(dg +.FS +\(dg \s-1UNIX\s0 is a registered bell of AT&T Trademark Laboratories. +.FE +.nr UX 1 +.\} +.. diff --git a/doc/sparc/intro b/doc/sparc/intro new file mode 100644 index 000000000..681bdf148 --- /dev/null +++ b/doc/sparc/intro @@ -0,0 +1,23 @@ +.so init +.hw de-vised +.TL +A fast backend for SPARC processors +.AU +Philip Homburg +Raymond Michiels +.AI +Dept. of Mathematics and Computer Science +Vrije Universiteit +Amsterdam, The Netherlands +.AB +The language EM is an intermediate language for use in compiler +construction. +In this paper we describe the construction of a so-called fast backend +which translates EM code to assembler for SPARC processors. +.br +Our construction deviates strongly from the usual procedure. We have +devised and implemented a virtual stack with which it is possible to +generate very acceptable code without much loss in compile time. +.AE +.PP +.bp diff --git a/doc/sparc/note_on_reg_wins b/doc/sparc/note_on_reg_wins new file mode 100644 index 000000000..7f045b83f --- /dev/null +++ b/doc/sparc/note_on_reg_wins @@ -0,0 +1,58 @@ +When developing a fast compiler for the Sun-4 series we have encountered +rather strange behavior of the Sun kernel. + +The problem is that when you have lots of nested procedure calls, (as +is often the case in compilers and parsers) the registers fill up which +causes a kernel trap. The kernel will then write out some of the registers +to memory to make room for another window. When you return from the nested +procedure call, just the reverse happens: yet another kernel trap so the +kernel can load the register from memory. + +Unfortunately the kernel only saves or loads a single window (= 16 register) +on each trap. This means that when you call a procedure recursively it causes +a kernel trap on almost every invocation (except for the first few). + +To illustrate this consider the following little program: + +--------------- little program ------------- +f(i) /* calls itself i times */ +int i; +{ + if (i) + f(i-1); +} + +main(argc, argv) +int argc; +char *argv[]; +{ + + + i = atoi(argv[1]); /* # loops */ + j = atoi(argv[2]); /* depth */ + + while (i--) + f(j); +} +------------ end of little program ----------- + + +The performance decreases abruptly when the depth (j) becomes larger +than 5. On a SPARC station we got the following results: + + depth run time (in seconds) + + 1 0.5 + 2 0.8 + 3 1.0 + 4 1.4 <- from here on it's +6 seconds for each + 5 7.6 step deeper. + 6 13.9 + 7 19.9 + 8 26.3 + 9 32.9 + +Things would be a lot better when instead of just 1, the kernel would +save or restore 4 windows (= 64 registers = 50% on our SPARC stations). + + -Raymond. diff --git a/doc/sparc/pics/.distr b/doc/sparc/pics/.distr new file mode 100644 index 000000000..32d6efca8 --- /dev/null +++ b/doc/sparc/pics/.distr @@ -0,0 +1,12 @@ +EM_stack.orig +EM_stack.ours +compile_bars +mem_config +perf +perf.comp +perf.d +perf.dhry +reg_layout +run-time_bars +run-time_bars.bup +signal_stack diff --git a/doc/sparc/pics/EM_stack.orig b/doc/sparc/pics/EM_stack.orig new file mode 100644 index 000000000..7cae3f3b7 --- /dev/null +++ b/doc/sparc/pics/EM_stack.orig @@ -0,0 +1,34 @@ +.PS +.ps -2 +.vs -2 +boxwid = 1.5; +boxht = 0.24 +down; +box "actual parameter n-1"; +box "." "." "." ht 0.6; +box "actual parameter 0"; +move 0.3 +box "return status block"; +{arrow <- right with .w at last box.e; \ +box invis wid 0.3 "LB" } +down +move to 2nd last box.s +move 0.1 +box "local variables" +box "compiler temporaries" +move 0.1 +box "register save block" +move 0.1 +box "dynamic local generators" +move 0.1 +box "operand" +box "operand" +move 0.1 +box "parameter m-1" +box "." "." "." ht 0.6; +box "parameter 0" with .n at last box .s +{ arrow <- right with .w at last box.e; \ +box invis wid 0.3 "SP" } +.ps +2 +.vs +2 +.PE diff --git a/doc/sparc/pics/EM_stack.ours b/doc/sparc/pics/EM_stack.ours new file mode 100644 index 000000000..260f2c66f --- /dev/null +++ b/doc/sparc/pics/EM_stack.ours @@ -0,0 +1,106 @@ +.ps 10 +.vs 12 +.PS +boxwid = 1.3 +boxht = 0.25 +down; +box "floating point" "register dump area" ht 0.6 +box "tmp float store" +box "register dump area" ht 0.6 +{ arrow <- right with .w at 3/4 ; \ +box invis wid 0.3 "%fp" } +move .1 +box dotted "gap" +{ arrow <- right with .w at last box.e; \ +box invis wid 0.3 "%LB" } +move .1 +box "locals" +box "actual parameter n-1"; +box "." "." "." ht 0.6; +box "actual parameter 0"; +{ arrow <- right with .w at last box.e; \ +box invis wid 0.3 "%SP" } +move 0.1 +box "large gap" "(>64kb)" ht 1.0 +box "register dump area" ht 0.6 +{ arrow <- right with .w at 3/4 ; \ +box invis wid 0.3 "%sp" } +move 0.2 +box invis "\\s+2just before call\\s0" +move 1 +box dotted "gap" +box invis "0 or 4 bytes" "for stack alignment" with .w at last box.e +box invis height .7 "when gap is 0 bytes," "%fp == %LB" with .n at 2nd last box.s +.PF +.PS +down; +move to 2.4,0 +box "floating point" "register dump area" ht 0.6 +box "tmp float store" +box "register dump area" ht 0.6 +{ arrow <- right with .w at 3/4 ; \ +box invis wid 0.3 "%fp" } +move .1 +box dotted "gap" +{ arrow <- right with .w at last box.e; \ +box invis wid 0.3 "%LB" } +move .1 +box "locals" +box "actual parameter n-1"; +box "." "." "." ht 0.6; +box "actual parameter 0"; +{ arrow <- right with .w at last box.e; \ +box invis wid 0.3 "%SP" } +move .1 +box dotted "gap" +move .4 +box "floating point" "register dump area" ht 0.6 +box "tmp float store" +box "register dump area" ht 0.6 +{ arrow <- right with .w at 3/4 ; \ +box invis wid 0.3 "%sp" } +move 0.2 +box invis "\\s+2'during' call\\s0" +.PF +.PS +down; +move to 4.8,0 +box "floating point" "register dump area" ht 0.6 +box "tmp float store" +box "register dump area" ht 0.6 +move .1 +box dotted "gap" +move .1 +box "locals" +box "actual parameter n-1"; +box "." "." "." ht 0.6; +box "actual parameter 0"; +move .1 +box dotted "gap" +move .4 +box "floating point" "register dump area" ht 0.6 +box "tmp float store" +box "register dump area" ht 0.6 +{ arrow <- right with .w at 3/4 ; \ +box invis wid 0.3 "%fp" } +move .1 +box dotted "gap" +{ arrow <- right with .w at last box.e; \ +box invis wid 0.3 "%LB" } +move .1 +box "locals" +box "actual parameter n-1"; +box "." "." "." ht 0.6; +box "actual parameter 0"; +{ arrow <- right with .w at last box.e; \ +box invis wid 0.3 "%SP" } +move 0.1 +box "large gap" "(>64kb)" ht 1.0 +box "register dump area" ht 0.6 +{ arrow <- right with .w at 3/4 ; \ +box invis wid 0.3 "%sp" } +move 0.2 +box invis "\\s+2after call\\s0" +.PF +.ps 12 +.vs 14 diff --git a/doc/sparc/pics/compile_bars b/doc/sparc/pics/compile_bars new file mode 100644 index 000000000..657a41812 --- /dev/null +++ b/doc/sparc/pics/compile_bars @@ -0,0 +1,49 @@ +.PS +boxht = 0.5 +boxwid = 1 +moveht = 0.65 +down; +{ +right; +box invis "ACK" "w/o" "opt" +box "cem" "0.7" wid 0.7 +box "opt" "0.4" wid 0.4 +box "be" "1.1" wid 1.1 +box "as" "1.4" wid 1.4 +box "ld" "0.4" wid 0.4 +box invis "4.0" wid 0.5 +} +move +{ +right; +box invis "ACK" "with" "opt" +box "cem" "0.7" wid 0.7 +box "opt" "0.4" wid 0.4 +box "be" "0.6" wid 0.6 +box "as" "0.7" wid 0.7 +box "ld" "0.4" wid 0.4 +box invis "2.8" wid 0.5 +} +move +{ +right; +box invis "\fIcc\fR" +box "cpp" "0.2" wid 0.2 +box "ccom" "1.0" wid 1.0 +box "as" "0.7" wid 0.7 +box "ld" "0.4" wid 0.4 +box invis "2.3" wid 0.5 +} +move +{ +right; +box invis "\fIcc -O4\fR" +box "cpp" "0.2" wid 0.2 +box "ccom" "1.0" wid 1.0 +box "iropt" "5.0 (not to scale!)" wid 1.5 +box "cg" "0.7" wid 0.7 +box "as" "1.7" wid 1.7 +box "ld" "0.4" wid 0.4 +box invis "9.0" wid 0.5 +} +.PE diff --git a/doc/sparc/pics/mem_config b/doc/sparc/pics/mem_config new file mode 100644 index 000000000..0ad88184f --- /dev/null +++ b/doc/sparc/pics/mem_config @@ -0,0 +1,34 @@ +.PS +boxwid = 1.3 +down +[ +right +[ +down; +box "stack" ht .6 +box "free" ht 1 +box "heap" ht .3 +box "text" ht .5 +] +move 1 +[ +down; +box "\s-4SPARC stack\s+4" ht .2 +box "\s-4EM stack\s+4" ht .1 +box "\s-4SPARC stack\s+4" ht .1 +box "\s-4EM stack\s+4" ht .1 +box "\s-4free\s+4" ht .2 +box "\s-4SPARC stack\s+4" ht .1 +box "free" ht .8 +box "heap" ht .3 +box "text" ht .5 +] +] +move .3 +[ +right +box invis "regular \(UX memory layout" +move 1 +box invis "memory layout for EM" +] +.PF diff --git a/doc/sparc/pics/perf b/doc/sparc/pics/perf new file mode 100644 index 000000000..a48965ea0 --- /dev/null +++ b/doc/sparc/pics/perf @@ -0,0 +1,12 @@ +.G1 +frame invis left solid bot solid +label left "run time" "(log scale)" left .5 +label bot "compile time (log scale)" +coord x 0.1,10 log x y 1000,20000 log y +ticks left out at 2000,5000,10000,20000 +ticks bot out at 0.1 0.3 1.0 3.0 10 +copy "perf.d" thru X + "\(bu" at $1, $2 + "$3" rjust at $1, $2 +X +.G2 diff --git a/doc/sparc/pics/perf.comp b/doc/sparc/pics/perf.comp new file mode 100644 index 000000000..761fd0671 --- /dev/null +++ b/doc/sparc/pics/perf.comp @@ -0,0 +1,7 @@ +in-line in ../A + +2.5 17.28 ack w/o opt +1.6 2.93 ack with opt +9.4 2.26 ack -O4 +1.5 7.43 \fIcc\fR +2.7 2.02 \fIcc -O4\fR diff --git a/doc/sparc/pics/perf.d b/doc/sparc/pics/perf.d new file mode 100644 index 000000000..9cf4081e8 --- /dev/null +++ b/doc/sparc/pics/perf.d @@ -0,0 +1,4 @@ +1.0 1700 ack w/o opt +1.9 8000 ack with opt +1.6 8000 \fIcc\fR +7 18000 \fIcc -O4\fR diff --git a/doc/sparc/pics/perf.dhry b/doc/sparc/pics/perf.dhry new file mode 100644 index 000000000..8faa4e332 --- /dev/null +++ b/doc/sparc/pics/perf.dhry @@ -0,0 +1,7 @@ +in-line in ../A + +3.5 1700 ack w/o opt +2.8 8770 ack with opt +16.0 10434 ack -O4 +2.3 7270 \fIcc\fR +9.0 12500 \fIcc -O4\fR diff --git a/doc/sparc/pics/reg_layout b/doc/sparc/pics/reg_layout new file mode 100644 index 000000000..58ddd92e1 --- /dev/null +++ b/doc/sparc/pics/reg_layout @@ -0,0 +1,24 @@ +.nr PS 12 +.nr VS 14 +.PP +.TS +allbox; +l l l l +l2f6 l l2f6 l. +g0 0 l0 EM_SP +g1 temporary 1 l1 EM_LB +g2 temporary 2 l2 +g3 temporary 3 l3 reserved +g4 64k..1M l4 reserved +g5 temporary 4 l5 reserved +g6 line number l6 reserved +g7 file name l7 reserved +o0 param 1 i0 +o1 param 2 i1 +o2 param 3 i2 +o3 param 4 i3 +o4 RETL_LD i4 RETL_ST +o5 RETH_LD i5 RETH_ST +sp stack pointer fp frame pointer +o7 xxx i7 return address +.TE diff --git a/doc/sparc/pics/run-time_bars b/doc/sparc/pics/run-time_bars new file mode 100644 index 000000000..cf4d29f93 --- /dev/null +++ b/doc/sparc/pics/run-time_bars @@ -0,0 +1,101 @@ +.PS +boxht = 0.5 +boxwid = 1 +moveht = 1 +down; +{ +right; +box invis "ACK" "w/o" "opt." +move +[ +down; +boxht = 0.25 +box wid 4.5 +"Sieve" ljust at last box.w + 0.1,-0.02 +"10(!)" ljust at last box.e + 0.1,-0.02 +box wid 4.5 with .nw at last box.sw +"Dhrystones" ljust at last box.w + 0.1,-0.02 +"10(!)" ljust at last box.e + 0.1,-0.02 +] with .w at last box.e +} +move +{ +right; +box invis "ACK" "with" "our" "opt." +move +[ +down; +boxht = 0.25 +box wid 1.4 +"Sieve" ljust at last box.w + 0.1,-0.02 +"1.4" ljust at last box.e + 0.1,-0.02 +box wid 1.9 with .nw at last box.sw +"Dhrystones" ljust at last box.w + 0.1,-0.02 +"1.9" ljust at last box.e + 0.1,-0.02 +] with .w at last box.e +} +move +{ +right; +box invis "ACK" "-O4" +move +[ +down; +boxht = 0.25 +box wid 1.1 +"Sieve" ljust at last box.w + 0.1,-0.02 +"1.1" ljust at last box.e + 0.1,-0.02 +box wid 1.6 with .nw at last box.sw +"Dhrystones" ljust at last box.w + 0.1,-0.02 +"1.6" ljust at last box.e + 0.1,-0.02 +] with .w at last box.e +} +move +{ +right; +box invis "Sun's" "compiler" "w/o opt." +move +[ +down; +boxht = 0.25 +box wid 3.7 +"Sieve" ljust at last box.w + 0.1,-0.02 +"3.7" ljust at last box.e + 0.1,-0.02 +box wid 2.2 with .nw at last box.sw +"Dhrystones" ljust at last box.w + 0.1,-0.02 +"2.2" ljust at last box.e + 0.1,-0.02 +] with .w at last box.e +} +move +{ +right; +box invis "Sun's" "compiler" "-O" +move +[ +down; +boxht = 0.25 +box wid 1.1 +"Sieve" ljust at last box.w + 0.1,-0.02 +"1.1" ljust at last box.e + 0.1,-0.02 +box wid 0.8 with .nw at last box.sw +"Dhryst." ljust at last box.w + 0.1,-0.02 +"0.8!" ljust at last box.e + 0.1,-0.02 +] with .w at last box.e +} +move +{ +right; +box invis "Sun's" "compiler" "-O4" +move +[ +down; +boxht = 0.25 +box wid 1.0 +"Sieve" ljust at last box.w + 0.1,-0.02 +"1.0" ljust at last box.e + 0.1,-0.02 +box wid 1.0 with .nw at last box.sw +"Dhrystones" ljust at last box.w + 0.1,-0.02 +"1.0" ljust at last box.e + 0.1,-0.02 +] with .w at last box.e +} +.PE diff --git a/doc/sparc/pics/run-time_bars.bup b/doc/sparc/pics/run-time_bars.bup new file mode 100644 index 000000000..6bb014dad --- /dev/null +++ b/doc/sparc/pics/run-time_bars.bup @@ -0,0 +1,100 @@ +.PS +boxht = 0.5 +boxwid = 1 +moveht = 1 +down; +{ +right; +box invis "ACK" "w/o" "opt" +move +[ +down; +boxht = 0.25 +box wid 4.5 +"C (arithmetic)" ljust at last box.w + 0.1,-0.02 +"10(!)" ljust at last box.e + 0.1,-0.02 +box wid 4.5 with .nw at last box.sw +"C (dhrystones)" ljust at last box.w + 0.1,-0.02 +"10(!)" ljust at last box.e + 0.1,-0.02 +box wid 4.5 with .nw at last box.sw +"Modula-2" ljust at last box.w + 0.1,-0.02 +"8(!)" ljust at last box.e + 0.1,-0.02 +] with .w at last box.e +} +move +{ +right; +box invis "ACK" "with" "peep-hole" "opt" +move +[ +down; +boxht = 0.25 +box wid 1.4 +"C (arithmetic)" ljust at last box.w + 0.1,-0.02 +"1.4" ljust at last box.e + 0.1,-0.02 +box wid 1.9 with .nw at last box.sw +"C (dhrystones)" ljust at last box.w + 0.1,-0.02 +"1.9" ljust at last box.e + 0.1,-0.02 +box wid 2.5 with .nw at last box.sw +"Modula-2" ljust at last box.w + 0.1,-0.02 +"2.5" ljust at last box.e + 0.1,-0.02 +] with .w at last box.e +} +move +{ +right; +box invis "ACK" "-O4" +move +[ +down; +boxht = 0.25 +box wid 1.1 +"C (arithmetic)" ljust at last box.w + 0.1,-0.02 +"1.1" ljust at last box.e + 0.1,-0.02 +box wid 1.6 with .nw at last box.sw +"C (dhrystones)" ljust at last box.w + 0.1,-0.02 +"1.6" ljust at last box.e + 0.1,-0.02 +box wid 2.5 with .nw at last box.sw +"Modula-2" ljust at last box.w + 0.1,-0.02 +"2.5" ljust at last box.e + 0.1,-0.02 +] with .w at last box.e +} +move +{ +right; +box invis "Sun's" "compiler" "w/o opt." +move +[ +down; +boxht = 0.25 +box wid 3.7 +"C (arithmetic)" ljust at last box.w + 0.1,-0.02 +"3.7" ljust at last box.e + 0.1,-0.02 +box wid 2.2 with .nw at last box.sw +"C (dhrystones)" ljust at last box.w + 0.1,-0.02 +"2.2" ljust at last box.e + 0.1,-0.02 +box wid 1.8 with .nw at last box.sw +"Modula-2" ljust at last box.w + 0.1,-0.02 +"1.8" ljust at last box.e + 0.1,-0.02 +] with .w at last box.e +} +move +{ +right; +box invis "Sun's" "compiler" "-O4" +move +[ +down; +boxht = 0.25 +box wid 1.0 +"C (arith.)" ljust at last box.w + 0.1,-0.02 +"1.0" ljust at last box.e + 0.1,-0.02 +box wid 1.0 with .nw at last box.sw +"C (dhryst.)" ljust at last box.w + 0.1,-0.02 +"1.0" ljust at last box.e + 0.1,-0.02 +box wid 1.0 with .nw at last box.sw +"Modula-2" ljust at last box.w + 0.1,-0.02 +"1.0" ljust at last box.e + 0.1,-0.02 +] with .w at last box.e +} +.PE diff --git a/doc/sparc/pics/signal_stack b/doc/sparc/pics/signal_stack new file mode 100644 index 000000000..6afe5ad76 --- /dev/null +++ b/doc/sparc/pics/signal_stack @@ -0,0 +1,42 @@ +.PS +boxwid = 1.3 +down +[ +right +[ +down; +box "\s-4SPARC stack\s+4" ht .2 +box "\s-4EM stack\s+4" ht .1 +box "\s-4SPARC stack\s+4" ht .1 +box "\s-4EM stack\s+4" ht .1 +box "\s-4free\s+4" ht .2 +box "\s-4SPARC stack\s+4" ht .1 +box "free" ht .8 +box "heap" ht .3 +box "text" ht .5 +] +move 1 +[ +down; +box "\s-4SPARC stack\s+4" ht .2 +box "\s-4EM stack\s+4" ht .1 +box "\s-4SPARC stack\s+4" ht .1 +box "\s-4EM stack\s+4" ht .1 +box "\s-4free\s+4" ht .2 +box "\s-4SPARC stack\s+4" ht .1 +box "\s-4EM stack\s+4" ht .1 +box "\s-4free\s+4" ht .2 +box "\s-4SPARC stack\s+4" ht .1 +box "free" ht .4 +box "heap" ht .3 +box "text" ht .5 +] +] +move .3 +[ +right +box invis "before signal" +move 1 +box invis "during (1st) signal" +] +.PF diff --git a/doc/sparc/printP4P b/doc/sparc/printP4P new file mode 100644 index 000000000..e871c2534 --- /dev/null +++ b/doc/sparc/printP4P @@ -0,0 +1,31 @@ +echo $0 +case $1 in +1 ) + CMD="cat" +;; +2 ) + CMD="cat" +;; +3 ) + CMD="cat" +;; +4 ) + CMD="pic | tbl" +;; +5 ) + CMD="tbl" +;; +A ) + CMD="grap | pic" +;; +B ) + CMD="tbl" +;; +esac +echo $0 +if [ $0 = printP4P ] +then + refer -sA+T '-l\", ' -p refs $1 | eval $CMD | troff -ms -Tp4p | dip -Tp4p -Pp4p +else + xtroff -full -geom 665x883+566+0 -command "refer -sA+T '-l\", ' -p refs $1 | $CMD | troff -ms -Tp4p" +fi diff --git a/doc/sparc/refs b/doc/sparc/refs new file mode 100644 index 000000000..ba46c3b47 --- /dev/null +++ b/doc/sparc/refs @@ -0,0 +1,185 @@ +%T The design of very fast portable compilers +%A A.S. Tanenbaum +%A M.F. Kaashoek +%A K.G. Langendoen +%A C.J.H. Jacobs +%J SIGPLAN Notices +%V 24 +%N 11 +%P 125-131 +%D November 1989 + +%T A Programmer-friendly LL(1) Parser Generator +%A D. Grune +%A C.J.H. Jacobs +%J Software \- Practice and Experience +%V 18 +%N 1 +%P 29-38 +%D January 1988 + +%T The Code Expander Generator +%A Frans Kaashoek +%A Koen Langendoen +%R IM-9 +%I Vrije Universiteit, Amsterdam +%D November 1987 + +%T The ACK Pascal Compiler +%A Aad Geudeke +%A Frans Hofmeester +%R IM-8 +%I Vrije Universiteit, Amsterdam +%D November 1987 + +%T The EM-interpreter +%A Eddo de Groot +%A Leo van den Berge +%R IM-7 +%I Vrije Universiteit, Amsterdam +%D June 1987 + +%T A set of multi\-process primitives for stack based machines +%A K. Bot +%A E. Scheffer +%R IR-122 +%I Vrije Universiteit, Amsterdam +%D December 1986 + +%T An Occam Compiler +%A K. Bot +%A E. Scheffer +%R IM-6 +%I Vrije Universiteit, Amsterdam +%D December 1986 + +%T Language- and Machine-independent Global Optimization on Intermediate Code +%A H.E. Bal +%A A.S. Tanenbaum +%J Computer Languages +%V 11 +%N 2 +%P 105-121 +%D April 1986 + +%T The ACK Target Optimizer +%A H.E. Bal +%R IR-107 +%D 1985 +%I Vrije Universiteit, Amsterdam + +%T Some Topics in Parser Generation +%A C.J.H. Jacobs +%R IR-105 +%D October 1985 +%I Vrije Universiteit, Amsterdam + +%T The CEM compiler +%A E.H. Baalbergen +%A D. Grune +%A M. Waage +%R IM-4 +%I Vrije Universiteit, Amsterdam +%D 1985 + +%T The Design and Implementation of the EM Global Optimizer +%A H.E. Bal +%I Vrije Universiteit, Amsterdam +%R IR-99 +%D March 1985 + +%T Does anybody out there want to write HALF of a compiler? +%A A.S. Tanenbaum +%A E.G. Keizer +%A H. van Staveren +%J Sigplan Notices +%V 19 +%N 8 +%P 106-108 +%D August 1984 + +%T Amsterdam Compiler Kit documentation +%A A.S. Tanenbaum et. al. +%I Vrije Universiteit, Amsterdam +%R IR-90 +%D June 1984 + +%T A Practical Toolkit for Making Portable Compilers +%A A. S. Tanenbaum +%A H. van Staveren +%A E. G. Keizer +%A J. W. Stevenson +%J Communications of the ACM +%V 26 +%N 9 +%P 654-660 +%D September 1983 + +%T Description of a Machine Architecture for use with Block Structured +Languages +%A A. S. Tanenbaum +%A H. van Staveren +%A E. G. Keizer +%A J. W. Stevenson +%R IR-81 +%D August 1983 +%I Vrije Universiteit, Amsterdam + +%T A Unix Toolkit for Making Portable Compilers +%A A.S. Tanenbaum +%A H. van Staveren +%A E.G. Keizer +%A J.W. Stevenson +%J Proceedings USENIX conf. +%C Toronto, Canada +%V 26 +%D July 1983 +%P 255-261 + +%T Using Peephole Optimization on Intermediate Code +%A A.S. Tanenbaum +%A J.M. van Staveren +%A J.W. Stevenson +%J TOPLAS +%V 4 +%N 1 +%P 21-36 +%D January 1982 + +%T EM-1 Compiler +%A A.S. Tanenbaum +%J Pascal News +%D September 1981 +%P 4-38 + +%T A portable compiler for the Proposed ISO Standard Pascal Language +%A A.S. Tanenbaum +%A J.W. Stevenson +%A H. van Staveren +%J Sigplan Notices +%V 15 +%N 10 +%D 1980 + +%T Implications of Structured Programming for Machine Architecture +%A A.S. Tanenbaum +%J CACM +%V 21 +%N 3 +%P 237-246 +%D March 1978 + +%T The table driven code generator from the Amsterdam Compiler Kit (Second +revised edition) +%A H. van Staveren +%I Vrije Universiteit, Amsterdam +%R on-line internal ACK documentation +%D early 1985 + +%T Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules +%A R.P. Weicker +%J Sigplan Notices +%V 23 +%N 8 +%D august 1988 +%P 49-62 diff --git a/doc/sparc/timing b/doc/sparc/timing new file mode 100644 index 000000000..9887db71b --- /dev/null +++ b/doc/sparc/timing @@ -0,0 +1,22 @@ + DHRYSTONES V2.0 + + cc cc -O4 cc -O fccO fccCE ack ack -O4 +compile time: + real 4.0 12.0 10.0 6.4 8.0 31.0 + user 1.6 7.3 4.1 1.9 1.8 2.0 9.3 + sys 0.9 2.1 1.8 2.5 1.5 2.0 7.7 + +run time: 7263 16250 15250 4730 3430 8474 10434 +(stones/sec) + + SIEVE + + cc cc -O4 fccO fccCE ack ack -O4 +compile time: + real 2.4 4.4 x 3.3 6.4 17.0 + user 0.8 1.6 x 0.7 0.7 3.2 + sys 0.7 1.0 x 0.8 1.3 6.2 + +run time: 7.43 2.02 x 12.18 2.93 2.26 + +All ack-derived compilers are shell script driven diff --git a/doc/sparc/title b/doc/sparc/title new file mode 100644 index 000000000..55be4607e --- /dev/null +++ b/doc/sparc/title @@ -0,0 +1,15 @@ +.so init +.TL +.sp 1.2c +A fast backend for SPARC processors +.AU +Philip Homburg +Raymond Michiels +.AI +Dept. of Mathematics and Computer Science +Vrije Universiteit +Amsterdam, The Netherlands +.PP +.sp 1i +Afstudeerverslag, 20 augustus 1990 +.bp -- 2.34.1