doc/toolkit.doc

   1 .\" $Id: toolkit.doc,v 1.4 1994/06/24 10:02:30 ceriel Exp $
   2 .RP
   3 .ND July 1984
   4 .tr ~
   5 .ds as *
   6 .TL
   7 A Practical Tool Kit for Making Portable Compilers
   8 .AU
   9 Andrew S. Tanenbaum
  10 Hans van Staveren
  11 E. G. Keizer
  12 Johan W. Stevenson
  13 .AI
  14 Mathematics Dept.
  15 Vrije Universiteit
  16 Amsterdam, The Netherlands
  17 .AB
  18 The Amsterdam Compiler Kit is an integrated collection of programs designed to
  19 simplify the task of producing portable (cross) compilers and interpreters.
  20 For each language to be compiled, a program (called a front end)
  21 must be written to
  22 translate the source program into a common intermediate code.
  23 This intermediate code can be optimized and then either directly interpreted
  24 or translated to the assembly language of the desired target machine.
  25 The paper describes the various pieces of the tool kit in some detail, as well
  26 as discussing the overall strategy.
  27 .sp
  28 Keywords: Compiler, Interpreter, Portability, Translator
  29 .sp
  30 CR Categories: 4.12, 4.13, 4.22
  31 .sp 12
  32 Author's present addresses:
  33   A.S. Tanenbaum, H. van Staveren, E.G. Keizer: Mathematics
  34      Dept., Vrije Universiteit, Postbus 7161, 1007 MC Amsterdam,
  35      The Netherlands
  36
  37   J.W. Stevenson: NV Philips, S&I, T&M, Building TQ V5, Eindhoven,
  38      The Netherlands
  39 .AE
  40 .NH 1
  41 Introduction
  42 .PP
  43 As more and more organizations acquire many micro- and minicomputers,
  44 the need for portable compilers is becoming more and more acute.
  45 The present situation, in which each hardware vendor provides its own
  46 compilers -- each with its own deficiencies and extensions, and none of them
  47 compatible -- leaves much to be desired.
  48 The ideal situation would be an integrated system containing a family
  49 of (cross) compilers, each compiler accepting a standard source language and
  50 producing code for a wide variety of target machines.
  51 Furthermore, the compilers should be compatible, so programs written in
  52 one language can call procedures written in another language.
  53 Finally, the system should be designed so as to make adding new languages
  54 and new machines easy.
  55 Such an integrated system is being built at the Vrije Universiteit.
  56 Its design and implementation is the subject of this article.
  57 .PP
  58 Our compiler building system, which is called the "Amsterdam Compiler Kit"
  59 (ACK), can be thought of as a "tool kit."
  60 It consists of a number of parts that can be combined to form compilers
  61 (and interpreters) with various properties.
  62 The tool kit is based on an idea (UNCOL) that was first suggested in 1960
  63 [7], but which never really caught on then.
  64 The problem which UNCOL attempts to solve is how to make a compiler for
  65 each of
  66 .I N
  67 languages on
  68 .I M
  69 different machines without having to write
  70 .I N
  71 x
  72 .I M
  73 programs.
  74 .PP
  75 As shown in Fig. 1, the UNCOL approach is to write
  76 .I N
  77 "front ends," each
  78 of which translates one source language to a common intermediate language,
  79 UNCOL (UNiversal Computer Oriented Language), and
  80 .I M
  81 "back ends," each
  82 of which translates programs in UNCOL to a specific machine language.
  83 Under these conditions, only
  84 .I N
  85 +
  86 .I M
  87 programs must be written to provide all
  88 .I N
  89 languages on all
  90 .I M
  91 machines, instead of
  92 .I N
  93 x
  94 .I M
  95 programs.
  96 .PP
  97 Various researchers have attempted to design a suitable UNCOL
  98 [2,8], but none of these have become popular.
  99 It is our belief that previous attempts have failed because they have been
 100 too ambitious, that is, they have tried to cover all languages
 101 and all machines using a single UNCOL.
 102 Our approach is more modest: we cater only to algebraic languages
 103 and machines whose memory consists of 8-bit bytes, each with its own address.
 104 Typical languages that could be handled include
 105 Ada, ALGOL 60, ALGOL 68, BASIC, C, FORTRAN,
 106 Modula, Pascal, PL/I, PL/M, PLAIN, and RATFOR,
 107 whereas COBOL, LISP, and SNOBOL would be less efficient.
 108 Examples of machines that could be included are the Intel 8080 and 8086,
 109 Motorola 6800, 6809, and 68000, Zilog Z80 and Z8000, DEC PDP-11 and VAX,
 110 and IBM 370 but not the Burroughs 6700, CDC Cyber, or Univac 1108 (because
 111 they are not byte-oriented).
 112 With these restrictions, we believe the old UNCOL idea can be used as the
 113 basis of a practical compiler-building system.
 114 .KF
 115 .sp 15P
 116 .ce 1
 117 Fig. 1.  The UNCOL model.
 118 .sp
 119 .KE
 120 .NH 1
 121 An Overview of the Amsterdam Compiler Kit
 122 .PP
 123 The tool kit consists of eight components:
 124 .sp
 125   1. The preprocessor.
 126   2. The front ends.
 127   3. The peephole optimizer.
 128   4. The global optimizer.
 129   5. The back end.
 130   6. The target machine optimizer.
 131   7. The universal assembler/linker.
 132   8. The utility package.
 133 .sp
 134 .PP
 135 A fully optimizing compiler,
 136 depicted in Fig. 2, has seven cascaded phases.
 137 Conceptually, each component reads an input file and writes a
 138 transformed output file to be used as input to the next component.
 139 In practice, some components may use temporary files to allow multiple
 140 passes over the input or internal intermediate files.
 141 .KF
 142 .sp 12P
 143 .ce 1
 144 Fig. 2.  Structure of the Amsterdam Compiler Kit.
 145 .sp
 146 .KE
 147 .PP
 148 In the following paragraphs we will briefly describe each component.
 149 After this overview, we will look at all of them again in more detail.
 150 A program to be compiled is first fed into the (language independent)
 151 preprocessor, which provides a simple macro facility,
 152 and similar textual facilties.
 153 The preprocessor's output is a legal program in one of the programming
 154 languages supported, whereas the input is a program possibly augmented
 155 with macros, etc.
 156 .PP
 157 This output goes into the appropriate front end, whose job it is to
 158 produce intermediate code.
 159 This intermediate code (our UNCOL) is the machine language for a simple
 160 stack machine called EM (Encoding Machine).
 161 A typical front end might build a parse tree from the input, and then
 162 use the parse tree to generate EM code, which is similar to reverse Polish.
 163 In order to perform this work, the front end has to maintain tables of
 164 declared variables, labels, etc., determine where to place the
 165 data structures in memory, and so on.
 166 .PP
 167 The EM code generated by the front end is fed into the peephole optimizer,
 168 which scans it with a window of a few instructions, replacing certain
 169 inefficient code sequences by better ones.
 170 Such a search is important because EM contains instructions to handle
 171 numerous important special cases efficiently
 172 (e.g., incrementing a variable by 1).
 173 It is our strategy to relieve the front ends of the burden of hunting for
 174 special cases because there are many front ends and only one peephole
 175 optimizer.
 176 By handling the special cases in the peephole optimizer,
 177 the front ends become simpler, easier to write and easier to maintain.
 178 .PP
 179 Following the peephole optimizer is a global optimizer [5], which
 180 unlike the peephole optimizer, examines the program as a whole.
 181 It builds a data flow graph to make possible a variety of
 182 global optimizations,
 183 among them, moving invariant code out of loops, avoiding redundant
 184 computations, live/dead analysis and eliminating tail recursion.
 185 Note that the output of the global optimizer is still EM code.
 186 .PP
 187 Next comes the back end, which differs from the front ends in a
 188 fundamental way.
 189 Each front end is a separate program, whereas the back end is a single
 190 program that is driven by a machine dependent driving table.
 191 The driving table for a specific machine tells how the EM code is mapped
 192 onto the machine's assembly language.
 193 Although a simple driving table might just macro expand each EM instruction
 194 into a sequence of target machine instructions, a much more sophisticated
 195 translation strategy is normally used, as described later.
 196 For speed, the back end does not actually read in the driving table at run time.
 197 Instead, the tables are compiled along with the back end in advance, resulting
 198 in one binary program per machine.
 199 .PP
 200 The output of the back end is a program in the assembly language of some
 201 particular machine.
 202 The next component in the pipeline reads this program and performs peephole
 203 optimization on it.
 204 The optimizations performed here involve idiosyncracies
 205 of the target machine that cannot be performed in the machine-independent
 206 EM-to-EM peephole optimizer.
 207 Typically these optimizations take advantage of special instructions or special
 208 addressing modes.
 209 .PP
 210 The optimized target machine assembly code then goes into the final
 211 component in the pipeline, the universal assembler/linker.
 212 This program assembles the input to object format, extracting routines from
 213 libraries and including them as needed.
 214 .PP
 215 The final component of the tool kit is the utility package, which contains
 216 various test programs, interpreters for EM code,
 217 EM libraries, conversion programs, and other aids for the implementer and
 218 user.
 219 .NH 1
 220 The Preprocessor
 221 .PP
 222 The function of the preprocessor is to extend all the programming languages
 223 by adding certain generally useful facilities to them in a uniform way.
 224 One of these is a simple macro system, in which the user can give names to
 225 character strings.
 226 The names can be used in the program, with the knowledge that they will be
 227 macro expanded prior to being input to the front end.
 228 Macros can be used for named constants, expanding short "procedures"
 229 in line, etc.
 230 .PP
 231 Another useful facility provided by the preprocessor is the ability to
 232 include compile-time libraries.
 233 On large projects, it is common to have all the declarations and definitions
 234 gathered together in a few files that are textually included in the programs
 235 by instructing the preprocessor to read them in, thus fooling the front end
 236 into thinking that they were part of the source program.
 237 .PP
 238 A third feature of the preprocessor is conditional compilation.
 239 The input program can be split up into labeled sections.
 240 By setting flags, some of the sections can be deleted by the preprocessor,
 241 thus allowing a family of slightly different programs to be conveniently stored
 242 on a single file.
 243 .NH 1
 244 The Front Ends
 245 .PP
 246 A front end is a program that converts input in some source language to a
 247 program in EM.
 248 At present, front ends
 249 exist or are in preparation for Pascal, C, and Plain, and are being considered
 250 for Ada, ALGOL 68, FORTRAN 77, and Modula 2.
 251 Each of the present front ends is independent of all the other ones,
 252 although a general-purpose, table-driven front end is conceivable, provided
 253 one can devise a way to express the semantics of the source language in the
 254 driving tables.
 255 The Pascal front end uses a top-down parsing algorithm (recursive descent),
 256 whereas the C and Plain front ends are bottom-up.
 257 .PP
 258 All front ends, independent of the language being compiled,
 259 produce a common intermediate code called EM, which is
 260 the assembly language for a simple stack machine.
 261 The EM machine is based on a memory architecture
 262 containing a stack for local variables, a (static) data area for variables
 263 declared in the outermost block and global to the whole program, and a heap
 264 for dynamic data structures.
 265 In some ways EM resembles P-code [6], but is more general, since it is
 266 intended for a wider class of languages than just Pascal.
 267 .PP
 268 The EM instruction set has been described elsewhere
 269 [9,10,11]
 270 so we will only briefly summarize it here.
 271 Instructions exist to:
 272 .sp
 273   1. Load a variable or constant of some length onto the stack.
 274   2. Store the top item on the stack in memory.
 275   3. Add, subtract, multiply, divide, etc. the top two stack items.
 276   4. Examine the top one or two stack items and branch conditionally.
 277   5. Call procedures and return from them.
 278 .sp
 279 .PP
 280 Loads and stores come in several variations, corresponding to the most common
 281 programming language semantics, for example, constants, simple variables,
 282 fields of a record, elements of an array, and so on.
 283 Distinctions are also made between variables local to the current block
 284 (i.e., stack frame), those in the outermost block (static storage), and those
 285 at intermediate lexicographic levels, which are accessed by following the
 286 static chain at run time.
 287 .PP
 288 All arithmetic instructions have a type (integer, unsigned, real,
 289 pointer, or set) and an
 290 operand length, which may either be explicit or may be popped from the stack
 291 at run time.
 292 Monadic branch instructions pop an item from the stack and branch if it is
 293 less than zero, less than or equal to zero, etc.
 294 Dyadic branch instructions pop two items, compare them, and branch accordingly.
 295 .PP
 296 In addition to these basic EM instructions, there is a collection of special
 297 purpose instructions (e.g., to increment a local variable), which are typically
 298 produced from the simple ones by the peephole optimizer.
 299 Although the complete EM instruction set contains nearly 150 instructions,
 300 only about 60 of them are really primitive; the rest are simply abbreviations
 301 for commonly occurring EM instruction sequences.
 302 .PP
 303 Of particular interest is the way object sizes are parametrized.
 304 The front ends allow the user to indicate how many bytes an integer, real, etc.
 305 should occupy.
 306 Given this information, the front ends can allocate memory, determining
 307 the placement of variables within the stack frame.
 308 Sizes for primitive types are restricted to 8, 16, 32, 64, etc. bits.
 309 The front ends are also parametrized by the target machine's word length
 310 and address size so they can tell, for example, how many "load" instructions
 311 to generate to move a 32-bit integer.
 312 In the examples used henceforth,
 313 we will assume a 16-bit word size and 16-bit integers.
 314 .PP
 315 Since only byte-addressable target machines are permitted,
 316 it is nearly
 317 always possible to implement any requested sizes on any target machine.
 318 For example, the designer of the back end tables for the Z80 should provide
 319 code for 8-, 16-, and 32-bit arithmetic.
 320 In our view, the Pascal, C, or Plain programmer specifies what lengths
 321 are needed,
 322 without reference to the target machine,
 323 and the back end provides it.
 324 This approach greatly enhances portability.
 325 While it is true that doing all arithmetic using 32-bit integers on the Z80
 326 will not be terribly fast, we feel that if that is what the programmer needs,
 327 it should be possible to implement it.
 328 .PP
 329 Like all assembly languages, EM has not only machine instructions, but also
 330 pseudoinstructions.
 331 These are used to indicate the start and end of each procedure, allocate
 332 and initialize storage for data, and similar functions.
 333 One particularly important pseudoinstruction is the one that is used to
 334 transmit information to the back end for optimization purposes.
 335 It can be used to suggest variables that are good candidates to assign to
 336 registers, delimit the scope of loops, indicate that certain variables
 337 contain a useful value (next operation is a load) or not (next operation is
 338 a store), and various other things.
 339 .NH 1
 340 The Peephole Optimizer
 341 .PP
 342 The peephole optimizer reads in unoptimized EM programs and writes out
 343 optimized ones.
 344 Both the input and output are expressed in a highly compact code, rather than
 345 in ASCII, to reduce the i/o time, which would otherwise dominate the CPU
 346 time.
 347 The program itself is table driven, and is, by and large, ignorant of the
 348 semantics of EM.
 349 The knowledge of EM is contained in a
 350 language- and machine-independent table consisting of about 400
 351 pattern-replacement pairs.
 352 We will briefly describe the kinds of optimizations it performs below;
 353 a more complete discussion can be found in [9].
 354 .PP
 355 Each line in the driving table describes one optimization, consisting of a
 356 pattern part and a replacement part.
 357 The pattern part is a series of one or more EM instructions and a boolean
 358 expression.
 359 The replacement part is a series of EM instructions with operands.
 360 A typical optimization might be:
 361 .sp
 362   LOL  LOC  ADI  STL  ($1 = $4) and ($2 = 1) and ($3 = 2) ==> INL $1
 363 .sp
 364 where the text prior to the ==> symbol is the pattern and the text after it is
 365 the replacement.
 366 LOL loads a local variable onto the stack, LOC loads a constant onto the stack,
 367 ADI is integer addition, and STL is store local.
 368 The pattern specifies that four consecutive EM instructions are present, with
 369 the indicated opcodes, and that furthermore the operand of the first
 370 instruction (denoted by $1) and the fourth instruction (denoted by $4) are the
 371 same, the constant pushed by LOC is 1, and the size of the integers added by
 372 ADI is 2 bytes.
 373 (EM instructions have at most one operand, so it is not necessary to specify
 374 the operand number.)
 375 Under these conditions, the four instructions can be replaced by a single INL
 376 (increment local) instruction whose operand is equal to that of LOL.
 377 .PP
 378 Although the optimizations cover a wide range, the main ones
 379 can be roughly divided into the following categories.
 380 \fIConstant folding\fR
 381 is used to evaluate constant expressions, such as 2*3~+~7 at
 382 compile time instead of run time.
 383 \fIStrength reduction\fR
 384 is used to replace one operation, such as multiply, by
 385 another, such as shift.
 386 \fIReordering of expressions\fR
 387 helps in cases like -K/5, which can be better
 388 evaluated as K/-5, because the former requires
 389 a division and a negation, whereas the latter requires only a division.
 390 \fINull instructions\fR
 391 include resetting the stack pointer after a call with 0 parameters,
 392 offsetting zero bytes to access the
 393 first element of a record, or jumping to the next instruction.
 394 \fISpecial instructions\fR
 395 are those like INL, which deal with common special cases
 396 such as adding one to a variable or comparing something to zero.
 397 \fIGroup moves\fR
 398 are useful because a sequence
 399 of consecutive moves can often be replaced with EM code
 400 that allows the back end to generate a loop instead of in line code.
 401 \fIDead code elimination\fR
 402 is a technique for removing unreachable statements, possibly made unreachable
 403 by previous optimizations.
 404 \fIBranch chain compression\fR
 405 can be applied when a branch instruction jumps to another branch instruction.
 406 The first branch can jump directly to the final destination instead of
 407 indirectly.
 408 .PP
 409 The last two optimizations logically belong in the global optimizer but are
 410 in the local optimizer for historical reasons (meaning that the local
 411 optimizer has been the only optimizer for many years and the optimizations were
 412 easy to do there).
 413 .NH 1
 414 The Global Optimizer
 415 .PP
 416 In contrast to the peephole optimizer, which examines the EM code a few lines
 417 at a time through a small window, the global optimizer examines the
 418 program's large scale structure.
 419 Three distinct types of optimizations can be found here:
 420 .sp
 421   1. Interprocedural optimizations.
 422   2. Intraprocedural optimizations.
 423   3. Basic block optimizations.
 424 .sp
 425 We will now look at each of these in turn.
 426 .PP
 427 Interprocedural optimizations are those spanning procedure boundaries.
 428 The most important one is deciding to expand procedures in line,
 429 especially short procedures that occur in loops and pass several parameters.
 430 If it takes more time or memory to pass the parameters than to do the work,
 431 the program can be improved by eliminating the procedure.
 432 The inverse optimization -- discovering long common code sequences and
 433 turning them into a procedure -- is also possible, but much more difficult.
 434 Like much of the global optimizer's work, the decision to make or not make
 435 a certain program transformation is a heuristic one, based on knowledge of
 436 how the back end works, how most target machines are organized, etc.
 437 .PP
 438 The heart of the global optimizer is its analysis of individual
 439 procedures.
 440 To perform this analysis, the optimizer must locate the basic blocks,
 441 instruction sequences which can be entered only at the top and exited
 442 only at the bottom.
 443 It then constructs a data flow graph, with the basic blocks as nodes and
 444 jumps between blocks as arcs.
 445 .PP
 446 From the data flow graph, many important properties of the program can be
 447 discovered and exploited.
 448 Chief among these is the presence of loops, indicated by cycles in the graph.
 449 One important optimization is looking for code that can be moved outside the
 450 loop, either prior to it or subsequent to it.
 451 Such code motion saves execution time, although it does not save memory.
 452 Unrolling loops is also possible and desirable in some cases.
 453 .PP
 454 Another area in which global analysis of loops is especially important is
 455 in register allocation.
 456 While it is true that EM does not have any registers to allocate,
 457 the optimizer can easily collect information to allow the
 458 back end to allocate registers wisely.
 459 For example, the global optimizer can collect static frequency-of-use
 460 and live/dead information about variables.
 461 (A variable is dead at some point in the program if its current value is
 462 not needed, i.e., the next reference to it overwrites it rather than
 463 reading it; if the current value will eventually be used, the variable is
 464 live.)
 465 If two variables are never simultaneously live over some interval of code
 466 (e.g., the body of a loop), they can be packed into a single variable,
 467 which, if used often enough, may warrant being assigned to a register.
 468 .PP
 469 Many loops involve arrays: this leads to other optimizations.
 470 If an array is accessed sequentially, with each iteration using the next
 471 higher numbered element, code improvement is often possible.
 472 Typically, a pointer to the bottom element of each array can be set up
 473 prior to the loop.
 474 Within the loop the element is accessed indirectly via the pointer, which is
 475 also incremented by the element size on each iteration.
 476 If the target machine has an autoincrement addressing mode and the pointer
 477 is assigned to a register, an array access can often be done in a single
 478 instruction.
 479 .PP
 480 Other intraprocedural optimizations include removing tail recursion
 481 (last statement is a recursive call to the procedure itself),
 482 topologically sorting the basic blocks to minimize the number of branch
 483 instructions, and common subexpression recognition.
 484 .PP
 485 The third general class of optimizations done by the global optimizer is
 486 improving the structure of a basic block.
 487 For the most part these involve transforming arithmetic or boolean
 488 expressions into forms that are likely to result in better target code.
 489 As a simple example, A~+~B*C can be converted to B*C~+~A.
 490 The latter can often
 491 be handled by loading B into a register, multiplying the register by C, and
 492 then adding in A, whereas the former may involve first putting A into a
 493 temporary, depending on the details of the code generation table.
 494 Another example of this kind of basic block optimization is transforming
 495 -B~+~A~<~0 into the equivalent, but simpler, A~<~B.
 496 .NH 1
 497 The Back End
 498 .PP
 499 The back end reads a stream of EM instructions and generates assembly code
 500 for the target machine.
 501 Although the algorithm itself is machine independent, for each target
 502 machine a machine dependent driving table must be supplied.
 503 The driving table effectively defines the mapping of EM code to target code.
 504 .PP
 505 It will be convenient to think of the EM instructions being read as a
 506 stream of tokens.
 507 For didactic purposes, we will concentrate on two kinds of tokens:
 508 those that load something onto the stack, and those that perform some operation
 509 on the top one or two values on the stack.
 510 The back end maintains at compile time a simulated stack whose behavior
 511 mirrors what the stack of a hardware EM machine would do at run time.
 512 If the current input token is a load instruction, a new entry is pushed onto
 513 the simulated stack.
 514 .PP
 515 Consider, as an example, the EM code produced for the statement K~:=~I~+~7.
 516 If K and I are
 517 2-byte local variables, it will normally be LOL I; LOC 7; ADI~2; STL K.
 518 Initially the simulated stack is empty.
 519 After the first token has been read and processed, the simulated stack will
 520 contain a stack token of type MEM with attributes telling that it is a local,
 521 giving its address, etc.
 522 After the second token has been read and processed, the top two tokens on the
 523 simulated stack will be CON (constant) on top and MEM directly underneath it.
 524 .PP
 525 At this point the back end reads the ADI~2 token and
 526 looks in the driving table to find a line or lines that define the
 527 action to be taken for ADI~2.
 528 For a typical multiregister machine, instructions will exist to add constants
 529 to registers, but not to memory.
 530 Consequently, the driving table will not contain an entry for ADI~2 with stack
 531 configuration CON, MEM.
 532 .PP
 533 The back end is now faced with the problem of how to get from its
 534 current stack configuration, CON, MEM, which is not listed, to one that is
 535 listed.
 536 The table will normally contain rules (which we call "coercions")
 537 for converting between CON, REG, MEM, and similar tokens.
 538 Therefore the back end attempts to "coerce" the stack into a configuration
 539 that
 540 .I is
 541 present in the table.
 542 A typical coercion rule might tell how to convert a MEM into
 543 a REG, namely by performing the actions of allocating a
 544 register and emitting code to move the memory word to that register.
 545 Having transformed the compile-time stack into a configuration allowed for
 546 ADI~2, the rule can be carried out.
 547 A typical rule
 548 for ADI~2 might have stack configuration REG, MEM
 549 and would emit code to add the MEM to the REG, leaving the stack
 550 with a single REG token instead of the REG and MEM tokens present before the
 551 ADI~2.
 552 .PP
 553 In general, there will be more than one possible coercion path.
 554 Assuming reasonable coercion rules for our example,
 555 we might be able to convert
 556 CON MEM into CON REG by loading the variable I into a register.
 557 Alternatively, we could coerce CON to REG by loading the constant into a register.
 558 The first coercion path does the add by first loading I into a register and
 559 then adding 7 to it.
 560 The second path first loads 7 into a register and then adds I to it.
 561 On machines with a fast LOAD IMMEDIATE instruction for small constants
 562 but no fast ADD IMMEDIATE, or vice
 563 versa, one code sequence will be preferable to the other.
 564 .PP
 565 In fact, we actually have more choices than suggested above.
 566 In both coercion paths a register must be allocated.
 567 On many machines, not every register can be used in every operation, so the
 568 choice may be important.
 569 On some machines, for example, the operand of a multiply must be in an odd
 570 register.
 571 To summarize, from any state (i.e., token and stack configuration), a
 572 variety of choices can be made, leading to a variety of different target
 573 code sequences.
 574 .PP
 575 To decide which of the various code sequences to emit, the back end must have
 576 some information about the time and memory cost of each one.
 577 To provide this information, each rule in the driving table, including
 578 coercions, specifies both the time and memory cost of the code emitted when
 579 the rule is applied.
 580 The back end can then simply try each of the legal possibilities (including all
 581 the possible register allocations) to find the cheapest one.
 582 .PP
 583 This situation is similar to that found in a chess or other game-playing
 584 program, in which from any state a finite number of moves can be made.
 585 Just as in a chess program, the back end can look at all the "moves" that can
 586 be made from each state reachable from the original state, and thus find the
 587 sequence that gives the minimum cost to a depth of one.
 588 More generally, the back end can evaluate all paths corresponding to accepting
 589 the next
 590 .I N
 591 input tokens, find the cheapest one, and then make the first move along
 592 that path, precisely the way a chess program would.
 593 .PP
 594 Since the back end is analogous to both a parser and a chess playing program,
 595 some clarifying remarks may be helpful.
 596 First, chess programs and the back end must do some look ahead, whereas the
 597 parser for a well-designed grammar can usually suffice with one input token
 598 because grammars are supposed to be unambiguous.
 599 In contrast, many legal mappings
 600 from a sequence of EM instructions to target code may exist.
 601 Second, like a parser but unlike a chess program, the back end has perfect
 602 information -- it does not have to contend with an unpredictable opponent's
 603 moves.
 604 Third, chess programs normally make a static evaluation of the board and
 605 label the
 606 .I nodes
 607 of the tree with the resulting scores.
 608 The back end, in contrast, associates costs with
 609 .I arcs
 610 (moves) rather than nodes (states).
 611 However, the difference is not essential, since it could
 612 also label each node with the cumulative cost from the root to that node.
 613 .PP
 614 As mentioned above, the cost field in the table contains
 615 .I both
 616 the time and memory costs for the code emitted.
 617 It should be clear that the back end could use either one
 618 or some linear combination of them as the scoring function for evaluating moves.
 619 A user can instruct the compiler to optimize for time or for memory or
 620 for, say,  0.3 x time + 0.7 x memory.
 621 Thus the same compiler can provide a wide range of performance options to
 622 the user.
 623 The writer of the back end table can take advantage of this flexibility by
 624 providing several code sequences with different tradeoffs for each EM
 625 instruction (e.g., in line code vs. call to a run time routine).
 626 .PP
 627 In addition to the time-space tradeoffs, by specifying the depth of search
 628 parameter,
 629 .I N ,
 630 the user can effectively also tradeoff compile time vs. object
 631 code quality, for whatever code metric has been chosen.
 632 In summary, by combining the properties of a parser and a game playing program,
 633 it is possible to make a code generator that is table driven,
 634 highly flexible, and has the ability to produce good code from a
 635 stack machine intermediate code.
 636 .NH 1
 637 The Target Machine Optimizer
 638 .PP
 639 In the model of Fig 2., the peephole optimizer comes before the global
 640 optimizer.
 641 It may happen that the code produced by the global optimizer can also
 642 be improved by another round of peephole optimization.
 643 Conceivably, the system could have been designed to iterate peephole and
 644 global optimizations until no more of either could be performed.
 645 .PP
 646 However, both of these optimizations are done on the machine independent
 647 EM code.
 648 Neither is able to take advantage of the peculiarities and idiosyncracies with
 649 which most target machines are well endowed.
 650 It is the function of the final
 651 optimizer to do any (peephole) optimizations that still remain.
 652 .PP
 653 The algorithm used here is the same as in the EM peephole optimizer.
 654 In fact, if it were not for the differences between EM syntax, which is
 655 very restricted, and target assembly language syntax,
 656 which is less so, precisely the same program could be used for both.
 657 Nevertheless, the same ideas apply concerning patterns and replacements, so
 658 our discussion of this optimizer will be restricted to one example.
 659 .PP
 660 To see what the target optimizer might do, consider the
 661 PDP-11 instruction sequence sub #2,r0;  mov (r0),x.
 662 First 2 is subtracted from register 0, then the word pointed to by it
 663 is moved to x.
 664 The PDP-11 happens to have an addressing mode to perform this sequence in
 665 one instruction: mov -(r0),x.
 666 Although it is conceivable that this instruction could be included in the
 667 back end driving table for the PDP-11, it is awkward to do so because it
 668 can occur in so many contexts.
 669 It is much easier to catch things like this in a separate program.
 670 .NH 1
 671 The Universal Assembler/Linker
 672 .PP
 673 Although assembly languages for different machines may appear very different
 674 at first glance, they have a surprisingly large intersection.
 675 We have been able to construct an assembler/linker that is almost entirely
 676 independent of the assembly language being processed.
 677 To tailor the program to a specific assembly language, it is necessary to
 678 supply a table giving the list of instructions, the bit patterns required for
 679 each one, and the language syntax.
 680 The machine independent part of the assembler/linker is then compiled with the
 681 table to produce an assembler and linker for a particular target machine.
 682 Experience has shown that writing the necessary table for a new machine can be
 683 done in less than a week.
 684 .PP
 685 To enforce a modicum of uniformity, we have chosen to use a common set of
 686 pseudoinstructions for all target machines.
 687 They are used to initialize memory, allocate uninitialized memory, determine the
 688 current segment, and similar functions found in most assemblers.
 689 .PP
 690 The assembler is also a linker.
 691 After assembling a program, it checks to see if there are any
 692 unsatisfied external references.
 693 If so, it begins reading the libraries to find the necessary routines, including
 694 them in the object file as it finds them.
 695 This approach requires libraries to be maintained in assembly language form,
 696 but eliminates the need for inventing a language to express relocatable
 697 object programs in a machine independent way.
 698 It also simplifies the assembler, since producing absolute object code is
 699 easier than producing relocatable object code.
 700 Finally, although assembly language libraries may be somewhat larger than
 701 relocatable object module libraries, the loss in speed due to having more
 702 input may be more than compensated for by not having to pass an intermediate
 703 file between the assembler and linker.
 704 .NH 1
 705 The Utility Package
 706 .PP
 707 The utility package is a collection of programs designed to aid the
 708 implementers of new front ends or new back ends.
 709 The most useful ones are the test programs.
 710 For example, one test set, EMTEST, systematically checks out a back end by
 711 executing an ever larger subset of the EM instructions.
 712 It starts out by testing LOC, LOL and a few of the other essential instructions.
 713 If these appear to work, it then tries out new instructions one at a time,
 714 adding them to the set of instructions "known" to work as they pass the tests.
 715 .PP
 716 Each instruction is tested with a variety of operands chosen from values
 717 where problems can be expected.
 718 For example, on target machines which have 16-bit index registers but only
 719 allow 8-bit displacements, a fundamentally different algorithm may be needed
 720 for accessing
 721 the first few bytes of local variables and those with offsets of thousands.
 722 The test programs have been carefully designed to thoroughly test all relevant
 723 cases.
 724 .PP
 725 In addition to EMTEST, test programs in Pascal, C, and other languages are also
 726 available.
 727 A typical test is:
 728 .sp
 729    i := 9; \fBif\fP i + 250 <> 259 \fBthen\fP error(16);
 730 .sp
 731 Like EMTEST, the other test programs systematically exercise all features of the
 732 language being tested, and do so in a way that makes it possible to pinpoint
 733 errors precisely.
 734 While it has been said that testing can only demonstrate the presence of errors
 735 and not their absence, our experience is that
 736 the test programs have been invaluable in debugging new parts of the system
 737 quickly.
 738 .PP
 739 Other utilities include programs to convert
 740 the highly compact EM code produced by front ends to ASCII and vice versa,
 741 programs to build various internal tables from human writable input formats,
 742 a variety of libraries written in or compiled to EM to make them portable,
 743 an EM assembler, and EM interpreters for various machines.
 744 .PP
 745 Interpreting the EM code instead of translating it to target machine language
 746 is useful for several reasons.
 747 First, the interpreters provide extensive run time diagnostics including
 748 an option to list the original source program (in Pascal, C, etc.) with the
 749 execution frequency or execution time for each source line printed in the
 750 left margin.
 751 Second, since an EM program is typically about one-third the size of a
 752 compiled program, large programs can be executed on small machines.
 753 Third, running the EM code directly makes it easier to pinpoint errors in
 754 the EM output of front ends still being debugged.
 755 .NH 1
 756 Summary and Conclusions
 757 .PP
 758 The Amsterdam Compiler Kit is a tool kit for building
 759 portable (cross) compilers and interpreters.
 760 The main pieces of the kit are the front ends, which convert source programs
 761 to EM code, optimizers, which improve the EM code, and back ends, which convert
 762 the EM code to target assembly language.
 763 The kit is highly modular, so writing one front end
 764 (and its associated runtime routines)
 765 is sufficient to implement
 766 a new language on a dozen or more machines, and writing one back end table
 767 and one universal assembler/linker table is all that is needed to bring up all
 768 the previously implemented languages on a new machine.
 769 In this manner, the contents, and hopefully the usefulness, of the toolkit
 770 will increase in time.
 771 .PP
 772 We believe the principal lesson to be learned from our work is that the old
 773 UNCOL idea is basically a sound way to produce compilers, provided suitable
 774 restrictions are placed on the source languages and target machines.
 775 We also believe that although compilers produced by this technology may not
 776 be equal to the very best handcrafted compilers,
 777 in terms of object code quality, they are certainly
 778 competitive with many existing compilers.
 779 However, when one factors in the cost of producing the compiler,
 780 the possible slight loss in performance may be more than compensated for by the
 781 large decrease in production cost.
 782 As a consequence of our work and similar work by other researchers [1,3,4],
 783 we expect integrated compiler building kits to become increasingly popular
 784 in the near future.
 785 .PP
 786 The toolkit is now available for various computers running the
 787 .UX
 788 operating system.
 789 For information, contact the authors.
 790 .NH 1
 791 References
 792 .LP
 793 .nr r 0 1
 794 .in +4
 795 .ti -4
 796 \fB~\n+r.\fR Graham, S.L.
 797 Table-Driven Code Generation.
 798 .I "Computer~13" ,
 799 8 (August 1980), 25-34.
 800 .PP
 801 A discussion of systematic ways to do code generation,
 802 in particular, the idea of having a table with templates that match parts of
 803 the parse tree and convert them into machine instructions.
 804 .sp 2
 805 .ti -4
 806 \fB~\n+r.\fR Haddon, B.K., and Waite, W.M.
 807 Experience with the Universal Intermediate Language Janus.
 808 .I "Software Practice & Experience~8" ,
 809 5 (Sept.-Oct. 1978), 601-616.
 810 .PP
 811 An intermediate language for use with ALGOL 68, Pascal, etc. is described.
 812 The paper discusses some problems encountered and how they were dealt with.
 813 .sp 2
 814 .ti -4
 815 \fB~\n+r.\fR Johnson, S.C.
 816 A Portable Compiler: Theory and Practice.
 817 .I "Ann. ACM Symp. Prin. Prog. Lang." ,
 818 Jan. 1978.
 819 .PP
 820 A cogent discussion of the portable C compiler.
 821 Particularly interesting are the author's thoughts on the value of
 822 computer science theory.
 823 .sp 2
 824 .ti -4
 825 \fB~\n+r.\fR Leverett, B.W., Cattell, R.G.G, Hobbs, S.O., Newcomer, J.M.,
 826 Reiner, A.H., Schatz, B.R., and Wulf, W.A.
 827 An Overview of the Production-Quality Compiler-Compiler Project.
 828 .I Computer~13 ,
 829 8 (August 1980), 38-49.
 830 .PP
 831 PQCC is a system for building compilers similar in concept but differing in
 832 details from the Amsterdam Compiler Kit.
 833 The paper describes the intermediate representation used and the code generation
 834 strategy.
 835 .sp 2
 836 .ti -4
 837 \fB~\n+r.\fR Lowry, E.S., and Medlock, C.W.
 838 Object Code Optimization.
 839 .I "Commun.~ACM~12",
 840 (Jan. 1969), 13-22.
 841 .PP
 842 A classic paper on global object code optimization.
 843 It covers data flow analysis, common subexpressions, code motion, register
 844 allocation and other techniques.
 845 .sp 2
 846 .ti -4
 847 \fB~\n+r.\fR Nori, K.V., Ammann, U., Jensen, K., Nageli, H.
 848 The Pascal P Compiler Implementation Notes.
 849 Eidgen. Tech. Hochschule, Zurich, 1975.
 850 .PP
 851 A description of the original P-code machine, used to transport the Pascal-P
 852 compiler to new computers.
 853 .sp 2
 854 .ti -4
 855 \fB~\n+r.\fR Steel, T.B., Jr. UNCOL: the Myth and the Fact. in
 856 .I "Ann. Rev. Auto. Prog."
 857 Goodman, R. (ed.), vol 2., (1960), 325-344.
 858 .PP
 859 An introduction to the UNCOL idea by its originator.
 860 .sp 2
 861 .ti -4
 862 \fB~\n+r.\fR Steel, T.B., Jr.
 863 A First Version of UNCOL.
 864 .I "Proc. Western Joint Comp. Conf." ,
 865 (1961), 371-377.
 866 .PP
 867 The first detailed proposal for an UNCOL.  By current standards it is a
 868 primitive language, but it is interesting for its historical perspective.
 869 .sp 2
 870 .ti -4
 871 \fB~\n+r.\fR Tanenbaum, A.S., van Staveren, H., and Stevenson, J.W.
 872 Using Peephole Optimization on Intermediate Code.
 873 .I "ACM Trans. Prog. Lang. and Sys. 3" ,
 874 1 (Jan. 1982) pp. 21-36.
 875 .PP
 876 A detailed description of a table-driven peephole optimizer.
 877 The driving table provides a list of patterns to match as well as the
 878 replacement text to use for each successful match.
 879 .sp 2
 880 .ti -4
 881 \fB\n+r.\fR Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren, H.
 882 Description of an Experimental Machine Architecture for use with Block
 883 Structured Languages.
 884 Informatica Rapport 81, Vrije Universiteit, Amsterdam, 1983.
 885 .PP
 886 The defining document for EM.
 887 .sp 2
 888 .ti -4
 889 \fB\n+r.\fR Tanenbaum, A.S.
 890 Implications of Structured Programming for Machine Architecture.
 891 .I "Comm. ACM~21" ,
 892 3 (March 1978), 237-246.
 893 .PP
 894 The background and motivation for the design of EM.
 895 This early version emphasized the idea of interpreting the intermediate
 896 code (then called EM-1) rather than compiling it.