doc/i80.doc

   1 . \" $Id: i80.doc,v 2.6 1994/06/24 10:01:54 ceriel Exp $
   2 .RP
   3 .ND April 1985
   4 .TL
   5 Back end table for the Intel 8080 micro-processor
   6 .AU
   7 Gerard Buskermolen
   8 .AB
   9 A back end is a part of the Amsterdam Compiler Kit (ACK).
  10 It translates EM, a family of intermediate languages, into the
  11 assembly language of some target machine, here the Intel 8080 and Intel 8085 microprocessors.
  12 .AE
  13 .NH1
  14 INTRODUCTION
  15 .PP
  16 To simplify the task of producing portable (cross) compilers and
  17 interpreters, the Vrije Universiteit designed an integrated collection
  18 of programs, the Amsterdam Compiler Kit (ACK).
  19 It is based on the old UNCOL-idea ([4]) which attempts to solve the problem
  20 of making a compiler for each of
  21 .B N
  22 languages on
  23 .B M
  24 different machines without having to write
  25 .B N\ *\ M
  26 programs.
  27 .sp 1
  28 The UNCOL approach is to write
  29 .B N
  30 "front ends", each of which translates one source language into
  31 a common intermediate language, UNCOL (UNiversal Computer Oriented
  32 Language), and
  33 .B M
  34 "back ends", each of which translates programs in UNCOL into a
  35 specific machine language.
  36 Under these conditions, only
  37 .B N\ +\ M
  38 programs should be written to provide all
  39 .B N
  40 languages on all
  41 .B M
  42 machines, instead of
  43 .B N\ *\ M
  44 programs.
  45 .sp 1
  46 The intermediate language for the Amsterdam Compiler Kit is the machine
  47 language for a simple stack machine called EM (Encoding Machine).
  48 So a back end for the Intel 8080 micro translates EM code into
  49 8080 assembly language.
  50 .sp 1
  51 The back end is a single program that is driven by a machine dependent
  52 driving table.
  53 This driving table, or back end table,
  54 defines the mapping from EM code to the machine's assembly language.
  55 .NH 1
  56 THE 8080 MICRO PROCESSOR
  57 .PP
  58 This back end table can be used without modification for the Intel 8085
  59 processor.
  60 Except for two additional instructions, the 8085 instruction set
  61 is identical and fully compatible with the 8080 instruction set.
  62 So everywhere in this document '8080' can be read as '8080 and 8085'.
  63 .NH 2
  64 Registers
  65 .PP
  66 The 8080 processor has an 8 bit accumulator,
  67 six general purpose 8-bit registers,
  68 a 16 bit programcounter and a 16 bit stackpointer.
  69 Assembler programs can refer the accumulator by A and
  70 the general purpose registers by B, C, D, E, H and L. (*)
  71 .FS
  72 * In this document 8080 registers and mnemonics are referenced by capitals, for the sake of clarity.
  73 Nevertheless the assembler expects small letters.
  74 .FE
  75 Several instructions address registers in groups of two, thus creating
  76 16 bit registers:
  77 .DS
  78 Registers referenced:   Symbolic reference:
  79       B and C                   B
  80       D and E                   D
  81       H and L                   H
  82 .DE
  83 The first named register, contains the high order byte
  84 (H and L stand for High and Low).
  85 .br
  86 The instruction determines how the processor interprets the reference.
  87 For example, ADD B is an 8 bit operation, adding the contents of
  88 register B to accumulator A. By contrast PUSH B is a 16 bit operation
  89 pushing B and C onto the stack.
  90 .sp 1
  91 There are no index registers.
  92 .sp 1
  93 .NH 2
  94 Flip-flops
  95 .PP
  96 The 8080 microprocessor provides five flip-flops used as condition flags
  97 (S, Z, P, C, AC) and one interrupt enable flip-flop IE.
  98 .br
  99 The sign bit S is set (cleared) by certain instructions when the most significant
 100 bit of the result of an operation equals one (zero).
 101 The zero bit Z is set (cleared) by certain operations when the
 102 8-bit result of an operation equals (does not equal) zero.
 103 The parity bit P is set (cleared) if the 8-bit result of an
 104 operation includes an even (odd) number of ones.
 105 C is the normal carry bit.
 106 AC is an auxiliary carry that indicates whether there has been a carry
 107 out of bit 3 of the accumulator.
 108 This auxiliary carry is used only by the DAA instruction, which
 109 adjusts the 8-bit value in the accumulator to form two 4-bit
 110 binary coded decimal digits.
 111 Needless to say this instruction is not used in the back-end.
 112 .sp 1
 113 The interrupt enable flip-flop IE is set and cleared under
 114 program control using the instructions EI (Enable Interrupt) and
 115 DI (Disable Interrupt).
 116 It is automatically cleared when the CPU is reset and when
 117 an interrupt occurs, disabling further interrupts until IE = 1 again.
 118 .NH 2
 119 Addressing modes
 120 .NH 3
 121 Implied addressing
 122 .PP
 123 The addressing mode of some instructions is implied by the instruction itself.
 124 For example, the RAL (rotate accumulator left) instruction deals only with
 125 the accumulator, and PCHL loads the programcounter with the contents
 126 of register-pair HL.
 127 .NH 3
 128 Register addressing
 129 .PP
 130 With each instruction using register addressing,
 131 only one register is specified (except for the MOV instruction),
 132 although in many of them the accumulator is implied as
 133 second operand.
 134 Examples are CMP E, which compares register E with the accumulator,
 135 and DCR B, which decrements register B.
 136 A few instructions deal with 16 bit register-pairs:
 137 examples are DCX B, which decrements register-pair BC and the
 138 PUSH and POP instructions.
 139 .NH 3
 140 Register indirect addressing
 141 .PP
 142 Each instruction that may refer to an 8 bit register, may
 143 refer also to a memory location. In this case the letter M
 144 (for Memory) has to be used instead of a register.
 145 It indicates the memory location pointed to by H and L,
 146 so ADD M adds the contents of the memory location specified
 147 by H and L to the contents of the accumulator.
 148 .br
 149 The register-pairs BC and DE can also be used for indirect addressing,
 150 but only to load or store the accumulator.
 151 For example, STAX B stores the contents of the accumulator
 152 into the memory location addressed by register-pair BC.
 153 .NH 3
 154 Immediate addressing
 155 .PP
 156 The immediate value can be an 8 bit value, as in ADI 10 which
 157 adds 10 to the accumulator, or a 16 bit value, as in
 158 LXI H,1000, which loads 1000 in the register-pair HL.
 159 .NH 3
 160 Direct addressing
 161 .PP
 162 Jump instructions include a 16 bit address as part of the instruction.
 163 .br
 164 The instruction SHLD 1234 stores the contents of register
 165 pair HL on memory locations 1234 and 1235.
 166 The high order byte is stored at the highest address.
 167 .NH 1
 168 THE 8080 BACK END TABLE
 169 .PP
 170 The back end table is designed as described in [5].
 171 For an overall design of a back end table I refer to this document.
 172 .br
 173 This section deals with problems encountered in writing the
 174 8080 back-end table.
 175 Some remarks are made about particular parts
 176 of the table that might not seem clear at first sight.
 177 .NH 2
 178 Constant definitions
 179 .PP
 180 Word size (EM_WSIZE) and pointer size (EM_PSIZE) are both
 181 defined as two bytes.
 182 The hole between AB and LB (EM_BSIZE) is four bytes: only the
 183 return address and the local base are saved.
 184 .NH 2
 185 Registers and their properties
 186 .PP
 187 All properties have the default size of two bytes, because one-byte
 188 registers also cover two bytes when put on the real stack.
 189 .sp 1
 190 The next considerations led to the choice of register-pair BC
 191 as local base.
 192 Though saving the local base in memory would leave one more register-pair
 193 available as scratch register, it would slow down instructions
 194 as 'lol' and 'stl' too much.
 195 So a register-pair should be sacrificed as local base.
 196 Because a back-end without a free register-pair HL is completely
 197 broken-winged, the only reasonable choices are BC and DE.
 198 Though the choice between them might seem arbitrary at first sight,
 199 there is a difference between register-pairs BC and DE:
 200 the instruction XCHG exchanges the contents of register-pairs DE and
 201 HL.
 202 When DE and HL are both heavily used on the fake-stack, this instruction
 203 is very useful.
 204 Since it won't be useful too often to exchange HL with the local base
 205 and since an instruction exchanging BC and HL does not exist, BC is
 206 chosen as local base.
 207 .sp 1
 208 Many of the register properties are never mentioned in the
 209 PATTERNS part of the table.
 210 They are only needed to define the INSTRUCTIONS correctly.
 211 .sp 1
 212 The properties really used in the PATTERNS part are:
 213 .IP areg: 24
 214 the accumulator only
 215 .IP reg:
 216 any of the registers A, D, E, H or L. Of course the registers B and C which are
 217 used as local base don't possess this property.
 218 When there is a single register on the fake-stack, its value
 219 is always considered non-negative.
 220 .IP dereg:
 221 register-pair DE only
 222 .IP hlreg:
 223 register-pair HL only
 224 .IP hl_or_de:
 225 register-pairs HL and DE both have this property
 226 .IP local base:
 227 used only once (i.e. in the EM-instruction 'str 0')
 228 .PP
 229 .sp 1
 230 The stackpointer SP and the processor status word PSW have to be
 231 defined explicitly because they are needed in some instructions
 232 (i.e. SP in LXI, DCX and INX and PSW in PUSH and POP).
 233 .br
 234 It doesn't matter that the processor status word is not just register A
 235 but includes the condition flags.
 236 .NH 2
 237 Tokens
 238 .PP
 239 The tokens 'm' and 'const1' are used in the INSTRUCTIONS- and MOVES parts only.
 240 They will never be on the fake-stack.
 241 .sp 1
 242 The token 'label' reflects addresses known at assembly time.
 243 It is used to take full profit of the instructions LHLD
 244 (Load HL Direct) and SHLD (Store HL Direct).
 245 .sp 1
 246 Compared with many other back-end tables, there are only a small number of
 247 different tokens (four).
 248 Reasons are the limited addressing modes of the 8080 microprocessor,
 249 no index registers etc.
 250 For example to translate the EM-instruction
 251 .DS
 252 lol 10
 253 .DE
 254 the next 8080 instructions are generated:
 255 .DS L
 256 LXI H,10        /* load registers pair HL with value 10 */
 257 DAD B           /* add local base (BC) to HL            */
 258 MOV E,M         /* load E with byte pointed to by HL    */
 259 INX H           /* increment HL                         */
 260 MOV D,M         /* load D with next byte                */
 261 .DE
 262 Of course, instead of emitting code immediately, it could be postponed
 263 by placing something like a {LOCAL,10} on the fake-stack, but some day the above
 264 mentioned code will have to be generated, so a LOCAL-token is
 265 hardly useful.
 266 See also the comment on the load instructions.
 267 .NH 2
 268 Sets
 269 .PP
 270 Only 'src1or2' is used in the PATTERNS.
 271 .NH 2
 272 Instructions
 273 .PP
 274 Each instruction indicates whether or not the condition flags
 275 are affected, but this information will never have any influence
 276 because there are no tests in the PATTERNS part of the table.
 277 .sp 1
 278 For each instruction a cost vector indicates the number of bytes
 279 the instruction occupies and the number of time periods it takes
 280 to execute the instruction.
 281 The length of a time period depends on the clock frequency
 282 and may range from 480 nanoseconds to 2 microseconds on a
 283 8080 system and from 320 nanoseconds to 2 microseconds
 284 on a 8085 system.
 285 .sp 1
 286 In the TOKENS-part the cost of token 'm' is defined as (0,3).
 287 In fact it usually takes 3 extra time periods when this register indirect mode
 288 is used instead of register mode, but since the costs are not completely
 289 orthogonal this results in small deficiencies for the DCR, INR and MOV
 290 instructions.
 291 Although it is not particularly useful these deficiencies are
 292 corrected in the INSTRUCTIONS part, by treating the register indirect
 293 mode separately.
 294 .sp 1
 295 The costs of the conditional call and return instructions really
 296 depend on whether or not the call resp. return is actually made.
 297 However, this is not important to the behaviour of the back end.
 298 .sp 1
 299 Instructions not used in this table have been commented out.
 300 Of course many of them are used in the library routines.
 301 .NH 2
 302 Moves
 303 .PP
 304 This section is supposed to be straight-forward.
 305 .NH 2
 306 Tests
 307 .PP
 308 The TESTS section is only included to refrain
 309 .B cgg
 310 from complaining.
 311 .NH 2
 312 Stacking rules
 313 .PP
 314 When, for example, the token {const2,10} has to be stacked while
 315 no free register-pair is available, the next code is generated:
 316 .DS
 317 PUSH H
 318 LXI H,10
 319 XTHL
 320 .DE
 321 The last instruction exchanges the contents of HL with the value
 322 on top of the stack, giving HL its original value again.
 323 .NH 2
 324 Coercions
 325 .PP
 326 The coercion to unstack register A, is somewhat tricky,
 327 but unfortunately just popping PSW leaves the high-order byte in
 328 the accumulator.
 329 .sp 1
 330 The cheapest way to coerce HL to DE (or DE to HL) is by using
 331 the XCHG instruction, but it is not possible to explain
 332 .B cgg
 333 this instruction in fact exchanges the contents of these
 334 register-pairs.
 335 Before the coercion is carried out other appearances of DE and HL
 336 on the fake-stack will be moved to the real stack, because in
 337 the INSTRUCTION-part is told that XCHG destroys the contents
 338 of both DE and HL.
 339 The coercion transposing one register-pair to another one by
 340 emitting two MOV-instructions, will be used only if
 341 one of the register-pairs is the local base.
 342 .NH 2
 343 Patterns
 344 .PP
 345 As a general habit I have allocated (uses ...) all registers
 346 that should be free to generate the code, although it is not
 347 always necessary.
 348 For example in the code rule
 349 .DS
 350 pat loe
 351 uses hlreg
 352 gen lhld {label,$1}                   yields hl
 353 .DE
 354 the 'uses'-clause could have been omitted because
 355 .B cgg
 356 knows that LHLD destroys register-pair HL.
 357 .sp 1
 358 Since there is only one register with property 'hlreg',
 359 there is no difference between 'uses hlreg' (allocate a
 360 register with property 'hlreg') and 'kills hlreg' (remove
 361 all registers with property 'hlreg' from the fake-stack).
 362 The same applies for the property 'dereg'.
 363 .br
 364 Consequently 'kills' is rarely used in this back-end table.
 365 .NH 3
 366 Group 1: Load instructions
 367 .PP
 368 When a local variable must be squared, there will probably be EM-code like:
 369 .DS
 370 lol 10
 371 lol 10
 372 mli 2
 373 .DE
 374 When the code for the first 'lol 10' has been executed, DE contains the
 375 wanted value.
 376 To refrain
 377 .B cgg
 378 from emitting the code for 'lol 10' again, an extra
 379 pattern is included in the table for cases like this.
 380 The same applies for two consecutive 'loe'-s or 'lil'-s.
 381 .sp 1
 382 A bit tricky is 'lof'.
 383 It expects either DE or HL on the fake-stack, moves {const2,$1}
 384 into the other one, and eventually adds them.
 385 The 'kills' part is necessary here because if DE was on the fake-stack,
 386 .B cgg
 387 doesn't see that the contents of DE is destroyed by the code
 388 (in fact 'kills dereg' would have been sufficient: because of the
 389 DAD instruction
 390 .B cgg
 391 knows that HL is destroyed).
 392 .sp 1
 393 By lookahead,
 394 .B cgg
 395 can make a clever choice between the first and
 396 second code rule of 'loi 4'.
 397 The same applies for several other instructions.
 398 .NH 3
 399 Group 2: Store instructions
 400 .PP
 401 A similar idea as with the two consecutive identical load instructions
 402 in Group 1, applies for a store instruction followed by a corresponding load instruction.
 403 .NH 3
 404 Groups 3 and 4: Signed and unsigned integer arithmetic
 405 .PP
 406 Since the 8080 instruction set doesn't provide multiply and
 407 divide instructions, special routines are made to accomplish these tasks.
 408 .sp 1
 409 Instead of providing four slightly differing routines for 16 bit signed or
 410 unsigned division, yielding the quotient or the remainder,
 411 the routines are merged.
 412 This saves space and assembly time
 413 when several variants are used in a particular program,
 414 at the cost of a little speed.
 415 When the routine is called, bit 7 of register A indicates whether
 416 the operands should be considered as signed or as unsigned integers,
 417 and bit 0 of register A indicates whether the quotient or the
 418 remainder has to be delivered.
 419 .br
 420 The same applies for 32 bit division.
 421 .sp 1
 422 The routine doing the 16 bit unsigned multiplication could
 423 have been used for 16 bit signed multiplication too.
 424 Nevertheless a special 16 bit signed multiplication routine is
 425 provided, because this one will usually be much faster.
 426 .NH 3
 427 Group 5: Floating point arithmetic
 428 .PP
 429 Floating point is not implemented.
 430 Whenever an EM-instruction involving floating points is offered
 431 to the code-generator, it calls the corresponding
 432 library routine with the proper parameters.
 433 Each floating point library routine calls 'eunimpl',
 434 trapping with trap number 63.
 435 Some of the Pascal and C library routines output floating point
 436 EM-instructions, so code has to be generated for them.
 437 Of course this does not imply the code will ever be executed.
 438 .NH 3
 439 Group 12: Compare instructions
 440 .PP
 441 The code for 'cmu 2', with its 4 labels, is terrible.
 442 But it is the best I could find.
 443 .NH 3
 444 Group 9: Logical instructions
 445 .PP
 446 I have tried to merge both variants of the instructions 'and 2', 'ior 2' and 'xor 2',
 447 as in
 448 .DS
 449 pat and $1==2
 450 with hl_or_de hl_or_de
 451 uses reusing %1, reusing %2, hl_or_de, areg
 452 gen mov a,%1.2
 453     ana %2.2
 454     mov %a.2,a
 455     mov a,%1.1
 456     ana %2.1
 457     mov %a.1,a                     yields %a
 458 .DE
 459 but the current version of
 460 .B cgg
 461 doesn't approve this.
 462 In any case
 463 .B cgg
 464 chooses either DE or HL to store the result, using lookahead.
 465 .NH 3
 466 Group 14: Procedure call instructions
 467 .PP
 468 There is an 8 bytes function return area, called '.fra'.
 469 If only 2 bytes have to be returned, register-pair DE is used.
 470 .NH 1
 471 LIBRARY ROUTINES
 472 .PP
 473 Most of the library routines start with saving the return address
 474 and the local base, so that the parameters are on the top of the stack
 475 and the registers B and C are available as scratch registers.
 476 Since register-pair HL is needed to accomplish these tasks,
 477 and also to restore everything just before the routine returns,
 478 it is not possible to transfer data between the routines and the
 479 surrounding world through register H or L.
 480 Only registers A, D and E can be used for this.
 481 .sp
 482 When a routine returns 2 bytes, they are usually returned in
 483 registers-pair DE.
 484 When it returns more than 2 bytes they are pushed onto the stack.
 485 .br
 486 It would have been possible to let the 32 bit arithmetic routines
 487 return 2 bytes in DE and the remaining 2 bytes on the stack
 488 (this often would have saved some space and execution time),
 489 but I don't consider that as well-structured programming.
 490 .NH 1
 491 TRAPS
 492 .PP
 493 Whenever a trap, for example trying to divide by zero,
 494 occurs in a program that originally was written in C or Pascal,
 495 a special trap handler is called.
 496 This trap handler wants to write an appropriate error message on the
 497 monitor.
 498 It tries to read the message from a file (e.g. etc/pc_rt_errors in the
 499 EM home directory for Pascal programs), but since the 8080 back-end
 500 doesn't know about files, we are in trouble.
 501 This problem is solved, as far as possible, by including the 'open'-monitor call in the mon-routine.
 502 It returns with file descriptor -1.
 503 The trap handler reacts by generating another trap, with the original
 504 trap number.
 505 But this time, instead of calling the C- or Pascal trap handler again,
 506 the next message is printed on the monitor:
 507 .DS L
 508         trap number <TN>
 509         line <LN> of file <FN>
 510
 511 where   <TN> is the trap number (decimal)
 512         <LN> is the line number (decimal)
 513         <FN> is the filename of the original program
 514 .DE
 515 .sp 1
 516 Trap numbers are subdivided as follows:
 517 .IP 1-27: 20
 518 EM-machine error, as described in [3]
 519 .IP 63:
 520 an unimplemented EM-instruction is used
 521 .IP 64-127:
 522 generated by compilers, runtime systems, etc.
 523 .IP 128-252:
 524 generated by user programs
 525 .NH 1
 526 IMPLEMENTATION
 527 .PP
 528 It will not be possible to run the entire Amsterdam Compiler Kit on a
 529 8080-based computer system.
 530 One has to write a program on another
 531 system, a system where the compiler kit runs on.
 532 This program may be a mixture of high-level languages, such as
 533 C or Pascal, EM and 8080 assembly code.
 534 The program should be compiled using the compiler kit, producing 8080 machine code.
 535 This code should come available to the 8080 machine
 536 for example by downloading or
 537 by storing it in ROM (Read Only Memory).
 538 .sp 1
 539 Depending on the characteristics of the particular 8080 based system, some
 540 adaptations have to be made:
 541 .IP 1) 10
 542 In 'head_em': the base address, which is the address where the first
 543 8080 instruction will be stored, and the initial value of the
 544 stackpointer are set to 0x1000 and 0x8000 respectively.
 545 .br
 546 Other systems require other values.
 547 .IP 2)
 548 In 'head_em': before calling "__m_a_i_n", the environment
 549 pointer, argument vector and argument count will have to be pushed
 550 onto the stack.
 551 Since this back-end is tested on a system without any knowledge
 552 of these things, dummies are pushed now.
 553 .IP 3)
 554 In 'tail_em': proper routines "putchar" and "getchar" should
 555 be provided.
 556 They should write resp. read a character on/from the monitor.
 557 Maybe some conversions will have to be made.
 558 .IP 4)
 559 In 'head_em': an application program returns control to the monitor by
 560 jumping to address 0xFB52.
 561 This may have to be changed for different systems.
 562 .IP 5)
 563 In 'tail_em': the current version of the 8080 back-end has very limited I/O
 564 capabilities, because it was tested on a system that
 565 had no knowledge of files.
 566 So the implementation of the EM-instruction 'mon' is very simple;
 567 it can only do the following things:
 568 .RS
 569 .IP Monitor\ call\ 1: 40
 570 exit
 571 .IP Monitor\ call\ 3:
 572 read, always reads from the monitor.
 573 .br
 574 echos the read character.
 575 .br
 576 ignores file descriptor.
 577 .IP Monitor\ call\ 4:
 578 write, always writes on the monitor.
 579 .br
 580 ignores file descriptor.
 581 .IP Monitor\ call\ 5:
 582 open file, returns file descriptor -1.
 583 .br
 584 (compare chapter about TRAPS)
 585 .IP Monitor\ call\ 6:
 586 close file, returns error code = 0.
 587 .IP Monitor\ call\ 54:
 588 io-control, returns error code = 0.
 589 .RE
 590 .sp
 591 If the system should do file-handling the routine ".mon"
 592 should be extended thoroughly.
 593 .NH 1
 594 INTEL 8080 VERSUS ZILOG Z80 AND INTEL 8086
 595 .NH 2
 596 Introduction
 597 .PP
 598 At about the same time I developed the back end
 599 for the Intel 8080 and Intel 8085,
 600 Frans van Haarlem did the same job for the Zilog z80 microprocessor.
 601 Since the z80 processor is an extension of the 8080,
 602 any machine code offered to a 8080 processor can be offered
 603 to a z80 too.
 604 The assembly languages are quite different however.
 605 .br
 606 During the developments of the back ends we have used
 607 two micro-computers, both equipped with a z80 microprocessor.
 608 Of course the output of the 8080 back end is assembled by an
 609 8080 assembler. This should assure I have never used any of
 610 the features that are potentially available in the z80 processor,
 611 but are not part of a true 8080 processor.
 612 .sp 1
 613 As a final job, I have
 614 investigated the differences between the 8080 and z80 processors
 615 and their influence on the back ends.
 616 I have tried to measure this influence by examining the length of
 617 the generated code.
 618 I have also involved the 8086 micro-processor in this measurements.
 619 .NH 2
 620 Differences between the 8080 and z80 processors
 621 .PP
 622 Except for some features that are less important concerning back ends,
 623 there are two points where the z80 improves upon the 8080:
 624 .IP First, 18
 625 the z80 has two additional index registers, IX and IY.
 626 They are used as in
 627 .DS
 628          LD B,(IX+10)
 629 .DE
 630 The offset, here 10, should fit in one byte.
 631 .IP Second,
 632 the z80 has several additional instructions.
 633 The most important ones are:
 634 .RS
 635 .IP 1) 8
 636 The 8080 can only load or store register-pair HL direct
 637 (using LHLD or SHLD).
 638 The z80 can handle BC, DE and SP too.
 639 .IP 2)
 640 Instructions are included to ease block movements.
 641 .IP 3)
 642 There is a 16 bit subtract instruction.
 643 .IP 4)
 644 While the 8080 can only rotate the accumulator, the z80
 645 can rotate and shift each 8 bit register.
 646 .IP 5)
 647 Special routines are included to jump to near locations, saving 1 byte.
 648 .RE
 649 .NH 2
 650 Consequences for the 8080 and z80 back end
 651 .PP
 652 The most striking difference between the 8080 and z80 back ends
 653 is the choice of the local base.
 654 The writer of the z80 back end chose index register IY as local base,
 655 because this results in the cheapest coding of EM-instructions
 656 like 'lol' and 'stl'.
 657 The z80 instructions that load local 10, for example
 658 .DS
 659 LD E,(IY+10)
 660 LD D,(IY+11)
 661 .DE
 662 occupy 6 bytes and take 38 time periods to execute.
 663 The five corresponding 8080 instructions loading a local
 664 occupy 7 bytes and take 41 time periods.
 665 Although the profit of the z80 might be not world-shocking,
 666 it should be noted that as a side effect it may save some
 667 pushing and popping since register pair HL is not used.
 668 .sp 1
 669 The choice of IY as local base has its drawbacks too.
 670 The root of the problem is that it is not possible to add
 671 IY to HL.
 672 For the EM-instruction
 673 .DS
 674 lal 20
 675 .DE
 676 the z80 back end generates code like
 677 .DS
 678 LD BC,20
 679 PUSH IY
 680 POP HL
 681 ADD HL,BC
 682 .DE
 683 leaving the wanted address in HL.
 684 .br
 685 This annoying push and pop instructions are also needed in some
 686 other instructions, for instance in 'lol' when the offset
 687 doesn't fit in one byte.
 688 .sp 1
 689 Beside the choice of the local base, I think there is no
 690 fundamental difference between the 8080 and z80 back ends,
 691 except of course that the z80 back end has register pair BC
 692 and, less important, index register IX available as scratch registers.
 693 .sp 1
 694 Most of the PATTERNS in the 8080 and z80 tables are more or less
 695 a direct translation of each other.
 696 .NH 2
 697 What did I do?
 698 .PP
 699 To get an idea of the quality of the code generated by
 700 the 8080, z80 and 8086 back ends I have gathered
 701 some C programs and some Pascal programs.
 702 Then I produced 8080, z80 and 8086 code for them.
 703 Investigating the assembler listing I found the
 704 lengths of the different parts of the generated code.
 705 I have checked two areas:
 706 .IP 1) 8
 707 the entire text part
 708 .IP 2)
 709 the text part without any library routine, so only the plain user program
 710 .LP
 711 I have to admit that neither one of them is really honest.
 712 When the entire text part is checked, the result is disturbed
 713 because not always the same library routines are loaded.
 714 And when only the user program itself is considered, the result is
 715 disturbed too.
 716 For example the 8086 has a multiply instruction,
 717 so the EM-instruction 'mli 2' is translated in the main program,
 718 but the 8080 and z80 call a library routine that is not counted.
 719 Also the 8080 uses library routines at some places where the
 720 z80 does not.
 721 .sp 1
 722 But nevertheless I think the measurements will give an idea
 723 about the code produced by the three back ends.
 724 .NH 2
 725 The results
 726 .PP
 727 The table below should be read as follows.
 728 For all programs I have computed the ratio of the code-lengths
 729 of the 8080, z80 and 8086.
 730 The averages of all Pascal/C programs are listed in the table,
 731 standardized to '100' for the 8080.
 732 So the listed '107' indicates that the lengths
 733 of the text parts of the z80 programs that originally were Pascal programs,
 734 averaged 7 percent larger than in the corresponding 8080 programs.
 735 .DS C
 736  --------------------------------------------------
 737 |                       |  8080  |   z80  |  8086  |
 738  --------------------------------------------------
 739 | C, text part          |   100  |   103  |    65  |
 740 | Pascal, text part     |   100  |   107  |    55  |
 741 | C, user program       |   100  |   110  |    71  |
 742 | Pascal, user program  |   100  |   118  |    67  |
 743  --------------------------------------------------
 744 .DE
 745 .TE
 746 The most striking thing in this table is that the z80 back end appears
 747 to produce larger code than the 8080 back end.
 748 The reason is that the current z80 back end table is
 749 not very sophisticated yet.
 750 For instance it doesn't look for any EM-pattern longer than one.
 751 So the table shows that the preparations in the 8080 back end table
 752 to produce faster code (like recognizing special EM-patterns
 753 and permitting one byte registers on the fake-stack)
 754 was not just for fun, but really improved the generated code
 755 significantly.
 756 .sp 1
 757 The table shows that the 8080 table is relatively better
 758 when only the plain user program is considered instead  of the entire text part.
 759 This is not very surprising since the 8080 back end sometimes
 760 uses library routines where the z80 and especially the 8086 don't.
 761 .sp 1
 762 The difference between the 8080 and z80 on the one hand and the 8086
 763 on the other is very big.
 764 But of course it was not equal game:
 765 the 8086 is a 16 bit processor that is much more advanced than the
 766 8080 or z80 and the 8086 back end is known to produce
 767 very good code.
 768 .bp
 769 .B REFERENCES
 770 .sp 2
 771 .IP [1] 10
 772 8080/8085 Assembly Language Programming Manual,
 773 .br
 774 Intel Corporation (1977,1978)
 775 .IP [2]
 776 Andrew S. Tanenbaum, Hans van Staveren, E.G. Keizer and Johan W. Stevenson,
 777 .br
 778 A practical tool kit for making portable compilers,
 779 .br
 780 Informatica report 74, Vrije Universiteit, Amsterdam, 1983.
 781 .sp
 782 An overview on the Amsterdam Compiler Kit.
 783 .IP [3]
 784 Tanenbaum, A.S., Stevenson, J.W., Keizer, E.G., and van Staveren, H.
 785 .br
 786 Description of an experimental machine architecture for use with block
 787 structured languages,
 788 .br
 789 Informatica report 81, Vrije Universiteit, Amsterdam, 1983.
 790 .sp
 791 The defining document for EM.
 792 .IP [4]
 793 Steel, T.B., Jr.
 794 .br
 795 UNCOL: The myth and the Fact. in Ann. Rev. Auto. Prog.
 796 .br
 797 Goodman, R. (ed.), vol. 2, (1960), p325-344.
 798 .sp
 799 An introduction to the UNCOL idea by its originator.
 800 .IP [5]
 801 van Staveren, Hans
 802 .br
 803 The table driven code generator from the Amsterdam Compiler Kit
 804 (Second Revised Edition),
 805 .br
 806 Vrije Universiteit, Amsterdam.
 807 .sp
 808 The defining document for writing a back end table.
 809 .IP [6]
 810 Voors, Jan
 811 .br
 812 A back end for the Zilog z8000 micro,
 813 .br
 814 Vrije Universiteit, Amsterdam.
 815 .sp
 816 A document like this one, but for the z8000.