doc/6500.doc

   1 . \" $Id: 6500.doc,v 2.3 1994/06/24 10:01:32 ceriel Exp $"
   2 .RP
   3 .ND Dec 1984
   4 .TL
   5 .B
   6 A backend table for the 6500 microprocessor
   7 .R
   8 .AU
   9 Jan van Dalen
  10 .AB
  11 The backend table is part of the Amsterdam Compiler Kit (ACK).
  12 It translates the intermediate language family EM to a machine
  13 code for the MCS6500 microprocessor family.
  14 .AE
  15 .bp
  16 .DS C
  17 .B
  18 THE MCS6500 MICROPROCESSOR.
  19 .R
  20 .DE
  21 .NH 0
  22 Introduction
  23 .PP
  24 Why a back end table for the MCS6500 microprocessor family.
  25 Although the MCS6500 microprocessor family has an simple
  26 instruction set and internal structure, it is used in a
  27 variety of microcomputers and homecomputers.
  28 This is because of is low cost.
  29 As an example the Apple II, a well known and width spread
  30 microprocessor, uses the MCS6502 CPU.
  31 Also the BBC homecomputer, whose popularity is growing day
  32 by day uses the MCS6502 CPU.
  33 The BBC homecomputer is based on the MCS6502 CPU although
  34 better and stronger microprocessors are available.
  35 The designers of Acorn computer Industries have probably
  36 choosen for the MCS6502 because of the amount of software
  37 available for this CPU.
  38 Since its width spreaded use, a variaty of software
  39 will be needed for it.
  40 One can think of games!!, administration programs,
  41 teaching programs, basic interpreters and other application
  42 programs.
  43 Even do it will not be possible to run the total compiler kit
  44 on a MCS6500 based computer, it is possible to write application
  45 programs in a high level language, such as Pascal or C on a
  46 minicomputer.
  47 These application programs can be tested and compiled on that
  48 minicomputer and put in a ROM (Read Only Memory), for example,
  49 cso that it an be executed by a MCS6500 CPU.
  50 The strategy of writing testprograms on a minicomputer,
  51 compile it and then execute it on a MCS6500 based
  52 microprocessor is used by the development of the back end.
  53 The minicomputer used is M68000 based one, manufactured by
  54 Bleasdale Computer Systems Ltd..
  55 The micro- or homecomputer used is a BBC microcomputer,
  56 manufactured by Acorn Computer Ltd..
  57 .NH
  58 The MOS Technology MCS6500
  59 .PP
  60 The MCS6500 is as a family of CPU devices developed by MOS
  61 Technology [1].
  62 The members of the MCS6500 family are the same chips in a
  63 different housing.
  64 The MCS6502, the big brother in the family, can handle 64k
  65 bytes of memory, while for example the MCS6504 can only handle
  66 8k bytes of memory.
  67 This difference is due to the fact that the MCS6502 is in a
  68 40 pins house and the MCS6504 has a 28 pins house, so less
  69 address lines are available.
  70 .bp
  71 .NH
  72 The MCS6500 CPU programmable registers
  73 .PP
  74 The MCS6500 series is based on the same chip so all have the
  75 same programmable registers.
  76 .sp 9
  77 .NH 2
  78 The accumulator A.
  79 .PP
  80 The accumulator A is the only register on which the arithmetic
  81 and logical instructions can be used.
  82 For example, the instruction ADC (add with carry) adds the
  83 contents of the accumulator A and a byte from memory or data.
  84 .NH 2
  85 The index register X.
  86 .PP
  87 As the name suggests this register can be used for some
  88 indirect addressing modes.
  89 The modes are explaned below.
  90 .NH 2
  91 The index register Y.
  92 .PP
  93 This register is, just as the index register X, used for
  94 certain indirect addressing modes.
  95 These addressing modes are different from the modes which
  96 use index register X.
  97 .NH 2
  98 The program counter PC
  99 .PP
 100 This is the only 16-bit register available.
 101 It is used to point to the next instruction to be
 102 carried out.
 103 .NH 2
 104 The stack pointer SP
 105 .PP
 106 The stack pointer is an 8-bit register, so the stack can contain
 107 at most 256 bytes.
 108 The CPU always appends 00000001 as highbyte of any stack address,
 109 which means that memory locations
 110 .B
 111 0100
 112 .R
 113 through
 114 .B
 115 01FF
 116 .R
 117 are permanently assigned to the stack.
 118 .sp 12
 119 .NH 2
 120 The status register
 121 .PP
 122 The status register maintains six status flags and a master
 123 interrupt control bit.
 124 .br
 125 These are the six status flags:
 126     Carry        (c)
 127     Zero         (z)
 128     Overflow     (o)
 129     Sign         (n)
 130     Decimal mode (d)
 131     Break        (b)
 132
 133
 134
 135
 136
 137 The bit (i) is the master interrupt control bit.
 138 .NH
 139 The MCS6500 memory layout.
 140 .PP
 141 In the MCS6500 memory space three area's have special meaning.
 142 These area's are:
 143 .IP 1)
 144 Top page.
 145 .IP 2)
 146 Zero page.
 147 .IP 3)
 148 The stack.
 149 .PP
 150 MCS6500 memory is divided up into pages.
 151 These pages consist 256 bytes.
 152 So in a memory address the highbyte denotes the page number
 153 and the lowbyte the offset within the page.
 154 .NH 2
 155 Top page.
 156 .PP
 157 When a MCS6500 is restared it jumps indirect via memory address
 158 .B
 159 FFFC.
 160 .R
 161 At
 162 .B
 163 FFFC
 164 .R
 165 (lowbyte) and
 166 .B
 167 FFFD
 168 .R
 169 (highbyte) there must be the address of the bootstrap subroutine.
 170 When a break instruction (BRK) occurs or an interrupt takes place,
 171 the MCS6500 jumps indirect through memory address
 172 .B
 173 FFFE.
 174 .R
 175 .B
 176 FFFE
 177 .R
 178 and
 179 .B
 180 FFFF
 181 .R
 182 thus, must contain the address of the interrupt routine.
 183 The former only goes for maskeble interrupt.
 184 There also exist a nonmaskeble interrupt.
 185 This cause the MCS6500 to jump indirect through memory address
 186 .B
 187 FFFA.
 188 .R
 189 So the top six bytes of memory are used by the operating system
 190 and therefore not available for the back end.
 191 .NH 2
 192 Zero page.
 193 .PP
 194 This page has a special meaning in the sence that addressing
 195 this page uses special opcodes.
 196 Since a page consists of 256 bytes, only one byte is needed
 197 for addressing zero page.
 198 So an instruction which uses zero page occupies two bytes.
 199 It also uses less clock cycle's while carrying out the instruction.
 200 Zero page is also needed when indirect addressing is used.
 201 This means that when indirect addressing is used, the address must
 202 reside in zero page (two consecutive bytes).
 203 In this case (the back end), zero page is used, for example
 204 to hold the local base, the second local base, the stack pointer
 205 etc.
 206 .NH 2
 207 The stack.
 208 .PP
 209 The stack is described in paragraph 3.5 about the MCS6500
 210 programmable registers.
 211 .NH
 212 The memory adressing modes
 213 .PP
 214 MCS6500 memory reference instructions use direct addressing,
 215 indexed addressing, and indirect addressing.
 216 .NH 2
 217 direct addressing.
 218 .PP
 219 Three-byte instructions use the second and third bytes of the
 220 object code to provide a direct 16-bit address:
 221 therefore, 65.536 bytes of memory can be addressed directly.
 222 The commonly used memory reference instructions also have a two-byte
 223 object code variation, where the second byte directly addresses
 224 one of the first 256 bytes.
 225 .NH 2
 226 Base page, indexed addressing.
 227 .PP
 228 In this case, the instruction has two bytes of object code.
 229 The contents of either the X or Y index registers are added to the
 230 second  object code byte in order to compute a memory address.
 231 This may be illustrated as follows:
 232 .sp 15
 233 Base page, indexed addressing, as illustrated above, is
 234 wraparound - which means that there is no carry.
 235 If the sum of the index register and second object code byte contents
 236 is more than
 237 .B
 238 FF
 239 .R
 240 , the carry bit will be dicarded.
 241 This may be illustrated as follows:
 242 .sp 9
 243 .NH 2
 244 Absolute indexed addressing.
 245 .PP
 246 In this case, the contents of either the X or Y register are added
 247 to a 16-bit direct address provided by the second and third bytes
 248 of an instruction's object code.
 249 This may be illustrated as follows:
 250 .sp 10
 251 .NH 2
 252 Indirect addressing.
 253 .PP
 254 Instructions that use simple indirect addressing have three bytes of
 255 object code.
 256 The second and third object code bytes provide a 16-bit address;
 257 therefore, the indirect address can be located anywhere in
 258 memory.
 259 This is straightforward indirect addressing.
 260 .NH 3
 261 Pre-indexed indirect addressing.
 262 .PP
 263 In this case, the object code consists of two bytes and the
 264 second object code byte provides an 8-bit address.
 265 Instructions that use pre-indexed indirect addressing add the contents
 266 of the X index register and the second object code byte to access
 267 a memory location in the first 256 bytes of memory, where the
 268 indirect address will be found:
 269 .sp 18
 270 When using pre-indexed indirect addressing, once again wraparound
 271 addition is used, which means that when the X index register contents
 272 are added to the second object code byte, any carry will be discarded.
 273 Note that only the X index register can be used with pre-indexed
 274 addressing.
 275 .NH 3
 276 Post-indexed indirect addressing.
 277 .PP
 278 In this case, the object code consists of two bytes and the
 279 second object code byte provides an 8-bit address.
 280 Now the second object code byte indentifies a location
 281 in the first 256 bytes of memory where an indirect address
 282 will be found.
 283 The contents of the Y index register are added to this indirect
 284 address.
 285 This may be illustrated as follows:
 286 .sp 18
 287 Note that only the Y index register can be used with post-indexed
 288 indirect addressing.
 289 .bp
 290 .NH
 291 What the CPU has and doesn't has.
 292 .PP
 293 Although the designers of the MCS6500 CPUs family state that
 294 there is nothing very significant about the short stack (only
 295 256 bytes) this stack caused problems for the back end.
 296 The designers say that a 256-byte stack usually is sufficient
 297 for any typical microcomputer, this is only true if the stack
 298 is used only for return addresses of the JSR (jump to
 299 subroutine) instruction.
 300 But since the EM machine is suppost to be a stack machine and
 301 high level languages need the ability of parameters and
 302 locals in there procedures and function, this short stack
 303 is unsufficiant.
 304 So an software stack is implemented in this back end, requiring two
 305 additional subroutines for stack handling.
 306 These two stack handling subroutines slow down the processing time
 307 of a program since the stack is used heavely.
 308 .PP
 309 Since parameters and locals of EM procedures are offseted
 310 from the localbase of that procedure, indirect addressing
 311 is havily used.
 312 Offsets are positive (for parameters) and negative (for
 313 local variables).
 314 As explaned before the addressing modes the MCS6500 have a
 315 post indexed indirect addressing mode.
 316 This addressing mode can only handle positive offsets.
 317 This raises a problem for accessing the local variables
 318 I have chosen for the next solution.
 319 A second local base is introduced.
 320 This second local base is the real local base subtracted by
 321 a constant BASE.
 322 In the present situation of the back end the value of BASE
 323 is 240.
 324 This means that there are 240 bytes reseved for local
 325 variables to be indirect addressed and 14 bytes for
 326 the parameters.
 327 .DS C
 328 .B
 329 THE CODE GENERATOR.
 330 .R
 331 .DE
 332 .NH 0
 333 Description of the machine table.
 334 .PP
 335 The machine description table consists of the following sections:
 336 .IP 1.
 337 The macro definitions.
 338 .IP 2.
 339 Constant definitions.
 340 .IP 3.
 341 Register definitions.
 342 .IP 4.
 343 Token definitions.
 344 .IP 5.
 345 Token expressions.
 346 .IP 6.
 347 Code rules.
 348 .IP 7.
 349 Move definitions.
 350 .IP 8.
 351 Test definitions.
 352 .IP 9.
 353 Stack definitions.
 354 .NH 2
 355 Macro definitions.
 356 .PP
 357 The macro definitions at the top of the table are expanded
 358 by the preprocessor on occurence in the rest of the table.
 359 .NH 2
 360 Constant definitions.
 361 .PP
 362 There are three constants which must be defined at first.
 363 The are:
 364 .IP EM_WSIZE: 11
 365 Number of bytes in a machine word.
 366 This is the number of bytes a simple
 367 .B
 368 loc
 369 .R
 370 instruction will put on the stack.
 371 .IP EM_PSIZE:
 372 Number of bytes in a pointer.
 373 This is the number of bytes a
 374 .B
 375 lal
 376 .R
 377 instruction will put on the stack.
 378 .IP EM_BSIZE:
 379 Number of bytes in the hole between AB and LB.
 380 The calling sequence only saves LB on the stack so this
 381 constant is equal to the pointer size.
 382 .NH 1
 383 Register definitions.
 384 .PP
 385 The only important register definition is the definition of
 386 the registerpair AX.
 387 Since the rest of the machine's registers Y, PC, ST serve
 388 special purposes, the code generator cannot use them.
 389 .NH 2
 390 Token definitions
 391 .PP
 392 There is a fake token.
 393 This token is put in the table, since the code generator generator
 394 complains if it cannot find one.
 395 .NH 2
 396 Token expression definitions.
 397 .PP
 398 The token expression is also a fake one.
 399 This token expression is put in the table, since the code generator
 400 generator complains if it cannot find one.
 401 .NH 2
 402 Code rules.
 403 .PP
 404 The code rule section is the largest section in the table.
 405 They specify EM patterns, stack patterns, code to be generated,
 406 etc.
 407 The syntax is:
 408 .IP code rule:
 409 EM pattern '|' stack pattern '|' code '|'
 410 stack replacement '|' EM replacement '|'
 411 .PP
 412 All patterns are optional, however there must be at least one
 413 pattern present.
 414 If the EM pattern is missing the rule becomes a rewriting
 415 rule or a
 416 .B
 417 coercion
 418 .R
 419 to be used when code generation cannot continue because of an
 420 invalid stack pattern.
 421 The code rules are preceeded by the word CODE:.
 422 .NH 3
 423 The EM pattern.
 424 .PP
 425 The EM pattern consists of a list of EM mnemonics followed by
 426 a boolean expression. Examples:
 427 .sp 1
 428 .br
 429 .B
 430 loe
 431 .R
 432 .sp 1
 433 will match a single
 434 .B
 435 loe
 436 .R
 437 instruction,
 438 .sp 1
 439 .br
 440 .B
 441 loc loc cif
 442 .R
 443 $1==2 && $2==8
 444 .sp 1
 445 is a pattern that will match
 446 .sp 1
 447 .br
 448 .B
 449 loc
 450 .R
 451 2
 452 .br
 453 .B
 454 loc
 455 .R
 456 8
 457 .br
 458 .B
 459 cif
 460 .R
 461 .sp 1
 462 and
 463 .sp 1
 464 .br
 465 .B
 466 lol
 467 inc
 468 stl
 469 .R
 470 $1==$3
 471 .sp 1
 472 will match for example
 473 .sp 1
 474 .br
 475 .B
 476 lol
 477 .R
 478 6
 479 .br
 480 .B
 481 inc
 482 .R
 483 .br
 484 .B
 485 stl
 486 .R
 487 6
 488 .sp 1
 489 A missing boolean expession evaluates to TRUE.
 490 .PP
 491 The code generator will match the longest EM pattern on every occasion,
 492 if two patterns of the same length match the first in the table
 493 will be chosen, while all patterns of length greater than or equal
 494 to three are considered to be of the same length.
 495 .NH 3
 496 The stack pattern.
 497 .PP
 498 The only stack pattern that can occur is R16, which means that the
 499 registerpair AX contains the word on top of the stack.
 500 If this is not the case a coersion occurs.
 501 This coersion generates a "jsr Pop", which means that the top
 502 of the stack is popped and stored in the registerpair AX.
 503 .NH 3
 504 The code part.
 505 .PP
 506 The code part consists of three parts, stack cleanup, register
 507 allocation, and code to be generated.
 508 All of these may be omitted.
 509 .NH 4
 510 Stack cleanup.
 511 .PP
 512 When generating something like a branch instruction it might be
 513 needed to empty the fake stack, that is, remove the AX registerpair.
 514 This is done by the instruction remove(ALL)
 515 .NH 4
 516 Register allocation.
 517 .PP
 518 If the machine code to be generated uses the registerpair AX,
 519 this is signaled to the code generator by the allocate(R16)
 520 instruction.
 521 If the registerpair AX resides on the fake stack, this will result
 522 in a "jsr Push", which means that the registerpair AX is pushed on
 523 the stack and will be free for further use.
 524 If registerpair AX is not on the fake stack nothing happens.
 525 .NH 4
 526 Code to be generated.
 527 .PP
 528 Code to be generated is specified as a list of items of the following
 529 kind:
 530 .IP 1)
 531 A string in double quotes("This is a string").
 532 This is copied to the codefile and a newline ('\n') is appended.
 533 Inside the string all normal C string conventions are allowed,
 534 and substitutions can be made of the following sorts.
 535 .RS
 536 .IP a)
 537 $1, $2 etc. These are the operand of the corresponding EM
 538 instructions and are printed according to there type.
 539 To put a real '$' inside the string it must be doubled ('$$').
 540 .IP b)
 541 %[1], %[2.reg], %[b.1] etc. these have there obvious meaning.
 542 If they describe a complete token (%[1]) the printformat for
 543 the token is used.
 544 If they stand fo a basic term in an expression they will be
 545 printed according to their type.
 546 To put a real '%' inside the string it must be doubled ('%%').
 547 .IP c)
 548 %( arbitrary expression %). This allows inclusion of arbitrary
 549 expressions inside strings.
 550 Usually not needed very often, so that the akward notation
 551 is not too bad.
 552 Note that %(%[1]%) is equivalent to %[1].
 553 .RE
 554 .NH 3
 555 stack replacement.
 556 .PP
 557 The stack replacement is a possibly empty list of items to be
 558 pushed on the fake stack.
 559 Three things can occur:
 560 .IP 1)
 561 %[1] is used if the registerpair AX was on the fake stack and is
 562 to be pushed back onto it.
 563 .IP 2)
 564 %[a] is used if the registerpair AX is allocated with allocate(R16)
 565 and is to be pushed onto the fake stack.
 566 .IP 3)
 567 It can also be empty.
 568 .NH 3
 569 EM replacement.
 570 .PP
 571 In exeptional cases it might be useful to leave part of the an EM
 572 pattern undone.
 573 For example, a
 574 .B
 575 sdl
 576 .R
 577 instruction might be split into two
 578 .B
 579 stl
 580 .R
 581 instructions when there is no 4-byte quantity on the stack.
 582 The EM replacement part allows one to express this.
 583 Example:
 584 .sp 1
 585 .br
 586 .B
 587 stl
 588 .R
 589 $1
 590 .B
 591 stl
 592 .R
 593 $1+2
 594 .sp 1
 595 The instructions are inserted in the stream so they can match
 596 the first part of a pattern in the next step.
 597 Note that since the code generator traverses the EM instructions
 598 in a strict linear fashion, it is impossible to let the EM
 599 replacement match later parts of a pattern.
 600 So if there is a pattern
 601 .sp 1
 602 .br
 603 .B
 604 loc
 605 stl
 606 .R
 607 $1==0
 608 .sp1
 609 and the input is
 610 .sp 1
 611 .br
 612 .B
 613 loc
 614 .R
 615 0
 616 .B
 617 sdl
 618 .R
 619 4
 620 .sp 1
 621 the
 622 .B
 623 loc
 624 .R
 625 0
 626 will be processed first, then the
 627 .B
 628 sdl
 629 .R
 630 might be split into two
 631 .B
 632 stl
 633 .R
 634 's but the pattern cannot match now.
 635 .NH 3
 636 Move definitions.
 637 .PP
 638 This definition is a fake. This definition is put in the
 639 table, since the code generator generator complains if it
 640 cannot find one.
 641 .NH 3
 642 Test definitions.
 643 .PP
 644 Test definitions aren't used by the table.
 645 .NH 3
 646 Stack definitions.
 647 .PP
 648 When the generator has to push the registerpair AX, it must
 649 know how to do so.
 650 The machine code to be generated is defined here.
 651 .NH 1
 652 Some remarks.
 653 .PP
 654 The above description of the machine table is
 655 a description of the table for the MCS6500.
 656 It uses only a part of the possibilities which the code generator
 657 generator offers.
 658 For a more precise and detailed description see [2].
 659 .DS C
 660 .B
 661 THE BACK END TABLE.
 662 .R
 663 .DE
 664 .NH 0
 665 Introduction.
 666 .PP
 667 The code rules are divided in 15 groups.
 668 These groups are:
 669 .IP 1.
 670 Load instructions.
 671 .IP 2.
 672 Store instructions.
 673 .IP 3.
 674 Integer arithmetic instructions.
 675 .IP 4.
 676 Unsigned arithmetic instructions.
 677 .IP 5.
 678 Floating point arithmetic instructions.
 679 .IP 6.
 680 Pointer arithmetic instructions.
 681 .IP 7.
 682 Increment, decrement and zero instructions.
 683 .IP 8.
 684 Convert instructions.
 685 .IP 9.
 686 Logical instructions.
 687 .IP 10.
 688 Set manipulation instructions.
 689 .IP 11.
 690 Array instructions.
 691 .IP 12.
 692 Compare instructions.
 693 .IP 13.
 694 Branch instructions.
 695 .IP 14.
 696 Procedure call instructions.
 697 .IP 15.
 698 Miscellaneous instructions.
 699 .PP
 700 From all of these groups one or two typical EM pattern will be explained
 701 in the next paragraphs.
 702 Comment is placed between /* and */ (/* This is a comment */).
 703 .NH
 704 The instructions.
 705 .NH 2
 706 The load instructions.
 707 .PP
 708 In this group a typical instruction is
 709 .B
 710 lol
 711 .R
 712 .
 713 A
 714 .B
 715 lol
 716 .R
 717 instruction pushes the word at local base + offset, where offset
 718 is the instructions argument, onto the stack.
 719 Since the MCS6500 can only offset by 256 bytes, as explaned at the
 720 memory addressing modes, there is a need for two code rules in the
 721 table.
 722 One which can offset directly and one that must explicit
 723 calculate the address of the local.
 724 .NH 3
 725 The lol instruction with indirect offsetting.
 726 .PP
 727 In this case an indirect offsetted load from the second local base
 728 is possible.
 729 The table content is:
 730 .sp 1
 731 .br
 732 .B
 733 lol
 734 .R
 735 IN($1) | |
 736 .br
 737 allocate(R16)   /* allocate registerpair AX */
 738 .br
 739 "ldy #BASE+$1"  /* load Y with the offset from the second
 740 .br
 741                                               local base */
 742 .br
 743 "lda (LBl),y"   /* load indirect the lowbyte of the word */
 744 .br
 745 "tax"           /* move register A to register X */
 746 .br
 747 "iny"           /* increment register Y (offset) */
 748 .br
 749 "lda (LBl),y"   /* load indirect the highbyte of the word */
 750 .br
 751 | %[a] | |      /* push the word onto the fake stack */
 752 .NH 3
 753 The lol instruction whose offset is to big.
 754 .PP
 755 In this case, the library subroutine "Lol" is used.
 756 This subroutine expects the offset in registerpair AX, then
 757 calculates the address of the local or parameter, and loads
 758 it into registerpair AX.
 759 The table content is:
 760 .sp 1
 761 .br
 762 .B
 763 lol
 764 .R
 765 | |
 766 .br
 767 allocate(R16)   /* allocate registerpair AX */
 768 .br
 769 "lda #[$1].h"   /* load highbyte of offset into register A */
 770 .br
 771 "ldx #[$1].l"   /* load lowbyte of offset into register X */
 772 .br
 773 "jsr Lol"       /* perform the subroutine */
 774 .br
 775 | %[a] | |      /* push word onto the fake stack */
 776 .NH 2
 777 The store instructions.
 778 .PP
 779 In this group a typical instruction is
 780 .B
 781 stl.
 782 .R
 783 A
 784 .B
 785 stl
 786 .R
 787 instruction poppes a word from the stack and stores it in the word
 788 at local base + offset, where offset is the instructions argument.
 789 Here also is the need for two code rules in the table as a result
 790 of the offset limits.
 791 .NH 3
 792 The stl instruction with indirect offsetting.
 793 .PP
 794 In this case it an indirect offsetted store from the second local
 795 base is possible.
 796 The table content is:
 797 .sp 1
 798 .br
 799 .B
 800 stl
 801 .R
 802 IN($1) | R16 |  /* expect registerpair AX on top of the
 803 .br
 804                                                         fake stack */
 805 .br
 806 "ldy #BASE+1+$1"  /* load Y with the offset from the
 807 .br
 808                                                 second local base */
 809 .br
 810 "sta (LBl),y"   /* store the highbyte of the word from A */
 811 .br
 812 "txa"           /* move register X to register A */
 813 .br
 814 "dey"           /* decrement offset */
 815 .br
 816 "sta (LBl),y"   /* store the lowbyte of the word from A */
 817 .br
 818 | | |
 819 .NH 3
 820 The stl instruction whose offset is to big.
 821 .PP
 822 In this case the library subroutine 'Stl' is used.
 823 This subroutine expects the offset in registerpair AX, then
 824 calculates the address, poppes the word stores it at its place.
 825 The table content is:
 826 .sp 1
 827 .br
 828 .B
 829 stl
 830 .R
 831 | |
 832 .br
 833 allocate(R16)   /* allocate registerpair AX */
 834 .br
 835 "lda #[$1].h"   /* load highbyte of offset in register A */
 836 .br
 837 "ldx #[$1].l"   /* load lowbyte of offset in register X */
 838 .br
 839 "jsr Stl"       /* perform the subroutine */
 840 .br
 841 | | |
 842 .NH 2
 843 Integer arithmetic instructions.
 844 .PP
 845 In this group typical instructions are
 846 .B
 847 adi
 848 .R
 849 and
 850 .B
 851 mli.
 852 .R
 853 These instructions, in this table, are implemented for 2-byte
 854 and 4-byte integers.
 855 The only arithmetic instructions available on the MCS6500 are
 856 the ADC (add with carry), and SBC (subtract with not(carry)).
 857 Not(carry) here means that in a subtraction, the one's complement
 858 of the carry is taken.
 859 The absence of multiply and division instructions forces the
 860 use of subroutines to handle these cases.
 861 Because there are no registers left to perform on the multiply
 862 and division, zero page is used here.
 863 The 4-byte integer arithmetic is implemented, because in C there
 864 exists the integer type long.
 865 A user is freely to use the type long, but will pay in performance.
 866 .NH 3
 867 The adi instruction.
 868 .PP
 869 In case of the
 870 .B
 871 adi
 872 .R
 873 2 (and
 874 .B
 875 sbi
 876 .R
 877 2) instruction there are many EM
 878 patterns, so that the instruction can be performed in line in
 879 most cases.
 880 For the worst case there exists a subroutine in the library
 881 which deals with the EM instruction.
 882 In case of a
 883 .B
 884 adi
 885 .R
 886 4 (or
 887 .B
 888 sbi
 889 .R
 890 4) there only is a subroutine to deal with it.
 891 A table content is:
 892 .sp 1
 893 .br
 894 .B
 895 lol lol adi
 896 .R
 897 (IN($1) && IN($2) && $3==2) | | /* is it in range */
 898 .br
 899 allocate(R16)   /* allocate registerpair AX */
 900 .br
 901 "ldy #BASE+$1+1" /* load Y with offset for first operand */
 902 .br
 903 "lda (LBl),y"   /* load indirect highbyte first operand */
 904 .br
 905 "pha"           /* save highbyte first operand on hard_stack */
 906 .br
 907 "dey"           /* decrement offset first operand */
 908 .br
 909 "lda (LBl),y"   /* load indirect lowbyte first operand */
 910 .br
 911 "ldy #BASE+$2"  /* load Y with offset for second operand */
 912 .br
 913 "clc"           /* clear carry for addition */
 914 .br
 915 "adc (LBl),y"   /* add the lowbytes of the operands */
 916 .br
 917 "tax"           /* store lowbyte of result in place */
 918 .br
 919 "iny"           /* increment offset second operand */
 920 .br
 921 "pla"           /* get highbyte first operand */
 922 .br
 923 "adc (LBl),y"   /* add the highbytes of the operands */
 924 .br
 925 | %[a] | |      /* push the result onto the fake stack */
 926 .NH 3
 927 The mli instruction.
 928 .PP
 929 The
 930 .B
 931 mli
 932 .R
 933 2 instruction uses most the subroutine 'Mlinp'.
 934 This subroutine expects the multiplicand in zero page
 935 at locations ARTH, ARTH+1, while the multiplier is in zero
 936 page locations ARTH+2, ARTH+3.
 937 For a description of the algorithms used for multiplication and
 938 division, see [3].
 939 A table content is:
 940 .sp  1
 941 .br
 942 .B
 943 lol lol mli
 944 .R
 945 (IN($1) && IN($2) && $3==2) | |
 946 .br
 947 allocate(R16)   /* allocate registerpair AX */
 948 .br
 949 "ldy #BASE+$1"  /* load Y with offset of multiplicand */
 950 .br
 951 "lda (LBl),y"   /* load indirect lowbyte of multiplicand */
 952 .br
 953 "sta ARTH"      /* store lowbyte in zero page */
 954 .br
 955 "iny"           /* increment offset of multiplicand */
 956 .br
 957 "lda (LBl),y"   /* load indirect highbyte of multiplicand */
 958 .br
 959 "sta ARTH+1"    /* store highbyte in zero page */
 960 .br
 961 "ldy #BASE+$2"  /* load Y with offset of multiplier */
 962 .br
 963 "lda (LBl),y"   /* load indirect lowbyte of multiplier */
 964 .br
 965 "sta ARTH+2"    /* store lowbyte in zero page */
 966 .br
 967 "iny"           /* increment offset of multiplier */
 968 .br
 969 "lda (LBl),y"   /* load indirect highbyte of multiplier */
 970 .br
 971 "sta ARTH+3"    /* store highbyte in zero page */
 972 .br
 973 "jsr Mlinp"     /* perform the multiply */
 974 .br
 975 | %[a] | |      /* push result onto fake stack */
 976 .NH 2
 977 The unsgned arithmetic instructions.
 978 .PP
 979 Since unsigned addition an subtraction is performed in the same way
 980 as signed addition and subtraction, these cases are dealt with by
 981 an EM replacement.
 982 For mutiplication and division there are special subroutines.
 983 .NH 3
 984 Unsigned addition.
 985 .PP
 986 This is an example of the EM replacement strategy.
 987 .sp 1
 988 .br
 989 .B
 990 lol lol adu
 991 .R
 992         | | | |
 993 .B
 994 lol
 995 .R
 996 $1
 997 .B
 998 lol
 999 .R
1000 $2
1001 .B
1002 adi
1003 .R
1004 $3 |
1005 .NH 2
1006 Floating point arithmetic.
1007 .PP
1008 Floating point arithmetic isn't implemented in this table.
1009 .NH 2
1010 Pointer arithmetic instructions.
1011 .PP
1012 A typical pointer arithmetic instruction is
1013 .B
1014 adp
1015 .R
1016 2.
1017 This instruction adds an offset and a pointer.
1018 A table content is:
1019 .sp 1
1020 .br
1021 .B
1022 adp
1023 .R
1024         | | | |
1025 .B
1026 loc
1027 .R
1028 $1
1029 .B
1030 adi
1031 .R
1032 2 |
1033 .NH 2
1034 Increment, decrement and zero instructions.
1035 .PP
1036 In this group a typical instruction is
1037 .B
1038 inl
1039 .R
1040 , which increments a local or parameter.
1041 The MCS6500 doesn't have an instruction to increment the
1042 accumulator A, so the 'ADC' instruction must be used.
1043 A table content is:
1044 .sp 1
1045 .br
1046 .B
1047 inl
1048 .R
1049 IN($1) | |
1050 .br
1051 allocate(R16)   /* allocate registerpair AX */
1052 .br
1053 "ldy #BASE+$1"  /* load Y with offset of the local */
1054 .br
1055 "clc"           /* clear carry for addition */
1056 .br
1057 "lda (LBl),y"   /* load indirect lowbyte of local */
1058 .br
1059 "adc #1"        /* increment lowbyte */
1060 .br
1061 "sta (LBl),y"   /* restore indirect the incremented lowbyte */
1062 .br
1063 "bcc 1f"        /* if carry is clear then ready */
1064 .br
1065 "iny"           /* increment offset of local */
1066 .br
1067 "lda (LBl),y"   /* load indirect highbyte of local */
1068 .br
1069 "adc #0"        /* add carry to highbyte */
1070 .br
1071 "sta (LBl),y\\n1:"  /* restore indirect the highbyte */
1072 .PP
1073 If the offset of the local or parameter is to big, first the
1074 local or parameter is fetched, than incremented, and then
1075 restored.
1076 .NH 2
1077 Convert instructions.
1078 .PP
1079 In this case there are two convert instructions
1080 which really do something.
1081 One of them is in line code, and deals with the extension of
1082 a character (1-byte) to an integer.
1083 The other one is a subroutine which handles the conversion
1084 between 2-byte integers and 4-byte integers.
1085 .NH 3
1086 The in line conversion.
1087 .PP
1088 The table content is:
1089 .sp 1
1090 .br
1091 .B
1092 loc loc cii
1093 .R
1094 $1==1 && $2==2 | R16 |
1095 .br
1096 "txa"           /* see if sign extension is needed */
1097 .br
1098 "bpl 1f"        /* there is no need for sign extension */
1099 .br
1100 "lda #0FFh"     /* sign extension here */
1101 .br
1102 "bne 2f"        /* conversion ready */
1103 .br
1104 "1: lda #0\\n2:"        /* no sign extension here */
1105 .NH 2
1106 Logical instructions.
1107 .PP
1108 A typical instruction in this group is the logical
1109 .B
1110 and
1111 .R
1112 on two 2-byte words.
1113 The logical
1114 .B
1115 and
1116 .R
1117 on groups of more than two bytes (max 254)
1118 is also possible and uses a library subroutine.
1119 .NH 3
1120 The logical and on 2-byte groups.
1121 .PP
1122 The table content is:
1123 .sp 1
1124 .br
1125 .B
1126 and
1127 .R
1128 $1==2 | R16 |   /* one group must be on the fake stack */
1129 .br
1130 "sta ARTH+1"    /* temporary save of first group highbyte */
1131 .br
1132 "stx ARTH"      /* temporary save of first group lowbyte */
1133 .br
1134 "jsr Pop"       /* pop second group from the stack */
1135 .br
1136 "and ARTH+1"    /* logical and on highbytes */
1137 .br
1138 "pha"           /* temporary save the result's highbyte */
1139 .br
1140 "txa"           /* logical and can only be done in A */
1141 .br
1142 "and ARTH"      /* logical and on lowbytes */
1143 .br
1144 "tax"           /* restore results lowbyte */
1145 .br
1146 "pla"           /* restore results highbyte */
1147 .br
1148 | %[1] | |      /* push result onto fake stack */
1149 .NH 2
1150 Set manipulation instructions.
1151 .PP
1152 A typical EM pattern in this group is
1153 .B
1154 loc inn zeq
1155 .R
1156 $1>0 && $1<16 && $2==2.
1157 This EM pattern works on sets of 16 bits.
1158 Sets can be bigger (max 256 bytes = 2048 bits), but than a
1159 library routine is used instead of in line code.
1160 The table content of the above EM pattern is:
1161 .sp 1
1162 .br
1163 .B
1164 loc inn zeq
1165 .R
1166 $1>0 && $1<16 && $2==2 | R16 |
1167 .br
1168 "ldy #$1+1"     /* load Y with bit number */
1169 .br
1170 "stx ARTH"      /* cannot rotate X, so use zero page */
1171 .br
1172 "1: lsr a"      /* right shift A */
1173 .br
1174 "ror ARTH"      /* right rotate zero page location */
1175 .br
1176 "dey"           /* decrement Y */
1177 .br
1178 "bne 1b"        /* shift $1 times */
1179 .br
1180 "bcc $1"        /* no carry, so bit is zero */
1181 .NH 2
1182 Array instructions.
1183 .PP
1184 In this group a typical EM pattern is
1185 .B
1186 lae lar
1187 .R
1188 defined(rom(1,3)) | | | |
1189 .B
1190 lae
1191 .R
1192 $1
1193 .B
1194 aar
1195 .R
1196 $2
1197 .B
1198 loi
1199 .R
1200 rom(1,3).
1201 This pattern uses the
1202 .B
1203 aar
1204 .R
1205 instruction, which is part of a typical EM pattern:
1206 .sp 1
1207 .br
1208 .B
1209 lae aar
1210 .R
1211 $2==2 && rom(1,3)==2 && rom(1,1)==0 | R16 | /* registerpair AX contains
1212 the index in the array */
1213 .br
1214 "pha"           /* save highbyte of index */
1215 .br
1216 "txa"           /* move lowbyte of index to A */
1217 .br
1218 "asl a"         /* shift left lowbyte == 2 times lowbyte */
1219 .br
1220 "tax"           /* restore lowbyte */
1221 .br
1222 "pla"           /* restore highbyte */
1223 .br
1224 "rol a"         /* rotate left highbyte == 2 times highbyte */
1225 .br
1226 | %[1] | adi 2 | /* push new index, add to lowerbound array */
1227 .NH 2
1228 Compare instructions.
1229 .PP
1230 In this group all EM patterns are performed by calling
1231 a subroutine.
1232 Subroutines are used here because comparison is only
1233 possible byte by byte.
1234 This means a lot of code, and since compare are used frequently
1235 a lot of in line code would be generated, and thus reducing
1236 the space left for the software stack.
1237 These subroutines can be found in the library.
1238 .NH 2
1239 Branch instructions.
1240 .PP
1241 A typical branch instruction is
1242 .B
1243 beq.
1244 .R
1245 The table content for it is:
1246 .sp 1
1247 .br
1248 .B
1249 beq
1250 .R
1251 | R16 |
1252 .br
1253 "sta BRANCH+1"  /* save highbyte second operand in zero page */
1254 .br
1255 "stx BRANCH"    /* save lowbyte second operand in zero page */
1256 .br
1257 "jsr Pop"       /* pop the first operand */
1258 .br
1259 "cmp BRANCH+1"  /* compare the highbytes */
1260 .br
1261 "bne 1f"        /* there not equal so go on */
1262 .br
1263 "cpx BRANCH"    /* compare the lowbytes */
1264 .br
1265 "beq $1\\n1:"   /* lowbytes are also equal, so branch */
1266 .PP
1267 Another typical instruction in this group is
1268 .B
1269 zeq.
1270 .R
1271 The table content is:
1272 .sp 1
1273 .br
1274 .B
1275 zeq
1276 .R
1277 | R16 |
1278 .br
1279 "tay"           /* move A to Y for setting testbits */
1280 .br
1281 "bmi $1"        /* highbyte s minus so branch */
1282 .br
1283 "txa"           /* move X to A for setting testbits */
1284 .br
1285 "beq $1\\n1:"   /* lowbyte also zero, thus branch */
1286 .NH 2
1287 Procedure call instructions.
1288 .PP
1289 In this group one code generation might seem a little
1290 akward.
1291 It is the EM instruction
1292 .B
1293 cai
1294 .R
1295 which generates a 'jsr Indir'.
1296 This is because there is no indirect jump_subroutine in the
1297 MCS6500.
1298 The only solution is to store the address in zero page, and then
1299 do a 'jsr' to a known label.
1300 At this label there must be an indirect jump instruction, which
1301 perform a jump to the address stored in zero page.
1302 In this case the label is Indir, and the address is stored in
1303 zero page at the addresses ADDR, ADDR+1.
1304 The tabel content is:
1305 .sp 1
1306 .br
1307 .B
1308 cai
1309 .R
1310 | R16 |
1311 .br
1312 "stx ADDR"      /* store lowbyte of address in zero page */
1313 .br
1314 "sta ADDR+1"    /* store highbyte of address in zero page */
1315 .br
1316 "jsr Indir"     /* use the indirect jump */
1317 .br
1318 | | |
1319 .NH 2
1320 Miscellaneous instructions.
1321 .PP
1322 In this group, as the name suggests, there is no
1323 typical EM instruction or EM pattern.
1324 Most of the MCS6500 code to be generated uses a library subroutine
1325 or is straightforward.
1326 .DS C
1327 .B
1328 PERFORMANCE.
1329 .R
1330 .DE
1331 .NH 0
1332 Introduction.
1333 .PP
1334 To measure the performance of the back end table some timing
1335 tests are done.
1336 What to time?
1337 In this case, the execution time of several Pascal statements
1338 are timed.
1339 Statements in C, which have a Pascal equivalence are timed also.
1340 The statements are timed as follows.
1341 A test program is been written, which executes two
1342 nested  for_loops from 1 to 1.000.
1343 Within these for_loops the statement, which is to be tested, is placed,
1344 so the statement will be executed 1.000.000 times.
1345 Then the same program is executed without the test statement.
1346 The time difference between the two executions is the time
1347 neccesairy to execute the test statement 1.000.000 times.
1348 The total time to execute the test statement requires thus the
1349 time difference divided by 1.000.000.
1350 .NH 0
1351 Testing Pascal statements.
1352 .PP
1353 The next statements are tested.
1354 .IP 1)
1355 int1 := 0;
1356 .IP 2)
1357 int1 := int2 - 1;
1358 .IP 3)
1359 int1 := int1 + 1;
1360 .IP 4)
1361 int1 := icon1 - icon2;
1362 .IP 5)
1363 int1 := icon2 div icon1;
1364 .IP 6)
1365 int1 := int2 * int3;
1366 .IP 7)
1367 bool := (int1 < 0);
1368 .IP 8)
1369 bool := (int1 < 3);
1370 .IP 9)
1371 bool := ((int1 > 3) or (int1 < 3))
1372 .IP 10)
1373 case int1 of 1: bool := false; 2: bool := true end;
1374 .IP 11)
1375 if int1 = 0 then int2 := 3;
1376 .IP 12)
1377 while int1 > 0 do int1 := int1 - 1;
1378 .IP 13)
1379 m := a[k];
1380 .IP 14)
1381 let2 := ['a'..'c'];
1382 .IP 15)
1383 P3(x);
1384 .IP 16)
1385 dum := F3(x);
1386 .IP 17)
1387 s.overhead := 5400;
1388 .IP 18)
1389 with s do overhead := 5400;
1390 .PP
1391 These statement were tested in a procedure test.
1392 .sp 1
1393 .br
1394 procedure test;
1395 .br
1396 var i, j, ... : integer;
1397 .br
1398     bool : boolean;
1399 .br
1400     let2 : set of char;
1401 .br
1402 begin
1403 .br
1404     for i := 1 to 1000
1405 .br
1406         for j := 1 to 1000
1407 .br
1408             STATEMENT
1409 .br
1410 end;
1411 .sp 1
1412 .PP
1413 STATEMENT is one of the statements as shown above, or it is
1414 the empty statement.
1415 The assignment of used variables, if neccesairy, is done before
1416 the first for_loop.
1417 In case of the statement which uses the procedure call, statement
1418 15, a dummy procedure is declared whose body is empty.
1419 In case of the statement which uses the function, statement 16,
1420 this function returns its argument.
1421 for the timing of C statements a similar test program was
1422 written.
1423 .sp 1
1424 .br
1425 main()
1426 .br
1427 {
1428 .br
1429     int i, j, ...;
1430 .br
1431     for (i = 1; i <= 1000; i++)
1432 .br
1433         for (j = 1; j <= 1000; j++)
1434 .br
1435             STATEMENT
1436 .br
1437 }
1438 .sp 1
1439 .NH
1440 The results.
1441 .PP
1442 Here are tables with the results of the time measurments.
1443 Times are in microseconds (10^-6).
1444 Some statements appear twice in the tables.
1445 In the second case an array of 200 integers was declerated
1446 before the variable to be tested, so this variable cannot
1447 be accessed by indirect addressing from the second local base.
1448 This results in a larger execution time of the statement to be
1449 tested.
1450 The column 68000 contains the times measured on a Bleasdale,
1451 M68000 based, computer.
1452 The times in column pdp are measured on a DEC pdp11/44, where
1453 the times from column 6500 come from a BBC microcomputer.
1454 .bp
1455 .TS
1456 expand;
1457 c s s s
1458 c c c c
1459 lw35 nw7 nw7 nw7.
1460 Pascal timing results
1461 statement       68000   pdp     6500
1462 _
1463 T{
1464 int1 := 0;
1465 T}      4.0     5.8     16.7
1466         4.0     4.2     97.8
1467 _
1468 T{
1469 int1 := int2 - 1;
1470 T}      7.2     7.1     27.2
1471         6.9     7.1     206.5
1472 _
1473 T{
1474 int1 := int1 + 1;
1475 T}      6.9     6.8     27.2
1476         6.4     6.7     106.5
1477 _
1478 T{
1479 int1 := icon1 + icon2;
1480 T}      6.2     6.2     25.6
1481         6.2     6.0     106.6
1482 _
1483 T{
1484 int1 := icon2 div icon1;
1485 T}      14.9    14.3    372.6
1486         14.9    14.7    453.7
1487 _
1488 T{
1489 int1 := int2 * int3;
1490 T}      11.5    12.0    558.1
1491         11.3    11.6    728.6
1492 _
1493 T{
1494 bool := (int1 < 0);
1495 T}      7.2     6.9     122.8
1496         7.8     8.1     453.2
1497 _
1498 T{
1499 bool := (int1 < 3);
1500 T}      7.3     7.6     126.0
1501         7.2     8.1     232.2
1502 _
1503 T{
1504 bool := ((int1 > 3) or (int1 < 3))
1505 T}      10.1    12.0    307.8
1506         10.2    11.9    440.1
1507 _
1508 T{
1509 case int1 of 1: bool := false; 2: bool := true end;
1510 T}      18.3    17.9    165.7
1511 _
1512 T{
1513 if int1 = 0 then int2 := 3;
1514 T}      9.5     8.5     133.8
1515 _
1516 T{
1517 while int1 > 0 do int1 := int1 - 1;
1518 T}      6.9     6.9     126.0
1519 _
1520 T{
1521 m := a[k];
1522 T}      7.2     6.8     134.3
1523 _
1524 T{
1525 let2 := ['a'..'c'];
1526 T}      38.4    38.8    447.4
1527 _
1528 T{
1529 P3(x);
1530 T}      18.9    18.8    180.3
1531 _
1532 T{
1533 dum := F3(x);
1534 T}      26.8    27.1    343.3
1535 _
1536 T{
1537 s.overhead := 5400;
1538 T}      4.6     4.1     16.7
1539 _
1540 T{
1541 with s do overhead := 5400;
1542 T}      4.2     4.3     16.7
1543 .TE
1544 .TS
1545 expand;
1546 c s s s
1547 c c c c
1548 lw35 nw7 nw7 nw7.
1549 C timing results
1550 statement       68000time       pdptime 6500time
1551 _
1552 T{
1553 int1 = 0;
1554 T}      4.1     3.6     17.2
1555         4.1     4.1     97.7
1556 _
1557 T{
1558 int1 = int2 - 1;
1559 T}      6.6     6.9     27.2
1560         6.1     6.5     206.4
1561 _
1562 T{
1563 int1 = int1 + 1;
1564 T}      6.4     7.3     27.2
1565         6.3     6.2     206.4
1566 _
1567 T{
1568 int1 = int2 * int3;
1569 T}      11.4    12.3    522.6
1570         9.6     10.1    721.2
1571 _
1572 T{
1573 int1 = (int2 < 0);
1574 T}      7.2     7.6     126.4
1575         7.4     7.7     232.5
1576 _
1577 T{
1578 int1 = (int2 < 3);
1579 T}      7.0     7.5     126.0
1580         7.8     7.8     232.6
1581 _
1582 T{
1583 int1 = ((int2 > 3) || (int2 < 3));
1584 T}      11.8    12.2    193.4
1585         11.5    13.2    245.6
1586 _
1587 T{
1588 switch (int1) { case 1: int1 = 0; break; case 2: int1 = 1; break; }
1589 T}      28.3    29.2    164.1
1590 _
1591 T{
1592 if (int1 == 0) int2 = 3;
1593 T}      4.8     4.8     19.4
1594 _
1595 T{
1596 while (int2 > 0) int2 = int2 - 1;
1597 T}      5.8     6.0     125.9
1598 _
1599 T{
1600 int2 = a[int2];
1601 T}      4.8     5.1     192.8
1602 _
1603 T{
1604 P3(int2);
1605 T}      18.8    18.4    180.3
1606 _
1607 T{
1608 int2 = F3(int2);
1609 T}      27.0    27.2    309.4
1610 _
1611 T{
1612 s.overhead = 5400;
1613 T}      5.0     4.1     16.7
1614 .TE
1615 .NH
1616 Pascal statements which don't have a C equivalent.
1617 .PP
1618 At first, the two statements who perform an operation on constants
1619 are left out.
1620 These are left out while the C front end does constant folding,
1621 while the Pascal front end doesn't.
1622 So in C the statements int1 = icon1 + icon2; and int1 = icon1 / icont2;
1623 will use the same amount of time since the expression is evaluated
1624 by the front end.
1625 The two other statements (let2 := ['a'..'c']; and
1626 .B
1627 with
1628 .R
1629 s
1630 .B
1631 do
1632 .R
1633 overhead := 5400;), aren't included in the C statement timing table,
1634 because there constructs do not exist in C.
1635 Although in C there can be direct bit manipulation, and thus can
1636 be used to implement sets I have not used it here.
1637 The
1638 .B
1639 with
1640 .R
1641 statement does not exists in C and there is nothing with the slightest
1642 resemblance to it.
1643 .PP
1644 At first sight in the table , it looked if there is no much difference
1645 in the times for the M68000 and the pdp11/44, in comparison with the
1646 times needed by the MCS6500.
1647 To verify this impression, I calculated the correlation coefficient
1648 between the times of the M68000 and pdp11/44.
1649 It turned out to be 0.997 for both the Pascal time tests and the C
1650 time tests.
1651 Since the correlation coefficient is near to one and the difference
1652 between the times is small, they can be considered to be the same
1653 as seen from the times of the MCS6500.
1654 Then I have tried to make a grafic of the times from the M68000 and
1655 the MCS6500.
1656 Well, there was't any correlation to been seen, taken all the times.
1657 The only correlation one could see, with some effort, was in the
1658 times for the first three Pascal statements.
1659 The two first C statements show also a correlation, which two points
1660 always do.
1661 .PP
1662 Also the three Pascal statements
1663 .B
1664 case
1665 .R
1666 ,
1667 .B
1668 if
1669 .R
1670 ,
1671 and
1672 .B
1673 while
1674 .R
1675 have a correlation coefficient of 0.999.
1676 This is probably because the
1677 .B
1678 case
1679 .R
1680 statement uses a subroutine in both cases and the other two
1681 statements
1682 .B
1683 if
1684 .R
1685 and,
1686 .B
1687 while
1688 .R
1689 generate in line code.
1690 The last two Pascal statements use the same time, since the front
1691 end wil generate the same EM code for both.
1692 .PP
1693 The independence between the rest of the test times is because
1694 in these cases the object code for the MCS6500 uses library
1695 subroutines, while the other processors can handle the EM code
1696 with in line code.
1697 .PP
1698 It is clear that the MCS6500 is a slower device, it needs longer
1699 execution times, the need of more library subroutines, but
1700 there is no constant factor between it execution times and those
1701 of other processors.
1702 .PP
1703 The slowing down of the MCS6500 as result of the need of a
1704 library subroutine is illustrated by the muliplication
1705 statement.
1706 The MCS6500 needs a library subroutine, while the other
1707 two processors have a machine instruction to perform the
1708 multiply.
1709 This results in a factor of 48.5, when the operands can be accessed
1710 indirect by the MCS6500.
1711 When the MCS6500 cannot access the operands indirectly the situation
1712 is even worse.
1713 The slight differences between the MCS6500 execution times for
1714 Pascal statements and C statements is probably the result of the
1715 front end, and thus beyond the scope of this discussion.
1716 .PP
1717 Another timing test is done in C on the statement k = i + j + 1983.
1718 This statement is tested on many UNIX*
1719 .FS
1720 * UNIX is a Trademark of Bell Laboratories.
1721 .FE
1722 systems.
1723 For a complete list see appendix A.
1724 The slowest one is the IBM XT, which runs on a 8088 microprocessor.
1725 The fasted one is the Amdahl computer.
1726 Here is short table to illustrate the performance of the
1727 MCS6500.
1728 .TS
1729 c c c
1730 c n n.
1731 machine short   int
1732 IBM XT  53.4    53.4
1733 Amdahl  0.5     0.3
1734 MCS6500 150.2   150.2
1735 .TE
1736 The MCS6500 is three times slower than the IBM XT, but threehundred
1737 times slower than the Amdahl.
1738 The reason why the times on the IBM XT and the MCS6500 are the
1739 same for short's and int's, is that most C compilers make the types
1740 short and integer the same size on 16-bit machines.
1741 In this project the MCS6500 is regarded as a 16-bit machine.
1742 .NH
1743 Length tests.
1744 .PP
1745 I have also compiled several programs written in Pascal and C to
1746 see if there is a resemblance between the number of bytes generated
1747 in the machine's language.
1748 In the tables:
1749 .IP length: 9
1750 The number of bytes of the source program.
1751 .IP 68000:
1752 The number of bytes of the a.out file for a M68000.
1753 .IP pdp:
1754 The number of bytes of the a.out file for a pdp11/44.
1755 .IP 6500:
1756 The number of bytes of the a.out file for a MCS6500.
1757 .LP
1758 These are the results:
1759 .TS
1760 c s s s
1761 c c c c
1762 n n n n.
1763 Pascal programs
1764 length  68000   pdp     6500
1765 _
1766 19946   14383   16090   26710
1767 19484   20169   20190   35416
1768 10849   10469   11464   18949
1769 273     4221    5106    7944
1770 1854    5807    6610    10301
1771 .TE
1772 .TS
1773 c s s s
1774 c c c c
1775 n n n n.
1776 C progams
1777 length  68000   pdp     6500
1778 _
1779 9444    6927    8234    11559
1780 7655    14353   18240   26251
1781 4775    11309   15934   19910
1782 639     6337    9660    12494
1783 .TE
1784 .PP
1785 In contrast to the execution times of the test statements, the
1786 object code files sizes show a constant factor between them.
1787 After calculating the correlation coefficient, I have calculated
1788 the line fitted between sizes.
1789 .FS
1790 * x is the number of bytes
1791 .FE
1792 .TS
1793 c s s
1794 c c c
1795 l c c.
1796 Pascal programs
1797 processor       corr. coef.     fitted line
1798 _
1799 68000-pdp       0.996
1800 68000-6500      0.999   1.76x + 502*
1801 pdp-6500        0.999   1.80x - 1577
1802 .TE
1803 .TS
1804 c s s
1805 c c c
1806 l c c.
1807 C programs
1808 processor       corr. coef.     fitted line
1809 _
1810 68000-pdp       0.974
1811 68000-6500      0.992   1.80x + 502*
1812 pdp-6500        0.980   1.40x - 1577
1813 .TE
1814 .PP
1815 As seen from the tables above the correlation coefficient for
1816 Pascal programs is better than the ones for C programs.
1817 Thus the line fits best for Pascal programs.
1818 With the formula of the best fitted line one can now estimate
1819 the size of the object code, which a program needs, for a MCS6500
1820 without having the compiler at hand.
1821 One also can see from these formula that the object code
1822 generated for a MCS6500 is about 1.8 times more than for the other
1823 processors.
1824 Since the number of bytes in the source file havily depends on the
1825 programmer, how many spaces he or she uses, the size of the indenting
1826 in structured programs, etc., there is no correlation between the
1827 size of the source file and the size of the object file.
1828 Also the use of comments has its influence on the size.
1829 .bp
1830 .DS C
1831 .B
1832 SUMMARY.
1833 .R
1834 .DE
1835 .NH 0
1836 Summary
1837 .PP
1838 In this chapter some final conclusions are made.
1839 .PP
1840 In spite of its simplicity, the MCS6500 is strong enough to
1841 implement a EM machine.
1842 A serious deficy of the MCS6500 is the missing of 16-bit
1843 general purpose registers, and especially the missing of a
1844 16-bit stackpointer.
1845 As pointed out before, one 16-bit register can be simulated
1846 by a pair of 8-bit registers, in fact, the accumulator A to
1847 hold the highbyte, and the index register X to hold the lowbyte
1848 of the word.
1849 By lack of a 16-bit stackpointer, zero page must be used to hold
1850 a stackpointer and there are also two subroutines needed for
1851 manipulating the stack (Push and Pop).
1852 .PP
1853 As seen at the time tests, the simple instruction set of the
1854 MCS6500 forces the use of library subroutines.
1855 These library subroutines increas the execution time of the
1856 programs.
1857 .PP
1858 The sizes of the object code files show a strong correlation
1859 in contrast to the execution times.
1860 With this correlatiuon one canestimate the size of a program
1861 if it is to be used on a MCS6500.
1862 .bp
1863 .NH 0
1864 .B
1865 REFERENCES.
1866 .R
1867 .IP 1.
1868 Osborn, A., Jacobson, S., and Kane, J. The Mos Technology MCS6500.
1869 .B
1870 An Introduction to Microcomputers ,
1871 .R
1872 Volume II, Some Real Products (june 1977) chap. 9.
1873 .RS
1874 .PP
1875 A hardware description of some real existing CPU's, such as
1876 the Intel Z80, MCS6500, etc. is given in this book.
1877 .RE
1878 .IP 2.
1879 van Staveren, H.
1880 The table driven code generator from the Amsterdam Compiler Kit.
1881 Vrije Universiteit, Amsterdam, (July 11, 1983).
1882 .RS
1883 .PP
1884 The defining document for writing a back end table.
1885 .RE
1886 .IP 3.
1887 Tanenbaum, A.S. Structured Computer Organization.
1888 Prentice Hall. (1976).
1889 .RS
1890 .PP
1891 In this book computers are described as a hierarchy of levels,
1892 with each one performing some well-defined function.
1893 .RE