1 . \" $Id: 6500.doc,v 2.3 1994/06/24 10:01:32 ceriel Exp $"
6 A backend table for the 6500 microprocessor
11 The backend table is part of the Amsterdam Compiler Kit (ACK).
12 It translates the intermediate language family EM to a machine
13 code for the MCS6500 microprocessor family.
18 THE MCS6500 MICROPROCESSOR.
24 Why a back end table for the MCS6500 microprocessor family.
25 Although the MCS6500 microprocessor family has an simple
26 instruction set and internal structure, it is used in a
27 variety of microcomputers and homecomputers.
28 This is because of is low cost.
29 As an example the Apple II, a well known and width spread
30 microprocessor, uses the MCS6502 CPU.
31 Also the BBC homecomputer, whose popularity is growing day
32 by day uses the MCS6502 CPU.
33 The BBC homecomputer is based on the MCS6502 CPU although
34 better and stronger microprocessors are available.
35 The designers of Acorn computer Industries have probably
36 choosen for the MCS6502 because of the amount of software
37 available for this CPU.
38 Since its width spreaded use, a variaty of software
39 will be needed for it.
40 One can think of games!!, administration programs,
41 teaching programs, basic interpreters and other application
43 Even do it will not be possible to run the total compiler kit
44 on a MCS6500 based computer, it is possible to write application
45 programs in a high level language, such as Pascal or C on a
47 These application programs can be tested and compiled on that
48 minicomputer and put in a ROM (Read Only Memory), for example,
49 cso that it an be executed by a MCS6500 CPU.
50 The strategy of writing testprograms on a minicomputer,
51 compile it and then execute it on a MCS6500 based
52 microprocessor is used by the development of the back end.
53 The minicomputer used is M68000 based one, manufactured by
54 Bleasdale Computer Systems Ltd..
55 The micro- or homecomputer used is a BBC microcomputer,
56 manufactured by Acorn Computer Ltd..
58 The MOS Technology MCS6500
60 The MCS6500 is as a family of CPU devices developed by MOS
62 The members of the MCS6500 family are the same chips in a
64 The MCS6502, the big brother in the family, can handle 64k
65 bytes of memory, while for example the MCS6504 can only handle
67 This difference is due to the fact that the MCS6502 is in a
68 40 pins house and the MCS6504 has a 28 pins house, so less
69 address lines are available.
72 The MCS6500 CPU programmable registers
74 The MCS6500 series is based on the same chip so all have the
75 same programmable registers.
80 The accumulator A is the only register on which the arithmetic
81 and logical instructions can be used.
82 For example, the instruction ADC (add with carry) adds the
83 contents of the accumulator A and a byte from memory or data.
87 As the name suggests this register can be used for some
88 indirect addressing modes.
89 The modes are explaned below.
93 This register is, just as the index register X, used for
94 certain indirect addressing modes.
95 These addressing modes are different from the modes which
98 The program counter PC
100 This is the only 16-bit register available.
101 It is used to point to the next instruction to be
106 The stack pointer is an 8-bit register, so the stack can contain
108 The CPU always appends 00000001 as highbyte of any stack address,
109 which means that memory locations
117 are permanently assigned to the stack.
122 The status register maintains six status flags and a master
123 interrupt control bit.
125 These are the six status flags:
137 The bit (i) is the master interrupt control bit.
139 The MCS6500 memory layout.
141 In the MCS6500 memory space three area's have special meaning.
150 MCS6500 memory is divided up into pages.
151 These pages consist 256 bytes.
152 So in a memory address the highbyte denotes the page number
153 and the lowbyte the offset within the page.
157 When a MCS6500 is restared it jumps indirect via memory address
169 (highbyte) there must be the address of the bootstrap subroutine.
170 When a break instruction (BRK) occurs or an interrupt takes place,
171 the MCS6500 jumps indirect through memory address
182 thus, must contain the address of the interrupt routine.
183 The former only goes for maskeble interrupt.
184 There also exist a nonmaskeble interrupt.
185 This cause the MCS6500 to jump indirect through memory address
189 So the top six bytes of memory are used by the operating system
190 and therefore not available for the back end.
194 This page has a special meaning in the sence that addressing
195 this page uses special opcodes.
196 Since a page consists of 256 bytes, only one byte is needed
197 for addressing zero page.
198 So an instruction which uses zero page occupies two bytes.
199 It also uses less clock cycle's while carrying out the instruction.
200 Zero page is also needed when indirect addressing is used.
201 This means that when indirect addressing is used, the address must
202 reside in zero page (two consecutive bytes).
203 In this case (the back end), zero page is used, for example
204 to hold the local base, the second local base, the stack pointer
209 The stack is described in paragraph 3.5 about the MCS6500
210 programmable registers.
212 The memory adressing modes
214 MCS6500 memory reference instructions use direct addressing,
215 indexed addressing, and indirect addressing.
219 Three-byte instructions use the second and third bytes of the
220 object code to provide a direct 16-bit address:
221 therefore, 65.536 bytes of memory can be addressed directly.
222 The commonly used memory reference instructions also have a two-byte
223 object code variation, where the second byte directly addresses
224 one of the first 256 bytes.
226 Base page, indexed addressing.
228 In this case, the instruction has two bytes of object code.
229 The contents of either the X or Y index registers are added to the
230 second object code byte in order to compute a memory address.
231 This may be illustrated as follows:
233 Base page, indexed addressing, as illustrated above, is
234 wraparound - which means that there is no carry.
235 If the sum of the index register and second object code byte contents
240 , the carry bit will be dicarded.
241 This may be illustrated as follows:
244 Absolute indexed addressing.
246 In this case, the contents of either the X or Y register are added
247 to a 16-bit direct address provided by the second and third bytes
248 of an instruction's object code.
249 This may be illustrated as follows:
254 Instructions that use simple indirect addressing have three bytes of
256 The second and third object code bytes provide a 16-bit address;
257 therefore, the indirect address can be located anywhere in
259 This is straightforward indirect addressing.
261 Pre-indexed indirect addressing.
263 In this case, the object code consists of two bytes and the
264 second object code byte provides an 8-bit address.
265 Instructions that use pre-indexed indirect addressing add the contents
266 of the X index register and the second object code byte to access
267 a memory location in the first 256 bytes of memory, where the
268 indirect address will be found:
270 When using pre-indexed indirect addressing, once again wraparound
271 addition is used, which means that when the X index register contents
272 are added to the second object code byte, any carry will be discarded.
273 Note that only the X index register can be used with pre-indexed
276 Post-indexed indirect addressing.
278 In this case, the object code consists of two bytes and the
279 second object code byte provides an 8-bit address.
280 Now the second object code byte indentifies a location
281 in the first 256 bytes of memory where an indirect address
283 The contents of the Y index register are added to this indirect
285 This may be illustrated as follows:
287 Note that only the Y index register can be used with post-indexed
291 What the CPU has and doesn't has.
293 Although the designers of the MCS6500 CPUs family state that
294 there is nothing very significant about the short stack (only
295 256 bytes) this stack caused problems for the back end.
296 The designers say that a 256-byte stack usually is sufficient
297 for any typical microcomputer, this is only true if the stack
298 is used only for return addresses of the JSR (jump to
299 subroutine) instruction.
300 But since the EM machine is suppost to be a stack machine and
301 high level languages need the ability of parameters and
302 locals in there procedures and function, this short stack
304 So an software stack is implemented in this back end, requiring two
305 additional subroutines for stack handling.
306 These two stack handling subroutines slow down the processing time
307 of a program since the stack is used heavely.
309 Since parameters and locals of EM procedures are offseted
310 from the localbase of that procedure, indirect addressing
312 Offsets are positive (for parameters) and negative (for
314 As explaned before the addressing modes the MCS6500 have a
315 post indexed indirect addressing mode.
316 This addressing mode can only handle positive offsets.
317 This raises a problem for accessing the local variables
318 I have chosen for the next solution.
319 A second local base is introduced.
320 This second local base is the real local base subtracted by
322 In the present situation of the back end the value of BASE
324 This means that there are 240 bytes reseved for local
325 variables to be indirect addressed and 14 bytes for
333 Description of the machine table.
335 The machine description table consists of the following sections:
337 The macro definitions.
339 Constant definitions.
341 Register definitions.
357 The macro definitions at the top of the table are expanded
358 by the preprocessor on occurence in the rest of the table.
360 Constant definitions.
362 There are three constants which must be defined at first.
365 Number of bytes in a machine word.
366 This is the number of bytes a simple
370 instruction will put on the stack.
372 Number of bytes in a pointer.
373 This is the number of bytes a
377 instruction will put on the stack.
379 Number of bytes in the hole between AB and LB.
380 The calling sequence only saves LB on the stack so this
381 constant is equal to the pointer size.
383 Register definitions.
385 The only important register definition is the definition of
387 Since the rest of the machine's registers Y, PC, ST serve
388 special purposes, the code generator cannot use them.
392 There is a fake token.
393 This token is put in the table, since the code generator generator
394 complains if it cannot find one.
396 Token expression definitions.
398 The token expression is also a fake one.
399 This token expression is put in the table, since the code generator
400 generator complains if it cannot find one.
404 The code rule section is the largest section in the table.
405 They specify EM patterns, stack patterns, code to be generated,
409 EM pattern '|' stack pattern '|' code '|'
410 stack replacement '|' EM replacement '|'
412 All patterns are optional, however there must be at least one
414 If the EM pattern is missing the rule becomes a rewriting
419 to be used when code generation cannot continue because of an
420 invalid stack pattern.
421 The code rules are preceeded by the word CODE:.
425 The EM pattern consists of a list of EM mnemonics followed by
426 a boolean expression. Examples:
445 is a pattern that will match
472 will match for example
489 A missing boolean expession evaluates to TRUE.
491 The code generator will match the longest EM pattern on every occasion,
492 if two patterns of the same length match the first in the table
493 will be chosen, while all patterns of length greater than or equal
494 to three are considered to be of the same length.
498 The only stack pattern that can occur is R16, which means that the
499 registerpair AX contains the word on top of the stack.
500 If this is not the case a coersion occurs.
501 This coersion generates a "jsr Pop", which means that the top
502 of the stack is popped and stored in the registerpair AX.
506 The code part consists of three parts, stack cleanup, register
507 allocation, and code to be generated.
508 All of these may be omitted.
512 When generating something like a branch instruction it might be
513 needed to empty the fake stack, that is, remove the AX registerpair.
514 This is done by the instruction remove(ALL)
518 If the machine code to be generated uses the registerpair AX,
519 this is signaled to the code generator by the allocate(R16)
521 If the registerpair AX resides on the fake stack, this will result
522 in a "jsr Push", which means that the registerpair AX is pushed on
523 the stack and will be free for further use.
524 If registerpair AX is not on the fake stack nothing happens.
526 Code to be generated.
528 Code to be generated is specified as a list of items of the following
531 A string in double quotes("This is a string").
532 This is copied to the codefile and a newline ('\n') is appended.
533 Inside the string all normal C string conventions are allowed,
534 and substitutions can be made of the following sorts.
537 $1, $2 etc. These are the operand of the corresponding EM
538 instructions and are printed according to there type.
539 To put a real '$' inside the string it must be doubled ('$$').
541 %[1], %[2.reg], %[b.1] etc. these have there obvious meaning.
542 If they describe a complete token (%[1]) the printformat for
544 If they stand fo a basic term in an expression they will be
545 printed according to their type.
546 To put a real '%' inside the string it must be doubled ('%%').
548 %( arbitrary expression %). This allows inclusion of arbitrary
549 expressions inside strings.
550 Usually not needed very often, so that the akward notation
552 Note that %(%[1]%) is equivalent to %[1].
557 The stack replacement is a possibly empty list of items to be
558 pushed on the fake stack.
559 Three things can occur:
561 %[1] is used if the registerpair AX was on the fake stack and is
562 to be pushed back onto it.
564 %[a] is used if the registerpair AX is allocated with allocate(R16)
565 and is to be pushed onto the fake stack.
567 It can also be empty.
571 In exeptional cases it might be useful to leave part of the an EM
577 instruction might be split into two
581 instructions when there is no 4-byte quantity on the stack.
582 The EM replacement part allows one to express this.
595 The instructions are inserted in the stream so they can match
596 the first part of a pattern in the next step.
597 Note that since the code generator traverses the EM instructions
598 in a strict linear fashion, it is impossible to let the EM
599 replacement match later parts of a pattern.
600 So if there is a pattern
626 will be processed first, then the
630 might be split into two
634 's but the pattern cannot match now.
638 This definition is a fake. This definition is put in the
639 table, since the code generator generator complains if it
644 Test definitions aren't used by the table.
648 When the generator has to push the registerpair AX, it must
650 The machine code to be generated is defined here.
654 The above description of the machine table is
655 a description of the table for the MCS6500.
656 It uses only a part of the possibilities which the code generator
658 For a more precise and detailed description see [2].
667 The code rules are divided in 15 groups.
674 Integer arithmetic instructions.
676 Unsigned arithmetic instructions.
678 Floating point arithmetic instructions.
680 Pointer arithmetic instructions.
682 Increment, decrement and zero instructions.
684 Convert instructions.
686 Logical instructions.
688 Set manipulation instructions.
692 Compare instructions.
696 Procedure call instructions.
698 Miscellaneous instructions.
700 From all of these groups one or two typical EM pattern will be explained
701 in the next paragraphs.
702 Comment is placed between /* and */ (/* This is a comment */).
706 The load instructions.
708 In this group a typical instruction is
717 instruction pushes the word at local base + offset, where offset
718 is the instructions argument, onto the stack.
719 Since the MCS6500 can only offset by 256 bytes, as explaned at the
720 memory addressing modes, there is a need for two code rules in the
722 One which can offset directly and one that must explicit
723 calculate the address of the local.
725 The lol instruction with indirect offsetting.
727 In this case an indirect offsetted load from the second local base
729 The table content is:
737 allocate(R16) /* allocate registerpair AX */
739 "ldy #BASE+$1" /* load Y with the offset from the second
743 "lda (LBl),y" /* load indirect the lowbyte of the word */
745 "tax" /* move register A to register X */
747 "iny" /* increment register Y (offset) */
749 "lda (LBl),y" /* load indirect the highbyte of the word */
751 | %[a] | | /* push the word onto the fake stack */
753 The lol instruction whose offset is to big.
755 In this case, the library subroutine "Lol" is used.
756 This subroutine expects the offset in registerpair AX, then
757 calculates the address of the local or parameter, and loads
758 it into registerpair AX.
759 The table content is:
767 allocate(R16) /* allocate registerpair AX */
769 "lda #[$1].h" /* load highbyte of offset into register A */
771 "ldx #[$1].l" /* load lowbyte of offset into register X */
773 "jsr Lol" /* perform the subroutine */
775 | %[a] | | /* push word onto the fake stack */
777 The store instructions.
779 In this group a typical instruction is
787 instruction poppes a word from the stack and stores it in the word
788 at local base + offset, where offset is the instructions argument.
789 Here also is the need for two code rules in the table as a result
790 of the offset limits.
792 The stl instruction with indirect offsetting.
794 In this case it an indirect offsetted store from the second local
796 The table content is:
802 IN($1) | R16 | /* expect registerpair AX on top of the
806 "ldy #BASE+1+$1" /* load Y with the offset from the
810 "sta (LBl),y" /* store the highbyte of the word from A */
812 "txa" /* move register X to register A */
814 "dey" /* decrement offset */
816 "sta (LBl),y" /* store the lowbyte of the word from A */
820 The stl instruction whose offset is to big.
822 In this case the library subroutine 'Stl' is used.
823 This subroutine expects the offset in registerpair AX, then
824 calculates the address, poppes the word stores it at its place.
825 The table content is:
833 allocate(R16) /* allocate registerpair AX */
835 "lda #[$1].h" /* load highbyte of offset in register A */
837 "ldx #[$1].l" /* load lowbyte of offset in register X */
839 "jsr Stl" /* perform the subroutine */
843 Integer arithmetic instructions.
845 In this group typical instructions are
853 These instructions, in this table, are implemented for 2-byte
855 The only arithmetic instructions available on the MCS6500 are
856 the ADC (add with carry), and SBC (subtract with not(carry)).
857 Not(carry) here means that in a subtraction, the one's complement
858 of the carry is taken.
859 The absence of multiply and division instructions forces the
860 use of subroutines to handle these cases.
861 Because there are no registers left to perform on the multiply
862 and division, zero page is used here.
863 The 4-byte integer arithmetic is implemented, because in C there
864 exists the integer type long.
865 A user is freely to use the type long, but will pay in performance.
877 2) instruction there are many EM
878 patterns, so that the instruction can be performed in line in
880 For the worst case there exists a subroutine in the library
881 which deals with the EM instruction.
890 4) there only is a subroutine to deal with it.
897 (IN($1) && IN($2) && $3==2) | | /* is it in range */
899 allocate(R16) /* allocate registerpair AX */
901 "ldy #BASE+$1+1" /* load Y with offset for first operand */
903 "lda (LBl),y" /* load indirect highbyte first operand */
905 "pha" /* save highbyte first operand on hard_stack */
907 "dey" /* decrement offset first operand */
909 "lda (LBl),y" /* load indirect lowbyte first operand */
911 "ldy #BASE+$2" /* load Y with offset for second operand */
913 "clc" /* clear carry for addition */
915 "adc (LBl),y" /* add the lowbytes of the operands */
917 "tax" /* store lowbyte of result in place */
919 "iny" /* increment offset second operand */
921 "pla" /* get highbyte first operand */
923 "adc (LBl),y" /* add the highbytes of the operands */
925 | %[a] | | /* push the result onto the fake stack */
933 2 instruction uses most the subroutine 'Mlinp'.
934 This subroutine expects the multiplicand in zero page
935 at locations ARTH, ARTH+1, while the multiplier is in zero
936 page locations ARTH+2, ARTH+3.
937 For a description of the algorithms used for multiplication and
945 (IN($1) && IN($2) && $3==2) | |
947 allocate(R16) /* allocate registerpair AX */
949 "ldy #BASE+$1" /* load Y with offset of multiplicand */
951 "lda (LBl),y" /* load indirect lowbyte of multiplicand */
953 "sta ARTH" /* store lowbyte in zero page */
955 "iny" /* increment offset of multiplicand */
957 "lda (LBl),y" /* load indirect highbyte of multiplicand */
959 "sta ARTH+1" /* store highbyte in zero page */
961 "ldy #BASE+$2" /* load Y with offset of multiplier */
963 "lda (LBl),y" /* load indirect lowbyte of multiplier */
965 "sta ARTH+2" /* store lowbyte in zero page */
967 "iny" /* increment offset of multiplier */
969 "lda (LBl),y" /* load indirect highbyte of multiplier */
971 "sta ARTH+3" /* store highbyte in zero page */
973 "jsr Mlinp" /* perform the multiply */
975 | %[a] | | /* push result onto fake stack */
977 The unsgned arithmetic instructions.
979 Since unsigned addition an subtraction is performed in the same way
980 as signed addition and subtraction, these cases are dealt with by
982 For mutiplication and division there are special subroutines.
986 This is an example of the EM replacement strategy.
1006 Floating point arithmetic.
1008 Floating point arithmetic isn't implemented in this table.
1010 Pointer arithmetic instructions.
1012 A typical pointer arithmetic instruction is
1017 This instruction adds an offset and a pointer.
1034 Increment, decrement and zero instructions.
1036 In this group a typical instruction is
1040 , which increments a local or parameter.
1041 The MCS6500 doesn't have an instruction to increment the
1042 accumulator A, so the 'ADC' instruction must be used.
1051 allocate(R16) /* allocate registerpair AX */
1053 "ldy #BASE+$1" /* load Y with offset of the local */
1055 "clc" /* clear carry for addition */
1057 "lda (LBl),y" /* load indirect lowbyte of local */
1059 "adc #1" /* increment lowbyte */
1061 "sta (LBl),y" /* restore indirect the incremented lowbyte */
1063 "bcc 1f" /* if carry is clear then ready */
1065 "iny" /* increment offset of local */
1067 "lda (LBl),y" /* load indirect highbyte of local */
1069 "adc #0" /* add carry to highbyte */
1071 "sta (LBl),y\\n1:" /* restore indirect the highbyte */
1073 If the offset of the local or parameter is to big, first the
1074 local or parameter is fetched, than incremented, and then
1077 Convert instructions.
1079 In this case there are two convert instructions
1080 which really do something.
1081 One of them is in line code, and deals with the extension of
1082 a character (1-byte) to an integer.
1083 The other one is a subroutine which handles the conversion
1084 between 2-byte integers and 4-byte integers.
1086 The in line conversion.
1088 The table content is:
1094 $1==1 && $2==2 | R16 |
1096 "txa" /* see if sign extension is needed */
1098 "bpl 1f" /* there is no need for sign extension */
1100 "lda #0FFh" /* sign extension here */
1102 "bne 2f" /* conversion ready */
1104 "1: lda #0\\n2:" /* no sign extension here */
1106 Logical instructions.
1108 A typical instruction in this group is the logical
1112 on two 2-byte words.
1117 on groups of more than two bytes (max 254)
1118 is also possible and uses a library subroutine.
1120 The logical and on 2-byte groups.
1122 The table content is:
1128 $1==2 | R16 | /* one group must be on the fake stack */
1130 "sta ARTH+1" /* temporary save of first group highbyte */
1132 "stx ARTH" /* temporary save of first group lowbyte */
1134 "jsr Pop" /* pop second group from the stack */
1136 "and ARTH+1" /* logical and on highbytes */
1138 "pha" /* temporary save the result's highbyte */
1140 "txa" /* logical and can only be done in A */
1142 "and ARTH" /* logical and on lowbytes */
1144 "tax" /* restore results lowbyte */
1146 "pla" /* restore results highbyte */
1148 | %[1] | | /* push result onto fake stack */
1150 Set manipulation instructions.
1152 A typical EM pattern in this group is
1156 $1>0 && $1<16 && $2==2.
1157 This EM pattern works on sets of 16 bits.
1158 Sets can be bigger (max 256 bytes = 2048 bits), but than a
1159 library routine is used instead of in line code.
1160 The table content of the above EM pattern is:
1166 $1>0 && $1<16 && $2==2 | R16 |
1168 "ldy #$1+1" /* load Y with bit number */
1170 "stx ARTH" /* cannot rotate X, so use zero page */
1172 "1: lsr a" /* right shift A */
1174 "ror ARTH" /* right rotate zero page location */
1176 "dey" /* decrement Y */
1178 "bne 1b" /* shift $1 times */
1180 "bcc $1" /* no carry, so bit is zero */
1184 In this group a typical EM pattern is
1188 defined(rom(1,3)) | | | |
1201 This pattern uses the
1205 instruction, which is part of a typical EM pattern:
1211 $2==2 && rom(1,3)==2 && rom(1,1)==0 | R16 | /* registerpair AX contains
1212 the index in the array */
1214 "pha" /* save highbyte of index */
1216 "txa" /* move lowbyte of index to A */
1218 "asl a" /* shift left lowbyte == 2 times lowbyte */
1220 "tax" /* restore lowbyte */
1222 "pla" /* restore highbyte */
1224 "rol a" /* rotate left highbyte == 2 times highbyte */
1226 | %[1] | adi 2 | /* push new index, add to lowerbound array */
1228 Compare instructions.
1230 In this group all EM patterns are performed by calling
1232 Subroutines are used here because comparison is only
1233 possible byte by byte.
1234 This means a lot of code, and since compare are used frequently
1235 a lot of in line code would be generated, and thus reducing
1236 the space left for the software stack.
1237 These subroutines can be found in the library.
1239 Branch instructions.
1241 A typical branch instruction is
1245 The table content for it is:
1253 "sta BRANCH+1" /* save highbyte second operand in zero page */
1255 "stx BRANCH" /* save lowbyte second operand in zero page */
1257 "jsr Pop" /* pop the first operand */
1259 "cmp BRANCH+1" /* compare the highbytes */
1261 "bne 1f" /* there not equal so go on */
1263 "cpx BRANCH" /* compare the lowbytes */
1265 "beq $1\\n1:" /* lowbytes are also equal, so branch */
1267 Another typical instruction in this group is
1271 The table content is:
1279 "tay" /* move A to Y for setting testbits */
1281 "bmi $1" /* highbyte s minus so branch */
1283 "txa" /* move X to A for setting testbits */
1285 "beq $1\\n1:" /* lowbyte also zero, thus branch */
1287 Procedure call instructions.
1289 In this group one code generation might seem a little
1291 It is the EM instruction
1295 which generates a 'jsr Indir'.
1296 This is because there is no indirect jump_subroutine in the
1298 The only solution is to store the address in zero page, and then
1299 do a 'jsr' to a known label.
1300 At this label there must be an indirect jump instruction, which
1301 perform a jump to the address stored in zero page.
1302 In this case the label is Indir, and the address is stored in
1303 zero page at the addresses ADDR, ADDR+1.
1304 The tabel content is:
1312 "stx ADDR" /* store lowbyte of address in zero page */
1314 "sta ADDR+1" /* store highbyte of address in zero page */
1316 "jsr Indir" /* use the indirect jump */
1320 Miscellaneous instructions.
1322 In this group, as the name suggests, there is no
1323 typical EM instruction or EM pattern.
1324 Most of the MCS6500 code to be generated uses a library subroutine
1325 or is straightforward.
1334 To measure the performance of the back end table some timing
1337 In this case, the execution time of several Pascal statements
1339 Statements in C, which have a Pascal equivalence are timed also.
1340 The statements are timed as follows.
1341 A test program is been written, which executes two
1342 nested for_loops from 1 to 1.000.
1343 Within these for_loops the statement, which is to be tested, is placed,
1344 so the statement will be executed 1.000.000 times.
1345 Then the same program is executed without the test statement.
1346 The time difference between the two executions is the time
1347 neccesairy to execute the test statement 1.000.000 times.
1348 The total time to execute the test statement requires thus the
1349 time difference divided by 1.000.000.
1351 Testing Pascal statements.
1353 The next statements are tested.
1361 int1 := icon1 - icon2;
1363 int1 := icon2 div icon1;
1365 int1 := int2 * int3;
1371 bool := ((int1 > 3) or (int1 < 3))
1373 case int1 of 1: bool := false; 2: bool := true end;
1375 if int1 = 0 then int2 := 3;
1377 while int1 > 0 do int1 := int1 - 1;
1389 with s do overhead := 5400;
1391 These statement were tested in a procedure test.
1396 var i, j, ... : integer;
1413 STATEMENT is one of the statements as shown above, or it is
1414 the empty statement.
1415 The assignment of used variables, if neccesairy, is done before
1417 In case of the statement which uses the procedure call, statement
1418 15, a dummy procedure is declared whose body is empty.
1419 In case of the statement which uses the function, statement 16,
1420 this function returns its argument.
1421 for the timing of C statements a similar test program was
1431 for (i = 1; i <= 1000; i++)
1433 for (j = 1; j <= 1000; j++)
1442 Here are tables with the results of the time measurments.
1443 Times are in microseconds (10^-6).
1444 Some statements appear twice in the tables.
1445 In the second case an array of 200 integers was declerated
1446 before the variable to be tested, so this variable cannot
1447 be accessed by indirect addressing from the second local base.
1448 This results in a larger execution time of the statement to be
1450 The column 68000 contains the times measured on a Bleasdale,
1451 M68000 based, computer.
1452 The times in column pdp are measured on a DEC pdp11/44, where
1453 the times from column 6500 come from a BBC microcomputer.
1460 Pascal timing results
1461 statement 68000 pdp 6500
1479 int1 := icon1 + icon2;
1484 int1 := icon2 div icon1;
1489 int1 := int2 * int3;
1504 bool := ((int1 > 3) or (int1 < 3))
1509 case int1 of 1: bool := false; 2: bool := true end;
1513 if int1 = 0 then int2 := 3;
1517 while int1 > 0 do int1 := int1 - 1;
1541 with s do overhead := 5400;
1550 statement 68000time pdptime 6500time
1583 int1 = ((int2 > 3) || (int2 < 3));
1588 switch (int1) { case 1: int1 = 0; break; case 2: int1 = 1; break; }
1592 if (int1 == 0) int2 = 3;
1596 while (int2 > 0) int2 = int2 - 1;
1616 Pascal statements which don't have a C equivalent.
1618 At first, the two statements who perform an operation on constants
1620 These are left out while the C front end does constant folding,
1621 while the Pascal front end doesn't.
1622 So in C the statements int1 = icon1 + icon2; and int1 = icon1 / icont2;
1623 will use the same amount of time since the expression is evaluated
1625 The two other statements (let2 := ['a'..'c']; and
1633 overhead := 5400;), aren't included in the C statement timing table,
1634 because there constructs do not exist in C.
1635 Although in C there can be direct bit manipulation, and thus can
1636 be used to implement sets I have not used it here.
1641 statement does not exists in C and there is nothing with the slightest
1644 At first sight in the table , it looked if there is no much difference
1645 in the times for the M68000 and the pdp11/44, in comparison with the
1646 times needed by the MCS6500.
1647 To verify this impression, I calculated the correlation coefficient
1648 between the times of the M68000 and pdp11/44.
1649 It turned out to be 0.997 for both the Pascal time tests and the C
1651 Since the correlation coefficient is near to one and the difference
1652 between the times is small, they can be considered to be the same
1653 as seen from the times of the MCS6500.
1654 Then I have tried to make a grafic of the times from the M68000 and
1656 Well, there was't any correlation to been seen, taken all the times.
1657 The only correlation one could see, with some effort, was in the
1658 times for the first three Pascal statements.
1659 The two first C statements show also a correlation, which two points
1662 Also the three Pascal statements
1675 have a correlation coefficient of 0.999.
1676 This is probably because the
1680 statement uses a subroutine in both cases and the other two
1689 generate in line code.
1690 The last two Pascal statements use the same time, since the front
1691 end wil generate the same EM code for both.
1693 The independence between the rest of the test times is because
1694 in these cases the object code for the MCS6500 uses library
1695 subroutines, while the other processors can handle the EM code
1698 It is clear that the MCS6500 is a slower device, it needs longer
1699 execution times, the need of more library subroutines, but
1700 there is no constant factor between it execution times and those
1701 of other processors.
1703 The slowing down of the MCS6500 as result of the need of a
1704 library subroutine is illustrated by the muliplication
1706 The MCS6500 needs a library subroutine, while the other
1707 two processors have a machine instruction to perform the
1709 This results in a factor of 48.5, when the operands can be accessed
1710 indirect by the MCS6500.
1711 When the MCS6500 cannot access the operands indirectly the situation
1713 The slight differences between the MCS6500 execution times for
1714 Pascal statements and C statements is probably the result of the
1715 front end, and thus beyond the scope of this discussion.
1717 Another timing test is done in C on the statement k = i + j + 1983.
1718 This statement is tested on many UNIX*
1720 * UNIX is a Trademark of Bell Laboratories.
1723 For a complete list see appendix A.
1724 The slowest one is the IBM XT, which runs on a 8088 microprocessor.
1725 The fasted one is the Amdahl computer.
1726 Here is short table to illustrate the performance of the
1736 The MCS6500 is three times slower than the IBM XT, but threehundred
1737 times slower than the Amdahl.
1738 The reason why the times on the IBM XT and the MCS6500 are the
1739 same for short's and int's, is that most C compilers make the types
1740 short and integer the same size on 16-bit machines.
1741 In this project the MCS6500 is regarded as a 16-bit machine.
1745 I have also compiled several programs written in Pascal and C to
1746 see if there is a resemblance between the number of bytes generated
1747 in the machine's language.
1750 The number of bytes of the source program.
1752 The number of bytes of the a.out file for a M68000.
1754 The number of bytes of the a.out file for a pdp11/44.
1756 The number of bytes of the a.out file for a MCS6500.
1758 These are the results:
1764 length 68000 pdp 6500
1766 19946 14383 16090 26710
1767 19484 20169 20190 35416
1768 10849 10469 11464 18949
1770 1854 5807 6610 10301
1777 length 68000 pdp 6500
1779 9444 6927 8234 11559
1780 7655 14353 18240 26251
1781 4775 11309 15934 19910
1785 In contrast to the execution times of the test statements, the
1786 object code files sizes show a constant factor between them.
1787 After calculating the correlation coefficient, I have calculated
1788 the line fitted between sizes.
1790 * x is the number of bytes
1797 processor corr. coef. fitted line
1800 68000-6500 0.999 1.76x + 502*
1801 pdp-6500 0.999 1.80x - 1577
1808 processor corr. coef. fitted line
1811 68000-6500 0.992 1.80x + 502*
1812 pdp-6500 0.980 1.40x - 1577
1815 As seen from the tables above the correlation coefficient for
1816 Pascal programs is better than the ones for C programs.
1817 Thus the line fits best for Pascal programs.
1818 With the formula of the best fitted line one can now estimate
1819 the size of the object code, which a program needs, for a MCS6500
1820 without having the compiler at hand.
1821 One also can see from these formula that the object code
1822 generated for a MCS6500 is about 1.8 times more than for the other
1824 Since the number of bytes in the source file havily depends on the
1825 programmer, how many spaces he or she uses, the size of the indenting
1826 in structured programs, etc., there is no correlation between the
1827 size of the source file and the size of the object file.
1828 Also the use of comments has its influence on the size.
1838 In this chapter some final conclusions are made.
1840 In spite of its simplicity, the MCS6500 is strong enough to
1841 implement a EM machine.
1842 A serious deficy of the MCS6500 is the missing of 16-bit
1843 general purpose registers, and especially the missing of a
1844 16-bit stackpointer.
1845 As pointed out before, one 16-bit register can be simulated
1846 by a pair of 8-bit registers, in fact, the accumulator A to
1847 hold the highbyte, and the index register X to hold the lowbyte
1849 By lack of a 16-bit stackpointer, zero page must be used to hold
1850 a stackpointer and there are also two subroutines needed for
1851 manipulating the stack (Push and Pop).
1853 As seen at the time tests, the simple instruction set of the
1854 MCS6500 forces the use of library subroutines.
1855 These library subroutines increas the execution time of the
1858 The sizes of the object code files show a strong correlation
1859 in contrast to the execution times.
1860 With this correlatiuon one canestimate the size of a program
1861 if it is to be used on a MCS6500.
1868 Osborn, A., Jacobson, S., and Kane, J. The Mos Technology MCS6500.
1870 An Introduction to Microcomputers ,
1872 Volume II, Some Real Products (june 1977) chap. 9.
1875 A hardware description of some real existing CPU's, such as
1876 the Intel Z80, MCS6500, etc. is given in this book.
1880 The table driven code generator from the Amsterdam Compiler Kit.
1881 Vrije Universiteit, Amsterdam, (July 11, 1983).
1884 The defining document for writing a back end table.
1887 Tanenbaum, A.S. Structured Computer Organization.
1888 Prentice Hall. (1976).
1891 In this book computers are described as a hierarchy of levels,
1892 with each one performing some well-defined function.