From 12f56d65015125201b8226fd63021276358a25b9 Mon Sep 17 00:00:00 2001 From: ceriel Date: Tue, 10 Mar 1987 16:49:13 +0000 Subject: [PATCH] Initial revision --- doc/m68020.doc | 1408 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1408 insertions(+) create mode 100644 doc/m68020.doc diff --git a/doc/m68020.doc b/doc/m68020.doc new file mode 100644 index 000000000..8bb6b7f20 --- /dev/null +++ b/doc/m68020.doc @@ -0,0 +1,1408 @@ +.nr PS 11 +.nr VS 13p +.EQ +delim @@ +.EN +.EQ +gfont R +.EN +.ND +.RP +.TL +A back end table for the Motorola MC68000, MC68010 and MC68020 microprocessors +.AU +Frank Doodeman +.AB +A back end table is part of the Amsterdam Compiler Kit (ACK). It is used +to produce the actual back end, a program that translates the intermediate +language family EM to assembly language for some target machine. The table +discussed here can be used for two back ends, suitable for in total three +machines: the MC68000 and MC68010 (the difference between these two is +so small that one back end table can be used for either one), or +for the MC68020. +.AE +.NH +Introduction +.PP +To simplify the task of producing portable (cross) compilers and interpreters +the Vrije Universiteit designed an integrated collection of programs, the +Amsterdam Compiler Kit (ACK) [2]. It is based on the old UNCOL idea [1] which +attempts to solve the problem of how to make a compiler for each of @ N @ +languages on @ M @ different machines without having to write @ N times M @ +programs. +.PP +The UNCOL approach is to write @ N @ +.I +front ends, +.R +which translate the +source language into a common intermediate language UNCOL (Universal Computer +Oriented Language), and @ M @ +.I +back ends, +.R +each of which translates programs in +UNCOL into a specific machine language. Under these conditions only @ M + N @ +programs must be written to provide all @ N @ languages on all @ M @ +machines, instead of @ M times N @ programs. +.PP +The intermediate language for the Amsterdam Compiler Kit is the machine language +for a simple stack machine called EM (Encoding Machine) [3]. So a back end for +the MC68020 translates EM code into MC68020 assembly language. Writing such a +table [4] suffices to get the back end. +.PP +The back end is a single program that is driven by a machine dependent driving +table. This table, the back end table, defines the mapping of EM code to +the MC68000, MC68010 or MC68020 assembly language. +.NH +The MC68000 and MC68020 micro processors +.PP +In this document the name MC68000 will be used for both the MC68000 and the +MC68010 micro processors, because as far as the back end table is concerned +there is no difference between them. For a complete and detailed description +of the MC68020 one is referred to [5]; for the MC68000 one might also use [6]. +In this section some relevant parts will be handled. +.NH 2 +Registers +.PP +Both the MC68000 and the MC68020 have eight 32-bit data registers (@ D sub 0 @-@ D sub 7 @) that can +be used for byte (8-bit), word (16-bit) and long word (32-bit) data operations. +They also have seven 32-bit address registers (@ A sub 0 @-@ A sub 6 @) that may be used as +software stack pointers and base address registers; address register @ A sub 7 @ is +used as the system stack pointer. Address registers may also be used for +word and long word address operations. +.NH 2 +Addressing modes +.PP +First the MC68000 addressing modes will be discussed. Since the MC68020's +set of addressing modes is an extension of the MC68000's set, of course this +section also applies to the MC68020. +.PP +In the description we use: +.IP @ A sub n @ +for address register; +.IP @ D sub n @ +for data register; +.IP @ R sub n @ +for address or data register; +.IP @ X sub n @ +for index register (either data or address register); +.IP @ PC @ +for program counter; +.IP @ d sub 8 @ +for 8 bit displacement integer; +.IP @ d sub 16 @ +for 16 bit displacement integer; +.IP @ bd @ +for base displacement (may be null, word or long); +.IP @ od @ +for outer displacement (may be null, word or long). +.NH 3 +General addressing modes +.NH 4 +Register Direct Addressing +.IP Syntax: 8 +@ R sub n @ +.PP +This addressing mode (it can be used with either a data register or an address +register) specifies that the operand is in one of +the 16 multifunction registers. +.NH 4 +Address Register Indirect +.IP Syntax: 8 +@ ( A sub n ) @ +.PP +The address of the operand is in the address register specified. +.NH 4 +Address Register Indirect With Postincrement +.IP Syntax: 8 +@ ( A sub n )+ @ +.PP +The address of the operand is in the address register specified. After the +operand address is used, the address register is incremented by one, two or +four depending upon whether the size of the operand is byte, word or long. +If the address register is the stack pointer and the operand size is byte, the +address register is incremented by two rather than one to keep the stack pointer +on a word boundary. +.NH 4 +Address Register Indirect With Predecrement +.IP Syntax: 8 +@ -( A sub n ) @ +.PP +The address of the operand is in the address register specified. Before the +operand address is used, the address register is decremented by one, two or +four depending upon whether the size of the operand is byte, word or long. +If the address register is the stack pointer and the operand size is byte, the +address register is decremented by two rather than one to keep the stack pointer +on a word boundary. +.NH 4 +Address Register Indirect With Displacement +.IP Syntax: 8 +@ d sub 16 ( A sub n ) @ for the MC68000, @ ( d sub 16 , A sub n ) @ for the MC68020 +.PP +This address mode requires one word of extension. The address of the operand is +the sum of the contents of the address register and the sign extended 16-bit +integer in the extension word. +.NH 4 +Address Register Indirect With Index +.IP Syntax: 8 +@ d sub 8 ( A sub n , X sub n .size) @ for the MC68000, @ ( d sub 8 , A sub n , X sub n .size) @ for the MC68020 +.PP +This address mode requires one word of extension according to a certain format, +which specifies +.IP 1. +which register to use as index register; +.IP 2. +a flag that indicates whether the index register is a data register or an +address register; +.IP 3. +a flag that indicates the index size; this is +.I word +when the low order part of the index register is to be used, and +.I long +when the whole long value in the register is to be used as index; +.IP 4. +an 8-bit displacement integer (the low order byte of the extension word). +.PP +The address of the operand is the sum of the contents of the address register, +the possibly sign extended contents of index register and the sign +extended 8-bit displacement. +.NH 4 +Absolute Data Addressing +.IP Syntax: 8 +@ address @ for the MC68000, @ ( address ) @ for the MC68020 +.PP +Two different kinds of this mode are available: +.IP 1. +Absolute Short Address; this mode requires one word of extension. The address of +the operand is the sign extended 16-bit extension word. +.IP 2. +Absolute Long Address; this mode requires two words of extension. The address of +the operand is developed by concatenation of the two extension words; the high +order part of the address is the first extension word, the low order part is +the second. +.NH 4 +Program Counter With Displacement. +.IP Syntax: 8 +@ d sub 16 ( PC ) @ for the MC68000, @ ( d sub 16 , PC ) @ for the MC68020 +.PP +This mode requires one word of extension. The address of the operand is the sum +of the address in the program counter and the sign extended 16-bit displacement +integer in the extension word. The value in the program counter is the +address of the extension word. +.NH 4 +Program Counter With Index +.IP Syntax: 8 +@ d sub 8 ( PC , X sub n .size ) @ for the MC68000, @ ( d sub 8 , PC, X sub n .size ) @ for the MC68020 +.PP +This mode requires one word of extension as described under +.I +Address Register Indirect With Index. +.R +The address of the operand is the sum of the value in the +program counter, the possibly sign extended index register and the sign +extended 8-bit displacement integer in the extension word. +The value in the program counter is the address of the extension word. +.NH 4 +Immediate Data +.IP Syntax: 8 +@ \#data @ +.PP +This addressing mode requires either one or two words of extension, depending +on the size of the operation; +.IP +byte operation - the operand is in the low order byte of extension word; +.IP +word operation - the operand is in the extension word; +.IP +long operation - the operand is in the two extension words, the high order +16-bits are in the first extension word, the low order 16-bits in the second. +.NH 3 +Extra MC68020 addressing modes +.PP +The MC68020 has three more addressing modes. These modes all use a displacement +(some even two), an address register and an index register. Instead of the +address register one may also use the program counter. Any of these +may be omitted. If all addends are omitted the processor creates an +effective address of zero. All of these three modes require at least one +extension word, the +.I +Full Format Extension Word, +.R +which specifies: +.IP 1. +the index register number (0-7); +.IP 2. +the index register type (address or data register); +.IP 3. +the size of the index (only low order part or the whole register) +.IP 4. +a scale factor. This is a number from 0 to 3 which specifies how many bits +the contents of the index register is to be shifted to the left before being +used as an index; +.IP 5. +a flag that specifies whether the base (address) register is to be added or +to be suppressed; +.IP 6. +a flag that specifies whether to add or suppress the index operand; +.IP 7. +two bits that specify the size of the base displacement (null, word or long); +.IP 8. +three bits that in combination with (6) above specify which of the three +addressing modes (described below) to use and, if used, the size of the +outer displacement (null, word or long). +.IP N.B. +All modes mentioned above for the MC68000 +that use an index register may have this register +scaled (only when using the MC68020). +.PP +The three extra addressing modes are: +.NH 4 +Address Register Indirect With Index (Base Displacement) +.IP Syntax: 8 +@ ( bd , A sub n , X sub n .size*scale ) @ (MC68020 only) +.PP +The address of the operand is the sum of the contents of the address register, +the scaled contents of the possibly scaled index register and the possibly +sign extended base displacement. When the program counter is used instead +of the address register, the value in the program counter is the address +of the full format extension word. This mode requires one or two more extension +words when the size of the base displacement is word or long respectively. +.PP +Note that without the index operand, this mode is an extension of the +.I +Address Register Indirect With Displacement +.R +mode; when using the MC68020 one is no longer limited to a 16-bit displacement. +Also note that with the index operand added, this mode is an extension +of the +.I +Address Register Indirect With Index +.R +mode; when using the MC68020 one is no longer limited to an 8-bit displacement. +.NH 4 +Memory Indirect Post-Indexed +.IP Syntax: 8 +@ ( [ bd , A sub n ] , X sub n .size*scale , od ) @ (MC68020 only) +.PP +This mode may use an outer displacement. First an intermediate memory +address is calculated by adding the contents of the address register and +the possibly sign extended base displacement. This address is used +for in indirect memory access of a long word, followed by adding +the index operand (scaled and possibly signed extended). Finally the +outer displacement is added to yield the address of the operand. +When the program counter is used, the value in the program counter is the +address of the full format extension word. +.NH 4 +Memory Indirect Pre-Indexed +.IP Syntax: 8 +@ ( [ bd , A sub n , X sub n .size*scale ] , od ) @ (MC68020 only) +.PP +This mode may use an outer displacement. First an intermediate memory +address is calculated by adding the contents of the address register, +the scaled contents of the possibly sign extended index register and +the possibly sign extended base displacement. This address is used +for an indirect memory access of a long word, followed by adding +the outer displacement to yield the address of the operand. +When the program counter is used, the value in the program counter is the +address of the full format extension word. +.NH 3 +Addressing modes used in the table +.PP +Not all addressing modes mentioned above are used in code generation. It is +clear that none of the modes that use the program counter PC can be used, +since at code generation time nothing is known about the value in PC. +Also some of the possibilities of the three MC68020 addressing modes are not +used; e.g. it is possible to use a +.I +Data Register Indirect +.R +mode, which actually is the +.I +Address Register Indirect With Index +.R +mode, with the address register and the displacement left out. However +such a mode would require two extra bytes for the full format extension word, +and it would also be much slower than using +.I +Address Register Indirect. +.R +For this kind of reasons several possible addressing modes are not used in the +generation of code. +In the table address registers are only used for holding addresses, and +for index registers only data registers are used. +.NH +The M68000 and MC68020 back end table +.PP +The table itself has to be run through the C preprocessor +before it can be used to generate +the back end (called +.I +code generator +.R +or +.I cg +for short). When no flags are given to +the preprocessor an MC68020 code generator is produced; for the MC68000 +code generator one has to run the table through the preprocessor using the +.I -Dm68k4 +flag. +.PP +The table is designed as described in [4]. For the overall design of a back +end table one is referred to this document. This section only deals +with problems encountered in writing the table and other things worth noting. +.NH 2 +Constant Definitions +.PP +Wordsize and pointersize (EM_WSIZE and EM_PSIZE respectively) are defined +as four (bytes). EM_BSIZE, the hole between AB (the parameter base) and +LB (the local base), is eight bytes: only +the return address and the localbase are saved. +.NH 2 +Properties +.PP +Since Hans van Staveren in his document [4] clearly states that +.I cg +execution time is negatively influenced by the number of properties, only +four different properties have been defined. Besides, since the registers +really are multifunctional, these four are really all that are needed. +.NH 2 +Registers +.PP +The table uses register variables: @ D sub 3 @ - @ D sub 7 @ are used as general register +variables, and address registers @ A sub 2 @ - @ A sub 5 @ are used as pointer register +variables. @ A sub 6 @ is reserved for the localbase. +.NH 2 +Tokens +.PP +At first glance one might wonder about the amount of tokens, especially +for the MC68020, considering the small amount of different addressing modes. +However, the last three addressing modes mentioned for the MC68020 may +omit any of the addends, and this leads to a large amount of different tokens. +I did consider the possibility of enlarging the number of tokens and sets +even further, because there might be assemblers that don't handle displacements +of zero optimally (they might generate a 2 byte extension word holding zero). +The small profit in bytes in the generated code +however does not justify the increase +in size of the token section, the set section and the patterns section, +so this idea was not developed any further. +.PP +The timing cost of the tokens may be incorrect for some MC68000 tokens. +This is because the MC68000 uses a 16-bit data bus which causes the need +of two separate memory accesses for getting 32-bit operands. +.NH 3 +Token names +.PP +The amount of tokens and the limited capability of the authors imagination +might have caused the names of some tokens not to be very clarifying. +Some information about the names may be in place here. +.PP +Whenever part of a token name is in capitals that part is memory indirected +(i.e. in square brackets). In token names +.I OFF +and +.I off +mean an offsetted address register, so an address register with a displacement +(either base displacement or outer displacement). +.I +IND, ind +.R +and +.I index +stand for indexed, or index register. +.I ABS +and +.I abs +stand for absolute, which actually is just a displacement (base or outer). +These `rules' only apply to names of tokens that represent actual operands. +There are also tokens that represent addresses of operands. These +(with a few exceptions) contain +.I +regA, regX +.R +and +.I con +as parts of there names, which stand for address register, index register and +displacement (always base displacement) respectively. If the address to which +the token refers uses memory indirection, that part of the name comes first +(in small letters), followed by an underscore. The memory indirection part +follows the `rules' for operand token names. +.PP +Of course there are exceptions to these `rules' but in those cases the names +are self explanatory. +.PP +Two special cases: +.I ext_regX +is the name of the token that represents the +address of an absolute indexed operand, syntax @ ( bd , X sub n .size*scale ) @; +.I regX +does not represent any real mode, but is used with EM array instructions and +pointer arithmetic. +.NH 3 +Special tokens for the MC68000 +.PP +The MC68000 requires two extra tokens, which are called +.I t_regAcon +and +.I +t_regAregXcon. +.R +They are necessary because +.I regAcon +can only have a 16-bit displacement on the MC68000, and +.I regAregXcon +uses only 8 bits for its displacement. To prevent these addressing modes to +be used with displacements that are too large, the extra tokens are needed. +Whenever the displacements become too large and they need +to be used in the generation +of assembly code, these tokens are transformed into other tokens. +To prevent the table from becoming too messy I defined +.I t_regAcon +and +.I t_regAregXcon +to be identical to +.I regAcon +and +.I regAregXcon +respectively for the MC68020. +.NH 2 +Sets +.PP +Most set names used in the table are self explanatory, especially to the reader +who is familiar with the four addressing categories as mentioned in [5]: +.I +data, memory, alterable +.R +and +.I +control. +.R +In the sets definition part some sets are defined that are not used elsewhere in +the table, but are only used to be part of the definition of +some other set. This keeps the +set definition part from getting too unreadable. +.PP +The sets called +.I imm_cmp +consist of all tokens that can be used to compare with a constant. +.NH 2 +Instructions +.PP +Only the instructions that are used in code generation are listed here. +The first few instructions are meant especially for the use with register +variables. The operand LOCAL used here refers to a register variable. +The reader may not conclude that these operations are also allowed on +ordinary locals. The space and timing cost of these instructions have been +adapted, but the use of the word LOCAL for register variables causes these cost +to be inaccurate anyway. +.PP +The +.I killreg +instruction, which generates a comment in the assembly language output and +which is meant to let +.I cg +know that the data register operand has its contents destroyed, +needs some explaining but this explanation is better in place +in the discussion of groups 3 and 4 of the section about patterns. +.PP +The timing cost of the instructions are probably not very accurate for the +MC68020 because the MC68020 uses an instruction cache and prefetch. The +cost used in the table are the `worst case cost' as mentioned in section 9 +of [5]. +.NH 2 +Moves +.PP +These are all pretty straightforward, except perhaps when +.I t_regAcon +and +.I t_regAregXcon +are used. In these cases the size of the displacement has to be checked +before moving. This also applies to the stacking rules and the coercions. +.NH 2 +Tests +.PP +These three tests (one fore each operation size) could not be more +straightforward than they are now. +.NH 2 +Stackingrules +.PP +The only peculiar stackingrule is the one for +.I +regX. +.R +This token is only used with EM array instructions and +with pointer arithmetic. Whenever it is put +on the fake stack, some EM instructions are left in the instruction stream +to remove this token. Consequently it should never have to be stacked. However +the +.I +code generator generator +.R +(or +.I cgg +for short) +complained about not having a stackingrule for this token, so it had to +be added nevertheless. +.NH 2 +Coercions +.PP +These are all straightforward. There are no splitting coercions since +the fake stack never contains any tokens that can be split. +There are only two unstacking coercions. +The rest are all transforming coercions. Almost all coercions transform +tokens into either a data register or an address register, except in the +MC68000 part of the table the +.I t_regAcon +and +.I t_regAregXcon +tokens are transformed into real +.I regAcon +and +.I regAregXcon +tokens with displacements that are properly sized. +.NH 2 +Patterns +.PP +This is the largest part of the table. It is subdivided into 17 groups. +We will take a closer look at the more interesting groups. +.NH 3 +Group 0: rules for register variables +.PP +This group makes sure that EM instructions using register variables are +handled efficiently. This group includes: local loads and +stores; arithmetic, shifts and logical operations on locals and indirect locals +and pointer handling, where C expressions like +.I +*cp++ +.R +are handled. For such an expression there are several EM instruction +sequences the front end might generate. For an integer pointer e.g.: +.DS +.B +lol lol adp stl loi $1==$2 && $1==$4 && $3==4 && $5==4 +.I +.DE +or +.DS +.B +lol loi lol adp stl $1==$3 && $3==$5 && $2==4 && $5==4 +.I +.DE +or perhaps even +.DS +.B +lil lol adp stl $1==$2 && $2==$4 && $3==4 +.I +.DE +Each of these is included, since which one is generated is is up to the front +end. If the front end is consistent this will mean that some of these patterns +will never be used in code generation. This might seem a waist, but anyone +who thinks that will certainly change his mind when his new C front end +generates a different EM instruction sequence. +.NH 3 +Groups 1 and 2: load and store instructions +.PP +In these groups +.B lof +and +.B stf +, +.B loi +and +.B sti +, +.B ldf +and +.B sdf +are the important instructions. +These are the large parts in this group, especially the +.B loi +and +.B sti +instructions, because they come in three basic sizes (byte, word and long). +Note that with these instructions in the MC68000 part the +.I exact +is omitted in front of +.I regAcon +and +.I +regAregXcon. +.R +This makes sure that +.I t_regAcon +and +.I t_regAregXcon +are transformed into proper tokens before they are used as addresses. +.PP +Also note that the +.I regAregXcon +token is completely left out from the +\fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR +instruction handling. This is because the sum of the token displacement +and the offset provided in the instruction cannot be checked and is likely +to exceed 8 bits. Unfortunately +.I cgg +does not allow the inspection of subregisters of tokens that are on the +fake stack. This same problem might also occur with the +.I regAcon +token, but this is less likely because it +uses 16-bit displacements. Besides if it would have been left out the +\fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR +instructions would have been handled considerably less efficient. +.NH 3 +Groups 3 and 4: integer and unsigned arithmetic +.PP +EM instruction +.B sbi +also works with address registers, because the +.B cmp +instruction in group 12 is replaced by \fBsbi 4\fR. +.PP +For the MC68000 \fBmli\fR, \fBmlu\fR, \fBdvi\fR, \fBdvu\fR, \fBrmi\fR +and \fBrmu\fR are handled +by library routines. This is because the MC68000 has only 16-bit multiplications +and divisions. +.PP +The MC68020 does have 32-bit multiplications and divisions, but for the +.B rmi +and +.B rmu +EM instructions peculiar things happen anyway: they generate the +.I killreg +instruction. This is necessary because the data register that +first held the dividend now holds the quotient; the original contents are +destroyed without +.I cg +knowing about it (the destruction of the two registers that make up the +.I DREG_pair +token couldn't be noted in the instructions part of the table). +To let +.I cg +know that these contents are destroyed, we have to use this `pseudo instruction' +from lack of a better solution. +.NH 3 +Group 5: floating point arithmetic +.PP +Since floating point arithmetic is not implemented traps will be generated here. +.NH 3 +Group 6: pointer arithmetic +.PP +This also is a very important group, along with groups 1 and 2. The MC68020 +has many different addressing modes and if possible they should be used in +the generation of assembly language. +.PP +The +.I regX +token is generated here too. It is meant to make efficient use of the +MC68020 possibility of scaling index registers. +.PP +Note that I would have liked one extra pattern to handle C-statements +like +.DS +.I +pointer += expr ? constant1 : constant2; +.R +.DE +efficiently. This pattern would have looked like: +.DS +pat ads +with const +leaving adp %1.num +.DE +but when +.I cg +is coming to the EM replacement part, the constant has already been removed +from the fake stack, causing +.I %1.num +to have a wrong value. +.NH 3 +Group 9: logical instructions +.PP +The EM instructions \fBand\fR, +.B ior +and +.B xor +are so much alike that procedures can be used here, except for the +.B +xor $1==4 +.R +instruction, because the MC68000 +.I eor +instruction does not allow as many kinds of operands as +.I and +and +.I +or. +.R +.NH 3 +Group 11: arrays +.PP +This group also tries to make efficient use of the available addressing modes, +but it leaves the actual work to group 6 mentioned above. +.PP +The +.I regX +token is also generated here. In this group this token is very useful for +handling array instructions for arrays with one, two, four or eight byte +elements; the array index goes into the index register, which can then +be scaled appropriately. An offset is used when the +first array element has an index other than zero. +.PP +I would have liked some extra patterns here too but they won't work +for the same reasons as explained in the discussion of group 6. +.NH 3 +Group 14: procedure calls instructions +.PP +The function return area consists of registers @ D sub 0 @ and @ D sub 1 @. +.NH 3 +Group 15: miscellaneous instructions +.PP +In many cases here library routines are called. These will be discussed +later. +.PP +Two special EM instructions are included here: \fBdch\fR, and \fBlpb\fR. +I don't know when they are generated by a front end, but these +instructions were also in the back end table for the PDP. In the PDP table +these instructions were replaced by +.B +loi 4 +.R +and +.B +adp 8 +.R +respectively. I included them both, since they couldn't do any harm. +.NH 3 +Extra group: optimalization +.PP +This group is handling EM patterns with more than one instruction. This group +is not absolutely necessary but it makes the generation of code +more efficient. Among the things that are handled here are: arithmetic and +logical operations on locals, externals and indirect locals; shifting +of locals, externals and indirect locals by one; some pointer arithmetic; tests +in combination with logical and's and or's or with branches. Finally +there are sixteen patterns about divisions that could be handled more +efficiently by right shifts and which I think should be handled by the +peephole optimizer (since it also handles +the same patterns with multiplication). +.NH +The library routines +.PP +The table is supplied with two separate libraries: one for the MC68000 and one +for the MC68020. The MC68000 uses a couple more routines than the MC68020 +because it doesn't have 32-bit division and multiplication. +.PP +The routines that need to pop their operands first store their return address. +Routines that need other register besides @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ first store +the original contents of those registers. @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ do not have +to be saved because if they contain anything useful, their contents +are pushed on the stack before the routine is called. +.PP +The +.I .trp +routine just prints a message stating the trap number and exits (except +of course when that particular trap number is masked). Usually higher +level languages use their own trap handling routines. +.PP +The +.I .mon +routine doesn't do anything useful at all. It just prints a message stating that +the specified system call is not implemented and then exits. Front ends +usually generate calls to special routines rather than the EM +instruction \fBmon\fR. +These routines have to be supplied in another library. They +may be system dependent (e.g. the MC68000 machine this table was tested on +first moves the parameters to registers, then moves the system call number +to @ D sub 0 @ and then executes +.I +trap #0, +.R +whereas the MC68020 machine this table was tested on required the parameters +to be on the stack rather than in registers). Therefor this library is not +discussed here. +.PP +The +.I .printf +routine is included for EM diagnostic messages. It can print strings using %s, +16-bit decimal numbers using %d and 32-bit hexadecimal numbers using %x. +.PP +The +.I .strhp +routine stores a new EM heap pointer, and sometimes it needs to allocate more +heap space. This is done by calling the system call routine \fI_brk\fR. +Chunks of 1K bytes are allocated, but this can easily be changed into +larger or smaller chunks. +.PP +The MC68000 library also contains a routine to handle the EM instruction \fBrck\fR. +The MC68020 has an instruction +.I cmp2 +that is specially meant for range checking so the MC68020 library can do without +that routine. +.PP +The MC68000 library has two multiplication routines, one for unsigned and the other +for signed multiplication. The one for signed multiplication +first tests the sizes of the operands, to see if it can perform +the 16 bit machine instruction instead of the routine. If not, it considers +it's two operands being two digit numbers in a 65535-radix system. It +uses the 16-bit unsigned multiply instruction +.I mulu +three times (it does not calculate the high order result), +and adds up the intermediary results the proper way. The signed +multiplication routine calculates the sign of the result, calculates +the result as it it were an unsigned multiplication, and +adjusts the sign of the result. Here testing +the operands for there sizes would be less simple, because the operands +are signeds; so that is not done here. +.PP +The MC68000 library also has two division routines. The routine for unsigned +division uses the popular algorithm, where the divisor is shifted out and +the quotient shifted in. The signed division routine calculates the sign of +both the quotient and the remainder, calls the unsigned division routine +and adjusts the signs for the quotient and the remainder. +.PP +The +.I .nop +routine is included for testing purposes. This routine prints the line +number and the value in the stack pointer. Calls to this routine +are generated by the EM instruction \fBnop\fR, which is ordinarily +left out by the peephole optimizer. +.NH +Testing the table +.PP +There are special test programs available for testing back end tables. +First there is the EM test set, which tests most EM instructions, making +good use of the +.B nop +instruction. Then there are the Pascal and C test programs. The Pascal +test programs report errors, which makes it relatively easy +to find out what was wrong in the table. The C test programs just +generate some output, which then has to be compared to the expected +output. Differences are +not only caused by errors but also e.g. by the use of four +byte integers and unsigneds (which this table does), +the use of signed characters +instead of unsigned characters (the C front end I used generated signed +characters) or because the back end +does not support floating point. +These differences have to be `filtered out' to reveal +the differences caused by actual errors in the back end table. +These errors then have to be found out by examining the assembly code, for +no proper diagnostic messages are generated. +.PP +After these three basic tests there still remain a number of patterns that +haven't been tested yet. Fortunately +.I cgg +offers the possibility of generating a special +.I cg +that can print a list of patterns that haven't been used in +code generation yet. +For these patterns the table writer has to write his own test programs. +This may complicate things a bit because errors may now be caused by +errors in the back end table as well as errors in the test programs. +The latter happened quite often to me, because I found EM +to be an uncomfortable programming language (of course it isn't meant to +be a programming language, but an intermediary language). +.PP +There still remain a couple of patterns in this table that haven't been tested +yet. However these patterns all have very similar cases that have been +tested (an example of this is mentioned in the section on group 0 +of the patterns section of the table). Some patterns have to +do with floating point numbers. These EM instructions all generate +traps, so they didn't all have to be tested. The two instructions +.B dch +and +.B lpb +haven't been tested in this table, but since they only use EM replacement +and they have been tested in the PDP back end table, these two should +be all right. +.NH +Performance of the back end +.PP +To test the performance of the back end I gathered a couple of +C programs and compiled them on the machines I used to test the back ends on. +I compiled them using the C compiler that was available there and +I also compiled them using the back end. I then compared the sizes +of the text segments in the object files. +The final results of these comparisons are in fig. 1 and fig. 2. +.KF +.TS +center box; +cfI s s s s s +c s s s s s +c c | c s | c s +c c | c s | c s +c | c | c c | c c +l | n | n n | n n. +Differences in text segment sizes for the MC68000 +parts of the back end compiled by itself +_ +original old m68k4 new MC68000 +compiler (100%) back end back end +_ +name size size perc. size perc. +_ +codegen.c 13892 16224 116.7% 12860 92.5% +compute.c 4340 4502 103.7% 4530 104.3% +equiv.c 680 662 97.3% 598 87.9% +fillem.c 8016 7304 91.1% 6880 85.8% +gencode.c 1356 1194 88.0% 1130 83.3% +glosym.c 224 202 90.1% 190 84.8% +main.c 732 672 91.8% 634 86.6% +move.c 1876 1526 81.3% 1410 75.1% +nextem.c 1288 1594 123.7% 1192 92.5% +reg.c 1076 1014 94.2% 916 85.1% +regvar.c 1352 1188 87.8% 1150 85.0% +salloc.c 1240 1100 88.7% 1024 82.5% +state.c 628 600 95.5% 532 84.7% +subr.c 6948 6382 91.8% 5680 81.7% += +averages 2939 3155 95.8% 2766 86.6% +.TE +.DS C +fig 1. +.DE +.KE +.KF +.TS +center box; +cfI s s s +cfI s s s +c s s s +c s s s +c c | c s +c c | c s +c | c | c c +l | n | n n. +Differences in text segment sizes +for the MC68020 +parts of the back end +compiled by itself +_ +original MC68020 +compiler (100%) back end +_ +name size size perc. +_ +codegen.c 12608 12134 96.2% +compute.c 4624 4416 95.5% +equiv.c 572 504 88.1% +fillem.c 7780 6976 89.6% +gencode.c 1320 1086 82.2% +glosym.c 228 182 79.8% +main.c 736 596 80.9% +move.c 1392 1280 91.9% +nextem.c 1176 1066 90.6% +reg.c 1052 836 79.4% +regvar.c 1196 968 80.9% +salloc.c 1200 932 77.6% +state.c 580 528 91.0% +subr.c 6136 5268 85.8% += +averages 2900 2627 86.4% +.TE +.DS C +fig 2. +.DE +.KE +Fig. 1 also includes results of an old m68k4 back end (a back end +for the MC68000 with four byte word and pointersize). The table for +this back end was given to me as an example, but I thought it didn't make +good use of the MC68000's addressing capabilities, it hardly did any +optimalization, and it sometimes even +generated code that the assembler would not swallow. +This was sufficient reason for me to write a completely new table. +.PP +The results from the table may not be taken too seriously. The sizes measured +are the sizes of the text segments of the user programs, i.e. without the +inclusion of library routines. Of course these segments do contain calls +to these routines. Another thing is that the +.I rom +segment may be included in the text segment (this is why the +results for the MC68000 for +.I compute.c +look so bad). +.PP +Some other things must be said about these results. +The quality of EM code +generated by the C front end is certainly not optimal. The front end +uses temporary locals (extra locals that are used to evaluate expressions) +far too quickly: for a simple C expression like +.DS +.I +*(pointer) += constant +.R +.DE +where +.I pointer +is a register variable, the C front end generates (for obscure reasons) +a temporary local that holds the contents of \fIpointer\fR. This way +the pattern for +.DS +.B +loc lil adi sil $2==$4 && $3==4 +.R +.DE +for register variables is not used and longer, less efficient +code is generated. But even in spite of this, the back end seems to +generate rather compact code. +.NH +Some timing results +.PP +In order to measure the performance of the code generated by the back end +some timing tests were done. The reason I chose these particular tests is +that they were also done for many other back ends; the reader can compare +the results if he so wishes (of course comparing the results only +show a global difference in speed of the various machines; it doesn't +show whether some back end generates relatively better code than another). +.PP +On the MC68000 machine the statements were executed one million times. +On the MC68020 machine the statements had to be executed four million times +because this machine was so fast that timing results would be very +unreliable if the statements were executed only one million times. +.PP +For testing I used the following C test program: +.DS +.I +main() +{ + int i, j, ... + ... + for (i=0; i<1000; i++) + for (j=0; j<1000; j++) + STATEMENT; +} +.R +.DE +where +.I STATEMENT +is any of the test statements or the empty statement. For the MC68020 +tests I used 2000 instead of 1000. +The results of the test with the empty statement were used to calculate +the execution times of the other test statements. +.PP +Figures 3 and 4 show many results. For each machine actually two tests were +done: one with register variables, and the other without them. +I noticed that the original C compilers on both machines did not generate +the use of register variables, unless specifically requested. The +back end uses register variables when and where they are profitable, even +if the user did not ask for them. +.KF +.TS +center box; +cfI s s s s +c s s s s +c | c s | c s +cw(1.5i) | c c | c c +c | c c | c c +lp-2fI | n n | n n. +timing results for the MC68000 +times in @ mu @seconds +_ +test statement without register variables with register variables +_ + original new MC68000 original new MC68000 + C compiler back end C compiler back end +_ +int1=0; 2.8 2.7 0.5 0.5 +int1=int2-1; 4.1 4.1 1.3 1.3 +int1=int1+1; 4.1 4.1 1.3 1.3 +int1=int2*int3; 40.0 40.5 36.2 36.8 +T{ +int1=(int2<0); +\/*true*/ +T} 5.5 7.3 2.0 4.5 +T{ +int1=(int2<0); +\/*false*/ +T} 4.7 8.5 2.8 5.6 +T{ +int1=(int2<3); +\/*true*/ +T} 6.2 7.7 2.6 5.4 +T{ +int1=(int2<3); +\/*false*/ +T} 5.4 8.9 3.6 6.5 +T{ +.na +int1=((int2>3)||(int2<3)); +\/* true || false */ +T} 6.0 7.8 3.4 5.4 +T{ +.na +int1=((int2>3)||(int2<3)); +\/* false || true */ +T} 9.1 10.2 5.7 7.1 +T{ +.na +switch (int1) { +case 1: int1=0; break; +case 2: int1=1; break; +} +T} 6.3 17.8 5.3 14.0 +T{ +.na +if (int1=0) int2=3; +\/*true*/ +T} 5.1 4.7 1.3 1.3 +T{ +.na +if (int1=0) int2=3; +\/*false*/ +T} 2.2 2.1 1.9 1.1 +while (int1>0) int1=int1-1; 2.2 2.1 1.1 1.1 +int1=a[int2]; 6.8 6.7 4.0 3.1 +p3(int1); 14.3 11.1 13.4 10.0 +int1=f(int2); 17.7 14.5 14.8 11.7 +s.overhead=5400; 2.8 2.7 2.9 2.7 +.TE +.DS C +Fig. 3 +.DE +.KE +.KF +.TS +center box; +cfI s s s s +c s s s s +c | c s | c s +cw(1.5i) | c c | c c +c | c c | c c +lp-2fI | n n | n n. +timing results for the MC68020 +times in @ mu @seconds +_ +test statement without register variables with register variables +_ + original new MC68020 original new MC68020 + C compiler back end C compiler back end +_ +int1=0; .25 .25 .15 .15 +int1=int2-1; 1.3 1.3 .38 .38 +int1=int1+1; 1.2 .90 .38 .15 +int1=int2*int3; 4.4 4.2 3.0 3.1 +T{ +int1=(int2<0); +\/*true*/ +T} 1.6 2.7 1.1 2.3 +T{ +int1=(int2<0); +\/*false*/ +T} 1.9 2.9 .80 2.1 +T{ +int1=(int2<3); +\/*true*/ +T} 1.7 2.8 1.2 2.6 +T{ +int1=(int2<3); +\/*false*/ +T} 2.1 3.0 .85 2.3 +T{ +.na +int1=((int2>3)||(int2<3)); +\/* true || false */ +T} 2.1 3.1 1.2 2.5 +T{ +.na +int1=((int2>3)||(int2<3)); +\/* false || true */ +T} 3.4 4.2 1.8 3.2 +T{ +.na +switch (int1) { +case 1: int1=0; break; +case 2: int1=1; break; +} +T} 2.7 8.0 2.0 6.9 +T{ +.na +if (int1=0) int2=3; +\/*true*/ +T} 1.2 1.3 .63 .63 +T{ +.na +if (int1=0) int2=3; +\/*false*/ +T} 1.7 1.6 .50 .53 +while (int1>0) int1=int1-1; 1.2 1.3 .55 .53 +int1=a[int2]; 1.8 1.8 1.0 1.0 +p3(int1); 14.8 5.5 14.1 5.0 +int1=f(int2); 16.3 6.6 15.2 5.9 +s.overhead=5400; .48 .48 .50 .50 +.TE +.DS C +Fig. 4 +.DE +.KE +.PP +The reader may have noticed that on both machines the back end seems +to generate considerably slower code for tests where a `condition' is +used in the rhs of an assignment statement. This is in fact not true: it is +the front end that generates bad code. Two examples: for the C statement +.DS +.I +int1 = (int2 < 0); +.R +.DE +the front end generates the following code for the rhs (I +used arbitrary labels): +.DS +.B +lol -16 +zlt *10 +loc 0 +bra *11 +10 +loc 1 +11 +.R +.DE +while in this case (to my opinion) it should have generated +.DS +.B +lol -16 +tlt +.R +.DE +which is much shorter. Another example: for the C statement +.DS +.I +int1 = (int2 < 3); +.B +.DE +the front end generates for the rhs +.DS +.B +lol -16 +loc 3 +blt *10 +loc 0 +bra *11 +10 +loc 1 +11 +.R +.DE +while a much better translation would be +.DS +.B +lol -16 +loc 3 +cmi 4 +tlt +.R +.DE +.PP +Another statement that the back end seems to generate slower code for is +the C switch statement. This is true, but it is also caused by +the way these things are done in EM. EM uses the +.B csa +or +.B csb +instruction, and for these two I had to use library routines. On larger +switch statements the +.I .csa +routine will perform relatively better. +.PP +The back end generates considerably faster code for procedure and function +calls, especially in the MC68020 case, and also for the C statement +.DS +.I +int1 = int1 + 1; +.R +.DE +The original C compilers use the same method for this instruction +as for +.DS +.I +int1 = int2 - 1; +.R +.DE +they perform the addition in a scratch register, and then store the +result. For the former C statement this is not necessary, because +the MC68000 and MC68020 have an instruction that can add constants +to almost anything (in this case: to locals). The MC68000 and MC68020 +back ends do use this instruction. +.NH +Some final remarks +.PP +As mentioned a few times before, the C front end compiler does not +generate optimal code and as a consequence of this the +back end does not always generate optimal code. This is especially +the case with temporary locals, which the front end generates much +too quickly, and also with conditional expressions that are +used in the rhs of an assignment statement (fortunately this is not +needed so much). +.PP +If +.I cgg +would have been able to accept operands separated by any character +instead of just by commas (in the instruction definitions part), +I wouldn't have had the need of the +.I killreg +pseudo instruction. It would also be handy to have +.I cgg +accept all normal C operators. At the moment +.I cgg +does not accept binary ands, ors and exors, even though in [4] +it is stated that +.I cgg +does accept all normal C operators. As it happens I did not need the +binary operators, but at some time in developing the table I thought +I did. +.PP +I would also like +.I cg +to do more with the condition codes information that is supplied with +each instruction in the instruction definitions section of the table. +Sometimes +.I cg +generates test instructions which actually were not necessary. This +of course causes the generated +programs to be slightly larger and slightly slower. +.PP +In spite of the few minor shortcomings mentioned above I found +.I cgg +a very comfortable tool to use. +.SH +References +.PP +.IP [1] +T. B. Steel Jr., +.I +UNCOL: The myth and the Fact, +.R +in Ann. Rev. Auto. Prog., +R. Goodman (ed.), Vol. 2 (1969), pp 325 - 344 +.IP [2] +A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson, +.I +A practical toolkit for making portable compilers, +.R +Informatica Report 74, Vrije Universiteit, Amsterdam, 1983 +.IP [3] +A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson, +.I +Description of an experimental machine architecture for use with +block structured languages, +.R +Informatica Report 81, Vrije Universiteit, Amsterdam, 1983 +.IP [4] +H. van Staveren +.I +The table driven code generator from the Amsterdam Compiler Kit, +Second Revised Edition, +.R +Vrije Universiteit, Amsterdam +.IP [5] +.I +MC68020 32-bit Microprocessor User's Manual, +.R +Second Edition, +Motorola Inc., 1985, 1984 +.IP [6] +.I +MC68000 16-bit Microprocessor User's Manual, +Preliminary, +.R +Motorola Inc., 1979 -- 2.34.1