From 12f56d65015125201b8226fd63021276358a25b9 Mon Sep 17 00:00:00 2001
From: ceriel <none@none>
Date: Tue, 10 Mar 1987 16:49:13 +0000
Subject: [PATCH] Initial revision

---
 doc/m68020.doc | 1408 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 1408 insertions(+)
 create mode 100644 doc/m68020.doc

diff --git a/doc/m68020.doc b/doc/m68020.doc
new file mode 100644
index 000000000..8bb6b7f20
--- /dev/null
+++ b/doc/m68020.doc
@@ -0,0 +1,1408 @@
+.nr PS 11
+.nr VS 13p
+.EQ
+delim @@
+.EN
+.EQ
+gfont R
+.EN
+.ND
+.RP
+.TL
+A back end table for the Motorola MC68000, MC68010 and MC68020 microprocessors
+.AU
+Frank Doodeman
+.AB
+A back end table is part of the Amsterdam Compiler Kit (ACK). It is used
+to produce the actual back end, a program that translates the intermediate
+language family EM to assembly language for some target machine. The table
+discussed here can be used for two back ends, suitable for in total three
+machines: the MC68000 and MC68010 (the difference between these two is
+so small that one back end table can be used for either one), or
+for the MC68020.
+.AE
+.NH
+Introduction
+.PP
+To simplify the task of producing portable (cross) compilers and interpreters
+the Vrije Universiteit designed an integrated collection of programs, the
+Amsterdam Compiler Kit (ACK) [2]. It is based on the old UNCOL idea [1] which
+attempts to solve the problem of how to make a compiler for each of @ N @
+languages on @ M @ different machines without having to write @ N times M @
+programs.
+.PP
+The UNCOL approach is to write @ N @
+.I
+front ends,
+.R
+which translate the
+source language into a common intermediate language UNCOL (Universal Computer
+Oriented Language), and @ M @
+.I
+back ends,
+.R
+each of which translates programs in
+UNCOL into a specific machine language. Under these conditions only @ M + N @
+programs must be written to provide all @ N @ languages on all @ M @
+machines, instead of @ M times N @ programs.
+.PP
+The intermediate language for the Amsterdam Compiler Kit is the machine language
+for a simple stack machine called EM (Encoding Machine) [3]. So a back end for
+the MC68020 translates EM code into MC68020 assembly language. Writing such a
+table [4] suffices to get the back end.
+.PP
+The back end is a single program that is driven by a machine dependent driving
+table. This table, the back end table, defines the mapping of EM code to
+the MC68000, MC68010 or MC68020 assembly language.
+.NH
+The MC68000 and MC68020 micro processors
+.PP
+In this document the name MC68000 will be used for both the MC68000 and the
+MC68010 micro processors, because as far as the back end table is concerned
+there is no difference between them. For a complete and detailed description
+of the MC68020 one is referred to [5]; for the MC68000 one might also use [6].
+In this section some relevant parts will be handled.
+.NH 2 
+Registers
+.PP
+Both the MC68000 and the MC68020 have eight 32-bit data registers (@ D sub 0 @-@ D sub 7 @) that can
+be used for byte (8-bit), word (16-bit) and long word (32-bit) data operations.
+They also have seven 32-bit address registers (@ A sub 0 @-@ A sub 6 @) that may be used as
+software stack pointers and base address registers; address register @ A sub 7 @ is
+used as the system stack pointer. Address registers may also be used for
+word and long word address operations.
+.NH 2 
+Addressing modes
+.PP
+First the MC68000 addressing modes will be discussed. Since the MC68020's
+set of addressing modes is an extension of the MC68000's set, of course this
+section also applies to the MC68020.
+.PP
+In the description we use:
+.IP @ A sub n @
+for address register;
+.IP @ D sub n @
+for data register;
+.IP @ R sub n @
+for address or data register;
+.IP @ X sub n @
+for index register (either data or address register);
+.IP @ PC @
+for program counter;
+.IP @ d sub 8 @
+for 8 bit displacement integer;
+.IP @ d sub 16 @
+for 16 bit displacement integer;
+.IP @ bd @
+for base displacement (may be null, word or long);
+.IP @ od @
+for outer displacement (may be null, word or long).
+.NH 3 
+General addressing modes
+.NH 4 
+Register Direct Addressing
+.IP Syntax: 8
+@ R sub n @ 
+.PP
+This addressing mode (it can be used with either a data register or an address
+register) specifies that the operand is in one of
+the 16 multifunction registers.
+.NH 4 
+Address Register Indirect
+.IP Syntax: 8
+@ ( A sub n ) @ 
+.PP
+The address of the operand is in the address register specified.
+.NH 4 
+Address Register Indirect With Postincrement
+.IP Syntax: 8
+@ ( A sub n )+ @ 
+.PP
+The address of the operand is in the address register specified. After the
+operand address is used, the address register is incremented by one, two or
+four depending upon whether the size of the operand is byte, word or long.
+If the address register is the stack pointer and the operand size is byte, the
+address register is incremented by two rather than one to keep the stack pointer
+on a word boundary.
+.NH 4 
+Address Register Indirect With Predecrement
+.IP Syntax: 8
+@ -( A sub n ) @ 
+.PP
+The address of the operand is in the address register specified. Before the
+operand address is used, the address register is decremented by one, two or
+four depending upon whether the size of the operand is byte, word or long.
+If the address register is the stack pointer and the operand size is byte, the
+address register is decremented by two rather than one to keep the stack pointer
+on a word boundary.
+.NH 4 
+Address Register Indirect With Displacement
+.IP Syntax: 8
+@ d sub 16 ( A sub n ) @ for the MC68000, @ ( d sub 16 , A sub n ) @ for the MC68020 
+.PP
+This address mode requires one word of extension. The address of the operand is
+the sum of the contents of the address register and the sign extended 16-bit
+integer in the extension word.
+.NH 4 
+Address Register Indirect With Index
+.IP Syntax: 8
+@ d sub 8 ( A sub n , X sub n .size) @ for the MC68000, @ ( d sub 8 , A sub n , X sub n .size) @ for the MC68020 
+.PP
+This address mode requires one word of extension according to a certain format, 
+which specifies
+.IP 1.
+which register to use as index register;
+.IP 2.
+a flag that indicates whether the index register is a data register or an
+address register;
+.IP 3.
+a flag that indicates the index size; this is
+.I word
+when the low order part of the index register is to be used, and 
+.I long
+when the whole long value in the register is to be used as index;
+.IP 4.
+an 8-bit displacement integer (the low order byte of the extension word).
+.PP
+The address of the operand is the sum of the contents of the address register,
+the possibly sign extended contents of index register and the sign
+extended 8-bit displacement.
+.NH 4 
+Absolute Data Addressing
+.IP Syntax: 8
+@ address @ for the MC68000, @ ( address ) @ for the MC68020 
+.PP
+Two different kinds of this mode are available:
+.IP 1.
+Absolute Short Address; this mode requires one word of extension. The address of
+the operand is the sign extended 16-bit extension word.
+.IP 2.
+Absolute Long Address; this mode requires two words of extension. The address of
+the operand is developed by concatenation of the two extension words; the high
+order part of the address is the first extension word, the low order part is
+the second.
+.NH 4 
+Program Counter With Displacement.
+.IP Syntax: 8
+@ d sub 16 ( PC ) @ for the MC68000, @ ( d sub 16 , PC ) @ for the MC68020 
+.PP
+This mode requires one word of extension. The address of the operand is the sum
+of the address in the program counter and the sign extended 16-bit displacement
+integer in the extension word. The value in the program counter is the
+address of the extension word.
+.NH 4 
+Program Counter With Index
+.IP Syntax: 8
+@ d sub 8 ( PC , X sub n .size ) @ for the MC68000, @ ( d sub 8 , PC,  X sub n .size ) @ for the MC68020 
+.PP
+This mode requires one word of extension as described under
+.I
+Address Register Indirect With Index.
+.R
+The address of the operand is the sum of the value in the
+program counter, the possibly sign extended index register and the sign
+extended 8-bit displacement integer in the extension word.
+The value in the program counter is the address of the extension word.
+.NH 4 
+Immediate Data
+.IP Syntax: 8
+@ \#data @
+.PP
+This addressing mode requires either one or two words of extension, depending
+on the size of the operation;
+.IP
+byte operation - the operand is in the low order byte of extension word;
+.IP
+word operation - the operand is in the extension word;
+.IP
+long operation - the operand is in the two extension words, the high order
+16-bits are in the first extension word, the low order 16-bits in the second.
+.NH 3 
+Extra MC68020 addressing modes
+.PP
+The MC68020 has three more addressing modes. These modes all use a displacement
+(some even two), an address register and an index register. Instead of the
+address register one may also use the program counter. Any of these
+may be omitted. If all addends are omitted the processor creates an
+effective address of zero. All of these three modes require at least one
+extension word, the
+.I
+Full Format Extension Word,
+.R
+which specifies:
+.IP 1.
+the index register number (0-7);
+.IP 2.
+the index register type (address or data register);
+.IP 3.
+the size of the index (only low order part or the whole register)
+.IP 4.
+a scale factor. This is a number from 0 to 3 which specifies how many bits
+the contents of the index register is to be shifted to the left before being
+used as an index;
+.IP 5.
+a flag that specifies whether the base (address) register is to be added or
+to be suppressed;
+.IP 6.
+a flag that specifies whether to add or suppress the index operand;
+.IP 7.
+two bits that specify the size of the base displacement (null, word or long);
+.IP 8.
+three bits that in combination with (6) above specify which of the three
+addressing modes (described below) to use and, if used, the size of the
+outer displacement (null, word or long).
+.IP N.B.
+All modes mentioned above for the MC68000
+that use an index register may have this register
+scaled (only when using the MC68020).
+.PP
+The three extra addressing modes are:
+.NH 4 
+Address Register Indirect With Index (Base Displacement)
+.IP Syntax: 8
+@ ( bd , A sub n , X sub n .size*scale ) @ (MC68020 only)
+.PP
+The address of the operand is the sum of the contents of the address register,
+the scaled contents of the possibly scaled index register and the possibly
+sign extended base displacement. When the program counter is used instead
+of the address register, the value in the program counter is the address
+of the full format extension word. This mode requires one or two more extension
+words when the size of the base displacement is word or long respectively.
+.PP
+Note that without the index operand, this mode is an extension of the
+.I
+Address Register Indirect With Displacement
+.R
+mode; when using the MC68020 one is no longer limited to a 16-bit displacement.
+Also note that with the index operand added, this mode is an extension
+of the
+.I
+Address Register Indirect With Index
+.R
+mode; when using the MC68020 one is no longer limited to an 8-bit displacement.
+.NH 4 
+Memory Indirect Post-Indexed
+.IP Syntax: 8
+@ ( [ bd , A sub n ] , X sub n .size*scale , od ) @ (MC68020 only)
+.PP
+This mode may use an outer displacement. First an intermediate memory
+address is calculated by adding the contents of the address register and
+the possibly sign extended base displacement. This address is used
+for in indirect memory access of a long word, followed by adding
+the index operand (scaled and possibly signed extended). Finally the
+outer displacement is added to yield the address of the operand.
+When the program counter is used, the value in the program counter is the
+address of the full format extension word.
+.NH 4
+Memory Indirect Pre-Indexed
+.IP Syntax: 8
+@ ( [ bd , A sub n , X sub n .size*scale ] , od ) @ (MC68020 only)
+.PP
+This mode may use an outer displacement. First an intermediate memory
+address is calculated by adding the contents of the address register,
+the scaled contents of the possibly sign extended index register and
+the possibly sign extended base displacement. This address is used
+for an indirect memory access of a long word, followed by adding
+the outer displacement to yield the address of the operand.
+When the program counter is used, the value in the program counter is the
+address of the full format extension word.
+.NH 3
+Addressing modes used in the table
+.PP
+Not all addressing modes mentioned above are used in code generation. It is
+clear that none of the modes that use the program counter PC can be used,
+since at code generation time nothing is known about the value in PC.
+Also some of the possibilities of the three MC68020 addressing modes are not
+used; e.g. it is possible to use a
+.I
+Data Register Indirect
+.R
+mode, which actually is the
+.I
+Address Register Indirect With Index
+.R
+mode, with the address register and the displacement left out. However 
+such a mode would require two extra bytes for the full format extension word,
+and it would also be much slower than using
+.I
+Address Register Indirect.
+.R
+For this kind of reasons several possible addressing modes are not used in the
+generation of code.
+In the table address registers are only used for holding addresses, and
+for index registers only data registers are used.
+.NH
+The M68000 and MC68020 back end table
+.PP
+The table itself has to be run through the C preprocessor 
+before it can be used to generate
+the back end (called
+.I
+code generator
+.R
+or
+.I cg
+for short). When no flags are given to
+the preprocessor an MC68020 code generator is produced; for the MC68000
+code generator one has to run the table through the preprocessor using the
+.I -Dm68k4
+flag.
+.PP
+The table is designed as described in [4]. For the overall design of a back
+end table one is referred to this document. This section only deals
+with problems encountered in writing the table and other things worth noting.
+.NH 2 
+Constant Definitions
+.PP
+Wordsize and pointersize (EM_WSIZE and EM_PSIZE respectively) are defined
+as four (bytes). EM_BSIZE, the hole between AB (the parameter base) and
+LB (the local base), is eight bytes: only
+the return address and the localbase are saved.
+.NH 2 
+Properties
+.PP
+Since Hans van Staveren in his document [4] clearly states that
+.I cg
+execution time is negatively influenced by the number of properties, only
+four different properties have been defined. Besides, since the registers
+really are multifunctional, these four are really all that are needed.
+.NH 2 
+Registers
+.PP
+The table uses register variables: @ D sub 3 @ - @ D sub 7 @ are used as general register
+variables, and address registers @ A sub 2 @ - @ A sub 5 @ are used as pointer register
+variables. @ A sub 6 @ is reserved for the localbase.
+.NH 2 
+Tokens
+.PP
+At first glance one might wonder about the amount of tokens, especially
+for the MC68020, considering the small amount of different addressing modes.
+However, the last three addressing modes mentioned for the MC68020 may
+omit any of the addends, and this leads to a large amount of different tokens.
+I did consider the possibility of enlarging the number of tokens and sets
+even further, because there might be assemblers that don't handle displacements
+of zero optimally (they might generate a 2 byte extension word holding zero).
+The small profit in bytes in the generated code
+however does not justify the increase
+in size of the token section, the set section and the patterns section,
+so this idea was not developed any further.
+.PP
+The timing cost of the tokens may be incorrect for some MC68000 tokens.
+This is because the MC68000 uses a 16-bit data bus which causes the need
+of two separate memory accesses for getting 32-bit operands.
+.NH 3 
+Token names
+.PP
+The amount of tokens and the limited capability of the authors imagination
+might have caused the names of some tokens not to be very clarifying.
+Some information about the names may be in place here.
+.PP
+Whenever part of a token name is in capitals that part is memory indirected
+(i.e. in square brackets). In token names
+.I OFF
+and
+.I off
+mean an offsetted address register, so an address register with a displacement
+(either base displacement or outer displacement).
+.I
+IND, ind
+.R
+and
+.I index
+stand for indexed, or index register.
+.I ABS
+and
+.I abs
+stand for absolute, which actually is just a displacement (base or outer).
+These `rules' only apply to names of tokens that represent actual operands.
+There are also tokens that represent addresses of operands. These
+(with a few exceptions) contain
+.I
+regA, regX
+.R
+and
+.I con
+as parts of there names, which stand for address register, index register and
+displacement (always base displacement) respectively. If the address to which
+the token refers uses memory indirection, that part of the name comes first
+(in small letters), followed by an underscore. The memory indirection part
+follows the `rules' for operand token names.
+.PP
+Of course there are exceptions to these `rules' but in those cases the names
+are self explanatory.
+.PP
+Two special cases:
+.I ext_regX
+is the name of the token that represents the
+address of an absolute indexed operand, syntax @ ( bd , X sub n .size*scale ) @; 
+.I regX
+does not represent any real mode, but is used with EM array instructions and
+pointer arithmetic.
+.NH 3
+Special tokens for the MC68000
+.PP
+The MC68000 requires two extra tokens, which are called
+.I t_regAcon
+and
+.I
+t_regAregXcon.
+.R
+They are necessary because
+.I regAcon
+can only have a 16-bit displacement on the MC68000, and
+.I regAregXcon
+uses only 8 bits for its displacement. To prevent these addressing modes to
+be used with displacements that are too large, the extra tokens are needed.
+Whenever the displacements become too large and they need
+to be used in the generation
+of assembly code, these tokens are transformed into other tokens.
+To prevent the table from becoming too messy I defined
+.I t_regAcon
+and
+.I t_regAregXcon
+to be identical to
+.I regAcon
+and
+.I regAregXcon
+respectively for the MC68020.
+.NH 2 
+Sets
+.PP
+Most set names used in the table are self explanatory, especially to the reader
+who is familiar with the four addressing categories as mentioned in [5]:
+.I
+data, memory, alterable
+.R
+and
+.I
+control.
+.R
+In the sets definition part some sets are defined that are not used elsewhere in
+the table, but are only used to be part of the definition of
+some other set. This keeps the
+set definition part from getting too unreadable.
+.PP
+The sets called
+.I imm_cmp
+consist of all tokens that can be used to compare with a constant.
+.NH 2 
+Instructions
+.PP
+Only the instructions that are used in code generation are listed here.
+The first few instructions are meant especially for the use with register
+variables. The operand LOCAL used here refers to a register variable.
+The reader may not conclude that these operations are also allowed on
+ordinary locals. The space and timing cost of these instructions have been
+adapted, but the use of the word LOCAL for register variables causes these cost
+to be inaccurate anyway.
+.PP
+The 
+.I killreg
+instruction, which generates a comment in the assembly language output and
+which is meant to let
+.I cg
+know that the data register operand has its contents destroyed,
+needs some explaining but this explanation is better in place
+in the discussion of groups 3 and 4 of the section about patterns.
+.PP
+The timing cost of the instructions are probably not very accurate for the
+MC68020 because the MC68020 uses an instruction cache and prefetch. The
+cost used in the table are the `worst case cost' as mentioned in section 9
+of [5].
+.NH 2 
+Moves
+.PP
+These are all pretty straightforward, except perhaps when
+.I t_regAcon
+and
+.I t_regAregXcon
+are used. In these cases the size of the displacement has to be checked
+before moving. This also applies to the stacking rules and the coercions.
+.NH 2 
+Tests
+.PP
+These three tests (one fore each operation size) could not be more
+straightforward than they are now.
+.NH 2 
+Stackingrules
+.PP
+The only peculiar stackingrule is the one for
+.I
+regX.
+.R
+This token is only used with EM array instructions and
+with pointer arithmetic. Whenever it is put
+on the fake stack, some EM instructions are left in the instruction stream
+to remove this token. Consequently it should never have to be stacked. However
+the
+.I
+code generator generator
+.R
+(or
+.I cgg
+for short)
+complained about not having a stackingrule for this token, so it had to
+be added nevertheless.
+.NH 2 
+Coercions
+.PP
+These are all straightforward. There are no splitting coercions since
+the fake stack never contains any tokens that can be split.
+There are only two unstacking coercions.
+The rest are all transforming coercions. Almost all coercions transform
+tokens into either a data register or an address register, except in the
+MC68000 part of the table the
+.I t_regAcon
+and
+.I t_regAregXcon
+tokens are transformed into real
+.I regAcon
+and
+.I regAregXcon
+tokens with displacements that are properly sized.
+.NH 2 
+Patterns
+.PP
+This is the largest part of the table. It is subdivided into 17 groups.
+We will take a closer look at the more interesting groups.
+.NH 3 
+Group 0: rules for register variables
+.PP
+This group makes sure that EM instructions using register variables are
+handled efficiently. This group includes: local loads and
+stores; arithmetic, shifts and logical operations on locals and indirect locals
+and pointer handling, where C expressions like
+.I
+*cp++
+.R
+are handled. For such an expression there are several EM instruction
+sequences the front end might generate. For an integer pointer e.g.:
+.DS
+.B
+lol lol adp stl loi $1==$2 && $1==$4 && $3==4 && $5==4
+.I
+.DE
+or
+.DS
+.B
+lol loi lol adp stl $1==$3 && $3==$5 && $2==4 && $5==4
+.I
+.DE
+or perhaps even
+.DS
+.B
+lil lol adp stl $1==$2 && $2==$4 && $3==4
+.I
+.DE
+Each of these is included, since which one is generated is is up to the front
+end. If the front end is consistent this will mean that some of these patterns
+will never be used in code generation. This might seem a waist, but anyone
+who thinks that will certainly change his mind when his new C front end
+generates a different EM instruction sequence.
+.NH 3 
+Groups 1 and 2: load and store instructions
+.PP
+In these groups
+.B lof
+and
+.B stf
+,
+.B loi
+and
+.B sti
+,
+.B ldf
+and
+.B sdf
+are the important instructions.
+These are the large parts in this group, especially the
+.B loi
+and
+.B sti
+instructions, because they come in three basic sizes (byte, word and long).
+Note that with these instructions in the MC68000 part the
+.I exact
+is omitted in front of
+.I regAcon
+and
+.I
+regAregXcon.
+.R
+This makes sure that
+.I t_regAcon
+and
+.I t_regAregXcon
+are transformed into proper tokens before they are used as addresses.
+.PP
+Also note that the
+.I regAregXcon
+token is completely left out from the
+\fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR
+instruction handling. This is because the sum of the token displacement
+and the offset provided in the instruction cannot be checked and is likely
+to exceed 8 bits. Unfortunately 
+.I cgg
+does not allow the inspection of subregisters of tokens that are on the
+fake stack. This same problem might also occur with the
+.I regAcon
+token, but this is less likely because it
+uses 16-bit displacements. Besides if it would have been left out the
+\fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR
+instructions would have been handled considerably less efficient.
+.NH 3 
+Groups 3 and 4: integer and unsigned arithmetic
+.PP
+EM instruction
+.B sbi
+also works with address registers, because the 
+.B cmp
+instruction in group 12 is replaced by \fBsbi 4\fR.
+.PP
+For the MC68000 \fBmli\fR, \fBmlu\fR, \fBdvi\fR, \fBdvu\fR, \fBrmi\fR
+and \fBrmu\fR are handled
+by library routines. This is because the MC68000 has only 16-bit multiplications
+and divisions.
+.PP
+The MC68020 does have 32-bit multiplications and divisions, but for the
+.B rmi
+and
+.B rmu
+EM instructions peculiar things happen anyway: they generate the
+.I killreg
+instruction. This is necessary because the data register that 
+first held the dividend now holds the quotient; the original contents are
+destroyed without
+.I cg
+knowing about it (the destruction of the two registers that make up the
+.I DREG_pair
+token couldn't be noted in the instructions part of the table).
+To let
+.I cg
+know that these contents are destroyed, we have to use this `pseudo instruction'
+from lack of a better solution.
+.NH 3 
+Group 5: floating point arithmetic
+.PP
+Since floating point arithmetic is not implemented traps will be generated here.
+.NH 3 
+Group 6: pointer arithmetic
+.PP
+This also is a very important group, along with groups 1 and 2. The MC68020
+has many different addressing modes and if possible they should be used in
+the generation of assembly language.
+.PP
+The
+.I regX
+token is generated here too. It is meant to make efficient use of the
+MC68020 possibility of scaling index registers.
+.PP
+Note that I would have liked one extra pattern to handle C-statements
+like
+.DS
+.I
+pointer += expr ? constant1 : constant2;
+.R
+.DE
+efficiently. This pattern would have looked like:
+.DS
+pat ads
+with const
+leaving adp %1.num
+.DE
+but when
+.I cg
+is coming to the EM replacement part, the constant has already been removed
+from the fake stack, causing
+.I %1.num
+to have a wrong value.
+.NH 3 
+Group 9: logical instructions
+.PP
+The EM instructions \fBand\fR,
+.B ior
+and
+.B xor
+are so much alike that procedures can be used here, except for the
+.B
+xor $1==4
+.R
+instruction, because the MC68000
+.I eor
+instruction does not allow as many kinds of operands as
+.I and
+and
+.I
+or.
+.R
+.NH 3 
+Group 11: arrays
+.PP
+This group also tries to make efficient use of the available addressing modes,
+but it leaves the actual work to group 6 mentioned above.
+.PP
+The
+.I regX
+token is also generated here. In this group this token is very useful for
+handling array instructions for arrays with one, two, four or eight byte
+elements; the array index goes into the index register, which can then
+be scaled appropriately. An offset is used when the
+first array element has an index other than zero.
+.PP
+I would have liked some extra patterns here too but they won't work
+for the same reasons as explained in the discussion of group 6.
+.NH 3 
+Group 14: procedure calls instructions
+.PP
+The function return area consists of registers @ D sub 0 @ and @ D sub 1 @.
+.NH 3 
+Group 15: miscellaneous instructions
+.PP
+In many cases here library routines are called. These will be discussed
+later.
+.PP
+Two special EM instructions are included here: \fBdch\fR, and \fBlpb\fR.
+I don't know when they are generated by a front end, but these
+instructions were also in the back end table for the PDP. In the PDP table
+these instructions were replaced by
+.B
+loi 4
+.R
+and
+.B
+adp 8
+.R
+respectively. I included them both, since they couldn't do any harm.
+.NH 3 
+Extra group: optimalization
+.PP
+This group is handling EM patterns with more than one instruction. This group
+is not absolutely necessary but it makes the generation of code
+more efficient. Among the things that are handled here are: arithmetic and
+logical operations on locals, externals and indirect locals; shifting
+of locals, externals and indirect locals by one; some pointer arithmetic; tests
+in combination with logical and's and or's or with branches. Finally
+there are sixteen patterns about divisions that could be handled more
+efficiently by right shifts and which I think should be handled by the
+peephole optimizer (since it also handles
+the same patterns with multiplication).
+.NH
+The library routines
+.PP
+The table is supplied with two separate libraries: one for the MC68000 and one
+for the MC68020. The MC68000 uses a couple more routines than the MC68020
+because it doesn't have 32-bit division and multiplication.
+.PP
+The routines that need to pop their operands first store their return address.
+Routines that need other register besides @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ first store
+the original contents of those registers. @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ do not have
+to be saved because if they contain anything useful, their contents
+are pushed on the stack before the routine is called.
+.PP
+The
+.I .trp
+routine just prints a message stating the trap number and exits (except
+of course when that particular trap number is masked). Usually higher
+level languages use their own trap handling routines.
+.PP
+The
+.I .mon
+routine doesn't do anything useful at all. It just prints a message stating that
+the specified system call is not implemented and then exits. Front ends
+usually generate calls to special routines rather than the EM
+instruction \fBmon\fR.
+These routines have to be supplied in another library. They
+may be system dependent (e.g. the MC68000 machine this table was tested on
+first moves the parameters to registers, then moves the system call number
+to @ D sub 0 @ and then executes
+.I
+trap #0,
+.R
+whereas the MC68020 machine this table was tested on required the parameters
+to be on the stack rather than in registers). Therefor this library is not
+discussed here.
+.PP
+The
+.I .printf
+routine is included for EM diagnostic messages. It can print strings using %s,
+16-bit decimal numbers using %d and 32-bit hexadecimal numbers using %x.
+.PP
+The
+.I .strhp
+routine stores a new EM heap pointer, and sometimes it needs to allocate more
+heap space. This is done by calling the system call routine \fI_brk\fR.
+Chunks of 1K bytes are allocated, but this can easily be changed into
+larger or smaller chunks.
+.PP
+The MC68000 library also contains a routine to handle the EM instruction \fBrck\fR.
+The MC68020 has an instruction
+.I cmp2
+that is specially meant for range checking so the MC68020 library can do without
+that routine.
+.PP
+The MC68000 library has two multiplication routines, one for unsigned and the other
+for signed multiplication. The one for signed multiplication
+first tests the sizes of the operands, to see if it can perform
+the 16 bit machine instruction instead of the routine. If not, it considers
+it's two operands being two digit numbers in a 65535-radix system. It
+uses the 16-bit unsigned multiply instruction
+.I mulu
+three times (it does not calculate the high order result),
+and adds up the intermediary results the proper way. The signed
+multiplication routine calculates the sign of the result, calculates
+the result as it it were an unsigned multiplication, and
+adjusts the sign of the result. Here testing
+the operands for there sizes would be less simple, because the operands
+are signeds; so that is not done here.
+.PP
+The MC68000 library also has two division routines. The routine for unsigned
+division uses the popular algorithm, where the divisor is shifted out and
+the quotient shifted in. The signed division routine calculates the sign of
+both the quotient and the remainder, calls the unsigned division routine
+and adjusts the signs for the quotient and the remainder.
+.PP
+The
+.I .nop
+routine is included for testing purposes. This routine prints the line
+number and the value in the stack pointer. Calls to this routine
+are generated by the EM instruction \fBnop\fR, which is ordinarily
+left out by the peephole optimizer.
+.NH
+Testing the table
+.PP
+There are special test programs available for testing back end tables.
+First there is the EM test set, which tests most EM instructions, making
+good use of the
+.B nop
+instruction. Then there are the Pascal and C test programs. The Pascal
+test programs report errors, which makes it relatively easy
+to find out what was wrong in the table. The C test programs just
+generate some output, which then has to be compared to the expected
+output. Differences are
+not only caused by errors but also e.g. by the use of four
+byte integers and unsigneds (which this table does),
+the use of signed characters
+instead of unsigned characters (the C front end I used generated signed
+characters) or because the back end
+does not support floating point.
+These differences have to be `filtered out' to reveal
+the differences caused by actual errors in the back end table.
+These errors then have to be found out by examining the assembly code, for
+no proper diagnostic messages are generated.
+.PP
+After these three basic tests there still remain a number of patterns that
+haven't been tested yet. Fortunately
+.I cgg
+offers the possibility of generating a special
+.I cg
+that can print a list of patterns that haven't been used in
+code generation yet.
+For these patterns the table writer has to write his own test programs.
+This may complicate things a bit because errors may now be caused by
+errors in the back end table as well as errors in the test programs.
+The latter happened quite often to me, because I found EM
+to be an uncomfortable programming language (of course it isn't meant to
+be a programming language, but an intermediary language).
+.PP
+There still remain a couple of patterns in this table that haven't been tested
+yet. However these patterns all have very similar cases that have been
+tested (an example of this is mentioned in the section on group 0
+of the patterns section of the table). Some patterns have to
+do with floating point numbers. These EM instructions all generate
+traps, so they didn't all have to be tested. The two instructions
+.B dch
+and
+.B lpb
+haven't been tested in this table, but since they only use EM replacement
+and they have been tested in the PDP back end table, these two should
+be all right.
+.NH
+Performance of the back end
+.PP
+To test the performance of the back end I gathered a couple of
+C programs and compiled them on the machines I used to test the back ends on.
+I compiled them using the C compiler that was available there and
+I also compiled them using the back end. I then compared the sizes
+of the text segments in the object files.
+The final results of these comparisons are in fig. 1 and fig. 2.
+.KF
+.TS
+center box;
+cfI s s s s s
+c s s s s s
+c c | c s | c s
+c c | c s | c s
+c | c | c  c | c  c
+l | n | n  n | n  n.
+Differences in text segment sizes for the MC68000
+parts of the back end compiled by itself
+_
+original	 	old m68k4	new MC68000
+compiler	(100%)	back end	back end
+_
+name	size	size	perc.	size	perc.
+_
+codegen.c	13892	16224	116.7%	12860	92.5%
+compute.c	4340	4502	103.7%	4530	104.3%
+equiv.c	680	662	97.3%	598	87.9%
+fillem.c	8016	7304	91.1%	6880	85.8%
+gencode.c	1356	1194	88.0%	1130	83.3%
+glosym.c	224	202	90.1%	190	84.8%
+main.c	732	672	91.8%	634	86.6%
+move.c	1876	1526	81.3%	1410	75.1%
+nextem.c	1288	1594	123.7%	1192	92.5%
+reg.c	1076	1014	94.2%	916	85.1%
+regvar.c	1352	1188	87.8%	1150	85.0%
+salloc.c	1240	1100	88.7%	1024	82.5%
+state.c	628	600	95.5%	532	84.7%
+subr.c	6948	6382	91.8%	5680	81.7%
+=
+averages	2939	3155	95.8%	2766	86.6%
+.TE
+.DS C
+fig 1.
+.DE
+.KE
+.KF
+.TS
+center box;
+cfI s s s
+cfI s s s
+c s s s
+c s s s
+c c | c s
+c c | c s
+c | c | c  c
+l | n | n  n.
+Differences in text segment sizes
+for the MC68020
+parts of the back end
+compiled by itself
+_
+original	 	MC68020
+compiler	(100%)	back end
+_
+name	size	size	perc.
+_
+codegen.c	12608	12134	96.2%
+compute.c	4624	4416	95.5%
+equiv.c	572	504	88.1%
+fillem.c	7780	6976	89.6%
+gencode.c	1320	1086	82.2%
+glosym.c	228	182	79.8%
+main.c	736	596	80.9%
+move.c	1392	1280	91.9%
+nextem.c	1176	1066	90.6%
+reg.c	1052	836	79.4%
+regvar.c	1196	968	80.9%
+salloc.c	1200	932	77.6%
+state.c	580	528	91.0%
+subr.c	6136	5268	85.8%
+=
+averages	2900	2627	86.4%
+.TE
+.DS C
+fig 2.
+.DE
+.KE
+Fig. 1 also includes results of an old m68k4 back end (a back end
+for the MC68000 with four byte word and pointersize). The table for
+this back end was given to me as an example, but I thought it didn't make
+good use of the MC68000's addressing capabilities, it hardly did any
+optimalization, and it sometimes even
+generated code that the assembler would not swallow.
+This was sufficient reason for me to write a completely new table.
+.PP
+The results from the table may not be taken too seriously. The sizes measured
+are the sizes of the text segments of the user programs, i.e. without the
+inclusion of library routines. Of course these segments do contain calls
+to these routines. Another thing is that the
+.I rom
+segment may be included in the text segment (this is why the
+results for the MC68000 for
+.I compute.c
+look so bad).
+.PP
+Some other things must be said about these results.
+The quality of EM code
+generated by the C front end is certainly not optimal. The front end
+uses temporary locals (extra locals that are used to evaluate expressions)
+far too quickly: for a simple C expression like
+.DS
+.I
+*(pointer) += constant
+.R
+.DE
+where
+.I pointer
+is a register variable, the C front end generates (for obscure reasons)
+a temporary local that holds the contents of \fIpointer\fR. This way
+the pattern for
+.DS
+.B
+loc lil adi sil $2==$4 && $3==4
+.R
+.DE
+for register variables is not used and longer, less efficient
+code is generated. But even in spite of this, the back end seems to
+generate rather compact code.
+.NH
+Some timing results
+.PP
+In order to measure the performance of the code generated by the back end
+some timing tests were done. The reason I chose these particular tests is
+that they were also done for many other back ends; the reader can compare
+the results if he so wishes (of course comparing the results only
+show a global difference in speed of the various machines; it doesn't
+show whether some back end generates relatively better code than another).
+.PP
+On the MC68000 machine the statements were executed one million times.
+On the MC68020 machine the statements had to be executed four million times
+because this machine was so fast that timing results would be very
+unreliable if the statements were executed only one million times.
+.PP
+For testing I used the following C test program:
+.DS
+.I
+main()
+{
+    int i, j, ...
+    ...
+    for (i=0; i<1000; i++)
+        for (j=0; j<1000; j++)
+    	    STATEMENT;
+}
+.R
+.DE
+where
+.I STATEMENT
+is any of the test statements or the empty statement. For the MC68020
+tests I used 2000 instead of 1000.
+The results of the test with the empty statement were used to calculate
+the execution times of the other test statements.
+.PP
+Figures 3 and 4 show many results. For each machine actually two tests were
+done: one with register variables, and the other without them.
+I noticed that the original C compilers on both machines did not generate
+the use of register variables, unless specifically requested. The
+back end uses register variables when and where they are profitable, even
+if the user did not ask for them.
+.KF
+.TS
+center box;
+cfI s s s s
+c s s s s
+c | c s | c s
+cw(1.5i) | c c | c c
+c | c c | c c
+lp-2fI | n n | n n.
+timing results for the MC68000
+times in @ mu @seconds
+_
+test statement	without register variables	with register variables
+_
+ 	original	new MC68000	original	new MC68000
+ 	C compiler	back end	C compiler	back end
+_
+int1=0;	2.8	2.7	0.5	0.5
+int1=int2-1;	4.1	4.1	1.3	1.3
+int1=int1+1;	4.1	4.1	1.3	1.3
+int1=int2*int3;	40.0	40.5	36.2	36.8
+T{
+int1=(int2<0);
+\/*true*/
+T}	5.5	7.3	2.0	4.5
+T{
+int1=(int2<0);
+\/*false*/
+T}	4.7	8.5	2.8	5.6
+T{
+int1=(int2<3);
+\/*true*/
+T}	6.2	7.7	2.6	5.4
+T{
+int1=(int2<3);
+\/*false*/
+T}	5.4	8.9	3.6	6.5
+T{
+.na
+int1=((int2>3)||(int2<3));
+\/* true || false */
+T}	6.0	7.8	3.4	5.4
+T{
+.na
+int1=((int2>3)||(int2<3));
+\/* false || true */
+T}	9.1	10.2	5.7	7.1
+T{
+.na
+switch (int1) {
+case 1: int1=0; break;
+case 2: int1=1; break;
+}
+T}	6.3	17.8	5.3	14.0
+T{
+.na
+if (int1=0) int2=3;
+\/*true*/
+T}	5.1	4.7	1.3	1.3
+T{
+.na
+if (int1=0) int2=3;
+\/*false*/
+T}	2.2	2.1	1.9	1.1
+while (int1>0) int1=int1-1;	2.2	2.1	1.1	1.1
+int1=a[int2];	6.8	6.7	4.0	3.1
+p3(int1);	14.3	11.1	13.4	10.0
+int1=f(int2);	17.7	14.5	14.8	11.7
+s.overhead=5400;	2.8	2.7	2.9	2.7
+.TE
+.DS C
+Fig. 3
+.DE
+.KE
+.KF
+.TS
+center box;
+cfI s s s s
+c s s s s
+c | c s | c s
+cw(1.5i) | c c | c c
+c | c c | c c
+lp-2fI | n n | n n.
+timing results for the MC68020
+times in @ mu @seconds
+_
+test statement	without register variables	with register variables
+_
+ 	original	new MC68020	original	new MC68020
+ 	C compiler	back end	C compiler	back end
+_
+int1=0;	.25	.25	.15	.15
+int1=int2-1;	1.3	1.3	.38	.38
+int1=int1+1;	1.2	.90	.38	.15
+int1=int2*int3;	4.4	4.2	3.0	3.1
+T{
+int1=(int2<0);
+\/*true*/
+T}	1.6	2.7	1.1	2.3
+T{
+int1=(int2<0);
+\/*false*/
+T}	1.9	2.9	.80	2.1
+T{
+int1=(int2<3);
+\/*true*/
+T}	1.7	2.8	1.2	2.6
+T{
+int1=(int2<3);
+\/*false*/
+T}	2.1	3.0	.85	2.3
+T{
+.na
+int1=((int2>3)||(int2<3));
+\/* true || false */
+T}	2.1	3.1	1.2	2.5
+T{
+.na
+int1=((int2>3)||(int2<3));
+\/* false || true */
+T}	3.4	4.2	1.8	3.2
+T{
+.na
+switch (int1) {
+case 1: int1=0; break;
+case 2: int1=1; break;
+}
+T}	2.7	8.0	2.0	6.9
+T{
+.na
+if (int1=0) int2=3;
+\/*true*/
+T}	1.2	1.3	.63	.63
+T{
+.na
+if (int1=0) int2=3;
+\/*false*/
+T}	1.7	1.6	.50	.53
+while (int1>0) int1=int1-1;	1.2	1.3	.55	.53
+int1=a[int2];	1.8	1.8	1.0	1.0
+p3(int1);	14.8	5.5	14.1	5.0
+int1=f(int2);	16.3	6.6	15.2	5.9
+s.overhead=5400;	.48	.48	.50	.50
+.TE
+.DS C
+Fig. 4
+.DE
+.KE
+.PP
+The reader may have noticed that on both machines the back end seems
+to generate considerably slower code for tests where a `condition' is
+used in the rhs of an assignment statement. This is in fact not true: it is
+the front end that generates bad code. Two examples: for the C statement
+.DS
+.I
+int1 = (int2 < 0);
+.R
+.DE
+the front end generates the following code for the rhs (I
+used arbitrary labels):
+.DS
+.B
+lol -16
+zlt *10
+loc 0
+bra *11
+10
+loc 1
+11
+.R
+.DE
+while in this case (to my opinion) it should have generated
+.DS
+.B
+lol -16
+tlt
+.R
+.DE
+which is much shorter. Another example: for the C statement
+.DS
+.I
+int1 = (int2 < 3);
+.B
+.DE
+the front end generates for the rhs
+.DS
+.B
+lol -16
+loc 3
+blt *10
+loc 0
+bra *11
+10
+loc 1
+11
+.R
+.DE
+while a much better translation would be
+.DS
+.B
+lol -16
+loc 3
+cmi 4
+tlt
+.R
+.DE
+.PP
+Another statement that the back end seems to generate slower code for is
+the C switch statement. This is true, but it is also caused by
+the way these things are done in EM. EM uses the
+.B csa
+or
+.B csb
+instruction, and for these two I had to use library routines. On larger
+switch statements the
+.I .csa
+routine will perform relatively better.
+.PP
+The back end generates considerably faster code for procedure and function
+calls, especially in the MC68020 case, and also for the C statement
+.DS
+.I
+int1 = int1 + 1;
+.R
+.DE
+The original C compilers use the same method for this instruction
+as for
+.DS
+.I
+int1 = int2 - 1;
+.R
+.DE
+they perform the addition in a scratch register, and then store the
+result. For the former C statement this is not necessary, because
+the MC68000 and MC68020 have an instruction that can add constants
+to almost anything (in this case: to locals). The MC68000 and MC68020
+back ends do use this instruction.
+.NH
+Some final remarks
+.PP
+As mentioned a few times before, the C front end compiler does not
+generate optimal code and as a consequence of this the
+back end does not always generate optimal code. This is especially
+the case with temporary locals, which the front end generates much
+too quickly, and also with conditional expressions that are
+used in the rhs of an assignment statement (fortunately this is not
+needed so much).
+.PP
+If
+.I cgg
+would have been able to accept operands separated by any character
+instead of just by commas (in the instruction definitions part),
+I wouldn't have had the need of the
+.I killreg
+pseudo instruction. It would also be handy to have
+.I cgg
+accept all normal C operators. At the moment
+.I cgg
+does not accept binary ands, ors and exors, even though in [4]
+it is stated that
+.I cgg
+does accept all normal C operators. As it happens I did not need the
+binary operators, but at some time in developing the table I thought
+I did.
+.PP
+I would also like
+.I cg
+to do more with the condition codes information that is supplied with
+each instruction in the instruction definitions section of the table.
+Sometimes
+.I cg
+generates test instructions which actually were not necessary. This
+of course causes the generated
+programs to be slightly larger and slightly slower.
+.PP
+In spite of the few minor shortcomings mentioned above I found
+.I cgg
+a very comfortable tool to use.
+.SH
+References
+.PP
+.IP [1]
+T. B. Steel Jr.,
+.I
+UNCOL: The myth and the Fact,
+.R
+in Ann. Rev. Auto. Prog.,
+R. Goodman (ed.), Vol. 2 (1969), pp 325 - 344
+.IP [2]
+A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson,
+.I
+A practical toolkit for making portable compilers,
+.R
+Informatica Report 74, Vrije Universiteit, Amsterdam, 1983
+.IP [3]
+A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson,
+.I
+Description of an experimental machine architecture for use with
+block structured languages,
+.R
+Informatica Report 81, Vrije Universiteit, Amsterdam, 1983
+.IP [4]
+H. van Staveren
+.I
+The table driven code generator from the Amsterdam Compiler Kit,
+Second Revised Edition,
+.R
+Vrije Universiteit, Amsterdam
+.IP [5]
+.I
+MC68020 32-bit Microprocessor User's Manual,
+.R
+Second Edition,
+Motorola Inc., 1985, 1984
+.IP [6]
+.I
+MC68000 16-bit Microprocessor User's Manual,
+Preliminary,
+.R
+Motorola Inc., 1979
-- 
2.34.1