doc/em/intro.nr

   1 .bp
   2 .P1 "INTRODUCTION"
   3 .PP
   4 EM is a family of intermediate languages designed for producing
   5 portable compilers.
   6 The general strategy is for a program called \fBfront end\fP
   7 to translate the source program to EM.
   8 Another program, \fBback end\fP,
   9 translates EM to target assembly language.
  10 Alternatively, the EM code can be assembled to a binary form
  11 and interpreted.
  12 These considerations led to the following goals:
  13 .IP 1
  14 The design should allow translation to,
  15 or interpretation on, a wide range of existing machines.
  16 Design decisions should be delayed as far as possible
  17 and the implications of these decisions should
  18 be localized as much as possible.
  19 .br
  20 The current microcomputer technology offers 8, 16 and 32 bit machines
  21 with various sizes of address space.
  22 EM should be flexible enough to be useful on most of these
  23 machines.
  24 The differences between the members of the EM family should only
  25 concern the wordsize and address space size.
  26 .IP 2
  27 The architecture should ease the task of code generation for
  28 high level languages such as Pascal, C, Ada, Algol 68, BCPL.
  29 .IP 3
  30 The instruction set used by the interpreter should be compact,
  31 to reduce the amount of memory needed
  32 for program storage, and to reduce the time needed to transmit
  33 programs over communication lines.
  34 .IP 3
  35 It should be designed with microprogrammed implementations in
  36 mind; in particular, the use of many short fields within
  37 instruction opcodes should be avoided, because their extraction by the
  38 microprogram or conversion to other instruction formats is inefficient.
  39 .PP
  40 The basic architecture is based on the concept of a stack. The stack
  41 is used for procedure return addresses, actual parameters, local variables,
  42 and arithmetic operations.
  43 There are several built-in object types,
  44 for example, signed and unsigned integers,
  45 floating point numbers, pointers and sets of bits.
  46 There are instructions to push and pop objects
  47 to and from the stack.
  48 The push and pop instructions are not typed.
  49 They only care about the size of the objects.
  50 For each built-in type there are
  51 reverse Polish type instructions that pop one or more
  52 objects from the top of
  53 the stack, perform an operation, and push the result back onto the
  54 stack.
  55 For all types except pointers,
  56 these instructions have the object size
  57 as argument.
  58 .PP
  59 There are no visible general registers used for arithmetic operands
  60 etc. This is in contrast to most third generation computers, which usually
  61 have 8 or 16 general registers. The decision not to have a group of
  62 general registers was fully intentional, and follows W.L. Van der
  63 Poel's dictum that a machine should have 0, 1, or an infinite
  64 number of any feature. General registers have two primary uses: to hold
  65 intermediate results of complicated expressions, e.g.
  66 .DS
  67 ((a*b + c*d)/e + f*g/h) * i
  68 .DE
  69 and to hold local variables.
  70 .PP
  71 Various studies
  72 have shown that the average expression has fewer than two operands,
  73 making the former use of registers of doubtful value. The present trend
  74 toward structured programs consisting of many small
  75 procedures greatly reduces the value of registers to hold local variables
  76 because the large number of procedure calls implies a large overhead in
  77 saving and restoring the registers at every call.
  78 .PP
  79 Although there are no general purpose registers, there are a
  80 few internal registers with specific functions as follows:
  81 .TS
  82 tab(:);
  83 l 1 l l l.
  84 PC:\-:Program Counter:Pointer to next instruction
  85 LB:\-:Local Base:Points to base of the local variables
  86 :::in the current procedure.
  87 SP:\-:Stack Pointer:Points to the highest occupied word on the stack.
  88 HP:\-:Heap Pointer:Points to the top of the heap area.
  89 .TE
  90 .PP
  91 Furthermore, reverse Polish code is much easier to generate than
  92 multi-register machine code, especially if highly efficient code is
  93 desired.
  94 When translating to assembly language the back end can make
  95 good use of the target machine's registers.
  96 An EM machine can
  97 achieve high performance by keeping part of the stack
  98 in high speed storage (a cache or microprogram scratchpad memory) rather
  99 than in primary memory.
 100 .PP
 101 Again according to van der Poel's dictum,
 102 all EM instructions have zero or one argument.
 103 We believe that instructions needing two arguments
 104 can be split into two simpler ones.
 105 The simpler ones can probably be used in other
 106 circumstances as well.
 107 Moreover, these two instructions together often
 108 have a shorter encoding than the single
 109 instruction before.
 110 .PP
 111 This document describes EM at three different levels:
 112 the abstract level, the assembly language level and
 113 the machine language level.
 114 .QQ
 115 The most important level is that of the abstract EM architecture.
 116 This level deals with the basic design issues.
 117 Only the functional capabilities of instructions are relevant, not their
 118 format or encoding.
 119 Most chapters of this document refer to the abstract level
 120 and it is explicitly stated whenever
 121 another level is described.
 122 .QQ
 123 The assembly language is intended for the compiler writer.
 124 It presents a more or less orthogonal instruction
 125 set and provides symbolic names for data.
 126 Moreover, it facilitates the linking of
 127 separately compiled 'modules' into a single program
 128 by providing several pseudoinstructions.
 129 .QQ
 130 The machine language is designed for interpretation with a compact
 131 program text and easy decoding.
 132 The binary representation of the machine language instruction set is
 133 far from orthogonal.
 134 Frequent instructions have a short opcode.
 135 The encoding is fully byte oriented.
 136 These bytes do not contain small bit fields, because
 137 bit fields would slow down decoding considerably.
 138 .PP
 139 A common use for EM is for producing portable (cross) compilers.
 140 When used this way, the compilers produce
 141 EM assembly language as their output.
 142 To run the compiled program on the target machine,
 143 the back end, translates the EM assembly language to
 144 the target machine's assembly language.
 145 When this approach is used, the format of the EM
 146 machine language instructions is irrelevant.
 147 On the other hand, when writing an interpreter for EM machine language
 148 programs, the interpreter must deal with the machine language
 149 and not with the symbolic assembly language.
 150 .PP
 151 As mentioned above, the
 152 current microcomputer technology offers 8, 16 and 32 bit
 153 machines with address spaces ranging from
 154 .Ex 2 16
 155 to
 156 .Ex 2 32
 157 bytes.
 158 Having one size of pointers and integers restricts
 159 the usefulness of the language.
 160 We decided to have a different language for each combination of
 161 word and pointer size.
 162 All languages offer the same instruction set and differ only in
 163 memory alignment restrictions and the implicit size assumed in
 164 several instructions.
 165 The languages
 166 differ slightly for the
 167 different size combinations.
 168 For example: the
 169 size of any object on the stack and alignment restrictions.
 170 The wordsize is restricted to powers of 2 and
 171 the pointer size must be a multiple of the wordsize.
 172 Almost all programs handling EM will be parametrized with word
 173 and pointer size.