doc/ego/cs/cs4

   1 .NH 2
   2 Implementation.
   3 .PP
   4 In this section we will discuss the implementation of the CS phase.
   5 We will first describe the basic actions that are undertaken
   6 by the algorithm, than the algorithm itself.
   7 .NH 3
   8 Partioning the EM instructions
   9 .PP
  10 There are over 100 EM instructions.
  11 For our purpose we partition this huge set into groups of
  12 instructions which can be more or less conveniently handled together.
  13 .PP
  14 There are groups for all sorts of load instructions:
  15 simple loads, expensive loads, loads of an array element.
  16 A load is considered \fIexpensive\fP when more than one EM instructions
  17 are involved in loading it.
  18 The load of a lexical entity is also considered expensive.
  19 For instance: LOF is expensive, LAL is not.
  20 LAR forms a group on its own,
  21 because it is not only an expensive load,
  22 but also implicitly includes the ternary operator AAR,
  23 which computes the address of the array element.
  24 .PP
  25 There are groups for all sorts of operators:
  26 unary, binary, and ternary.
  27 The groups of operators are further partitioned according to the size
  28 of their operand(s) and result.
  29 .\" .PP
  30 .\" The distinction between operators and expensive loads is not always clear.
  31 .\" The ADP instruction for example,
  32 .\" might seem a unary operator because it pops one item
  33 .\" (a pointer) from the stack.
  34 .\" However, two ADP-instructions which pop an item with the same value number
  35 .\" need not have the same result,
  36 .\" because the attributes (an offset, to be added to the pointer)
  37 .\" can be different.
  38 .\" Is it then a binary operator?
  39 .\" That would give rise to the strange, and undesirable,
  40 .\" situation that some binary operators pop two operands
  41 .\" and others pop one.
  42 .\" The conclusion is inevitable:
  43 .\" we have been fooled by the name (ADd Pointer).
  44 .\" The ADP-instruction is an expensive load.
  45 .\" In this context LAF, meaning Load Address of oFfsetted,
  46 .\" would have been a better name,
  47 .\" corresponding to LOF, like LAL,
  48 .\" Load Address of Local, corresponds to LOL.
  49 .PP
  50 There are groups for all sorts of stores:
  51 direct, indirect, array element.
  52 The SAR forms a group on its own for the same reason
  53 as appeared with LAR.
  54 .PP
  55 The effect of the remaining instructions is less clear.
  56 They do not help very much in parsing expressions or
  57 in constructing our pseudo symboltable.
  58 They are partitioned according to the following criteria:
  59 .RS
  60 .IP "-"
  61 They change the value of an entity without using the stack
  62 (e.g. ZRL, DEE).
  63 .IP "-"
  64 They are subroutine calls (CAI, CAL).
  65 .IP "-"
  66 They change the stack in some irreproduceable way (e.g. ASP, LFR, DUP).
  67 .IP "-"
  68 They have no effect whatever on the stack or on the entities.
  69 This does not mean they can be deleted,
  70 but they can be ignored for the moment
  71 (e.g. MES, LIN, NOP).
  72 .IP "-"
  73 Their effect is too complicate too compute,
  74 so we just assume worst case behaviour.
  75 Hopefully, they do not occur very often.
  76 (e.g. MON, STR, BLM).
  77 .IP "-"
  78 They signal the end of the basic block (e.g. BLT, RET, TRP).
  79 .RE
  80 .NH 3
  81 Parsing expressions
  82 .PP
  83 To recognize expressions,
  84 we simulate the behaviour of the EM machine,
  85 by means of a fake-stack.
  86 When we scan the instructions in sequential order,
  87 we first encounter the instructions that load
  88 the operands on the stack,
  89 and then the instruction that indicates the operator,
  90 because EM expressions are postfix.
  91 When we find an instruction to load an operand,
  92 we load on the fake-stack a struct with the following information:
  93 .DS
  94 .TS
  95 l l.
  96 (1)     the value number of the operand
  97 (2)     the size of the operand
  98 (3)     a pointer to the first line of EM-code
  99         that constitutes the operand
 100 .TE
 101 .DE
 102 In most cases, (3) will point to the line
 103 that loaded the operand (e.g. LOL, LOC),
 104 i.e. there is only one line that refers to this operand,
 105 but sometimes some information must be popped
 106 to load the operand (e.g. LOI, LAR).
 107 This information must have been pushed before,
 108 so we also pop a pointer to the first line that pushed
 109 the information.
 110 This line is now the first line that defines the operand.
 111 .PP
 112 When we find the operator instruction,
 113 we pop its operand(s) from the fake-stack.
 114 The first line that defines the first operand is
 115 now the first line of the expression.
 116 We now have all information to determine
 117 whether the just parsed expression has occurred before.
 118 We also know the first and last line of the expression;
 119 we need this when we decide to eliminate it.
 120 Associated with each available expression is a set of
 121 which the elements contains the first and last line of
 122 a recurrence of this expression.
 123 .PP
 124 Not only will the operand(s) be popped from the fake-stack,
 125 but the following will be pushed:
 126 .DS
 127 .TS
 128 l l.
 129 (1)     the value number of the result
 130 (2)     the size of the result
 131 (3)     a pointer to the first line of the expression
 132 .TE
 133 .DE
 134 In this way an item on the fake-stack always contains
 135 the necessary information.
 136 EM expressions are parsed bottum up.
 137 .NH 3
 138 Updating entities
 139 .PP
 140 As said before,
 141 we build our private "symboltable",
 142 while scanning the EM-instructions.
 143 The behaviour of the EM-machine is not only reflected
 144 in the fake-stack,
 145 but also in the entities.
 146 When an entity is created,
 147 we do not yet know its value,
 148 so we assign a brand new value number to it.
 149 Each time a store-instruction is encountered,
 150 we change the value number of the target entity of this store
 151 to the value number of the token that was popped
 152 from the fake-stack.
 153 Because entities may overlap,
 154 we must also "forget" the value numbers of entities
 155 that might be affected by this store.
 156 Each such entity will be \fIkilled\fP,
 157 i.e. assigned a brand new valuenumber.
 158 .PP
 159 Because we lose information when we forget
 160 the value number of an entity,
 161 we try to save as much entities as possible.
 162 When we store into an external,
 163 we don't have to kill locals and vice versa.
 164 Furthermore, we can see whether two locals or
 165 two externals overlap,
 166 because we know the offset from the local base,
 167 resp. the offset within the data block,
 168 and the size.
 169 The situation becomes more complicated when we have
 170 to consider indirection.
 171 The worst case is that we store through an unknown pointer.
 172 In that case we kill all entities except those locals
 173 for which a so-called \fIregister message\fP has been generated;
 174 this register message indicates that this local can never be
 175 accessed indirectly.
 176 If we know this pointer we can be more careful.
 177 If it points to a local then the entity that is accessed through
 178 this pointer can never overlap with an external.
 179 If it points to an external this entity can never overlap with a local.
 180 Furthermore, in the latter case,
 181 we can find the data block this entity belongs to.
 182 Since pointer arithmetic is only defined within a data block,
 183 this entity can never overlap with entities that are known to
 184 belong to another data block.
 185 .PP
 186 Not only after a store-instruction but also after a
 187 subroutine-call it may be necessary to kill entities;
 188 the subroutine may affect global variables or store
 189 through a pointer.
 190 If a subroutine is called that is not available as EM-text,
 191 we assume worst case behaviour,
 192 i.e. we kill all entities without register message.
 193 .NH 3
 194 Additions and replacements.
 195 .PP
 196 When a new expression comes available,
 197 we check whether the result is saved in a local
 198 that may go in a register.
 199 The last line of the expression must be followed
 200 by a STL or SDL instruction,
 201 depending on the size of the result
 202 (resp. WS and 2*WS),
 203 and a register message must be present for
 204 this local.
 205 If we have found such a local,
 206 we store a pointer to it with the available expression.
 207 Each time a new occurrence of this expression
 208 is found,
 209 we compare the value number of the local against
 210 the value number of the result.
 211 When they are different we remove the pointer to it,
 212 because we cannot use it.
 213 .PP
 214 The available expressions are singly linked in a list.
 215 When a new expression comes available,
 216 we link it at the head of the list.
 217 In this way expressions that are contained within other
 218 expressions appear later in the list,
 219 because EM-expressions are postfix.
 220 When we are going to eliminate expressions,
 221 we walk through the list,
 222 starting at the head, to find the largest expressions first.
 223 When we decide to eliminate an expression,
 224 we look at the expressions in the tail of the list,
 225 starting from where we are now,
 226 to delete expressions that are contained within
 227 the chosen one because
 228 we cannot eliminate an expression more than once.
 229 .PP
 230 When we are going to eliminate expressions,
 231 and we do not have a local that holds the result,
 232 we emit a STL or SDL after the line where the expression
 233 was first found.
 234 The other occurrences are simply removed,
 235 unless they contain instructions that not only have
 236 effect on the stack; e.g. messages, stores, calls.
 237 Before each instruction that needs the result on the stack,
 238 we emit a LOL or LDL.
 239 When the expression was an AAR,
 240 but the instruction was a LAR or a SAR,
 241 we append a LOI resp. a STI of the number of bytes
 242 in an array-element after each LOL/LDL.
 243 .NH 3
 244 Desirability analysis
 245 .PP
 246 Although the global optimizer works on EM code,
 247 the goal is to improve the quality of the object code.
 248 Therefore we need some machine dependent information
 249 to decide whether it is desirable to
 250 eliminate a given expression.
 251 Because it is impossible for the CS phase to know
 252 exactly what code will be generated,
 253 we use some heuristics.
 254 In most cases it will save time when we eliminate an
 255 operator, so we just do it.
 256 We only look for some special cases.
 257 .PP
 258 Some operators can in some cases be translated
 259 into an addressing mode for the machine at hand.
 260 We only eliminate such an operator,
 261 when its operand is itself "expensive",
 262 i.e. not just a simple load.
 263 The user of the CS phase has to supply
 264 a set of such operators.
 265 .PP
 266 Eliminating the loading of the Local Base or
 267 the Argument Base by the LXL resp. LXA instruction
 268 is only beneficial when the number of lexical levels
 269 we have to go back exceeds a certain threshold.
 270 This threshold will be different when registers
 271 are saved by the back end.
 272 The user must supply this threshold.
 273 .PP
 274 Replacing a SAR or a LAR by an AAR followed by a LOI
 275 may possibly increase the size of the object code.
 276 We assume that this is only possible when the
 277 size of the array element is greater than some
 278 (user-supplied) limit.
 279 .PP
 280 There are back ends that can very efficiently translate
 281 the index computing instruction sequence LOC SLI ADS.
 282 If this is the case,
 283 we do not eliminate the SLI instruction between a LOC
 284 and an ADS.
 285 .PP
 286 To handle unforeseen cases, the user may also supply
 287 a set of operators that should never be eliminated.
 288 .NH 3
 289 The algorithm
 290 .PP
 291 After these preparatory explanations,
 292 we can be short about the algorithm itself.
 293 For each instruction within our window,
 294 the following steps are performed in the order given:
 295 .IP 1.
 296 We check if this instructin defines an entity.
 297 If this is the case the set of entities is updated accordingly.
 298 .IP 2.
 299 We kill all entities that might be affected by this instruction.
 300 .IP 3.
 301 The instruction is simulated on the fake-stack.
 302 Copy propagation is done.
 303 If this instruction is an operator,
 304 we update the list of available expressions accordingly.
 305 .PP
 306 When we have processed all instructions this way,
 307 we have built a list of available expressions plus the information we
 308 need to eliminate them.
 309 Those expressions of which desirability analysis tells us so,
 310 we eliminate.
 311 The we shift our window and continue.