doc/ego/ra/ra3

   1 .NH 2
   2 The register allocation phase
   3 .PP
   4 .NH 3
   5 Overview
   6 .PP
   7 The RA phase deals with one procedure at a time.
   8 For every procedure, it first determines which entities
   9 may be put in a register. Such an entity
  10 is called an \fIitem\fR.
  11 For every item it decides during which parts of the procedure it
  12 might be assigned a register.
  13 Such a region is called a \fItimespan\fR.
  14 For any item, several (possibly overlapping) timespans may
  15 be considered.
  16 A pair (item,timespan) is called an \fIallocation\fR.
  17 If the items of two allocations are both live at some
  18 point of time in the intersections of their timespans,
  19 these allocations are said to be \fIrivals\fR of each other,
  20 as they cannot be assigned the same register.
  21 The rivals-set of every allocation is computed.
  22 Next, the gains of assigning a register to an allocation are estimated,
  23 for every allocation.
  24 With all this information, decisions are made which allocations
  25 to store in which registers (\fIpacking\fR).
  26 Finally, the EM text is transformed to reflect these decisions.
  27 .NH 3
  28 The item recognition subphase
  29 .PP
  30 RA tries to put the following entities in a register:
  31 .IP -
  32 a local variable for which a register message was found
  33 .IP -
  34 the address of a local variable for which no
  35 register message was found
  36 .IP -
  37 the address of a global variable
  38 .IP -
  39 the address of a procedure
  40 .IP -
  41 a numeric constant.
  42 .LP
  43 Only the \fIaddress\fR of a global variable
  44 may be put in a register, not the variable itself.
  45 This approach avoids the very complex problems that would be
  46 caused by procedure calls and indirect pointer references (see
  47 .[~[
  48 aho design compiler
  49 .] sections 14.7 and 14.8]
  50 and
  51 .[~[
  52 spillman side-effects
  53 .]]).
  54 Still, on most machines accessing a global variable using indirect
  55 addressing through a register is much cheaper than
  56 accessing it via its address.
  57 Similarly, if the address of a procedure is put in a register, the
  58 procedure can be called via an indirect call.
  59 .PP
  60 With every item we associate a register type.
  61 This type is
  62 .DS
  63 for local variables: the type contained in the register message
  64 for addresses of variables and procedures: the pointer type
  65 for constants: the general type
  66 .DE
  67 An entity other than a local variable is not taken to be an item
  68 if it is used only once within the current procedure.
  69 .PP
  70 An item is said to be \fIlive\fR at some point of the program text
  71 if its value may be used before it is changed.
  72 As addresses and constants are never changed, all items but local
  73 variables are always live.
  74 The region of text during which a local variable is live is
  75 determined via the live/dead messages generated by the
  76 Live Variable analysis phase of the Global Optimizer.
  77 .NH 3
  78 The allocation determination subphase
  79 .PP
  80 If a procedure has more items than registers,
  81 it may be advantageous to put an item in a register
  82 only during those parts of the procedure where it is most
  83 heavily used.
  84 Such a part will be called a timespan.
  85 With every item we may associate a set of timespans.
  86 If two timespans of an item overlap,
  87 at most one of them may be granted a register,
  88 as there is no use in putting the same item in two
  89 registers simultaneously.
  90 If two timespans of an item are distinct,
  91 both may be chosen;
  92 the item will possibly be put in two
  93 different registers during different parts of the procedure.
  94 The timespan may also consist
  95 of the whole procedure.
  96 .PP
  97 A list of (item,timespan) pairs (allocations)
  98 is build, which will be the input to the decision making
  99 subphase of RA (packing subphase).
 100 This allocation list is the main data structure of RA.
 101 The description of the remainder of RA will be in terms
 102 of allocations rather than items.
 103 The phrase "to assign a register to an allocation" means "to assign
 104 a register to the item of the allocation for the duration of
 105 the timespan of the allocation".
 106 Subsequent subphases will add more information
 107 to this list.
 108 .PP
 109 Several factors must be taken into account when a
 110 timespan for an item is constructed:
 111 .IP 1.
 112 At any \fIentry point\fR of the timespan where the
 113 item is live,
 114 the register must be initialized with the item
 115 .IP 2.
 116 At any exit point of the timespan where the item is live,
 117 the item must be updated.
 118 .LP
 119 In order to decrease these costs, we will only consider timespans with
 120 one entry point
 121 and no live exit points.
 122 .NH 3
 123 The rivals computation subphase
 124 .PP
 125 As stated before, several different items may be put in the
 126 same register, provided they are not live simultaneously.
 127 For every allocation we determine the intersection
 128 of its timespan and the lifetime of its item (i.e. the part of the
 129 procedure during which the item is live).
 130 The allocation is said to be busy during this intersection.
 131 If two allocations are ever busy simultaneously they are
 132 said to be rivals of each other.
 133 The rivals information is added to the allocation list.
 134 .NH 3
 135 The profits computation subphase
 136 .PP
 137 To make good decisions, the packing subphase needs to
 138 know which allocations can be assigned the same register
 139 (rivals information) and how much is gained by
 140 granting an allocation a register.
 141 .PP
 142 Besides the gains of using a register instead of an
 143 item,
 144 two kinds of overhead costs must be
 145 taken into account:
 146 .IP -
 147 the register must be initialized with the item
 148 .IP -
 149 the register must be saved at procedure entry
 150 and restored at procedure exit.
 151 .LP
 152 The latter costs should not be due to a single
 153 allocation, as several allocations can be assigned the same register.
 154 These costs are dealt with after packing has been done.
 155 They do not influence the decisions of the packing algorithm,
 156 they may only undo them.
 157 .PP
 158 The actual profits consist of improvements
 159 of execution time and code size.
 160 As the former is far more difficult to estimate , we will
 161 discuss code size improvements first.
 162 .PP
 163 The gains of putting a certain item in a register
 164 depends on how the item is used.
 165 Suppose the item is
 166 a pointer variable.
 167 On machines that do not have a
 168 double-indirect addressing mode,
 169 two instructions are needed to dereference the variable
 170 if it is not in a register, but only one if it is put in a register.
 171 If the variable is not dereferenced, but simply copied, one instruction
 172 may be sufficient in both cases.
 173 So  the gains of putting a pointer variable in a register are higher
 174 if the variable is dereferenced often.
 175 .PP
 176 To make accurate estimates, detailed knowledge of
 177 the target machine and of the code generator
 178 would be needed.
 179 Therefore, a simplification has been made that substantially limits
 180 the amount of target machine information that is needed.
 181 The estimation of the number of bytes saved does
 182 not take into account how an item is used.
 183 Rather, an average number is used.
 184 So these gains are computed as follows:
 185 .DS
 186 #bytes_saved = #occurrences * gains_per_occurrence
 187 .DE
 188 The number of occurrences is derived from
 189 the EM code.
 190 Note that this is not exact either,
 191 as there is no one-to-one correspondence between occurrences in
 192 the EM code and in the assembler code.
 193 .PP
 194 The gains of one occurrence depend on:
 195 .IP 1.
 196 the type of the item
 197 .IP 2.
 198 the size of the item
 199 .IP 3.
 200 the type of the register
 201 .LP
 202 and for local variables and addresses of local variables:
 203 .IP 4.
 204 the type of the local variable
 205 .IP 5.
 206 the offset of the variable in the stackframe
 207 .LP
 208 For every allocation we try two types of registers: the register type
 209 of the item and the general register type.
 210 Only the type with the highest profits will subsequently be used.
 211 This type is added to the allocation information.
 212 .PP
 213 To compute the gains, RA uses a machine-dependent table
 214 that is read from a machine descriptor file.
 215 By means of this table the number of bytes saved can be computed
 216 as a function of the five properties.
 217 .PP
 218 The costs of initializing a register with an item
 219 is determined in a similar way.
 220 The cost of one initialization is also
 221 obtained from the descriptor file.
 222 Note that there can be at most one initialization for any
 223 allocation.
 224 .PP
 225 To summarize, the number of bytes a certain allocation would
 226 save is computed as follows:
 227 .DS
 228 .TS
 229 l l.
 230 net_bytes_saved =       bytes_saved - init_cost
 231 bytes_saved =   #occurrences * gains_per_occ
 232 init_cost =     #initializations * costs_per_init
 233 .TE
 234 .DE
 235 .PP
 236 It is inherently more difficult to estimate the execution
 237 time saved by putting an item in a register,
 238 because it is impossible to predict how
 239 many times an item will be used dynamically.
 240 If an occurrence is part of a loop,
 241 it may be executed many times.
 242 If it is part of a conditional statement,
 243 it may never be executed at all.
 244 In the latter case, the speed of the program may even get
 245 worse if an initialization is needed.
 246 As a clear example, consider the piece of "C" code in Fig. 13.1.
 247 .DS
 248 switch(expr) {
 249       case 1:  p(); break;
 250       case 2:  p(); p(); break;
 251       case 3:  p(); break;
 252       default: break;
 253 }
 254
 255 Fig. 13.1 A "C" switch statement
 256 .DE
 257 Lots of bytes may be saved by putting the address of procedure p
 258 in a register, as p is called four times (statically).
 259 Dynamically, p will be called zero, one or two times,
 260 depending on the value of the expression.
 261 .PP
 262 The optimizer uses the following strategy for optimizing
 263 execution time:
 264 .IP 1.
 265 try to put items in registers during \fIloops\fR first
 266 .IP 2.
 267 always keep the initializing code outside the loop
 268 .IP 3.
 269 if an item is not used in a loop, do not put it in a register if
 270 the initialization costs may be higher than the gains
 271 .LP
 272 The latter condition can be checked by determining the
 273 minimal number of usages (dynamically) of the item during the procedure,
 274 via a shortest path algorithm.
 275 In the example above, this minimal number is zero, so the address of
 276 p is not put in a register.
 277 .PP
 278 The costs of one occurrence is estimated as described above for the
 279 code size.
 280 The number of dynamic occurrences is guessed by looking at the
 281 loop nesting level of every occurrence.
 282 If the item is never used in a loop,
 283 the minimal number of occurrences is used.
 284 From these facts, the execution time improvement is assessed
 285 for every allocation.
 286 .NH 3
 287 The packing subphase
 288 .PP
 289 The packing subphase takes as input the allocation
 290 list and outputs a
 291 description of which allocations should be put
 292 in which registers.
 293 So it is essentially the decision making part of RA.
 294 .PP
 295 The packing system tries to assign a register to allocations one
 296 at a time, in some yet to be defined order.
 297 For every allocation A, it first checks if there is a register
 298 (of the right type)
 299 that is already assigned to one or more allocations,
 300 none of which are rivals of A.
 301 In this case A is assigned the same register.
 302 Else, A is assigned a new register, if one exists.
 303 A table containing the number of free registers for every type
 304 is maintained.
 305 It is initialized with the number of non-scratch registers of
 306 the target computer and updated whenever a
 307 new register is handed out.
 308 The packing algorithm stops when no more allocations can
 309 or need be assigned a register.
 310 .PP
 311 After an allocation A has been packed,
 312 all allocations with non-disjunct timespans (including
 313 A itself) are removed from the allocation list.
 314 .PP
 315 In case the number of items exceeds the number of registers, it
 316 is important to choose the most profitable allocations.
 317 Due to the possibility of having several allocations
 318 occupying the same register,
 319 this problem is quite complex.
 320 Our packing algorithm uses simple heuristic rules
 321 and avoids any combinatorial search.
 322 It has distinct rules for different costs measures.
 323 .PP
 324 If object code size is the most important factor,
 325 the algorithm is greedy and chooses allocations in
 326 decreasing order of their profits attribute.
 327 It does not take into account the fact that
 328 other allocations may be passed over because of
 329 this decision.
 330 .PP
 331 If execution time is at prime stake, the algorithm
 332 first considers allocations whose timespans consist of loops.
 333 After all these have been packed, it considers the remaining
 334 allocations.
 335 Within the two subclasses, it considers allocations
 336 with the highest profits first.
 337 When assigning a register to an allocation with a loop
 338 as timespan, the algorithm checks if the item has
 339 already been put in a register during another loop.
 340 If so, it tries to use the same register for the
 341 new allocation.
 342 After all packing has been done,
 343 it checks if the item has always been assigned the same
 344 register (although not necessarily during all loops).
 345 If so, it tries to put the item in that register during
 346 the entire procedure. This is possible
 347 if the allocation (item,whole_procedure) is not a rival
 348 of any allocation with a different item that has been
 349 assigned to the same register.
 350 Note that this approach is essentially 'bottom up',
 351 as registers are first assigned over small regions
 352 of text which are later collapsed into larger regions.
 353 The advantage of this approach is the fact that
 354 the decisions for one loop can be made independently
 355 of all other loops.
 356 .PP
 357 After the entire packing process has been completed,
 358 we compute for each register how much is gained in using
 359 this register, by simply adding the net profits
 360 of all allocations assigned to it.
 361 This total yield should outweigh the costs of
 362 saving/restoring the register at procedure entry/exit.
 363 As most modern processors (e.g. 68000, Vax) have special
 364 instructions to save/restore several registers,
 365 the differential costs of saving one extra register are by
 366 no means constant.
 367 The costs are read from the machine descriptor file and
 368 compared to the total yields of the registers.
 369 As a consequence of this analysis, some allocations
 370 may have their registers taken away.
 371 .NH 3
 372 The transformation subphase
 373 .PP
 374 The final subphase of RA transforms the EM text according to the
 375 decisions made by the packing system.
 376 It traverses the text of the currently optimized procedure and
 377 changes all occurrences of items at points where
 378 they are assigned a register.
 379 It also clears the score field of the register messages for
 380 normal local variables and emits register messages with a very
 381 high score for the pseudo locals.
 382 At points where registers have to be initialized with items,
 383 it generates EM code to do so.
 384 Finally it tries to decrease the size of the stackframe
 385 of the procedure by looking at which local variables need not
 386 be given memory locations.