doc/ego/sr/sr3

   1 .NH 2
   2 Implementation
   3 .PP
   4 Like most phases, SR deals with one procedure
   5 at a time.
   6 Within a procedure, SR works on one loop at a time.
   7 Loops are processed in textual order.
   8 If loops are nested inside each other,
   9 SR starts with the outermost loop and proceeds in the
  10 inwards direction.
  11 This order is chosen,
  12 because it enables the optimization
  13 of multi-dimensional array address computations,
  14 if the elements are accessed in the usual way
  15 (i.e. row after row, rather than column after column).
  16 For every loop, SR first detects all induction variables
  17 and then tries to recognize
  18 expressions that can be optimized.
  19 .NH 3
  20 Finding induction variables
  21 .PP
  22 The process of finding induction variables
  23 can conveniently be split up
  24 into two parts.
  25 First, the EM text of the loop is scanned to find
  26 all \fIcandidate\fR induction variables,
  27 which are word-sized local variables
  28 that are assigned precisely once
  29 in the loop, within a firm block.
  30 Second, for every candidate, the single assignment
  31 is inspected, to see if it has the form
  32 required by the definition of an induction variable.
  33 .PP
  34 Candidates are found by scanning the EM code of the loop.
  35 During this scan, two sets are maintained.
  36 The set "cand" contains all variables that were
  37 assigned exactly once so far, within a firm block.
  38 The set "dismiss" contains all variables that
  39 should not be made a candidate.
  40 Initially, both sets are empty.
  41 If a variable is assigned to, it is put
  42 in the cand set, if three conditions are met:
  43 .IP 1.
  44 the variable was not in cand or dismiss already
  45 .IP 2.
  46 the assignment takes place in a firm block
  47 .IP 3.
  48 the assignment is not a ZRL instruction (assignment by zero)
  49 or a SDL instruction (store double local).
  50 .LP
  51 If any condition fails, the variable is dismissed from cand
  52 (if it was there already) and put in dismiss
  53 (if it was not there already).
  54 .sp 0
  55 All variables for which no register message was generated (i.e. those
  56 variables that may be accessed indirectly) are assumed
  57 to be changed in the loop.
  58 .sp 0
  59 All variables that remain in cand are candidate induction variables.
  60 .PP
  61 From the set of candidates, the induction variables can
  62 be determined, by inspecting the single assignment.
  63 The assignment must match one of the EM patterns below.
  64 ('x' is the candidate. 'ws' is the word size of the target machine.
  65 'n' is any number.)
  66 .DS
  67 .TS
  68 l l.
  69 \fIpattern\fR   \fIstep size\fR
  70 INL x  |        +1
  71 DEL x  |        -1
  72 LOL x ; (INC | DEC) ; STL x  |  +1 | -1
  73 LOL x ; LOC n ; (ADI ws | SBI ws) ; STL x  |    +n | -n
  74 LOC n ; LOL x ; ADI ws ; STL x  +n
  75 .TE
  76 .DE
  77 From the patterns the step size of the induction variable
  78 can also be determined.
  79 These step sizes are displayed on the right hand side.
  80 .sp
  81 For every induction variable we maintain the following information:
  82 .IP -
  83 the offset of the variable in the stackframe of its procedure
  84 .IP -
  85 a pointer to the EM text of the assignment statement
  86 .IP -
  87 the step value
  88 .LP
  89 .NH 3
  90 Optimizing expressions
  91 .PP
  92 If any induction variables of the loop were found,
  93 the EM text of the loop is scanned again,
  94 to detect expressions that can be optimized.
  95 SR scans for multiplication and array instructions.
  96 Whenever it finds such an instruction, it analyses the
  97 code in front of it.
  98 If an expression is to be optimized, it must
  99 be generated by the following syntax rules.
 100 .DS
 101 .TS
 102 l l.
 103 optimizable_expr:
 104         iv_expr const mult |
 105         const iv_expr mult |
 106         address iv_expr address array_instr;
 107 mult:
 108         MLI ws |
 109         MLU ws ;
 110 array_instr:
 111         LAR ws |
 112         SAR ws |
 113         AAR ws ;
 114 const:
 115         LOC n ;
 116 .TE
 117 .DE
 118 An 'address' is an EM instruction that loads an
 119 address on the stack.
 120 An instruction like LOL may be an 'address', if
 121 the size of an address (pointer size, =ps) is
 122 the same as the word size.
 123 If the pointer size is twice the word size,
 124 instructions like LDL are an 'address'.
 125 (The addresses in the third grammar rule
 126 denote resp. the array address and the
 127 array descriptor address).
 128 .DS
 129 .TS
 130 l l.
 131 address:
 132         LAE |
 133         LAL |
 134         LOL if ps=ws |
 135         LOE    ,,    |
 136         LIL    ,,    |
 137         LDL if ps=2*ws |
 138         LDE    ,,      ;
 139 .TE
 140 .DE
 141 The notion of an iv-expression was introduced earlier.
 142 .DS
 143 .TS
 144 l l.
 145 iv_expr:
 146         iv_expr unair_op |
 147         iv_expr iv_expr binary_op |
 148         loopconst |
 149         iv ;
 150 unair_op:
 151         NGI ws |
 152         INC |
 153         DEC ;
 154 binary_op:
 155         ADI ws |
 156         ADU ws |
 157         SBI ws |
 158         SBU ws ;
 159 loopconst:
 160         const |
 161         LOL x  if x is not changed in loop ;
 162 iv:
 163         LOL x  if x is an induction variable ;
 164 .TE
 165 .DE
 166 An iv_expression must satisfy one additional constraint:
 167 it must use exactly one operand that is an induction
 168 variable.
 169 A simple, hand written, top-down parser is used
 170 to recognize an iv-expression.
 171 It scans the EM code from right to left
 172 (recall that EM is essentially postfix).
 173 It uses semantic attributes (inherited as well as
 174 derived) to check the additional constraint.
 175 .PP
 176 All information assembled during the recognition
 177 process is put in a 'code_info' structure.
 178 This structure contains the following information:
 179 .IP -
 180 the optimizable code itself
 181 .IP -
 182 the loop and basic block the code is part of
 183 .IP -
 184 the induction variable
 185 .IP -
 186 the iv-expression
 187 .IP -
 188 the sign of the induction variable in the
 189 iv-expression
 190 .IP -
 191 the offset and size of the temporary local variable
 192 .IP -
 193 the expensive operator (MLI, LAR etc.)
 194 .IP -
 195 the instruction that loads the constant
 196 (for multiplication) or the array descriptor
 197 (for arrays).
 198 .LP
 199 The entire transformation process is driven
 200 by this information.
 201 As the EM text is represented internally
 202 as a list, this process consists
 203 mainly of straightforward list manipulations.
 204 .sp 0
 205 The initialization code must be put
 206 immediately before the loop entry.
 207 For this purpose a \fIheader block\fR is
 208 created that has the loop entry block as
 209 its only successor and that dominates the
 210 entry block.
 211 The CFG and all relations (SUCC,PRED, IDOM, LOOPS etc.)
 212 are updated.
 213 .sp 0
 214 An EM instruction that will
 215 replace the optimizable code
 216 is created and put at the place of the old code.
 217 The list representing the old optimizable code
 218 is used to create a list for the initializing code,
 219 as they are similar.
 220 Only two modifications are required:
 221 .IP -
 222 if the expensive operator is a LAR or SAR,
 223 it must be replaced by an AAR, as the initial value
 224 of TMP is the \fIaddress\fR of the first
 225 array element that is accessed.
 226 .IP -
 227 code must be appended to store the result of the
 228 expression in TMP.
 229 .LP
 230 Finally, code to increment TMP is created and put after
 231 the code of the single assignment to the
 232 induction variable.
 233 The generated code uses either an integer addition
 234 (ADI) or an integer-to-pointer addition (ADS)
 235 to do the increment.
 236 .PP
 237 SR maintains a set of all expressions that have already
 238 been recognized in the present loop.
 239 Such expressions are said to be \fIavailable\fR.
 240 If an expression is recognized that is
 241 already available,
 242 no new temporary local variable is allocated for it,
 243 and the code to initialize and increment the local
 244 is not generated.