doc/ego/cs/cs3

   1 .NH 2
   2 Implementation
   3 .PP
   4 .NH 3
   5 The value number method
   6 .PP
   7 To determine whether two expressions have the same result,
   8 there must be some way to determine whether their operands have
   9 the same values.
  10 We use a system of \fIvalue numbers\fP
  11 .[
  12 kennedy data flow analysis
  13 .]
  14 in which each distinct value of whatever type,
  15 created or used within the working window,
  16 receives a unique identifying number, its value number.
  17 Two items have the same value number if and only if,
  18 based only upon information from the instructions in the window,
  19 their values are provably identical.
  20 For example, after processing the statement
  21 .DS
  22 a := 4;
  23 .DE
  24 the variable a and the constant 4 have the same value number.
  25 .PP
  26 The value number of the result of an expression depends only
  27 on the kind of operator and the value number(s) of the operand(s).
  28 The expressions need not be textually equal, as shown in Fig. 7.5.
  29 .DS
  30 .TS
  31 l l.
  32 a := c; (1)
  33 use(a * b);     (2)
  34 d := b; (3)
  35 use(c * d);     (4)
  36 .TE
  37
  38 Fig. 7.5 Different expressions with the same value number
  39 .DE
  40 At line (1) a receives the same value number as c.
  41 At line (2) d receives the same value number as b.
  42 At line (4) the expression "c * d" receives the same value number
  43 as the expression "a * b" at line (2),
  44 because the value numbers of their left and right operands are the same,
  45 and the operator (*) is the same.
  46 .PP
  47 As another example of the value number method, consider Fig. 7.6.
  48 .DS
  49 .TS
  50 l l.
  51 use(a * b);     (1)
  52 a := 123;       (2)
  53 use(a * b);     (3)
  54 .TE
  55
  56 Fig. 7.6 Identical expressions with the different value numbers
  57 .DE
  58 Although textually the expressions "a * b" in line 1 and line 3 are equal,
  59 a will have different value numbers at line 3 and line 1.
  60 The two expressions will not mistakenly be recognized as equivalent.
  61 .NH 3
  62 Entities
  63 .PP
  64 The Value Number Method distinguishes between operators and operands.
  65 The value numbers of operands are stored in a table,
  66 called the \fIsymbol table\fR.
  67 The value number of a subexpression depends on the
  68 (root) operator of the expression and on the value numbers
  69 of its operands.
  70 A table of "available expressions" is used to do this mapping.
  71 .PP
  72 CS recognizes the following kinds of EM operands, called \fIentities\fR:
  73 .DS
  74 - constant
  75 - local variable
  76 - external variable
  77 - indirectly accessed entity
  78 - offsetted entity
  79 - address of local variable
  80 - address of external variable
  81 - address of offsetted entity
  82 - address of local base
  83 - address of argument base
  84 - array element
  85 - procedure identifier
  86 - floating zero
  87 - local base
  88 - heap pointer
  89 - ignore mask
  90 .DE
  91 .LP
  92 Whenever a new entity is encountered in the working window,
  93 it is entered in the symbol table and given a brand new value number.
  94 Most entities have attributes (e.g. the offset in
  95 the current stackframe for local variables),
  96 which are also stored in the symbol table.
  97 .PP
  98 An entity is called static if its value cannot be changed
  99 (e.g. a constant or an address).
 100 .NH 3
 101 Parsing expressions
 102 .PP
 103 Common subexpressions are recognized by simulating the behaviour
 104 of the EM machine.
 105 The EM code is parsed from left to right;
 106 as EM is postfix code, this is a bottom up parse.
 107 At any point the current state of the EM runtime stack is
 108 reflected by a simulated "fake stack",
 109 containing descriptions of the parsed operands and expressions.
 110 A descriptor consists of:
 111 .DS
 112 (1) the value number of the operand or expression
 113 (2) the size of the operand or expression
 114 (3) a pointer to the first line of EM-code
 115     that constitutes the operand or expression
 116 .DE
 117 Note that operands may consist of several EM instructions.
 118 Whenever an operator is encountered, the
 119 descriptors of its operands are on top of the fake stack.
 120 The operator and the value numbers of the operands
 121 are used as indices in the table of available expressions,
 122 to determine the value number of the expression.
 123 .PP
 124 During the parsing process,
 125 we keep track of the first line of each expression;
 126 we need this information when we decide to eliminate the expression.
 127 .NH 3
 128 Updating entities
 129 .PP
 130 An entity is assigned a value number when it is
 131 used for the first time
 132 in the working window.
 133 If the entity is used as left hand side of an assignment,
 134 it gets the value number of the right hand side.
 135 Sometimes the effects of an instruction on an entity cannot
 136 be determined exactly;
 137 the current value and value number of the entity may become
 138 inconsistent.
 139 Hence the current value number must be forgotten.
 140 This is achieved by giving the entity a new value number
 141 that was not used before.
 142 The entity is said to be \fIkilled\fR.
 143 .PP
 144 As information is lost when an entity is killed,
 145 CS tries to save as many entities as possible.
 146 In case of an indirect assignment through a pointer,
 147 some analysis is done to see which variables cannot be altered.
 148 For a procedure call, the interprocedural information contained
 149 in the procedure table is used to restrict the set of entities that may
 150 be changed by the call.
 151 Local variables for which the front end generated
 152 a register message can never be changed by an indirect assignment
 153 or a procedure call.
 154 .NH 3
 155 Changing the EM text
 156 .PP
 157 When a new expression comes available,
 158 it is checked whether its result is saved in a local
 159 that may go in a register.
 160 The last line of the expression must be followed
 161 by a STL or SDL instruction
 162 (depending on the size of the result)
 163 and a register message must be present for
 164 this local.
 165 If there is such a local,
 166 it is recorded in the available expressions table.
 167 Each time a new occurrence of this expression
 168 is found,
 169 the value number of the local is compared against
 170 the value number of the result.
 171 If they are different the local cannot be used and is forgotten.
 172 .PP
 173 The available expressions are linked in a list.
 174 New expressions are linked at the head of the list.
 175 In this way expressions that are contained within other
 176 expressions appear later in the list,
 177 because EM-expressions are postfix.
 178 The elimination process walks through the list,
 179 starting at the head, to find the largest expressions first.
 180 If an expression is eliminated,
 181 any expression later on in the list, contained in the former expression,
 182 is removed from the list,
 183 as expressions can only be eliminated once.
 184 .PP
 185 A STL or SDL is emitted after the first occurrence of the expression,
 186 unless there was an existing local variable that could hold the result.
 187 .NH 3
 188 Desirability analysis
 189 .PP
 190 Although the global optimizer works on EM code,
 191 the goal is to improve the quality of the object code.
 192 Therefore some machine-dependent information is needed
 193 to decide whether it is desirable to
 194 eliminate a given expression.
 195 Because it is impossible for the CS phase to know
 196 exactly what code will be generated,
 197 some heuristics are used.
 198 CS essentially looks for some special cases
 199 that should not be eliminated.
 200 These special cases can be turned on or off for a given machine,
 201 as indicated in a machine descriptor file.
 202 .PP
 203 Some operators can sometimes be translated
 204 into an addressing mode for the machine at hand.
 205 Such an operator is only eliminated
 206 if its operand is itself expensive,
 207 i.e. it is not just a simple load.
 208 The machine descriptor file contains a set of such operators.
 209 .PP
 210 Eliminating the loading of the Local Base or
 211 the Argument Base by the LXL resp. LXA instruction
 212 is only beneficial if the difference in lexical levels
 213 exceeds a certain threshold.
 214 The machine descriptor file contains this threshold.
 215 .PP
 216 Replacing a SAR or a LAR by an AAR followed by a LOI
 217 may possibly increase the size of the object code.
 218 We assume that this is only possible when the
 219 size of the array element is greater than some limit.
 220 .PP
 221 There are back ends that can very efficiently translate
 222 the index computing instruction sequence LOC SLI ADS.
 223 If this is the case,
 224 the SLI instruction between a LOC
 225 and an ADS is not eliminated.
 226 .PP
 227 To handle unforseen cases, the descriptor file may also contain
 228 a set of operators that should never be eliminated.
 229 .NH 3
 230 The algorithm
 231 .PP
 232 After these preparatory explanations,
 233 the algorithm itself is easy to understand.
 234 For each instruction within the current window,
 235 the following steps are performed in the given order :
 236 .IP 1.
 237 Check if this instruction defines an entity.
 238 If so, the set of entities is updated accordingly.
 239 .IP 2.
 240 Kill all entities that might be affected by this instruction.
 241 .IP 3.
 242 Simulate the instruction on the fake-stack.
 243 If this instruction is an operator,
 244 update the list of available expressions accordingly.
 245 .PP
 246 The result of this process is
 247 a list of available expressions plus the information
 248 needed to eliminate them.
 249 Expressions that are desirable to eliminate are eliminated.
 250 Next, the window is shifted and the process is repeated.