doc/ego/ov/ov1

   1 .bp
   2 .NH 1
   3 Overview of the global optimizer
   4 .NH 2
   5 The ACK compilation process
   6 .PP
   7 The EM Global Optimizer is one of three optimizers that are
   8 part of the Amsterdam Compiler Kit (ACK).
   9 The phases of ACK are:
  10 .IP 1.
  11 A Front End translates a source program to EM
  12 .IP 2.
  13 The Peephole Optimizer
  14 .[
  15 tanenbaum staveren peephole toplass
  16 .]
  17 reads EM code and produces 'better' EM code.
  18 It performs a number of optimizations (mostly peephole
  19 optimizations)
  20 such as constant folding, strength reduction and unreachable code
  21 elimination.
  22 .IP 3.
  23 The Global Optimizer further improves the EM code.
  24 .IP 4.
  25 The Code Generator transforms EM to assembly code
  26 of the target computer.
  27 .IP 5.
  28 The Target Optimizer improves the assembly code.
  29 .IP 6.
  30 An Assembler/Loader generates an executable file.
  31 .LP
  32 For a more extensive overview of the ACK compilation process,
  33 we refer to.
  34 .[
  35 tanenbaum toolkit rapport
  36 .]
  37 .[
  38 tanenbaum toolkit cacm
  39 .]
  40 .PP
  41 The input of the Global Optimizer may consist of files and
  42 libraries.
  43 Every file or module in the library must contain EM code in
  44 Compact Assembly Language format.
  45 .[~[
  46 tanenbaum machine architecture
  47 .], section 11.2]
  48 The output consists of one such EM file.
  49 The input files and libraries together need not
  50 constitute an entire program,
  51 although as much of the program as possible should be supplied.
  52 The more information about the program the optimizer
  53 gets, the better its output code will be.
  54 .PP
  55 The Global Optimizer is language- and machine-independent,
  56 i.e. it can be used for all languages and machines supported by ACK.
  57 Yet, it puts some unavoidable restrictions on the EM code
  58 produced by the Front End (see below).
  59 It must have some knowledge of the target machine.
  60 This knowledge is expressed in a machine description table
  61 which is passed as argument to the optimizer.
  62 This table does not contain very detailed information about the
  63 target (such as its instruction set and addressing modes).
  64 .NH 2
  65 The EM code
  66 .PP
  67 The definition of EM, the intermediate code of all ACK compilers,
  68 is given in a separate document.
  69 .[
  70 tanenbaum machine architecture
  71 .]
  72 We will only discuss some features of EM that are most relevant
  73 to the Global Optimizer.
  74 .PP
  75 EM is the assembly code of a virtual \fIstack machine\fR.
  76 All operations are performed on the top of the stack.
  77 For example, the statement "A := B + 3" may be expressed in EM as:
  78 .DS
  79 .TS
  80 l l.
  81 LOL -4  -- push local variable B
  82 LOC 3   -- push constant 3
  83 ADI 2   -- add two 2-byte items on top of
  84         -- the stack and push the result
  85 STL -2  -- pop A
  86 .TE
  87 .DE
  88 So EM is essentially a \fIpostfix\fR code.
  89 .PP
  90 EM has a rich instruction set, containing several arithmetic
  91 and logical operators.
  92 It also contains special-case instructions (such as INCrement).
  93 .PP
  94 EM has \fIglobal\fR (\fIexternal\fR) variables, accessible
  95 by all procedures and \fIlocal\fR variables, accessible by a few
  96 (nested) procedures.
  97 The local variables of a lexically enclosing procedure may
  98 be accessed via a \fIstatic link\fR.
  99 EM has instructions to follow the static chain.
 100 There are EM instruction to allow a procedure
 101 to access its local variables directly (such as LOL and STL above).
 102 Local variables are referenced via an offset in the stack frame
 103 of the procedure, rather than by their names (e.g. -2 and -4 above).
 104 The EM code does not contain the (source language) type
 105 of the variables.
 106 .PP
 107 All structured statements in the source program are expressed in
 108 low level jump instructions.
 109 Besides conditional and unconditional branch instructions, there are
 110 two case instructions (CSA and CSB),
 111 to allow efficient translation of case statements.
 112 .NH 2
 113 Requirements on the EM input
 114 .PP
 115 As the optimizer should be useful for all languages,
 116 it clearly should not put severe restrictions on the EM code
 117 of the input.
 118 There is, however, one immovable requirement:
 119 it must be possible to determine the \fIflow of control\fR of the
 120 input program.
 121 As virtually all global optimizations are based on control flow information,
 122 the optimizer would be totally powerless without it.
 123 For this reason we restrict the usage of the case jump instructions (CSA/CSB)
 124 of EM.
 125 Such an instruction is always called with the address of a case descriptor
 126 on top the the stack.
 127 .[~[
 128 tanenbaum machine architecture
 129 .] section 7.4]
 130 This descriptor contains the labels of all possible
 131 destinations of the jump.
 132 We demand that all case descriptors are allocated in a global
 133 data fragment of type ROM, i.e. the case descriptors
 134 may not be modifyable.
 135 Furthermore, any case instruction should be immediately preceded by
 136 a LAE (Load Address External) instruction, that loads the
 137 address of the descriptor,
 138 so the descriptor can be uniquely identified.
 139 .PP
 140 The optimizer will work improperly if the user deceives the control flow.
 141 We will give two methods to do this.
 142 .PP
 143 In "C" the notorious library routines "setjmp" and "longjmp"
 144 .[
 145 unix programmer's manual McIlroy
 146 .]
 147 may be used to jump out of a procedure,
 148 but can also be used for a number of other stuffy purposes,
 149 for example, to create an extra entry point in a loop.
 150 .DS
 151  while (condition) {
 152          ....
 153          setjmp(buf);
 154          ...
 155  }
 156  ...
 157  longjmp(buf);
 158 .DE
 159 The invocation to longjmp actually is a jump to the place of
 160 the last call to setjmp with the same argument (buf).
 161 As the calls to setjmp and longjmp are indistinguishable from
 162 normal procedure calls, the optimizer will not see the danger.
 163 No need to say that several loop optimizations will behave
 164 unexpectedly when presented with such pathological input.
 165 .PP
 166 Another way to deceive the flow of control is
 167 by using exception handling routines.
 168 Ada*
 169 .FS
 170 * Ada is a registered trademark of the U.S. Government
 171 (Ada Joint Program Office).
 172 .FE
 173 has clearly recognized the dangers of exception handling,
 174 but other languages (such as PL/I) have not.
 175 .[
 176 ada rationale
 177 .]
 178 .PP
 179 The optimizer will be more effective if the EM input contains
 180 some extra information about the source program.
 181 Especially the \fIregister message\fR is very important.
 182 These messages indicate which local variables may never be
 183 accessed indirectly.
 184 Most optimizations benefit significantly by this information.
 185 .PP
 186 The Inline Substitution technique needs to know how many bytes
 187 of formal parameters every procedure accesses.
 188 Only calls to procedures for which the EM code contains this information
 189 will be substituted in line.
 190 .NH 2
 191 Structure of the optimizer
 192 .PP
 193 The Global Optimizer is organized as a number of \fIphases\fR,
 194 each one performing some task.
 195 The main structure is as follows:
 196 .IP IC 6
 197 the Intermediate Code construction phase transforms EM into the
 198 intermediate code (ic) of the optimizer
 199 .IP CF
 200 the Control Flow phase extends the ic with control flow
 201 information and interprocedural information
 202 .IP OPTs
 203 zero or more optimization phases, each one performing one or
 204 more related optimizations
 205 .IP CA
 206 the Compact Assembly phase generates Compact Assembly Language EM code
 207 out of ic.
 208 .LP
 209 .PP
 210 An important issue in the design of a global optimizer is the
 211 interaction between optimization techniques.
 212 It is often advantageous to combine several techniques in
 213 one algorithm that takes into account all interactions between them.
 214 Ideally, one single algorithm should be developed that does
 215 all optimizations simultaneously and deals with all possible interactions.
 216 In practice, such an algorithm is still far out of  reach.
 217 Instead some rather ad hoc (albeit important) combinations are chosen,
 218 such as Common Subexpression Elimination and Register Allocation.
 219 .[
 220 prabhala sethi common subexpressions
 221 .]
 222 .[
 223 sethi ullman optimal code
 224 .]
 225 .PP
 226 In the Em Global Optimizer there is one separate algorithm for
 227 every technique.
 228 Note that this does not mean that all techniques are independent
 229 of each other.
 230 .PP
 231 In principle, the optimization phases can be run in any order;
 232 a phase may even be run more than once.
 233 However, the following rules should be obeyed:
 234 .IP -
 235 the Live Variable analysis phase (LV) must be run prior to
 236 Register Allocation (RA), as RA uses information outputted by LV.
 237 .IP -
 238 RA should be the last phase; this is a consequence of the way
 239 the interface between RA and the Code Generator is defined.
 240 .LP
 241 The ordering of the phases has significant impact on
 242 the quality of the produced code.
 243 In
 244 .[
 245 wulf overview production quality carnegie-mellon
 246 .]
 247 two kinds of phase ordering problems are distinguished.
 248 If two techniques A and B both take away opportunities of each other,
 249 there is a "negative" ordering problem.
 250 If, on the other hand, both A and B introduce new optimization
 251 opportunities for each other, the problem is called "positive".
 252 In the Global Optimizer the following interactions must be
 253 taken into account:
 254 .IP -
 255 Inline Substitution (IL) may create new opportunities for most
 256 other techniques, so it should be run as early as possible
 257 .IP -
 258 Use Definition analysis (UD) may introduce opportunities for LV.
 259 .IP -
 260 Strength Reduction may create opportunities for UD
 261 .LP
 262 The optimizer has a default phase ordering, which can
 263 be changed by the user.
 264 .NH 2
 265 Structure of this document
 266 .PP
 267 The remaining chapters of this document each describe one
 268 phase of the optimizer.
 269 For every phase, we describe its task, its design,
 270 its implementation, and its source files.
 271 The latter two sections are intended to aid the
 272 maintenance of the optimizer and
 273 can be skipped by the initial reader.
 274 .NH 2
 275 References
 276 .PP
 277 There are very
 278 few modern textbooks on optimization.
 279 Chapters 12, 13, and 14 of
 280 .[
 281 aho compiler design
 282 .]
 283 are a good introduction to the subject.
 284 Wulf et. al.
 285 .[
 286 wulf optimizing compiler
 287 .]
 288 describe one specific optimizing (Bliss) compiler.
 289 Anklam et. al.
 290 .[
 291 anklam vax-11
 292 .]
 293 discuss code generation and optimization in
 294 compilers for one specific machine (a Vax-11).
 295 Kirchgaesner et. al.
 296 .[
 297 optimizing ada compiler
 298 .]
 299 present a brief description of many
 300 optimizations; the report also contains a lengthy (over 60 pages)
 301 bibliography.
 302 .PP
 303 The number of articles on optimization is quite impressive.
 304 The Lowry and Medlock paper on the Fortran H compiler
 305 .[
 306 object code optimization Lowry Medlock
 307 .]
 308 is a classical one.
 309 Other papers on global optimization are.
 310 .[
 311 faiman optimizing pascal
 312 .]
 313 .[
 314 perkins sites
 315 .]
 316 .[
 317 harrison general purpose optimizing
 318 .]
 319 .[
 320 morel partial redundancies
 321 .]
 322 .[
 323 Mintz global optimizer
 324 .]
 325 Freudenberger
 326 .[
 327 freudenberger setl optimizer
 328 .]
 329 describes an optimizer for a Very High Level Language (SETL).
 330 The Production-Quality Compiler-Compiler (PQCC) project uses
 331 very sophisticated compiler techniques, as described in.
 332 .[
 333 wulf overview ieee
 334 .]
 335 .[
 336 wulf overview carnegie-mellon
 337 .]
 338 .[
 339 wulf machine-relative
 340 .]
 341 .PP
 342 Several Ph.D. theses are dedicated to optimization.
 343 Davidson
 344 .[
 345 davidson simplifying
 346 .]
 347 outlines a machine-independent peephole optimizer that
 348 improves assembly code.
 349 Katkus
 350 .[
 351 katkus
 352 .]
 353 describes how efficient programs can be obtained at little cost by
 354 optimizing only a small part of a program.
 355 Photopoulos
 356 .[
 357 photopoulos mixed code
 358 .]
 359 discusses the idea of generating interpreted intermediate code as well
 360 as assembly code, to obtain programs that are both small and  fast.
 361 Shaffer
 362 .[
 363 shaffer automatic
 364 .]
 365 describes the theory of automatic subroutine generation.
 366 .]
 367 Leverett
 368 .[
 369 leverett register allocation compilers
 370 .]
 371 deals with register allocation in the PQCC compilers.
 372 .PP
 373 References to articles about specific optimization techniques
 374 will be given in later chapters.