1 .\" $Id: peep.doc,v 1.5 1994/06/24 10:02:20 ceriel Exp $
3 Internal documentation on the peephole optimizer
5 from the Amsterdam Compiler Kit
9 Part of the Amsterdam Compiler Kit is a program to do
10 peephole optimization on an EM program.
11 The optimizer scans the program to match patterns from a table
12 and if found makes the optimization from the table,
13 and with the result of the optimization
14 it tries to find yet another optimization
15 continuing until no more optimizations are found.
17 Furthermore it does some optimizations that can not be called
18 peephole optimizations for historical reasons,
19 like branch chaining and the deletion of unreachable code.
21 The peephole optimizer consists of three parts
25 A program translating the table to internal format
27 C code compiled with the table to make the optimizer proper
29 In this document the table format, internal format and
30 data structures in the optimizer will be explained,
31 plus a hint on what the code does where it might not be obvious.
32 It is a simple program mostly.
36 The driving table consists of pattern/replacement pairs,
37 in principle one per line,
38 although a line starting with white space is considered
39 a continuation line for the previous.
40 The general format is:
42 optimization : pattern ':' replacement '\en'
44 pattern : EMlist optional_boolean_expression
46 replacement : EM_plus_operand_list
48 Example of a simple one
50 loc stl $1==0 : zrl $2
52 There is no real limit for the length of the pattern or the replacement,
53 the replacement might even be longer than the pattern,
54 and expressions can be made arbitrarily complicated.
56 The expressions in the table are made of the following pieces:
60 $\fIn\fP, standing for the operand of the \fIn\fP'th EM
61 instruction in the pattern,
62 undefined if that instruction has no operand.
64 w, standing for the wordsize of the code optimized.
66 p, for the pointersize.
68 defined(expr), true if expression is defined
70 samesign(expr,expr), true if expressions have the same sign.
72 sfit(expr,expr), ufit(expr,expr),
73 true if the first expression fits signed or unsigned in the number
74 of bits given in the second expression.
77 first expression rotated left the number of bits given by the second expression.
80 true if the local with the expression as number is not a candidate to put
83 rom(\fIn\fP,expr), contents of the rom descriptor at index expr that
84 is associated with the global label that should be the argument of
85 the \fIn\fP'th EM instruction.
86 Undefined if such a thing does not exist.
88 The usual arithmetic operators may be used on integer values,
89 if any operand is undefined the expression is undefined,
90 except for the defined() function above.
91 An undefined expression used for its truth value is false.
92 All arithmetic on local label operands is forbidden,
93 only things allowed are tests for equality.
94 Arithmetic on global labels makes sense,
95 i.e. one can add a global label and a constant,
96 but not two global labels.
98 In the table one can use five additional EM instructions in patterns.
101 Stands for a local label
103 Load Local Pointer, translates into a
107 depending on the relationship between wordsize and pointersize.
109 Load External Pointer, translates into a
119 Store External Pointer,
124 There is only one peephole optimizer,
125 so the substitutions to be made for the last four instructions
126 are made at run time before the first optimizations are made.
130 The translating program,
132 converts the table into an array of bytes where all
133 patterns follow unaligned.
134 Format of a pattern is:
136 One byte for high byte of hash value,
137 will be explained later on.
139 Two bytes for the index of the next pattern in a chain.
143 * An integer is encoded as a byte when less than 255,
144 otherwise as a byte containing 255 followed by two
145 bytes with the real value.
149 The list of pattern opcodes, one per byte.
151 An integer expression index, 0 if not used.
153 An integer, replacement length.
155 A list of pairs consisting of a one byte opcode and an integer
158 The expressions are kept in an array of triples,
159 implementing a binary tree.
162 program tries to minimize the number of triples by reusing
163 duplicates and even reverses the operands of commutative operators
164 when doing so would spare a triple.
166 A tour through the sources
168 Now we will walk through the sources and note things of interest.
172 The header files are the place where data structures and options reside.
176 In the header file alloc.h several defines can be used to select various
177 kinds of core allocation schemes.
178 This is important on small machines like the PDP-11 since a complete
179 procedure must be in core at the same space,
180 and the peephole optimizer should not be the limiting factor in
181 determining the maximum size of procedures if possible.
184 USEMALLOC, standard malloc() and free() are used instead of the own
185 core allocation package.
186 Not recommended unless the own package does not work on some bizarre
189 COREDEBUG, prints large amounts of information about core management.
190 Not recommended unless the code is changed and it stops working.
192 SEPID, defining this will add an extra procedure that will
193 go through a lot of work to scrape the last bytes together if the
194 system won't provide more.
195 This is not a good idea if memory is scarce and code and data reside
196 in the same spaces, since the room used by the procedure might well
197 be more than the room saved.
199 STACKROOM, number of shorts used in stack space.
200 This is used if memory is scarce and stack space and data space are
202 On the PDP-11 a UNIX process starts with an 8K stack segment which
203 cannot be transferred to the data segment.
204 Under these conditions one can use a lot of the stack space for storage.
208 Just defines the assert macro.
209 When compiled with -DNDEBUG all asserts will be off.
213 Gives external definitions of variables used by more than one module.
217 Defines the structures used to keep instructions,
218 one structure per line of EM code,
219 and the structure to keep arguments of pseudos,
220 one structure per argument.
221 Both structures essentially contain a pointer to the next,
223 and a union containing information depending on the type.
224 Core is allocated only for the part of the union used.
230 has a very compact encoding for small integers,
231 they are encoded in the type field.
232 On the PDP-11 this gives a line structure of only 4 bytes for most
237 Contains definition of the struct used for symbol table management,
238 global labels and procedure names are kept in one table.
242 If one defines the DIAGOPT option in this header file,
243 for every optimization performed a number is written on stderr.
244 The number gives the number of the pattern in the table
245 or one of the four special numbers in this header file.
249 Contains one settable option,
251 If this is not defined the optimizer can only optimize programs
252 with wordsize 2 and pointersize 2.
253 Set this only if it must be run on a Z80 or something pathetic like that.
255 Other defines here should not be touched.
259 Contains defines of indices in a pattern,
260 definition of the expression triples,
261 definitions of the various expression operators
262 and definition of the result struct where expression results are put.
264 This header file is the main one that is also included by
269 This one contains definitions
270 for the local label table structs
271 and for the struct where all information for one procedure is kept.
272 This is in one struct so it can be saved easily when recursive
273 procedures have to be resolved.
277 Contains the data structure used by the top element size computation.
281 Collection of typedefs to be used by almost all modules.
285 The C code will now be the center of our attention.
286 We will make a walk through the sources and we will try
287 to follow the sources in a logical order.
292 The main.c module contains the main() function.
293 Here nothing spectacular happens,
294 only thing of interest is the handling of flags:
296 This is an instruction to the peephole optimizer to perform
297 one of its auxiliary functions, the generation of a library module.
298 This makes the peephole optimizer write its output on a temporary file,
299 and at the end making the real output by first generating a list
300 of exported symbols and then copying the temporary file behind it.
302 Disables all optimization.
303 Only thing the optimizer does now is filling in the blank after the
305 pseudo and resolving recursive procedures.
307 The place where main() is left is the call to getlines() which brings
312 This module reads the EM code and constructs a list of
317 linked together backwards,
318 i.e. the first instruction read is the last in the list.
319 Pseudos are handled here also,
320 for most pseudos this just means that a chain of argument records
321 is linked into the linked line list but some pseudos get special attention:
323 This pseudo is acted upon right away.
324 Lines read are shuffled around according to instruction.
326 Some messages are acted upon.
330 The input is drained, just in case it is a pipe.
331 After that the optimizer exits.
333 The do not optimize flag is set.
334 Acts just like -n on the command line.
336 The word- and pointersize are read,
337 complain if we are not able to handle this.
339 We take notice of the offset of this local.
340 See also comments in the description of peephole.c
343 A new procedure starts, if we are already in one save the status,
344 else process collected input.
345 Collect information about this procedure and if already in a procedure
346 call getlines() recursively.
348 Process collected input.
350 The phrase "process collected input" is used twice,
355 This module contains the entry point process() which is called at any
356 time the collected input must be processed.
357 It calls a variety of other routines to get the real work done.
358 Routines in this module are in chronological order:
360 Marks all symbols seen until now as known,
361 i.e. it is now known whether their scope is local or global.
362 This information is used again during output.
364 Runs through the chain of pseudos to give values to data labels.
365 This needs an extra pass.
366 It cannot be done during the getlines pass, since an
368 pseudo could destroy things.
369 Nor can it be done during the backward pass since it is impossible
370 to do good fragment numbering backward.
372 Checks whether all local labels referenced are defined.
373 It needs to be sure about this since otherwise the
374 semi global optimizations made cannot work.
376 This routine finds the final destination for each label in the procedure.
377 Labels followed by unconditional branches or other labels are marked during
378 the peephole fase and this leeds to chains of identical labels.
379 These chains are followed here, and in the local label table each label
380 has associated with it its replacement label, after this procedure is run.
381 Care is taken in this routine to prevent a loop in the program to
382 cause the optimizer to loop.
384 This routine empties the local label table after everything
387 But before this can all be done,
388 the backward linked list of instructions first has to be reversed,
393 The routine backward has a number of functions:
395 It reverses the backward linked list, making two forward linked lists,
396 one for the instructions and one for the pseudos.
398 It notes the last occurrence of data labels in the backward linked list
399 and puts it in the global symbol table.
400 This is of course the first occurence in the procedure.
401 This information is needed to decide whether the symbols are global
402 or local to this module.
404 It decides about the fragment boundaries of data blocks.
405 Fragments are numbered backwards starting at 3.
406 This is done to be able to make the type of an expression
407 containing a symbol equal to its fragment.
408 This type can then not clash with the types integer and local label.
410 It allocates a rom buffer to every data label with a rom behind
411 it, if that rom contains only plain integers at the start.
413 The first thing done after process() has called backward() and some
414 of its own little routines is a call to the real routine,
415 the one that does the work the program was written for
419 The first routines in peephole.c
420 implement a linked list for the offsets of local variables
421 that are candidates for a register implementation.
422 Several patterns use the notreg() function,
423 since it is forbidden to combine a load of that variable
424 with the load of another and
425 it is not allowed to take the address of that variable.
427 The routine peephole hashes the patterns the first time it is called
428 after which it doesn't do much more than calling optimize.
429 But first hashpatterns().
431 The patterns are hashed at run time of the optimizer because of
438 instructions added to the instruction set in this optimizer.
439 These are first replaced everywhere in the table by the correct
440 replacement after which the first three instructions of the
441 pattern are hashed and the pattern is linked into one of the
443 There is a define CHK_HASH in this module that
444 can be set if the randomness of the hashing
445 function is not trusted.
447 The attention now shifts to optimize().
448 This routine calls basicblock() for every piece of code between two labels.
449 It also notes which labels have another label or a branch behind them
450 so the relabel() routine from process.c can do something with that.
452 Basicblock() keeps making passes over its basic block
453 until no more optimizations are found.
454 This might be inefficient if there is a long basicblock with some
455 deep recursive optimization in one part of it.
456 The entire basic block is then scanned a lot of times just for
458 The alternative is backing up after making an optimization and running
459 through the same code again, but that is difficult
460 in a single linked list.
462 It hashes instructions and calls trypat() for every pattern that has
463 a full hash value match,
464 i.e. lower byte and upper byte equal.
465 Longest pattern is tried first.
467 Trypat() checks length and opcodes of the pattern.
468 If correct it fills the iargs[] array with argument values
469 and calculates the expression.
470 If that is also correct the work shifts to tryrepl().
472 Tryrepl() generates the list of replacement instructions,
473 links it into the list and returns true.
474 Why then the name tryrepl() if it always succeeds?
475 Well, there is a mechanism in the optimizer,
476 unused until today that makes it possible to do optimizations that cannot
477 be described by the table.
478 It is possible to give a number as a replacement which will cause the
479 optimizer to call a routine special() to do some work.
480 This routine might decide not to do an optimization and return false.
482 The last routine that is called from process() is putline()
483 to write the optimized code, bringing us to
487 Contains the routines used by the top element size computation phase,
488 which is run after the peephole-optimisation.
489 The main routine of tes.c is tes_instr(). This looks at an instruction and
490 decides the size of the element on top of the stack after the instruction
491 is executed. When a label is defined or used, the size of the top element
492 is remembered for later use. When the information in consistent throuhout
493 the procedure, it is passed to the code generator by means of an ms_tes
498 The major part of putline.c is the standard set of routines
499 that makes EM compact code.
500 The extra functions performed are:
502 For every occurence of a global symbol it might be necessary to
510 That task is performed.
514 instructions are optimized here,
516 instructions added for
518 instructions and superfluous
520 instructions deleted.