1 .\" $Id: nopt.doc,v 2.5 1994/06/24 10:02:13 ceriel Exp $
3 A Tour of the New Peephole Optimizer
9 The peephole optimizer consists of four major parts:
11 the table describing the optimization to be performed
13 a program to parse these tables and build input and output routines to
14 interface to the library and a dfa based routine to recognize patterns and
15 make the requested replacements.
17 common routines for the library that are independent of the table of a)
19 a stand alone version of the optimizer.
21 The library conforms to the
23 module interface but with routine names of the form
25 replaced by names like
27 Furthermore there is also no routine
32 The library module results in calls to the usual
34 module. It is possible to write a front end so that it can call either the
37 module or this new module by adding
45 This will map all calls to the routine
47 into a call to the routine
51 We shall now describe each of these major parts in some detail.
54 The optimization table
58 contains the patterns of EM instructions to be recognized by the optimizer
59 and the EM instructions to replace them. Each pattern may have an
60 optional restriction that must be satisfied before the replacement is made.
61 The syntax of the table will be described using extended BNF notation
67 [...] - are used to group items
68 | - is used to separate alternatives
70 ? - indicates item is optional
71 * - indicates item is repeated zero or more times
72 + - indicates item is repeated one or more times
75 The format of each rule in the table is:
78 rule : pattern global_restriction? ':' replacement
82 Each rule must be on a single line except that it may be broken after the
83 colon if the next line begins with a tab character.
84 The pattern has the syntax:
87 pattern : [ EM_mnem [ local_restriction ]? ]+
89 EM-mnem : "An EM instruction mnemonic"
94 and consists of a sequence of one or more EM instructions or
96 which stands for a defined instruction label. Each EM-mnem may optionally be
97 followed by a local restriction on the argument of the mnemonic and take
98 one of the following forms depending on the type of the EM instruction it
102 local_restriction : normal_restriction
103 | opt_arg_restriction
104 | ext_arg_restriction
108 A normal restriction is used after all types of EM instruction except for
109 those that allow an optional argument, (such as
111 ) or those involving external names, (such as
117 normal_restriction : [ rel_op ]? expression
128 If the rel_op is missing, the equality
130 operator is assumed. The general form of expression is defined later but
131 basically it involves simple constants, references to EM_mnem arguments
132 that appear earlier in the pattern and expressions similar to those used
135 The form of the restriction after those EM instructions like
137 whose arguments are optional takes the form:
140 opt_arg_restriction : normal_restriction
150 indicate that the argument is present
151 or absent respectively. The normal restriction form implies that the
152 argument is present and satisfies the restriction.
154 The form of the restriction after those EM instructions like
156 whose arguments refer to external object take the form:
159 ext_arg_restriction : patarg offset_part?
161 offset_part : [ '+' | '-' ] expression
165 Such an argument has one of three forms: a offset with no name, an
166 offset form a name or an offset from a label. With no offset part
167 the restriction requires the argument to be identical to a previous
168 external argument. With an offset part it requires an identical name
169 part, (either empty, same name or same label) and supplies a relationship
170 among the offset parts. It is possible to refer to test for the same
171 external argument, the same name or to obtain the offset part of an external
178 functions given below.
180 The general form of an expression is:
183 expression : expression binop expression
186 | bin_function '(' expression ',' expression ')'
187 | ext_function '(' patarg ',' patarg ')'
188 | 'offset' '(' patarg ')'
199 bin_function : 'sfit'
208 ext_function : 'samenam'
213 binop : "As for C language"
214 unaryop : "As for C language"
219 refers to the first, second, etc. argument in the pattern and it is
220 required to refer to a pattern that appears earlier in the pattern
225 refer to the word size and pointer size (in bytes) respectively.
228 refers to twice the word size.
230 various function test for:
232 the first argument fits as a signed value of
233 the number of bit specified by the second argument.
235 as for sfit but for unsigned values.
237 the first argument has the same sign as the second.
239 the value of the first argument rotated by the number of bit specified
240 by the second argument.
242 both arguments refer to externals and have either no name, the same name
245 both arguments refer to the same external.
247 the argument is an external and this yields it offset part.
250 The global restriction takes the form:
253 global_restriction : '?' expression
257 and is used to express restrictions that cannot be expressed as simple
258 restrictions on a single argument or are can be expressed in a more
259 readable fashion as a global restriction. An example of such a rule is:
262 dup w ldl stf ? p==2*w : ldl $2 stf $3 ldl $2 lof $3
265 which says that this rule only applies if the pointer size is twice the
269 Incompatibilities with Previous Optimizer
271 The current table format is not compatible with previous versions of the
272 peephole optimizer tables. In particular the previous table had no provision
273 for local restrictions and only the equivalent of the global restriction.
276 character that announces the presence of the optional global restriction was
277 not required. The previous optimizer performed a number of other tasks that
278 were unrelated to optimization that were possible because the old optimizer
279 read the EM code for a complete procedure at a time. This included tasks such
280 as register variable reference counting and moving the information regarding
281 the number of bytes of local storage required by a procedure from it
283 pseudo instruction to it's
285 pseudo instruction. These tasks are no longer done by this module but have
286 been moved to other modules or programs in the pipeline. The register variable
287 reference counting is now performed by the front end. The reordering of
288 code, such as the moving of mes instructions and the local storage
289 requirements from the end to beginning of procedures, is now performed using
290 the insertpart mechanism in the
295 The removal of dead code is performed by the global optimizer.
298 available in the old tables are no longer available as they rely on
299 information that is not available to the current program.
305 The previous optimizer allowed the use of
311 in patterns. For example
315 if the pointer size was the same as the word size, or for
317 if the pointer size was twice the word size.
318 In the current optimizer it is necessary to include two patterns for each
319 such single pattern in the old table. For example for a pattern containing
321 there would be one pattern with
323 and with a global restriction of the form
325 and another pattern with ldl and a global restriction of the form
331 The program to parse the tables and build the pattern table dependent dfa
332 routines is built from the files:
336 LLGen source file defining syntax of table
338 Lex sources file defining form of tokens in table.
340 Uses the data in the library
342 to initialize the lexical analyzer to recognize EM instruction mnemonics.
344 Routines to output the dfa when it has been constructed. It outputs the files
349 Routines to output the file
351 defined in the next section.
353 Routines to analyze patterns to find how to continue matching after a
354 successful replacement or failed match.
357 The parser checks that the tables conform to the syntax outlined in the
358 previous section and also makes a number of semantic checks on their
359 validity. Further versions could make further checks such as looking for
360 cycles in the rules or checking that each replacement leaves the same
361 number of bytes on the stack as the pattern it replaces. The parser
362 builds an internal dfa representation of the rules by combining rules with
363 common prefixes. All local and global restrictions are combined into a single
364 test to be performed are a complete pattern has been detected in the input.
365 The idea is to build a structure so that each of the patterns can be matched
366 and then the corresponding tests made and the first that succeeds is replaced.
367 If two rules have the same pattern and both their tests also succeed the one
368 that appears first in the tables file will be done. Somewhat less obvious
369 is that if one pattern is a proper prefix of a longer pattern and its test
370 succeeds then the second pattern will not be checked for.
372 A major task of the parser if to decide on the action to take when a rule has
373 been partially matched or when a pattern has been completely matched but its
374 test does not succeed. This requires a search of all patterns to see if any
375 part of the part matched could be part of some other pattern. for example
376 given the two patterns:
379 loc adi w loc adi w : loc $1+$3 adi w
380 loc adi w loc sbi w : loc $1-$3 adi w
383 If the first pattern fails after seeing the input:
389 the parser will still need to check whether the second pattern matches.
390 This requires a decision on how to fix up any internal data structures in
391 the dfa matcher, such as moving some instructions from the pattern to the
392 output queue and moving the pattern along and then deciding what state
393 it should continue from. Similar decisions are requires after a pattern
394 has been replaced. For example if the replacement is empty it is necessary
399 is the length of the longest pattern in the tables.
402 Structure of the Resulting Library
405 The major data structures maintained by the library consist of three queues;
408 queue of instructions awaiting output, a
410 queue containing instructions that match the current prefix, and a
412 queue of instructions that have been backed up over and need to be reparsed
413 for further pattern matches.
414 These three queues are maintained in a single fixed size buffer as explained
415 in more detail in the next section.
416 Also, after a successful match, a replacement queue is constructed.
420 If no errors are detected by the parser in the tables it output the following
421 files if they have changed from the existing version of the file:
423 this contains the dfa encoded into a number of arrays using the technique
424 of row displacement for compacted sparse matricies. Given an opcode and
425 the current state, the value of
427 is consulted to obtain a pointer into the array
429 If this pointer in zero or the
431 field of the addressed structure does
432 not correspond to the curerent state then it is known there is no entry for
433 this opcode/state pair and the
435 array is consulted instead.
436 If the check field does match then the
438 field contains the new state.
439 After each transition the array
441 is consulted to see if this state corresponds to a final state
442 (i.e. a complete pattern) and if so the corresponding function is called.
444 this contains external declarations of transition routines with names like
449 These are called when there a transition to state
451 that corresponds to a
452 complete pattern. Any tests are performed if necessary to confirm that the
453 pattern matches and then the replacement instructions are placed on the
454 output queue and the routine
456 is called to make the replacement and if backup the amount required.
457 If there are a number of patterns with the same instructions but different
458 tests, these will all appear in the same routine and the tests performed in
459 the order they appear in the original
463 this contains an entry for every EM instruction (plus
465 ) giving information on how to build a routine with the name
467 for the library version of the module.
468 If the EM instruction does not appear in the tables
469 patterns at all then the dfa routine is called to flush any current queued
470 output and the the output
472 routine is called. If the EM instruction does appear in a pattern then the
473 instruction data structure fields are
474 initialized and it is added onto the end of the pattern queue.
475 The dfa routines are then called to attempted to make a transition.
476 This file is input to the
482 The following files contain code that is independent of the pattern tables:
484 this is used only in the stand alone version of the optimizer and consists
485 of code to open the input file, read the input using the
487 module and call the dfa routines. This version does not require the routines
488 constructed from the incalls.r file described above.
490 general routines to initialize, and maintain the data structures. The file
493 etc are defined here. Also defined are routines for flushing the output queue
498 module and moving instructions from the output to the backup queue.
499 Routines to free the strings stored in instructions
507 .I fco_ptyp are also defined. These strings are copied to a large array that
510 if it overflows. The strings can be thrown away on any flush that occurs when
511 the backup queue is empty.
513 contains routines to build the data structure from the input
515 routines and place the structure on the pattern queue. These routines are also
516 used to build the data structures when a replacement is constructed.
518 routines to implement the external functions used in the pattern table.
521 The following files are also used in building the module library:
525 program is used to produce individual C files with names like
527 each containing a single function definition and then call the
529 compiler to produce a single output file.
530 This enables the loader to only load those routines that are actually
531 needed when the library is loaded.
533 this file is like the
535 file produced by the parser but is built by hand and handles the pseudo
536 EM instructions. It is also processed by
542 The output, pattern and backup queues are maintained in fixed length array,
546 (a constant declared in nopt.h) at run time.
547 It consists of an array of the
549 data structure used by the
552 At any time the pointers
556 point to the beginning and end of the current pattern prefix that corresponds
557 to the current state. Any instructions on the backup queue are between
561 If there are no instructions on the backup queue then
564 The size of the replacement queue is set to the length of the maximum
565 replacement length by the tables output by the parser.
568 The fixed size of the buffer causes no difficulty in
569 practice and can only result in some potential optimizations being missed.
570 When space for a new instruction is required and the buffer is full the
573 is called to flush half the buffer and move all the data structures left.
574 It should be noted that it is not possible to statically determine the
575 maximum possible size for these queues as they need to be unbounded in
583 with the input consisting of
589 instructions requires an output queue length of
591 to find all possible replacements.