doc/nopt.doc

   1 .\" $Id: nopt.doc,v 2.5 1994/06/24 10:02:13 ceriel Exp $
   2 .TL
   3 A Tour of the New Peephole Optimizer
   4 .AU
   5 B. J. McKenzie
   6 .NH
   7 Introduction
   8 .LP
   9 The peephole optimizer consists of four major parts:
  10 .IP a)
  11 the table describing the optimization to be performed
  12 .IP b)
  13 a program to parse these tables and build input and output routines to
  14 interface to the library and a dfa based routine to recognize patterns and
  15 make the requested replacements.
  16 .IP c)
  17 common routines for the library that are independent of the table of a)
  18 .IP d)
  19 a stand alone version of the optimizer.
  20 .LP
  21 The library conforms to the
  22 .I EM_CODE(3)
  23 module interface but with routine names of the form
  24 .BI C_ xxx
  25 replaced by names like
  26 .BI O_ xxx.
  27 Furthermore there is also no routine
  28 .I O_getid
  29 and no variable
  30 .I O_tmpdir
  31 in the module.
  32 The library module results in calls to the usual
  33 .I EM_CODE(3)
  34 module. It is possible to write a front end so that it can call either the
  35 normal
  36 .I EM_CODE(3)
  37 module or this new module by adding
  38 .B
  39 #define PEEPHOLE
  40 .R
  41 before the line
  42 .B
  43 #include <em.h>
  44 .R
  45 This will map all calls to the routine
  46 .BI C_ xxx
  47 into a call to the routine
  48 .BI O_ xxx.
  49
  50 .LP
  51 We shall now describe each of these major parts in some detail.
  52
  53 .NH
  54 The optimization table
  55 .LP
  56 The file
  57 .I patterns
  58 contains the patterns of EM instructions  to be recognized by the optimizer
  59 and the EM instructions to replace them. Each pattern may have an
  60 optional restriction that must be satisfied before the replacement is made.
  61 The syntax of the table will be described using extended BNF notation
  62 used by
  63 .I LLGen
  64 where:
  65 .DS
  66 .I
  67         [...]   - are used to group items
  68         |       - is used to separate alternatives
  69         ;       - terminates a rule
  70         ?       - indicates item is optional
  71         *       - indicates item is repeated zero or more times
  72         +       - indicates item is repeated one or more times
  73 .R
  74 .DE
  75 The format of each rule in the table is:
  76 .DS
  77 .I
  78         rule    : pattern global_restriction? ':' replacement
  79                 ;
  80 .R
  81 .DE
  82 Each rule must be on a single line except that it may be broken after the
  83 colon if the next line begins with a tab character.
  84 The pattern has the syntax:
  85 .DS
  86 .I
  87         pattern : [ EM_mnem [ local_restriction ]? ]+
  88                 ;
  89         EM-mnem : "An EM instruction mnemonic"
  90                 | 'lab'
  91                 ;
  92 .R
  93 .DE
  94 and consists of a sequence of one or more EM instructions or
  95 .I lab
  96 which stands for a defined instruction label. Each EM-mnem may optionally be
  97 followed by a local restriction on the argument of the mnemonic and take
  98 one of the following forms depending on the type of the EM instruction it
  99 follows:
 100 .DS
 101 .I
 102         local_restriction       : normal_restriction
 103                                 | opt_arg_restriction
 104                                 | ext_arg_restriction
 105                                 ;
 106 .R
 107 .DE
 108 A normal restriction is used after all types of EM instruction except for
 109 those that allow an optional argument, (such as
 110 .I adi
 111 ) or those involving external names, (such as
 112 .I lae
 113 )
 114 and takes the form:
 115 .DS
 116 .I
 117         normal_restriction      : [ rel_op ]? expression
 118                                 ;
 119         rel_op  : '=='
 120                 | '!='
 121                 | '<='
 122                 | '<'
 123                 | '>='
 124                 | '>'
 125                 ;
 126 .R
 127 .DE
 128 If the rel_op is missing, the equality
 129 .I ==
 130 operator is assumed. The general form of expression is defined later but
 131 basically it involves simple constants, references to EM_mnem arguments
 132 that appear earlier in the pattern and expressions similar to those used
 133 in C expressions.
 134
 135 The form of the restriction after those EM instructions like
 136 .I adi
 137 whose arguments are optional takes the form:
 138 .DS
 139 .I
 140         opt_arg_restriction     : normal_restriction
 141                                 | 'defined'
 142                                 | 'undefined'
 143                                 ;
 144 .R
 145 .DE
 146 The
 147 .I defined
 148 and
 149 .I undefined
 150 indicate that the argument is present
 151 or absent respectively. The normal restriction form implies that the
 152 argument is present and satisfies the restriction.
 153
 154 The form of the restriction after those EM instructions like
 155 .I lae
 156 whose arguments refer to external object take the form:
 157 .DS
 158 .I
 159         ext_arg_restriction     : patarg  offset_part?
 160                                 ;
 161         offset_part             : [ '+' | '-' ] expression
 162                                 ;
 163 .R
 164 .DE
 165 Such an argument has one of three forms: a offset with no name, an
 166 offset form a name or an offset from a label. With no offset part
 167 the restriction requires the argument to be identical to a previous
 168 external argument. With an offset part it requires an identical name
 169 part, (either empty, same name or same label) and supplies a relationship
 170 among the offset parts. It is possible to refer to test for the same
 171 external argument, the same name or to obtain the offset part of an external
 172 argument using the
 173 .I sameext
 174 ,
 175 .I samenam
 176 and
 177 .I offset
 178 functions given below.
 179 .LP
 180 The general form of an expression is:
 181 .DS
 182 .I
 183         expression      : expression binop expression
 184                         | unaryop expression
 185                         | '(' expression ')'
 186                         | bin_function '(' expression ',' expression ')'
 187                         | ext_function '(' patarg ',' patarg ')'
 188                         | 'offset' '(' patarg ')'
 189                         | patarg
 190                         | 'p'
 191                         | 'w2'
 192                         | 'w'
 193                         | INTEGER
 194                         ;
 195 .R
 196 .DE
 197 .DS
 198 .I
 199         bin_function    : 'sfit'
 200                         | 'ufit'
 201                         | 'samesign'
 202                         | 'rotate'
 203                         ;
 204 .R
 205 .DE
 206 .DS
 207 .I
 208         ext_function    : 'samenam'
 209                         | 'sameext'
 210                         ;
 211         patarg          : '$' INTEGER
 212                         ;
 213         binop           : "As for C language"
 214         unaryop         : "As for C language"
 215 .R
 216 .DE
 217 The INTEGER in the
 218 .I patarg
 219 refers to the first, second, etc. argument in the pattern and it is
 220 required to refer to a pattern that appears earlier in the pattern
 221 The
 222 .I w
 223 and
 224 .I p
 225 refer to the word size and pointer size (in bytes) respectively.
 226 The
 227 .I w2
 228 refers to twice the word size.
 229 The
 230 various function test for:
 231 .IP sfit 10
 232 the first argument fits as a signed value of
 233 the number of bit specified by the second argument.
 234 .IP ufit 10
 235 as for sfit but for unsigned values.
 236 .IP samesign 10
 237 the first argument has the same sign as the second.
 238 .IP rotate 10
 239 the value of the first argument rotated by the number of bit specified
 240 by the second argument.
 241 .IP samenam 10
 242 both arguments refer to externals and have either no name, the same name
 243 or same label.
 244 .IP sameext 10
 245 both arguments refer to the same external.
 246 .IP offset 10
 247 the argument is an external and this yields it offset part.
 248
 249 .LP
 250 The global restriction takes the form:
 251 .DS
 252 .I
 253         global_restriction      : '?' expression
 254                                 ;
 255 .R
 256 .DE
 257 and is used to express restrictions that cannot be expressed as simple
 258 restrictions on a single argument or are can be expressed in a more
 259 readable fashion as a global restriction. An example of such a rule is:
 260 .DS
 261 .I
 262         dup w ldl stf  ? p==2*w : ldl $2 stf $3 ldl $2 lof $3
 263 .R
 264 .DE
 265 which says that this rule only applies if the pointer size is twice the
 266 word size.
 267
 268 .NH
 269 Incompatibilities with Previous Optimizer
 270 .LP
 271 The current table format is not compatible with previous versions of the
 272 peephole optimizer tables. In particular the previous table had no provision
 273 for local restrictions and only the equivalent of the global restriction.
 274 This meant that our
 275 .I '?'
 276 character that announces the presence of the optional global restriction was
 277 not required. The previous optimizer performed a number of other tasks that
 278 were unrelated to optimization that were possible because the old optimizer
 279 read the EM code for a complete procedure at a time. This included tasks such
 280 as register variable reference counting and moving the information regarding
 281 the number of bytes of local storage required by a procedure from it
 282 .I end
 283 pseudo instruction to it's
 284 .I pro
 285 pseudo instruction. These tasks are no longer done by this module but have
 286 been moved to other modules or programs in the pipeline. The register variable
 287 reference counting is now performed by the front end. The reordering of
 288 code, such as the moving of mes instructions and the local storage
 289 requirements from the end to beginning of procedures, is now performed using
 290 the insertpart mechanism in the
 291 .I EM_CODE
 292 (or
 293 .I EM_OPT
 294 ) module.
 295 The removal of dead code is performed by the global optimizer.
 296 Various
 297 .I ext_functions
 298 available in the old tables are no longer available as they rely on
 299 information that is not available to the current program.
 300 These are the
 301 .I notreg
 302 and the
 303 .I rom
 304 functions.
 305 The previous optimizer allowed the use of
 306 .I LLP,
 307 .I LEP,
 308 .I SLP
 309 and
 310 .I SEP
 311 in patterns. For example
 312 .I LLP
 313 stood for either
 314 .I lol
 315 if the pointer size was the same as the word size, or for
 316 .I ldl
 317 if the pointer size was twice the word size.
 318 In the current optimizer it is necessary to include two patterns for each
 319 such single pattern in the old table. For example for a pattern containing
 320 .I LLP
 321 there would be one pattern with
 322 .I lol
 323 and with a global restriction of the form
 324 .I p=w
 325 and another pattern with ldl and a global restriction of the form
 326 .I p=2*w.
 327
 328 .NH
 329 The Parser
 330 .LP
 331 The program to parse the tables and build the pattern table dependent dfa
 332 routines is built from the files:
 333 .IP parser.h 15
 334 header file
 335 .IP parser.g 15
 336 LLGen source file defining syntax of table
 337 .IP syntax.l 15
 338 Lex sources file defining form of tokens in table.
 339 .IP initlex.c 15
 340 Uses the data in the library
 341 .I em_data.a
 342 to initialize the lexical analyzer to recognize EM instruction mnemonics.
 343 .IP outputdfa.c 15
 344 Routines to output the dfa when it has been constructed. It outputs the files
 345 .I dfa.c
 346 and
 347 .I trans.c
 348 .IP outcalls.c 15
 349 Routines to output the file
 350 .I incalls.r
 351 defined in the next section.
 352 .IP findworst.c 15
 353 Routines to analyze patterns to find how to continue matching after a
 354 successful replacement or failed match.
 355
 356 .LP
 357 The parser checks that the tables conform to the syntax outlined in the
 358 previous section and also makes a number of semantic checks on their
 359 validity. Further versions could make further checks such as looking for
 360 cycles in the rules or checking that each replacement leaves the same
 361 number of bytes on the stack as the pattern it replaces. The parser
 362 builds an internal dfa representation of the rules by combining rules with
 363 common prefixes. All local and global restrictions are combined into a single
 364 test to be performed are a complete pattern has been detected in the input.
 365 The idea is to build a structure so that each of the patterns can be matched
 366 and then the corresponding tests made and the first that succeeds is replaced.
 367 If two rules have the same pattern and both their tests also succeed the one
 368 that appears first in the tables file will be done. Somewhat less obvious
 369 is that if one pattern is a proper prefix of a longer pattern and its test
 370 succeeds then the second pattern will not be checked for.
 371
 372 A major task of the parser if to decide on the action to take when a rule has
 373 been partially matched or when a pattern has been completely matched but its
 374 test does not succeed. This requires a search of all patterns to see if any
 375 part of the part matched could be part of some other pattern. for example
 376 given the two patterns:
 377 .DS
 378 .I
 379         loc adi w loc adi w : loc $1+$3 adi w
 380         loc adi w loc sbi w : loc $1-$3 adi w
 381 .R
 382 .DE
 383 If the first pattern fails after seeing the input:
 384 .DS
 385 .I
 386         loc adi loc
 387 .R
 388 .DE
 389 the parser will still need to check whether the second pattern matches.
 390 This requires a decision on how to fix up any internal data structures in
 391 the dfa matcher, such as moving some instructions from the pattern to the
 392 output queue and moving the pattern along and then deciding what state
 393 it should continue from. Similar decisions  are requires after a pattern
 394 has been replaced. For example if the replacement is empty it is necessary
 395 to backup
 396 .I n-1
 397 instructions where
 398 .I n
 399 is the length of the longest pattern in the tables.
 400
 401 .NH
 402 Structure of the Resulting Library
 403
 404 .LP
 405 The major data structures maintained by the library consist of three queues;
 406 an
 407 .I output
 408 queue of instructions awaiting output, a
 409 .I pattern
 410 queue containing instructions that match the current prefix, and a
 411 .I backup
 412 queue of instructions that have been backed up over and need to be reparsed
 413 for further pattern matches.
 414 These three queues are maintained in a single fixed size buffer as explained
 415 in more detail in the next section.
 416 Also, after a successful match, a replacement queue is constructed.
 417
 418
 419 .LP
 420 If no errors are detected by the parser in the tables it output the following
 421 files if they have changed from the existing version of the file:
 422 .IP dfa.c 10
 423 this contains the dfa encoded into a number of arrays using the technique
 424 of row displacement for compacted sparse matricies. Given an opcode and
 425 the current state, the value of
 426 .I OO_base[OO_state]
 427 is consulted to obtain a pointer into the array
 428 .I OO_checknext.
 429 If this pointer in zero or the
 430 .I check
 431 field of the addressed structure does
 432 not correspond to the curerent state then it is known there is no entry for
 433 this opcode/state pair and the
 434 .I OO_default
 435 array is consulted instead.
 436 If the check field does match then the
 437 .I next
 438 field contains the new state.
 439 After each transition the array
 440 .I OO_ftrans
 441 is consulted to see if this state corresponds to a final state
 442 (i.e. a complete pattern) and if so the corresponding function is called.
 443 .IP trans.c 10
 444 this contains external declarations of transition routines with names like
 445 .B OO_xxxdotrans
 446 (where
 447 .I xxx
 448 is a small integer).
 449 These are called when there a transition to state
 450 .I xxx
 451 that corresponds to a
 452 complete pattern. Any tests are performed if necessary to confirm that the
 453 pattern matches and then the replacement instructions are placed on the
 454 output queue and the routine
 455 .I OO_mkrepl
 456 is called to make the replacement and if backup the amount required.
 457 If there are a number of patterns with the same instructions but different
 458 tests, these will all appear in the same routine and the tests performed in
 459 the order they appear in the original
 460 .I patterns
 461 file.
 462 .IP incalls.r 10
 463 this contains an entry for every EM instruction (plus
 464 .I lab
 465 ) giving information on how to build a routine with the name
 466 .BI O_ xxx
 467 for the library version of the module.
 468 If the EM instruction does not appear in the tables
 469 patterns at all then the dfa routine is called to flush any current queued
 470 output and the the output
 471 .BI C_ xxx
 472 routine is called. If the EM instruction does appear in a pattern then the
 473 instruction data structure fields are
 474 initialized and it is added onto the end of the pattern queue.
 475 The dfa routines are then called to attempted to make a transition.
 476 This file is input to the
 477 .I awk
 478 program
 479 .I makefuns.awk.
 480
 481 .LP
 482 The following files contain code that is independent of the pattern tables:
 483 .IP main.c 10
 484 this is used only in the stand alone version of the optimizer and consists
 485 of code to open the input file, read the input using the
 486 .I READ_EM(3)
 487 module and call the dfa routines. This version does not require the routines
 488 constructed from the incalls.r file described above.
 489 .IP nopt.c 10
 490 general routines to initialize, and maintain the data structures. The file
 491 handling routines
 492 .I O_open
 493 etc are defined here. Also defined are routines for flushing the output queue
 494 by calling the
 495 .I EM_mkcalls
 496 routine from the
 497 .I READ_EM(3)
 498 module and moving instructions from the output to the backup queue.
 499 Routines to free the strings stored in instructions
 500 with types of
 501 .I sof_ptyp,
 502 .I pro_ptyp,
 503 .I str_ptyp,
 504 .I ico_ptyp,
 505 .I uco_ptyp,
 506 and
 507 .I fco_ptyp are also defined. These strings are copied to a large array that
 508 is extended by
 509 .I Realloc
 510 if it overflows. The strings can be thrown away on any flush that occurs when
 511 the backup queue is empty.
 512 .IP mkstrct.c 10
 513 contains routines to build the data structure from the input
 514 .BI C_ xxx
 515 routines and place the structure on the pattern queue. These routines are also
 516 used to build the data structures when a replacement is constructed.
 517 .IP aux.c 10
 518 routines to implement the external functions used in the pattern table.
 519
 520 .LP
 521 The following files are also used in building the module library:
 522 .IP makefuns.awk 10
 523 this
 524 .I awk
 525 program is used to produce individual C files with names like
 526 .BI O_ xxx.c
 527 each containing a single function definition and then call the
 528 .I cc
 529 compiler to produce a single output file.
 530 This enables the loader to only load those routines that are actually
 531 needed when the library is loaded.
 532 .IP pseudo.r 10
 533 this file is like the
 534 .I incalls.r
 535 file produced by the parser but is built by hand and handles the pseudo
 536 EM instructions. It is also processed by
 537 .I makefuns.awk.
 538
 539 .NH
 540 Miscellaneous Issues
 541 .LP
 542 The output, pattern and backup queues are maintained in fixed length array,
 543 .I OO_buffer
 544 allocated of size
 545 .I MAXBUFFER
 546 (a constant declared in nopt.h) at run time.
 547 It consists of an array of the
 548 .I e_instr
 549 data structure used by the
 550 .I READ_EM(3)
 551 module.
 552 At any time the pointers
 553 .I OO_patternqueue
 554 and
 555 .I OO_nxtpatt
 556 point to the beginning and end of the current pattern prefix that corresponds
 557 to the current state. Any instructions on the backup queue are between
 558 .I OO_nxtpatt
 559 and
 560 .I OO_endbackup.
 561 If there are no instructions on the backup queue then
 562 .I OO_endbackup
 563 will be 0 (zero).
 564 The size of the replacement queue is set to the length of the maximum
 565 replacement length by the tables output by the parser.
 566
 567 .LP
 568 The fixed size of the buffer causes no difficulty in
 569 practice and can only result in some potential optimizations being missed.
 570 When space for a new instruction is required and the buffer is full the
 571 routine
 572 .I OO_halfflush
 573 is called to flush half the buffer and move all the data structures left.
 574 It should be noted that it is not possible to statically determine the
 575 maximum possible size for these queues as they need to be unbounded in
 576 the worst case.
 577 A study of the rule
 578 .DS
 579 .I
 580         inc dec :
 581 .R
 582 .DE
 583 with the input consisting of
 584 .I N
 585 .I inc
 586 and then
 587 .I N
 588 .I dec
 589 instructions requires an output queue length of
 590 .I N-1
 591 to find all possible replacements.