doc/ego/ic/ic3

   1 .NH 2
   2 Definition of the intermediate code
   3 .PP
   4 The intermediate code of the optimizer consists
   5 of several components:
   6 .IP -
   7 the object table
   8 .IP -
   9 the procedure table
  10 .IP -
  11 the em code
  12 .IP -
  13 the control flow graphs
  14 .IP -
  15 the loop table
  16 .LP -
  17 .PP
  18 These components are described in
  19 the next sections.
  20 The syntactic structure of every component
  21 is described by a set of context free syntax rules,
  22 with the following conventions:
  23 .DS
  24 .TS
  25 l l.
  26 x       a non-terminal symbol
  27 A       a terminal symbol (in capitals)
  28 x: a b c;       a grammar rule
  29 a | b   a or b
  30 (a)+    1 or more occurrences of a
  31 {a}     0 or more occurrences of a
  32 .TE
  33 .DE
  34 .NH 3
  35 The object table
  36 .PP
  37 EM programs declare blocks of bytes rather than (global) variables.
  38 A typical program may declare 'HOL 7780'
  39 to allocate space for 8 I/O buffers,
  40 2 large arrays and 10 scalar variables.
  41 The optimizer wants to deal with
  42 .UL objects
  43 like variables, buffers and arrays
  44 and certainly not with huge numbers of bytes.
  45 Therefore the intermediate code contains information
  46 about which global objects are used.
  47 This information can be obtained from an EM program
  48 by just looking at the operands of instruction
  49 such as LOE, LAE, LDE, STE, SDE, INE, DEE and ZRE.
  50 .PP
  51 The object table consists of a list of
  52 .UL datablock
  53 entries.
  54 Each such entry represents a declaration like HOL, BSS,
  55 CON or ROM.
  56 There are five kinds of datablock entries.
  57 The fifth kind,
  58 UNKNOWN, denotes a declaration in a
  59 separately compiled file that is not made
  60 available to the optimizer.
  61 Each datablock entry contains the type of the block,
  62 its size, and a description of the objects that
  63 belong to it.
  64 If it is a rom,
  65 it also contains a list of values given
  66 as arguments to the rom instruction,
  67 provided that this list contains only integer numbers.
  68 An object has an offset (within its datablock)
  69 and a size.
  70 The size need not always be determinable.
  71 Both datablock and object contain a unique
  72 identifying number
  73 (see previous section for their use).
  74 .DS
  75 .UL syntax
  76 .TS
  77 lw(1i) l l.
  78 object_table:
  79         {datablock} ;
  80 datablock:
  81         D_ID    -- unique identifying number
  82         PSEUDO  -- one of ROM,CON,BSS,HOL,UNKNOWN
  83         SIZE    -- # bytes declared
  84         FLAGS
  85         {value} -- contents of rom
  86         {object} ;      -- objects of the datablock
  87 object:
  88         O_ID    -- unique identifying number
  89         OFFSET  -- offset within the datablock
  90         SIZE ;  -- size of the object in bytes
  91 value:
  92         argument ;
  93 .TE
  94 .DE
  95 A data block has only one flag: "external", indicating
  96 whether the data label is externally visible.
  97 The syntax for "argument" will be given later on
  98 (see em_text).
  99 .NH 3
 100 The procedure table
 101 .PP
 102 The procedure table contains global information
 103 about all procedures that are made available
 104 to the optimizer
 105 and that are needed by the EM program.
 106 (Library units may not be needed, see section 3.5).
 107 The table has one entry for
 108 every procedure.
 109 .DS
 110 .UL syntax
 111 .TS
 112 lw(1i) l l.
 113 procedure_table:
 114         {procedure}
 115 procedure:
 116         P_ID    -- unique identifying number
 117         #LABELS -- number of instruction labels
 118         #LOCALS -- number of bytes for locals
 119         #FORMALS        -- number of bytes for formals
 120         FLAGS   -- flag bits
 121         calling -- procedures called by this one
 122         change  -- info about global variables changed
 123         use ;   -- info about global variables used
 124 calling:
 125         {P_ID} ;        -- procedures called
 126 change:
 127         ext     -- external variables changed
 128         FLAGS ;
 129 use:
 130         FLAGS ;
 131 ext:
 132         {O_ID} ;        -- a set of objects
 133 .TE
 134 .DE
 135 .PP
 136 The number of bytes of formal parameters accessed by
 137 a procedure is determined by the front ends and
 138 passed via a message (parameter message) to the optimizer.
 139 If the front end is not able to determine this number
 140 (e.g. the parameter may be an array of dynamic size or
 141 the procedure may have a variable number of arguments) the attribute
 142 contains the value 'UNKNOWN_SIZE'.
 143 .sp 0
 144 A procedure has the following flags:
 145 .IP -
 146 external: true if the proc. is externally visible
 147 .IP -
 148 bodyseen: true if its code is available as EM text
 149 .IP -
 150 calunknown: true if it calls a procedure that has its bodyseen
 151 flag not set
 152 .IP -
 153 environ: true if it uses or changes a (non-global) variable in
 154 a lexically enclosing procedure
 155 .IP -
 156 lpi: true if is used as operand of an lpi instruction, so
 157 it may be called indirect
 158 .LP
 159 The change and use attributes both have one flag: "indirect",
 160 indicating whether the procedure does a 'use indirect'
 161 or a 'store indirect' (indirect means through a pointer).
 162 .NH 3
 163 The EM text
 164 .PP
 165 The EM text contains the EM instructions.
 166 Every EM instruction has an operation code (opcode)
 167 and 0 or 1 operands.
 168 EM pseudo instructions can have more than
 169 1 operand.
 170 The opcode is just a small (8 bit) integer.
 171 .sp
 172 There are several kinds of operands, which we will
 173 refer to as
 174 .UL types.
 175 Many EM instructions can have more than one type of operand.
 176 The types and their encodings in Compact Assembly Language
 177 are discussed extensively in.
 178 .[~[
 179 keizer architecture
 180 .], section 11.2]
 181 Of special interest is the way numeric values
 182 are represented.
 183 Of prime importance is the machine independency of
 184 the representation.
 185 Ultimately, one could store every integer
 186 just as a string of the characters '0' to '9'.
 187 As doing arithmetic on strings is awkward,
 188 Compact Assembly Language allows several alternatives.
 189 The main idea is to look at the value of the integer.
 190 Integers that fit in 16, 32 or 64 bits are
 191 represented as a row of resp. 2, 4 and 8 bytes,
 192 preceded by an indication of how many bytes are used.
 193 Longer integers are represented as strings;
 194 this is only allowed within pseudo instructions, however.
 195 This concept works very well for target machines
 196 with reasonable word sizes.
 197 At present, most ACK software cannot be used for word sizes
 198 higher than 32 bits,
 199 although the handles for using larger word sizes are
 200 present in the design of the EM code.
 201 In the intermediate code we essentially use the
 202 same ideas.
 203 We allow three representations of integers.
 204 .IP -
 205 integers that fit in a short are represented as a short
 206 .IP -
 207 integers that fit in a long but not in a short are represented
 208 as longs
 209 .IP -
 210 all remaining integers are represented as strings
 211 (only allowed in pseudos).
 212 .LP
 213 The terms short and long are defined in
 214 .[~[
 215 ritchie reference manual programming language
 216 .], section 4]
 217 and depend only on the source machine
 218 (i.e. the machine on which ACK runs),
 219 not on the target machines.
 220 For historical reasons a long will often be called an
 221 .UL offset.
 222 .PP
 223 Operands can also be instruction labels,
 224 objects or procedures.
 225 Instruction labels are denoted by a
 226 .UL label
 227 .UL identifier,
 228 which can be distinguished from a normal identifier.
 229 .sp
 230 The operand of a pseudo instruction can be a list of
 231 .UL arguments.
 232 Arguments can have the same type as operands, except
 233 for the type short, which is not used for arguments.
 234 Furthermore, an argument can be a string or
 235 a string representation of a signed integer, unsigned integer
 236 or floating point number.
 237 If the number of arguments is not fully determined by
 238 the pseudo instruction (e.g. a ROM pseudo can have any number
 239 of arguments), then the list is terminated by a special
 240 argument of type CEND.
 241 .DS
 242 .UL syntax
 243 .TS
 244 lw(1i) l l.
 245 em_text:
 246         {line} ;
 247 line:
 248         INSTR   -- opcode
 249         OPTYPE  -- operand type
 250         operand ;
 251 operand:
 252         empty | -- OPTYPE = NO
 253         SHORT | -- OPTYPE = SHORT
 254         OFFSET |        -- OPTYPE = OFFSET
 255         LAB_ID |        -- OPTYPE = INSTRLAB
 256         O_ID |  -- OPTYPE = OBJECT
 257         P_ID |  -- OPTYPE = PROCEDURE
 258         {argument} ;    -- OPTYPE = LIST
 259 argument:
 260         ARGTYPE
 261         arg ;
 262 arg:
 263         empty | -- ARGTYPE = CEND
 264         OFFSET |
 265         LAB_ID |
 266         O_ID |
 267         P_ID |
 268         string |        -- ARGTYPE = STRING
 269         const ; -- ARGTYPE = ICON,UCON or FCON
 270 string:
 271         LENGTH  -- number of characters
 272         {CHARACTER} ;
 273 const:
 274         SIZE    -- number of bytes
 275         string ;        -- string representation of (un)signed
 276                 -- or floating point constant
 277 .TE
 278 .DE
 279 .NH 3
 280 The control flow graphs
 281 .PP
 282 Each procedure can be divided
 283 into a number of basic blocks.
 284 A basic block is a piece of code with
 285 no jumps in, except at the beginning,
 286 and no jumps out, except at the end.
 287 .PP
 288 Every basic block has a set of
 289 .UL successors,
 290 which are basic blocks that can follow it immediately in
 291 the dynamic execution sequence.
 292 The
 293 .UL predecessors
 294 are the basic blocks of which this one
 295 is a successor.
 296 The successor and predecessor attributes
 297 of all basic blocks of a single procedure
 298 are said to form the
 299 .UL control
 300 .UL flow
 301 .UL graph
 302 of that procedure.
 303 .PP
 304 Another important attribute is the
 305 .UL immediate
 306 .UL dominator.
 307 A basic block B dominates a block C if
 308 every path in the graph from the procedure entry block
 309 to C goes through B.
 310 The immediate dominator of C is the closest dominator
 311 of C on any path from the entry block.
 312 (Note that the dominator relation is transitive,
 313 so the immediate dominator is well defined.)
 314 .PP
 315 A basic block also has an attribute containing
 316 the identifiers of every
 317 .UL loop
 318 that the block belongs to (see next section for loops).
 319 .DS
 320 .UL syntax
 321 .TS
 322 lw(1i) l l.
 323 control_flow_graph:
 324         {basic_block} ;
 325 basic_block:
 326         B_ID    -- unique identifying number
 327         #INSTR  -- number of EM instructions
 328         succ
 329         pred
 330         idom    -- immediate dominator
 331         loops   -- set of loops
 332         FLAGS ; -- flag bits
 333 succ:
 334         {B_ID} ;
 335 pred:
 336         {B_ID} ;
 337 idom:
 338         B_ID ;
 339 loops:
 340         {LP_ID} ;
 341 .TE
 342 .DE
 343 The flag bits can have the values 'firm' and 'strong',
 344 which are explained below.
 345 .NH 3
 346 The loop tables
 347 .PP
 348 Every procedure has an associated
 349 .UL loop
 350 .UL table
 351 containing information about all the loops
 352 in the procedure.
 353 Loops can be detected by a close inspection of
 354 the control flow graph.
 355 The main idea is to look for two basic blocks,
 356 B and C, for which the following holds:
 357 .IP -
 358 B is a successor of C
 359 .IP -
 360 B is a dominator of C
 361 .LP
 362 B is called the loop
 363 .UL entry
 364 and C is called the loop
 365 .UL end.
 366 Intuitively, C contains a jump backwards to
 367 the beginning of the loop (B).
 368 .PP
 369 A loop L1 is said to be
 370 .UL nested
 371 within loop L2 if all basic blocks of L1
 372 are also part of L2.
 373 It is important to note that loops could
 374 originally be written as a well structured for -or
 375 while loop or as a messy goto loop.
 376 Hence loops may partly overlap without one
 377 being nested inside the other.
 378 The
 379 .UL nesting
 380 .UL level
 381 of a loop is the number of loops in
 382 which it is nested (so it is 0 for
 383 an outermost loop).
 384 The details of loop detection will be discussed later.
 385 .PP
 386 It is often desirable to know whether a
 387 basic block gets executed during every iteration
 388 of a loop.
 389 This leads to the following definitions:
 390 .IP -
 391 A basic block B of a loop L is said to be a \fIfirm\fR block
 392 of L if B is executed on all successive iterations of L,
 393 with the only possible exception of the last iteration.
 394 .IP -
 395 A basic block B of a loop L is said to be a \fIstrong\fR block
 396 of L if B is executed on all successive iterations of L.
 397 .LP
 398 Note that a strong block is also a firm block.
 399 If a block is part of a conditional statement, it is neither
 400 strong nor firm, as it may be skipped during some iterations
 401 (see Fig. 3.2).
 402 .DS
 403 loop
 404        if cond1 then
 405               ... \kx-- this code will not
 406                   \h'|\nxu'-- result in a firm or strong block
 407        end if;
 408        ...  -- strong (always executed)
 409        exit when cond2;
 410        ...  \kx-- firm (not executed on last iteration).
 411 end loop;
 412
 413 Fig. 3.2 Example of firm and strong block
 414 .DE
 415 .DS
 416 .UL syntax
 417 .TS
 418 lw(1i) l l.
 419 looptable:
 420         {loop} ;
 421 loop:
 422         LP_ID   -- unique identifying number
 423         LEVEL   -- loop nesting level
 424         entry   -- loop entry block
 425         end ;
 426 entry:
 427         B_ID ;
 428 end:
 429         B_ID ;
 430 .TE
 431 .DE