2 Definition of the intermediate code
4 The intermediate code of the optimizer consists
13 the control flow graphs
18 These components are described in
20 The syntactic structure of every component
21 is described by a set of context free syntax rules,
22 with the following conventions:
26 x a non-terminal symbol
27 A a terminal symbol (in capitals)
28 x: a b c; a grammar rule
30 (a)+ 1 or more occurrences of a
31 {a} 0 or more occurrences of a
37 EM programs declare blocks of bytes rather than (global) variables.
38 A typical program may declare 'HOL 7780'
39 to allocate space for 8 I/O buffers,
40 2 large arrays and 10 scalar variables.
41 The optimizer wants to deal with
43 like variables, buffers and arrays
44 and certainly not with huge numbers of bytes.
45 Therefore the intermediate code contains information
46 about which global objects are used.
47 This information can be obtained from an EM program
48 by just looking at the operands of instruction
49 such as LOE, LAE, LDE, STE, SDE, INE, DEE and ZRE.
51 The object table consists of a list of
54 Each such entry represents a declaration like HOL, BSS,
56 There are five kinds of datablock entries.
58 UNKNOWN, denotes a declaration in a
59 separately compiled file that is not made
60 available to the optimizer.
61 Each datablock entry contains the type of the block,
62 its size, and a description of the objects that
65 it also contains a list of values given
66 as arguments to the rom instruction,
67 provided that this list contains only integer numbers.
68 An object has an offset (within its datablock)
70 The size need not always be determinable.
71 Both datablock and object contain a unique
73 (see previous section for their use).
81 D_ID -- unique identifying number
82 PSEUDO -- one of ROM,CON,BSS,HOL,UNKNOWN
83 SIZE -- # bytes declared
85 {value} -- contents of rom
86 {object} ; -- objects of the datablock
88 O_ID -- unique identifying number
89 OFFSET -- offset within the datablock
90 SIZE ; -- size of the object in bytes
95 A data block has only one flag: "external", indicating
96 whether the data label is externally visible.
97 The syntax for "argument" will be given later on
102 The procedure table contains global information
103 about all procedures that are made available
105 and that are needed by the EM program.
106 (Library units may not be needed, see section 3.5).
107 The table has one entry for
116 P_ID -- unique identifying number
117 #LABELS -- number of instruction labels
118 #LOCALS -- number of bytes for locals
119 #FORMALS -- number of bytes for formals
121 calling -- procedures called by this one
122 change -- info about global variables changed
123 use ; -- info about global variables used
125 {P_ID} ; -- procedures called
127 ext -- external variables changed
132 {O_ID} ; -- a set of objects
136 The number of bytes of formal parameters accessed by
137 a procedure is determined by the front ends and
138 passed via a message (parameter message) to the optimizer.
139 If the front end is not able to determine this number
140 (e.g. the parameter may be an array of dynamic size or
141 the procedure may have a variable number of arguments) the attribute
142 contains the value 'UNKNOWN_SIZE'.
144 A procedure has the following flags:
146 external: true if the proc. is externally visible
148 bodyseen: true if its code is available as EM text
150 calunknown: true if it calls a procedure that has its bodyseen
153 environ: true if it uses or changes a (non-global) variable in
154 a lexically enclosing procedure
156 lpi: true if is used as operand of an lpi instruction, so
157 it may be called indirect
159 The change and use attributes both have one flag: "indirect",
160 indicating whether the procedure does a 'use indirect'
161 or a 'store indirect' (indirect means through a pointer).
165 The EM text contains the EM instructions.
166 Every EM instruction has an operation code (opcode)
168 EM pseudo instructions can have more than
170 The opcode is just a small (8 bit) integer.
172 There are several kinds of operands, which we will
175 Many EM instructions can have more than one type of operand.
176 The types and their encodings in Compact Assembly Language
177 are discussed extensively in.
181 Of special interest is the way numeric values
183 Of prime importance is the machine independency of
185 Ultimately, one could store every integer
186 just as a string of the characters '0' to '9'.
187 As doing arithmetic on strings is awkward,
188 Compact Assembly Language allows several alternatives.
189 The main idea is to look at the value of the integer.
190 Integers that fit in 16, 32 or 64 bits are
191 represented as a row of resp. 2, 4 and 8 bytes,
192 preceded by an indication of how many bytes are used.
193 Longer integers are represented as strings;
194 this is only allowed within pseudo instructions, however.
195 This concept works very well for target machines
196 with reasonable word sizes.
197 At present, most ACK software cannot be used for word sizes
199 although the handles for using larger word sizes are
200 present in the design of the EM code.
201 In the intermediate code we essentially use the
203 We allow three representations of integers.
205 integers that fit in a short are represented as a short
207 integers that fit in a long but not in a short are represented
210 all remaining integers are represented as strings
211 (only allowed in pseudos).
213 The terms short and long are defined in
215 ritchie reference manual programming language
217 and depend only on the source machine
218 (i.e. the machine on which ACK runs),
219 not on the target machines.
220 For historical reasons a long will often be called an
223 Operands can also be instruction labels,
224 objects or procedures.
225 Instruction labels are denoted by a
228 which can be distinguished from a normal identifier.
230 The operand of a pseudo instruction can be a list of
232 Arguments can have the same type as operands, except
233 for the type short, which is not used for arguments.
234 Furthermore, an argument can be a string or
235 a string representation of a signed integer, unsigned integer
236 or floating point number.
237 If the number of arguments is not fully determined by
238 the pseudo instruction (e.g. a ROM pseudo can have any number
239 of arguments), then the list is terminated by a special
240 argument of type CEND.
249 OPTYPE -- operand type
252 empty | -- OPTYPE = NO
253 SHORT | -- OPTYPE = SHORT
254 OFFSET | -- OPTYPE = OFFSET
255 LAB_ID | -- OPTYPE = INSTRLAB
256 O_ID | -- OPTYPE = OBJECT
257 P_ID | -- OPTYPE = PROCEDURE
258 {argument} ; -- OPTYPE = LIST
263 empty | -- ARGTYPE = CEND
268 string | -- ARGTYPE = STRING
269 const ; -- ARGTYPE = ICON,UCON or FCON
271 LENGTH -- number of characters
274 SIZE -- number of bytes
275 string ; -- string representation of (un)signed
276 -- or floating point constant
280 The control flow graphs
282 Each procedure can be divided
283 into a number of basic blocks.
284 A basic block is a piece of code with
285 no jumps in, except at the beginning,
286 and no jumps out, except at the end.
288 Every basic block has a set of
290 which are basic blocks that can follow it immediately in
291 the dynamic execution sequence.
294 are the basic blocks of which this one
296 The successor and predecessor attributes
297 of all basic blocks of a single procedure
304 Another important attribute is the
307 A basic block B dominates a block C if
308 every path in the graph from the procedure entry block
310 The immediate dominator of C is the closest dominator
311 of C on any path from the entry block.
312 (Note that the dominator relation is transitive,
313 so the immediate dominator is well defined.)
315 A basic block also has an attribute containing
316 the identifiers of every
318 that the block belongs to (see next section for loops).
326 B_ID -- unique identifying number
327 #INSTR -- number of EM instructions
330 idom -- immediate dominator
331 loops -- set of loops
343 The flag bits can have the values 'firm' and 'strong',
344 which are explained below.
348 Every procedure has an associated
351 containing information about all the loops
353 Loops can be detected by a close inspection of
354 the control flow graph.
355 The main idea is to look for two basic blocks,
356 B and C, for which the following holds:
358 B is a successor of C
360 B is a dominator of C
364 and C is called the loop
366 Intuitively, C contains a jump backwards to
367 the beginning of the loop (B).
369 A loop L1 is said to be
371 within loop L2 if all basic blocks of L1
373 It is important to note that loops could
374 originally be written as a well structured for -or
375 while loop or as a messy goto loop.
376 Hence loops may partly overlap without one
377 being nested inside the other.
381 of a loop is the number of loops in
382 which it is nested (so it is 0 for
384 The details of loop detection will be discussed later.
386 It is often desirable to know whether a
387 basic block gets executed during every iteration
389 This leads to the following definitions:
391 A basic block B of a loop L is said to be a \fIfirm\fR block
392 of L if B is executed on all successive iterations of L,
393 with the only possible exception of the last iteration.
395 A basic block B of a loop L is said to be a \fIstrong\fR block
396 of L if B is executed on all successive iterations of L.
398 Note that a strong block is also a firm block.
399 If a block is part of a conditional statement, it is neither
400 strong nor firm, as it may be skipped during some iterations
405 ... \kx-- this code will not
406 \h'|\nxu'-- result in a firm or strong block
408 ... -- strong (always executed)
410 ... \kx-- firm (not executed on last iteration).
413 Fig. 3.2 Example of firm and strong block
422 LP_ID -- unique identifying number
423 LEVEL -- loop nesting level
424 entry -- loop entry block