3 # PiTree: Specification Language for Abstract Syntax Trees
7 The πtree program is a companion program to πlex and πyacc. It implements a
8 specification language for abstract syntax trees. In general you will have a
9 lexical specification such as `my_language.l`, a grammar specification such
10 as `my_language.y`, and an AST specification such as `my_language.t`. These
11 will be compiled by πlex, πyacc, and πtree respectively. Then when you run the
12 generated lexer/parser it will instantiate classes from the AST specification.
14 The `my_language.t` AST specification is formatted something like a Lex or
15 YACC specification. It has three parts, separated by %% lines. The parts are
16 (i) directives and header information, (ii) the class declarations for the
17 different objects that can be in an AST, in a C++-like syntax, (iii) free-form
18 code which is appended to the πtree output and can contain subroutines, etc.
19 Note that in πtree the free-form section serves a second purpose, which is to
20 attach methods to the classes declared in the classes section. We do this with
21 regular Python syntax, including a decorator that we provide for this purpose.
23 It is possible to define your own class hierarchy to use with πlex and πyacc,
24 without using πtree. You can either define action code in your Lex and YACC
25 specification to instantiate the objects (the traditional way), or you can use
26 πlex and πyacc's markup syntax embedded within the πlex and πyacc rules, to say
27 what you want generated. If you do it the latter way, then the classes need to
28 have a constructor defined in a specific way so that πlex or πyacc can easily
29 pass in the child nodes of the AST and any field values to store in the node.
31 The reason why it is convenient to use πtree to define the class hierarchy is
32 (i) πtree automatically declares the right kind of constructor that can take
33 the child nodes and field values, and deal with them appropriately by either
34 storing or passing into the superclass's constructor, (ii) πtree automatically
35 adds serialization/deserialization methods to the class to/from πtree's XML
36 format. While you could use `pickle` or similar, the XML format is advantageous
37 because it is human readable. It embeds the original parsed text, with markup
38 giving class names and field values from your πlex/πyacc/πtree specification.
42 From PyPI we can install πtree by a command like `pip3 install --user pitree`.
43 This will put a `pitree` executable in `$HOME/.local/bin` where `$HOME` is your
44 home directory. You need to make sure this is in your path, so if the directory
45 `$HOME/.local/bin` is being created for the first time, you may wish to log out
46 and back in again. At least on Debian systems it will then appear in your path.
48 As well as the `pitree` executable, the `ndcode.pitree` package needs to be
49 importable from Python, since the first statement of the generated code (from
50 running the `pitree` executable) will usually be something like `from
51 ndcode.pitree import element`. With the installation method recommended above,
52 this module will be placed somewhere like
53 `$HOME/.local/lib/python3.6/site-packages/pitree-0.0.1-py3.6.egg/ndcode/pitree/element.py`
54 where it is accessible to the generated code.
56 An alternative way, if you have already checked out the source repository from
57 `https://git.ndcode.org/public/pitree.git` or similar, is to run a command like
58 `. ./env.sh` in the root of the repository. This will put the current directory
59 (the root of the repository) into your `PYTHONPATH`. The source files have been
60 put in a directory `ndcode/pitree` so you can then `import ndcode.pitree.XXX`.
61 With this method, instead of just `pitree` you must run `ndcode/pitree/cli.py`.
62 Also, you'll have to run `make` in `ndcode/pitree` before anything will work.
66 We are going to define a simple AST for a four-function calculator language.
67 The leaf nodes of the tree will be literals expressed as strings like '5.0'.
68 The non-leaf nodes will be operators. The negation operator will have one
69 child being the expression to negate. The add/subtract/multiply/divide nodes
70 will have two children being the expressions to add/subtract/multiply/divide.
72 ### Tutorial: defining a tree
74 Create a file called `calc.t` containing the following:
77 from ndcode.pitree import element
83 class LiteralExpression: Expression;
84 class NegExpression: Expression;
85 class AddExpression: Expression;
86 class SubExpression: Expression;
87 class MulExpression: Expression;
88 class DivExpression: Expression;
91 The text is free-form like C, whitespace is insignificant, and statements
92 are delimited by semicolons. We have given a simple directives section that
93 defines some Python code to be copied to the output before anything else,
94 here an `import` statement that imports the `ndcode.pitree.element` module.
96 Then we have defined an abstract class `Expression`, which acts as a superclass
97 to the literal and operator expression classes. Finally, we have gone ahead and
98 defined the other classes, using πtree's `:` syntax to denote the superclass.
100 Note that `Expression` has an implicit superclass of `element.Element`, using
101 the `element` module imported in the directives section. Hence, some `element`
102 module always needs to be imported (but you can use your own `element` module).
104 ### Tutorial: compiling the tree
106 To compile this specification, run a command like `pitree --python calc.t`.
107 By default this creates an output file called `t_def.py`. (Similarly, πlex
108 defaults to creating `lex_yy.py` and πyacc defaults to creating `y_tab.py`,
109 behaviour which is inherited from the C versions of the tools). The name
110 `t_def.py` stands for "Tree Definition". You can also specify your own output
111 file name using `-o`, for example `pitree --python -o calc_def.py calc.t`.
113 Let us now have a quick look at the generated output, it is pretty simple:
115 # ... copyright stuff (omitted for brevity) ...
119 # GENERATE SECTION1 BEGIN
120 from ndcode.pitree import element
123 # GENERATE SECTION2 BEGIN
124 class Expression(element.Element):
126 class LiteralExpression(Expression):
127 tag = 'LiteralExpression'
128 class NegExpression(Expression):
129 tag = 'NegExpression'
130 class AddExpression(Expression):
131 tag = 'AddExpression'
132 class SubExpression(Expression):
133 tag = 'SubExpression'
134 class MulExpression(Expression):
135 tag = 'MulExpression'
136 class DivExpression(Expression):
137 tag = 'DivExpression'
139 'Expression': Expression,
140 'LiteralExpression': LiteralExpression,
141 'NegExpression': NegExpression,
142 'AddExpression': AddExpression,
143 'SubExpression': SubExpression,
144 'MulExpression': MulExpression,
145 'DivExpression': DivExpression
151 setattr(_class, func.__name__, func)
155 # GENERATE SECTION3 BEGIN
159 The wanted classes are just defined as regular Python classes, and there is
160 a small amount of stuff to support serializing/deserializing. The `tag` field
161 of each class is for serializing and converts a class to a textual name, the
162 `tag_to_class` dictionary is for deserializing and goes the other way. The
163 `method` function is a decorator, and will be discussed later in the tutorial.
164 You can insert arbitary code in the SECTION1 and SECTION3 parts of the output.
166 The `json` import is not referenced in this particular example, but in later
167 examples it will be used in the generated code to serialize and deserialize
168 Python native types such as `str`, `int` and so on. This is a bit lazy on our
169 part, since we know what type is being serialized/deserialized, so we could
170 generate more specific code to do this, but the `json` way is easier for now.
172 ### Tutorial: serializing the tree
174 Let us now create a short program which imports and uses the example tree
175 definition from above. Create a script `serialize.py` containing the following:
177 #!/usr/bin/env python3
179 from ndcode.pitree import element
183 expression = t_def.AddExpression(
187 t_def.LiteralExpression(text = ['5']),
188 t_def.LiteralExpression(text = ['2'])
191 t_def.LiteralExpression(text = ['7'])
194 element.serialize(expression, sys.stdout)
197 Make it executable, `chmod a+x ./serialize.py`. For the remaining scripts in
198 the tutorial, we'll assume that you know to do this step, or that you will run
199 the scripts via the `python3` command if you prefer them not to be executable.
201 This program programmatically constructs a simple AST for the calculator
202 expression `5 * 2 + 7`, and then serializes it to `sys.stdout` for inspection.
204 Basically all the heavy lifting is done in the `ndcode.pitree.element` module,
205 via a combination of the utility routine `element.serialize()`, and supporting
206 methods that we have defined on the superclass `ndcode.pitree.element.Element`.
208 Run this and redirect the output to a file, e.g. `./serialize.py >expr.xml`.
209 You should get a file containing something like:
212 <AddExpression ref="0"><MulExpression><LiteralExpression>5</LiteralExpression><LiteralExpression>2</LiteralExpression></MulExpression><LiteralExpression>7</LiteralExpression></AddExpression>
216 The `<root>` and `</root>` tags and the `ref="0"` attribute are added by the
217 serializer, to support the attachment of arbitrary structures to AST nodes,
218 such as from semantic analysis. We will not consider such use for now. But note
219 that when deserializing, the node returned is the one with the `ref="0"` on it.
221 We have deliberately not indented this in the normal XML way, for reasons that
222 should become clear further in the tutorial. If we do indent it manually, it
223 becomes more obvious that this expression is nested like `(5 * 2) + 7`, e.g.:
226 <AddExpression ref="0">
228 <LiteralExpression>5</LiteralExpression>
229 <LiteralExpression>2</LiteralExpression>
231 <LiteralExpression>7</LiteralExpression>
236 ### Tutorial: methods on the tree
238 What we will do now is create a method to perform the requested calculation,
239 e.g. with the above example tree, it would add 5 and 2, then multiply by 7.
241 We are going to do this in an object-oriented manner, by defining a method
242 `evaluate()` on the `Expression` class. This method will be specialized in the
243 leaf class `LiteralExpression` and the non-leaf classes like `NegExpression`,
244 so that each class knows how to evaluate itself. For the leaf class this means
245 creating a number to operate on, for the non-leaf classes this means evaluating
246 the child nodes and then operating on these results by negation, addition, etc.
248 We will define these methods in a slightly unconventional way, by attaching
249 them to the class after the class has been defined. This serves two purposes,
250 (i) the code generator for classes is simplified since it does not need to
251 worry about our methods, (ii) our methods can be grouped by method name rather
252 than by class name, making it easy to see how each class evaluates itself. The
253 point ii makes our Python code look a bit like Haskell pattern matching code.
255 To try it, add a free-form section to `calc.t` beginning with `%%`, as follows:
261 raise NotImplementedError
262 @method(LiteralExpression)
264 return float(self.text[0])
265 @method(NegExpression)
267 return -self.children[0].evaluate()
268 @method(AddExpression)
270 return self.children[0].evaluate() + self.children[1].evaluate()
271 @method(SubExpression)
273 return self.children[0].evaluate() - self.children[1].evaluate()
274 @method(MulExpression)
276 return self.children[0].evaluate() * self.children[1].evaluate()
277 @method(DivExpression)
279 return self.children[0].evaluate() / self.children[1].evaluate()
283 The same-named function `evaluate()` is defined a number of times, using the
284 `@method` decorator to attach the different definitions to different classes.
285 Unfortunately the `def` statement doesn't know that the decorator has put the
286 function elsewhere, so it also defines the function in the `t_def` module. To
287 clean up, we therefore do `del evaluate` after we've done all of the attaching.
289 In the leaf node method, we refer to text stored in the node by `self.text[0]`.
290 In the non-leaf node methods, we refer to the child nodes by `self.children[0]`
291 for the first child, `self.children[1]` for the second child and so on. The
292 methods simply assume that the correct number of children are present: 0 for
293 `LiteralExpression`, 1 for `NegExpression`, 2 for `AddExpression` and the rest.
295 Note that following best Python practice, the abstract class `Expression` also
296 gets a definition of the method, which raises `NotImplementedError`. It serves
297 as a catch-all, in case we forget to define it on one of the derived classes.
299 Don't forget to recompile the tree definition, by `pitree --python calc.t`.
300 Then, to test the new method, create a script `evaluate.py` as follows:
302 #!/usr/bin/env python3
306 expression = t_def.AddExpression(
310 t_def.LiteralExpression(text = ['5']),
311 t_def.LiteralExpression(text = ['2'])
314 t_def.LiteralExpression(text = ['7'])
317 print(expression.evaluate())
319 Running `./evaluate.py` should show the answer of evaluating the tree, `17.0`.
321 ### Tutorial: deserializing the tree
323 To support deserialization, we have to add a tiny bit of boilerplate code to
324 our tree definition `calc.t` in the free-form section, as follows:
326 def factory(tag, *args, **kwargs):
327 return tag_to_class[tag](*args, **kwargs)
330 What this does is, given a tag name and a set of constructor arguments, calls
331 the appropriate constructor to create the requested node, e.g. it constructs a
332 `LiteralExpression` when `tag` is set to `'LiteralExpression'`, and so on.
334 The reason we do this here (instead of doing it automatically in the generated
335 code) is for expandability. For example, we can define a larger language which
336 uses `calc.t` as a mini-language for some of its subtrees, with a more complex
337 factory that can use the `calc.t` factory as a subfactory for particular nodes.
339 Don't forget to recompile the tree definition, by `pitree --python calc.t`.
340 Then, to test the deserializer, create a script `deserialize.py` as follows:
342 #!/usr/bin/env python3
344 from ndcode.pitree import element
348 expression = element.deserialize(sys.stdin, t_def.factory)
349 print(expression.evaluate())
352 If you still have the `expr.xml` file that you created when testing the
353 serializer, you can now feed it to the deserializer by running a command like
354 `./deserialize.py <expr.xml`. It will deserialize, then evaluate the `17.0`.
355 This is quite powerful, as it makes an existing XML file become "intelligent".
357 ### Tutorial: fields on a node
359 Suppose we want to cut down the number of different classes in the tree. We now
360 want to have a `UnaryExpression` class to cover all unary operators, including
361 the former `NegExpression`, and a `BinaryExpression` class to cover all binary
362 operators, including the former `AddExpression` and so on. We will keep track
363 of which kind of operator is requested, by storing it in a field on the node.
364 We use "field" to mean the same as Python instance variables or XML attributes.
366 To avoid confusion we'll call the field `unary_operator` for `UnaryExpression`
367 and `binary_operator` for `BinaryExpression`. For simplicity's sake we'll make
368 the field contain a Python `str`, even though in it would be more efficient to
369 make it an `int` containing an enumeration value. πtree supports a fairly rich
370 type system, subject to types being declared upfront (not Duck typing). We can
371 have `str`, `int`, `bool`, `list(str)` and so on. It's best to think of this in
372 a C++-like way, so `list(str)` is similar to `std::vector<std::string>`, etc.
374 Here is a revised tree definition `calc1.t` using fields, including a factory
375 to support the deserializer, and an appropriately updated `evaluate()` method:
378 from ndcode.pitree import element
384 class LiteralExpression: Expression;
385 class UnaryExpression: Expression {
388 class BinaryExpression: Expression {
394 def factory(tag, *args, **kwargs):
395 return tag_to_class[tag](*args, **kwargs)
399 raise NotImplementedError
400 @method(LiteralExpression)
402 return float(self.text[0])
403 @method(UnaryExpression)
405 value = self.children[0].evaluate()
406 if self.unary_operator == '-':
408 raise NotImplementedError
409 @method(BinaryExpression)
411 value0 = self.children[0].evaluate()
412 value1 = self.children[1].evaluate()
413 if self.binary_operator == '+':
414 return value0 + value1
415 if self.binary_operator == '-':
416 return value0 - value1
417 if self.binary_operator == '*':
418 return value0 * value1
419 if self.binary_operator == '/':
420 return value0 / value1
421 raise NotImplementedError
425 Whilst this is a bit of a contrived example, it illustrates how sometimes an
426 attribute on a node can be more compact than defining a set of classes for the
427 different node types. Here it allows some common code, the evaluation of the
428 child nodes, to be expressed just once for unary and once for binary operators.
430 To avoid confusion let us compile this to `t_def1.py` instead of `t_def.py`.
431 Run `pitree --python -o t_def1.py calc1.t`. Then, create an `evaluate1.py`:
433 #!/usr/bin/env python3
437 expression = t_def1.BinaryExpression(
438 binary_operator = '+',
440 t_def1.BinaryExpression(
441 binary_operator = '*',
443 t_def1.LiteralExpression(text = ['5']),
444 t_def1.LiteralExpression(text = ['2'])
447 t_def1.LiteralExpression(text = ['7'])
450 print(expression.evaluate())
453 To test the serialization with fields, create a `serialize1.py` as follows:
455 #!/usr/bin/env python3
457 from ndcode.pitree import element
461 expression = t_def1.BinaryExpression(
462 binary_operator = '+',
464 t_def1.BinaryExpression(
465 binary_operator = '*',
467 t_def1.LiteralExpression(text = ['5']),
468 t_def1.LiteralExpression(text = ['2'])
471 t_def1.LiteralExpression(text = ['7'])
474 element.serialize(expression, sys.stdout)
477 Running `./serialize1.py >tree1.xml` should produce output in which the field
478 values `unary_operator` and `binary_operator` are stored in XML attributes:
481 <BinaryExpression binary_operator=""+"" ref="0"><BinaryExpression binary_operator=""*""><LiteralExpression>5</LiteralExpression><LiteralExpression>2</LiteralExpression></BinaryExpression><LiteralExpression>7</LiteralExpression></BinaryExpression>
484 Note that `"` in the above represents `'`, and hence the value stored for
485 `binary_operator` is actually `'+'` or `'*'` rather than `+` or `*` directly.
486 The reason for this, is that if the type of the field were say `list(str)`
487 rather than `str`, it would be necessary to quote the strings in order to be
488 able to separate the list items. In the future, we might try to quote only if
489 needed, to produce a more readable file. For now we use Python's `json` module.
491 The deserializer test program for the new format using fields, is the same as
492 `deserialize.py` with `t_def` changed to `t_def1`. It prints `17.0` as before.
494 ### Tutorial: storing the whitespace
496 An important part of the XML serialization process is that it the AST stores
497 the original parsed text within it, which can be recovered either by stripping
498 the serialized XML output of all tags, or by calling `element.to_text()` on an
499 in-memory tree. When in memory, this text is stored in the `text` array of each
500 node. We saw previously, how leaf nodes can get their text by `self.text[0]`.
502 Logically, the contents of a node is its fields, and the interleaving of its
503 text and its children, starting and ending with text. So there is always one
504 more entry in the `text` array than the `children` array. For example, suppose
505 a node has 2 children, e.g. the `AddExpression` or `BinaryExpression` nodes
506 seen in the previous examples. Then it will have 3 pieces of text. Its internal
507 structure can be summarized `text[0] children[0] text[1] children[1] text[2]`.
509 We will now show a revised `serialize2.py` which stores some text in the nodes:
511 #!/usr/bin/env python3
513 from ndcode.pitree import element
517 expression = t_def.AddExpression(
518 text = ['', ' + ', ''],
521 text = ['', ' * ', ''],
523 t_def.LiteralExpression(text = ['5']),
524 t_def.LiteralExpression(text = ['2'])
527 t_def.LiteralExpression(text = ['7'])
530 element.serialize(expression, sys.stdout)
533 This would generate the following serialized output:
536 <AddExpression ref="0"><MulExpression><LiteralExpression>5</LiteralExpression> * <LiteralExpression>2</LiteralExpression></MulExpression> + <LiteralExpression>7</LiteralExpression></AddExpression>
540 If we either strip the tags from the XML output (only the `AddExpression` node)
541 or add a line like `print(element.to_text(expression))` in the test program,
542 then we can see the stripped version is `5 * 2 + 7` which is more satisfying
543 than `527` which would have been printed by the earlier versions without text.
545 Suppose we use πlex, πyacc, and πtree in integrated fashion to parse general
546 computer languages such as C. Then the text-storage facility is used in leaf
547 nodes to store things like identifiers, constants and such, and in non-leaf
548 nodes mainly to store the whitespace separators between tokens. So for example,
549 a source code formatter can run over the tree adjusting the inter-token space.
551 Interestingly though, much of the incoming syntax can also be treated as
552 inter-token space, because it is redundant in respect of the structure imposed
553 by the AST. For example, suppose the original expression was `5 * (2 + 7)`.
554 Since the AST already encodes that the `+` node is nested inside the `*` node,
555 the parentheses are effectively whitespace from the viewpoint of the AST.
557 To see how this works, consider this `serialize3.py` which uses parentheses:
559 #!/usr/bin/env python3
561 from ndcode.pitree import element
565 expression = t_def.MulExpression(
566 text = ['', ' * (', ')'],
568 t_def.LiteralExpression(text = ['5']),
570 text = ['', ' + ', ''],
572 t_def.LiteralExpression(text = ['2']),
573 t_def.LiteralExpression(text = ['7'])
578 element.serialize(expression, sys.stdout)
581 This produces output like:
584 <MulExpression ref="0"><LiteralExpression>5</LiteralExpression> * (<AddExpression><LiteralExpression>2</LiteralExpression> + <LiteralExpression>7</LiteralExpression></AddExpression>)</MulExpression>
588 Whilst it is irrelevant whether the `(` and `)` characters appear inside the
589 `AddExpression` tags or not, we have placed them outside. This is because in
590 the parsing stage an `AddExpression` would be recognized first, followed by a
591 notional `ParenthesesExpression` that contains the parentheses, but does not
592 actually generate any markup. Using πyacc we can control the markup precisely.
594 Another time this occurs is when recognizing keyword-delimited structures such
595 as `for (declaration; expression; expression) statement` in C. In the AST this
596 would appear as a `ForStatement` class with 4 children: the declaration, the
597 two expressions and the statement. The keyword `for` and all delimeters are
598 treated as whitespace, since `ForStatement` already implies the keyword `for`.
600 ### Tutorial: leading and trailing whitespace
602 In order to be able to store the contents of a source file exactly (plus the
603 markup that results from recognizing its lexical and grammatical structure),
604 we need to be able to store the leading and trailing whitespace. To do this,
605 we will define a root AST node called `AST`. Furthermore, a πtree convention is
606 to use inner classes for node types that are expected to appear inside others.
608 Therefore we will create a `calc4.t` with this modification, as follows:
611 from ndcode.pitree import element
618 class LiteralExpression: Expression;
619 class NegExpression: Expression;
620 class AddExpression: Expression;
621 class SubExpression: Expression;
622 class MulExpression: Expression;
623 class DivExpression: Expression;
627 Whilst Python does not have good inner class support in the same way as Java,
628 it does allow them. Apart from the lexical nesting, the inner classes are
629 treated as independent from their outer class. We just define them as inner
630 classes by convention, as it cleans up the namespace and tends to group them.
632 Compile this with `pitree --python -o t_def4.py calc4.t` and then create an
633 example `serialize4.py` that has leading and trailing whitespace as follows:
635 #!/usr/bin/env python3
637 from ndcode.pitree import element
641 expression = t_def4.AST(
642 text = ['/* calculator source file */\n\n', '\n'],
644 t_def4.AST.MulExpression(
645 text = ['', ' * (', ')'],
647 t_def4.AST.LiteralExpression(text = ['5']),
648 t_def4.AST.AddExpression(
649 text = ['', ' + ', ''],
651 t_def4.AST.LiteralExpression(text = ['2']),
652 t_def4.AST.LiteralExpression(text = ['7'])
659 element.serialize(expression, sys.stdout)
662 This serializes as follows:
665 <AST ref="0">/* calculator source file */
667 <AST_MulExpression><AST_LiteralExpression>5</AST_LiteralExpression> * (<AST_AddExpression><AST_LiteralExpression>2</AST_LiteralExpression> + <AST_LiteralExpression>7</AST_LiteralExpression></AST_AddExpression>)</AST_MulExpression>
672 Now, the `AST` tag contains a leading comment (treated as whitespace) and some
673 newlines, then the real AST content (its root is an `AST_MulExpression`), and
674 finally a trailing newline between the real AST and the end of the `AST` tag.
676 Note that inner classes must be prefixed with their outer class name at every
677 reference, e.g. `t_def.AST.MulExpression` in the Python code, and are stored
678 with underscores in the serialized output, e.g. `AST_MulExpression`. In future
679 it might be possible to generate the XML without the fully qualified names, for
680 example just `<MulExpression>...</MulExpression` when inside `<AST>...</AST>`.
682 ### Tutorial: semantic analysis and markup
684 Another important feature of the serialization format is that arbitrary objects
685 can be attached to a node, subject to those objects being declared to πtree and
686 following the ordinary πtree conventions. As well as πtree objects, we can also
687 attach many Python intrinsic types, such as numbers, strings, lists, and so on.
689 To attach πtree objects to a node, you simply declare a field of type `ref`.
690 Whereas basic field types like `int` or `list(int)` store the corresponding
691 Python intrinsic types, fields of type `ref` store Python references, provided
692 they refer to a πtree object. Aggregates like `list(ref)` are also available.
694 Such attached πtree objects are serialized by reference rather than by direct
695 nesting in the XML file, so that multiple pointers to the same object generate
696 multiple references to that object rather than multiple copies. Deserialization
697 reverses the process to reproduce the original structure of Python references.
698 The `None` value is allowed and will be serialized and deserialized correctly.
700 Note that for the sake of readability in the generated XML, Python intrinsics
701 such as lists are not stored by reference, and neither are child nodes. Thus,
702 they cannot be multiple, and care should be taken to ensure that multiple
703 pointers to those kinds of objects do not occur in the tree. An exception to
704 this is, you can attach a reference to a subtree that also has a parent node,
705 provided there is at most one parent node (the serializer relies upon this).
707 The following example is based on `calc4.t`, but there are now two kinds of
708 literal nodes, `IntLiteralExpression` and `FloatLiteralExpression`. Every
709 expression has a `type` (declared on the base class `Expression` to ensure it
710 is always present), which refers to either a `IntType` or `FloatType` object.
712 We'll also create `UnaryExpression` and `BinaryExpression` classes to put the
713 semantic analysis code on, to take advantage that the semantic analysis does
714 not really care which operators are involved specifically. The `DivExpression`
715 will override the default semantic analysis to always generate a `FloatType`.
717 Create the basic `calc5.t` defining the needed class hierarchy, as follows:
720 from ndcode.pitree import element
727 class FloatType: Type;
733 class IntLiteralExpression: Expression;
734 class FloatLiteralExpression: Expression;
735 class UnaryExpression: Expression;
736 class NegExpression: UnaryExpression;
737 class BinaryExpression: Expression;
738 class AddExpression: BinaryExpression;
739 class SubExpression: BinaryExpression;
740 class MulExpression: BinaryExpression;
741 class DivExpression: BinaryExpression;
745 Add a free-form section beginning with `%%` and a `semantic_analysis()` method:
749 @method(AST.Expression)
750 def semantic_analysis(self):
751 raise NotImplementedError
752 @method(AST.IntLiteralExpression)
753 def semantic_analysis(self):
754 self.type = IntType()
755 @method(AST.FloatLiteralExpression)
756 def semantic_analysis(self):
757 self.type = FloatType()
758 @method(AST.UnaryExpression)
759 def semantic_analysis(self):
760 self.children[0].semantic_analysis()
761 self.type = self.children[0].type
762 @method(AST.BinaryExpression)
763 def semantic_analysis(self):
764 self.children[0].semantic_analysis()
765 self.children[1].semantic_analysis()
767 self.children[0].type
768 if isinstance(self.children[0].type, FloatType) else
769 self.children[1].type
771 @method(AST.DivExpression)
772 def semantic_analysis(self):
773 self.children[0].semantic_analysis()
774 self.children[1].semantic_analysis()
775 self.type = FloatType()
778 Compile this to `t_def5.py` by `pitree --python -o t_def5.py calc5.t`, and then
779 create the corresponding test program `serialize5.py` containing the following:
781 #!/usr/bin/env python3
783 from ndcode.pitree import element
787 expression = t_def5.AST(
788 text = ['/* calculator source file */\n\n', '\n'],
790 t_def5.AST.MulExpression(
791 text = ['', ' * (', ')'],
793 t_def5.AST.IntLiteralExpression(text = ['5']),
794 t_def5.AST.AddExpression(
795 text = ['', ' + ', ''],
797 t_def5.AST.IntLiteralExpression(text = ['2']),
798 t_def5.AST.FloatLiteralExpression(text = ['7.5'])
805 expression.children[0].semantic_analysis()
806 element.serialize(expression, sys.stdout)
809 Running this should produce output like
812 <AST ref="0">/* calculator source file */
814 <AST_MulExpression type="3"><AST_IntLiteralExpression type="1">5</AST_IntLiteralExpression> * (<AST_AddExpression type="3"><AST_IntLiteralExpression type="2">2</AST_IntLiteralExpression> + <AST_FloatLiteralExpression type="3">7.5</AST_FloatLiteralExpression></AST_AddExpression>)</AST_MulExpression>
818 <FloatType ref="3" />
822 We can see that the `type` attribute of each expression node in the XML gives
823 an integer, which is a reference to one of the `IntType` or `FloatType` objects
824 later in the file. The `IntType` and `FloatType` objects are unparented with
825 respect to the AST, and thus appear as children of the `<root>...</root>` node.
827 We can also see that due to some optimization in the `semantic_analysis()`
828 method of the `BinaryExpression` class, which reuses the type of either the
829 left child or the right child as appropriate, many of the type references are
830 the same. This makes it fairly cheap to add the semantic information to a node.
832 The efficiency of this approach is highlighted for semantic analysis with more
833 complicated type systems, for example in C we can have "int", "pointer to int",
834 "pointer to pointer to int", "array of int" and so on. There can be a lot of
835 reuse, e.g. consider the "address of" operator in C -- the "address of" node
836 has a type of a "pointer to" node, containing a reference to the child's type.