From 9ac0c83bdea436769ea41b6dba6f8e724ba6de0e Mon Sep 17 00:00:00 2001 From: Nick Downing Date: Fri, 14 Sep 2018 12:12:02 +1000 Subject: [PATCH 1/1] Initial commit; overall plan for the book --- plan.txt | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 plan.txt diff --git a/plan.txt b/plan.txt new file mode 100644 index 0000000..ff4e80d --- /dev/null +++ b/plan.txt @@ -0,0 +1,81 @@ +The pilex and piyacc book. + +Introduction to scanning and parsing + - expression example; breaking an expression into a parse tree + - placing a template or pattern over the input and matching + - filling in the blanks (extracting [expression] + [expression] or similar) + - difference between scanning (e.g. textual) and parsing (e.g. recursive) + +Introduction to lex and yacc + - what are they (historical unix tools from AT&T; still highly relevant) + - differences and similarities of lex vs yacc and when to use each + - history; Steve Johnson, Eric Schmidt + - brief introduction to C language: use of variables, +, -, printf() etc + - word count, calculator tutorials (no storage of the AST) + +Introduction to pilex and piyacc + - what are they (highly evolved versions of lex and yacc; easier to use) + - brief introduction to Python language: use of variables, +, -, print() etc + - word count, calculator tutorials (should not be too different from C) + +Regex grouping + - why we might want to pick out subparts of a regex match (floating point) + - how to use grouping in pilex (added feature that lex does not have) + - pilex produces a list of matches per group, not just the rightmost match + +Abstract Syntax Trees + - what is an AST, what makes it abstract rather than concrete + - why we need to store an AST rather than calculating/generating directly + - how lists and trees are handled in each of the languages; examples + - why we should do compiler related activities in Python not C or C++ (gcc). + +Porting lex/yacc to pilex/piyacc + - why we made pilex/piyacc reproduce the exact lex/yacc behaviour + (many other Python lex/yacc tools, but not easy to port code to them) + - lex/yacc specifications available on the Web for many languages + - many languages are open source, and use lex/yacc (e.g. MiniZinc) + - also likely to have lex/yacc specifications in your in-house tools + (configuration parsers, domain specific languages and so forth) + - tool to convert C to Python, including within lex/yacc specifications + +Over-complex lex/yacc specifications + - in many cases the parsing and semantic analysis phases are combined + - lex/yacc specifications can be tortuous and hard to read (e.g. flex) + - separate parsing (specification based), tree building, semantic analysis + +Automatic tree generation + - significant feature of pilex and piyacc + - word count tutorial, producing a document marked up by word boundaries + - calculator tutorial, producing a document marked up by nested expressions + +Semantic analysis + - example: C type system + - ability to add arbitrarily complex semantic markup to the AST at any stage + - adding markup in such a way that the original syntax is still visible + - our element library: serialize ASTs into portable and self-contained XML + +Case study: source to source translation + - example: C source formatter + +Case study: translation between languages + - example: C to Python + +Case study: creating new languages + - example: lambda calculus interpreter + - example: C interpreter (ref. Herb Schildt) + +Case study: how pilex and piyacc are implemented + +Appendix: Tour of advanced lex/flex features + - start conditions (how lex handles preceding context) + - trailing context (equivalence with grouping; dangerous trailing context) + - yyin, unput(), REJECT(), yyterminate() etc + +Appendix: Tour of advanced yacc/bison features + - mainly to do with C (the union, specifying destructors, etc) + - use of the %error token + - how yacc/bison print error messages and the bison improvements + +Appendix: Location tracking + - using the yyloc, yylloc variables (similar to yyval, yylval) + - specifying C type of locations and how default location is computed -- 2.34.1