The pilex and piyacc book. Introduction to scanning and parsing - expression example; breaking an expression into a parse tree - placing a template or pattern over the input and matching - filling in the blanks (extracting [expression] + [expression] or similar) - difference between scanning (e.g. textual) and parsing (e.g. recursive) Introduction to lex and yacc - what are they (historical unix tools from AT&T; still highly relevant) - differences and similarities of lex vs yacc and when to use each - history; Steve Johnson, Eric Schmidt - brief introduction to C language: use of variables, +, -, printf() etc - word count, calculator tutorials (no storage of the AST) Introduction to pilex and piyacc - what are they (highly evolved versions of lex and yacc; easier to use) - brief introduction to Python language: use of variables, +, -, print() etc - word count, calculator tutorials (should not be too different from C) Regex grouping - why we might want to pick out subparts of a regex match (floating point) - how to use grouping in pilex (added feature that lex does not have) - pilex produces a list of matches per group, not just the rightmost match Abstract Syntax Trees - what is an AST, what makes it abstract rather than concrete - why we need to store an AST rather than calculating/generating directly - how lists and trees are handled in each of the languages; examples - why we should do compiler related activities in Python not C or C++ (gcc). Porting lex/yacc to pilex/piyacc - why we made pilex/piyacc reproduce the exact lex/yacc behaviour (many other Python lex/yacc tools, but not easy to port code to them) - lex/yacc specifications available on the Web for many languages - many languages are open source, and use lex/yacc (e.g. MiniZinc) - also likely to have lex/yacc specifications in your in-house tools (configuration parsers, domain specific languages and so forth) - tool to convert C to Python, including within lex/yacc specifications Over-complex lex/yacc specifications - in many cases the parsing and semantic analysis phases are combined - lex/yacc specifications can be tortuous and hard to read (e.g. flex) - separate parsing (specification based), tree building, semantic analysis Automatic tree generation - significant feature of pilex and piyacc - word count tutorial, producing a document marked up by word boundaries - calculator tutorial, producing a document marked up by nested expressions Semantic analysis - example: C type system - ability to add arbitrarily complex semantic markup to the AST at any stage - adding markup in such a way that the original syntax is still visible - our element library: serialize ASTs into portable and self-contained XML Case study: source to source translation - example: C source formatter Case study: translation between languages - example: C to Python Case study: creating new languages - example: lambda calculus interpreter - example: C interpreter (ref. Herb Schildt) Case study: how pilex and piyacc are implemented Appendix: Tour of advanced lex/flex features - start conditions (how lex handles preceding context) - trailing context (equivalence with grouping; dangerous trailing context) - yyin, unput(), REJECT(), yyterminate() etc Appendix: Tour of advanced yacc/bison features - mainly to do with C (the union, specifying destructors, etc) - use of the %error token - how yacc/bison print error messages and the bison improvements Appendix: Location tracking - using the yyloc, yylloc variables (similar to yyval, yylval) - specifying C type of locations and how default location is computed