From 6214be89c824eac71e61841d0296ff6e65c471a8 Mon Sep 17 00:00:00 2001 From: dick Date: Wed, 22 Jun 1988 21:48:19 +0000 Subject: [PATCH] Initial entry --- doc/int/Makefile | 16 ++ doc/int/README | 4 + doc/int/appA | 280 ++++++++++++++++++++++ doc/int/appB | 486 ++++++++++++++++++++++++++++++++++++++ doc/int/bib | 25 ++ doc/int/cover | 26 +++ doc/int/draw.mac | 24 ++ doc/int/txt2 | 595 +++++++++++++++++++++++++++++++++++++++++++++++ doc/int/txt3 | 181 ++++++++++++++ util/int/int.1 | 200 ++++++++++++++++ 10 files changed, 1837 insertions(+) create mode 100755 doc/int/Makefile create mode 100644 doc/int/README create mode 100644 doc/int/appA create mode 100644 doc/int/appB create mode 100644 doc/int/bib create mode 100644 doc/int/cover create mode 100644 doc/int/draw.mac create mode 100644 doc/int/txt2 create mode 100644 doc/int/txt3 create mode 100644 util/int/int.1 diff --git a/doc/int/Makefile b/doc/int/Makefile new file mode 100755 index 000000000..05164fce3 --- /dev/null +++ b/doc/int/Makefile @@ -0,0 +1,16 @@ +# $Header$ + +TBL=/usr/ditroff/tbl + +DOC = draw.mac cover txt1 txt2 txt3 appA appB bib +int.doc: $(DOC) + $(TBL) $(DOC) > $@ + +FLS = README .distr Makefile int.1 $(DOC) + +.distr: Makefile + echo $(FLS) | tr ' ' '\012' >.distr + +clean: + rm -f int.doc + diff --git a/doc/int/README b/doc/int/README new file mode 100644 index 000000000..e6a0a8377 --- /dev/null +++ b/doc/int/README @@ -0,0 +1,4 @@ +# $Header$ + +This directory contains the text of the documentation for the +Production Quality Interpreter "int". diff --git a/doc/int/appA b/doc/int/appA new file mode 100644 index 000000000..4ba79ec6d --- /dev/null +++ b/doc/int/appA @@ -0,0 +1,280 @@ +.\" List of all warnings; source of warn_msg and warn.h +.\" +.\" $Header$ +.\" +.\" This file contains the warnings issued by the interpreter, together +.\" with their names and values in the code of the interpreter. Some of +.\" the source files of the interpreter are generated from the Wn +.\" macros in this file. +.\" When modifying this file, preserve the parameters of the Wn macros. +.de Wn \" +.IP \\$3. 7 +.B "\\$1" +.br +.. Wn +.bp +.DS C +APPENDIX A +.DE +.SH +List of Warnings. +.PP +The shadow-byte administration makes it possible to check for a +wide range of errors during run-time. +We have tried to make the diagnostics self-explanatory and especially useful +for the C-programmer. +The warnings are printed in the message file, together with source file +and line number. +The complete list of warnings is presented here, followed by an +explanation of what might be wrong. +Often, these explanations implicitly assume that the program +being interpreted, was originally written in C (and not Pascal, Basic etc.). +.LP +.I "Reading the load file" +.Wn "Floating point instructions flag in header ignored" WFLUSED 1 +.Wn "No float initialisation in this version" WFLINIT 2 +The interpreter was compiled with the NOFLOAT option; code involving +floating point operations can be run as long as the actual +instructions are avoided. +.Wn "Extra-test flag in header ignored" WEXTRIGN 4 +The interpreter already tests anything conceivable. +.Wn "Maximum line number in header was 0" WNLINEZR 5 +This number could be used to allocate tables for tallying; these tables are, +however, expanded as needed, so the number is immaterial. +.Wn "Bad float initialisation" WBADFLOAT 7 +The loadfile contains a floating point denotation which does not +satisfy the syntax (see 2.6). +Examining the loadfile (with \fBod \-c\fP) might show the syntax error. +Probably there is a bug in the front-end, creating floats with +a bad syntax. +.LP +.I "System calls" +.Wn "IOCTL \- bad or unimplemented request" WBADIOCTL 11 +The second parameter to the ioctl() request (the operation code) is invalid or +not implemented; since there are many different opcodes on the various UNIX +systems, it is difficult to tell which. The system call fails. +.Wn "MPXCALL \- not (yet) implemented" WMPXIMP 14 +.Wn "PROFIL \- not (yet) implemented" WPROFILIMP 15 +.Wn "PTRACE \- not (yet) implemented" WPTRACEIMP 16 +The monitor calls \fImpxcall()\fP, \fIprofil()\fP and \fIptrace()\fP +have not been implemented. The monitor call fails. +.Wn "Inaccessible memory in system call" WMONFLT 21 +Bad pointers passed to system calls do not cause a memory fault (which in UNIX +would happen to the kernel), but cause the system call to fail with the UNIX +variable errno set to 14 (EFAULT). It seems likely that your program is at +fault, but there is also a good possibility that a library routine made +unwarranted assumptions about word size and pointer size. +.Wn "READ \- buffer resides in unallocated memory" WRUMEM 23 +.Wn "READ \- buffer across global data area and heap" WRGDAH 24 +When the buffer passed to the read() system call is situated (completely +or partially) in unallocated memory (beyond \fIHP\fP) or begins +in the global data area and ends in the heap, the appropriate warning +is given. +The buffer is not written. +.Wn "WRITE \- buffer resides in unallocated memory" WWUMEM 25 +.Wn "WRITE \- buffer across global data area and heap" WWGDAH 26 +.Wn "WRITE \- (part of) global buffer is undefined" WWGUNDEF 27 +.Wn "WRITE \- (part of) local buffer is undefined" WWLUNDEF 28 +The first two are equivalent to the READ-errors above. +Writing out a buffer usually makes no sense when the contents are undefined, +so one of the latter two warnings will be generated in this case. +A global buffer resides in the data partition; a local buffer resides in +the stack partition. +This corresponds to global and local variables in a C-program. +In the first two cases the WRITE is not performed, in the latter two cases +it is. +.LP +.I "Traps and signals" +.Wn "SIGTRP \- bad signo argument" WILLSN 31 +The \fIsigtrp()\fP monitor call allows \fIsig_no\fP arguments in the +range [1..17] (UNIX Version 7 signals); the actual argument is out of range. +.Wn "SIGTRP \- signo argument is a synchronous trap" WUNIXTR 32 +The signal is one that can only be caused synchronously by the running program +on UNIX; it cannot occur to an interpreted program. +.Wn "SIGTRP \- bad trapno argument" WILLTN 33 +The \fIsigtrp()\fP monitor call allows \fItrap_no\fP arguments between 0 and +252, and the special values \-2 and \-3; the actual argument is not one of +these. +.Wn "Heap overflow due to command line limitation" WEHEAP 36 +.Wn "Stack overflow due to command line limitation" WESTACK 37 +The maximum sizes of the heap and the stack can be limited by options on the +command line. If overflow occurs due to such limitations, the corresponding +trap is taken, preceded by one of the above warnings. If the memory of the +interpreter itself is exhausted, a fatal error follows. +.LP +.I "Run-time type checking" +.Wn "Local character expected" WLCEXP 41 +.Wn "Global character expected" WGCEXP 42 +.Wn "Local integer expected" WLIEXP 43 +.Wn "Global integer expected" WGIEXP 44 +.Wn "Local float expected" WLFEXP 45 +.Wn "Global float expected" WGFEXP 46 +.Wn "Local data pointer expected" WLDPEXP 47 +.Wn "Global data pointer expected" WGDPEXP 48 +.Wn "Local instruction pointer expected" WLIPEXP 49 +.Wn "Global instruction pointer expected" WGIPEXP 50 +In general, a type violation has taken place when one of +these warnings is given. +The \fBfloat\fP- and \fBinstruction pointer\fP warnings are rare and will +usually be easy traceable. +\fBInteger/character expected\fP will normally occur when unsigned arithmetic +is performed on datapointers or when memory containing objects other than +integers is copied bytewise. +Often, this warning is followed by a warning \fBdatapointer expected\fP. +This is due to our decision of transforming pointers to (unsigned) integers +after doing unsigned arithmetic on them. +When such a transformed integer is dereferenced (as if it were a pointer) +or, in general, when it is treated as a pointer, this results in a warning. +The present library implementation of malloc() causes such a +sequence of errors. +.LP +These messages are always followed by a tentative description of what is found +in memory at the offending place. +.Wn "Actual memory is undefined" WWASUND 61 +.Wn "Actual memory contains an integer" WWASINT 62 +.Wn "Actual memory contains a float" WWASFLOAT 63 +.Wn "Actual memory contains a data pointer" WWASDATAP 64 +.Wn "Actual memory contains an instruction pointer" WWASINSP 65 +.Wn "Actual memory contains mixed information" WWASMISC 66 +If the contents of the area was undefined, +check the source code for an uninitialized variable of the mentioned type. +Officially, the use of an undefined value +should result in a EIUND or EFUND trap but the occurrence is +so common that a warning is more appropriate. +The contents of memory are described as mixed if the data consists of pieces +of different types. This happens, e.g., when caller and callee do not agree on +the types and lengths of the parameters. +.LP +.I "Protection" +.br +.Wn "Destroying contents of ROM (at or near loc 0)" WDESROM 71 +The program stores a value in Read-Only Memory; the only ROM in the present +implementation is the area near location 0. The warning probably results from +storing under a NULL pointer. This is only a warning, the store operation is +executed normally. Reads from location 0 are not detected. +.Wn "Destroying contents of Return Status Block" WDESRSB 72 +The Return Status Block is the stack area containing the return address, the +dynamic link, etc. +This may or may not be an error. +The current implementation of \fIsetjmp()\fP/\fIlongjmp()\fP +may be responsible for it. +If your program does not use setjmp(), there \fIis\fP something +very wrong (e.g. argument for ASP too large). +Note that there are some library routines (such as \fIalarm()\fP) which +use \fIsetjmp()\fP. +.Wn "Logical operation using undefined operand(s)" WUNLOG 81 +.Wn "Comparing undefined operand(s)" WUNCMP 82 +The logical operations AND, XOR, IOR, COM and the compare operation +CMS do their jobs bytewise. +If one of the bytes is found to be undefined, the corresponding warning +is given, and the operation is stopped immediately. +The stack is adjusted so interpretation may continue. +.br +It is hard to say what went wrong. +Possibly, the argument of the instruction at hand (which indicates the +size of the objects to be compared), was too large. +.LP +.I "Bad operands" +.Wn "Shift over negative distance" WSHNEG 91 +.Wn "Shift over too large distance" WSHLARGE 92 +Shift instructions yield undefined results if the shift distance is negative +or larger than the object size. +.Wn "Pointer arithmetic yields pointer to bad segment" WSEGADP 93 +When doing pointer arithmetic (ADP, ADS), the operand and result pointer +must be in the same \fIsegment\fP (see sec. 4). +E.g. loading the address of the first local and adding 20 to it will +certainly give this warning. +.Wn "Subtracting pointers to different segments" WSEGSBS 94 +Pointers may be subtracted only if they point into the same segment. +.Wn "Pointer arithmetic with NULL pointer" WNULLPA 96 +By definition it is illegal to do arithmetic with null pointers. +Integers with the size of a pointer and the value zero are recognized +as NULL pointers. +A well-known C-trick to compute the offset of some field in a struct +is converting the null-pointer to the type of the struct and simply +taking the address of the field. +This trick will \-when translated and interpreted\- generate this warning +because it results in arithmetic with the NULL pointer. +.LP +.I "Return area" +.Wn "Returned function result too large" WRFUNLAR 101 +.Wn "Returned function result too small" WRFUNSML 102 +This warning is generated when the size of the expected return value +is not equal to the size actually returned. +.br +Your interpreted program may have fallen through the end of +the code without explicitly doing an \fIexit()\fP or \fIreturn()\fP. +The start-up routine (\fIcrt0()\fP) however always expects to get some +value returned by the program proper. +.br +Another (less probable) possibility of course is that the code contains +a subroutine or function call that does not return properly (e.g. +it returns a short instead of a long). +.Wn "Returned function result may be garbled" WRFUNGAR 103 +This warning will be generated, when the contents of the FRA are fetched +after some instruction is executed which can mess up the area. +Compiler-generated loadfiles should not generate this message. +.LP +.I "Return Status Block" +.Wn "RET did not find a Return Status Block" WRETBAD 111 +.Wn "Used RET to return from a trap" WRETTRAP 112 +The RET instruction found a garbled Return Status Block, or on that resulted +from a trap. +.Wn "RTT did not find a Return Status Block" WRTTBAD 115 +.Wn "RTT on empty stack" WRTTEMPTY 116 +.Wn "Used RTT to return from a call" WRTTCALL 117 +.Wn "Used RTT to return from a non-returnable trap" WRTTNRTT 118 +The RTT (Return from Trap) instruction found a Return Status block that was not +created properly by a trap. +.Wn "Stack Pointer too large in RET" WRETSTL 121 +.Wn "Stack Pointer too small in RET" WRETSTS 122 +.Wn "Stack Pointer too large in RTT" WRTTSTL 125 +.Wn "Stack Pointer too small in RTT" WRTTSTS 126 +According to the EM Manual (4.2), "the value of SP just after the return +value has been popped must be the same as the +value of SP just before executing the first instruction of the +invocation." +If the Stack Pointer is too large, some dynamically allocated item or some +temporary result may have been left behind on the stack. +If the Stack Pointer is too small, some locals have been unstacked. +Since the interpreter has enough information in the Return Status Block, it +recovers correctly from these errors. +.LP +.I "Traps" +.LP +Some traps have ambiguous or non-obvious causes. +As far as possible, these are preceded by a warning, explaining the +circumstances of the trap. +.Wn "Trap ESTACK: DCH on bad LB" WDCHBADLB 131 +.Wn "Trap ESTACK: LPB on bad LB" WLPBBADLB 132 +.Wn "Trap ESTACK: SP retracted over Return Status Block" WSPGTLB 133 +.Wn "Trap ESTACK: SP moved into data area" WSPINHEAP 134 +.Wn "Trap ESTACK: SP set to non-word-boundary" WSPODD 135 +.Wn "Trap ESTACK: LB set out of stack" WLBOUT 136 +.Wn "Trap ESTACK: LB set to non-word-boundary" WLBODD 137 +.Wn "Trap ESTACK: LB set to position where there is no RSB" WLBRSB 138 +.Wn "Trap EHEAP: HP retracted into Global Data Area" WHPGDA 141 +.Wn "Trap EHEAP: HP pushed into stack" WHPSTACK 142 +.Wn "Trap EHEAP: HP set to non-word-boundary" WHPODD 143 +.Wn "Trap EILLINS: unknown opcode" WBADOPC 151 +.Wn "Trap EILLINS: conversion with unacceptable size for this machine" WILLCONV 152 +.Wn "Trap EILLINS: FIL with non-existing address" WILLFIL 153 +.Wn "Trap EILLINS: LFR with too large size" WILLLFR 154 +.Wn "Trap EILLINS: RET with too large size" WILLRET 155 +.Wn "Trap EILLINS: instruction argument of class c does not fit a word" WARGC 156 +.Wn "Trap EILLINS: instruction on double word on machine with word size 4" WARGD 157 +.Wn "Trap EILLINS: local offset too large" WARGL 158 +.Wn "Trap EILLINS: instruction argument of class g not in GDA" WARGG 159 +.Wn "Trap EILLINS: fragment offset too large" WARGF 160 +.Wn "Trap EILLINS: counter in lexical instruction out of range" WARGN 161 +.Wn "Trap EILLINS: non-existent procedure identifier" WARGP 162 +.Wn "Trap EILLINS: illegal register number" WARGR 163 +.Wn "Trap EBADPC: jump out of text segment" WPCOVFL 172 +.Wn "Trap EBADPC: jump out of procedure fragment" WPCPROC 173 +.Wn "Trap EBADGTO: GTO does not restore an existing RSB" WGTORSB 181 +.Wn "Trap EBADGTO: GTO descriptor on the stack" WGTOSTACK 182 +.Wn "Trap caused by TRP instruction" WTRP 191 +.ig +.Wn "Last warning" WMSG 199 +!Leave these lines here! +.. diff --git a/doc/int/appB b/doc/int/appB new file mode 100644 index 000000000..fe19cd891 --- /dev/null +++ b/doc/int/appB @@ -0,0 +1,486 @@ +.\" A simple tutorial +.\" +.\" $Header$ +.\" +.bp +.DS +APPENDIX B +.DE +.SH +How to use the interpreter +.PP +The interpreter is not normally used for the debugging of programs under +construction. Its primary application is as a verification tool for almost +completed programs. Although the proper operation of the interpreter is +obviously a black art, this chapter tries to provide some guidelines. +.LP +For the sake of the argument, the source language is assumed to be C, but most +hints apply equally well to other languages supported by ACK. +.sp +.LP +.I "Initial measures" +.PP +Start with a test case of trivial size; to be on the safe side, reckon with a +time dilatation factor of about 500, i.e., a second grows into 10 minutes. +(The interpreter takes 0.5 msec to do one EM instruction on a Sun 3/50). +Fortunately many trivial test cases are much shorter than one second. +.PP +Compile the program into an \fIe.out\fP, the EM machine version of a +\fIa.out\fP, by calling \fIem22\fP (for 2-byte integers and 2-byte pointers), +\fIem24\fP (for 2 and 4) or \fIem44\fP (for 4 and 4) as seems appropriate; +if in doubt, use \fIem44\fP. These compilers can be found in the ACK +\fIbin\fP directory, and should be used instead of \fIacc\fP (or normal +.UX +\fIcc\fP). Alternatively, you can use \fIacc \-memNN\fP instead of +\fIemNN\fP. +.LP +If your C program consists of more than one file, as it usually does, there is +a small problem. The \fIacc\fP and \fIcc\fP compilers generate .o files, +whereas the \fIemNN\fP compilers generate .m files as object files. +A simple technique to avoid the problem is to call +.DS +em44 *.c +.DE +if you can. If not, the following hack on the \fIMakefile\fP generally works. +.IP \- +Make sure the \fIMakefile\fP is reasonably clean and complete: all calls to +the compiler are through \fI$(CC)\fP, \fICFLAGS\fP is used properly and all +dependencies are specified. +.IP \- +Add the following lines to the \fIMakefile\fP (possibly permanently): +.DS +\&.SUFFIXES: .o +\&.c.o: +\& $(CC) \-c $(CFLAGS) $< +.DE +.IP \- +Set CC to \fIem44 \-.c\fP (for example). Make sure CFLAGS includes +the \-O option; this yields a speed-up of about 15 %. +.IP \- +Change all .o to .m (or .k if you do not use the \-O option). +.IP \- +If necessary, change \fIa.out\fP to \fIe.out\fP. +.PP +With these changes, \fImake\fP will produce an EM object; you can use +\fIesize\fP to verify that it is indeed an EM object and obtain some +statistics. Then call the interpreter: +.DS +int [ parameters ] +.DE +where the parameters are the normal parameters of your program. This should +work exactly like the original program, though slower. It reads from the +terminal if the original does, it opens and closes files like the original and +it accepts interrupts. +.sp +.LP +.I "Interpreting the results" +.PP +Now there are several possibilities. +.PP +It does all this. Great! This means the program +does not do very uncouth things. Now +read the file \fIint.mess\fP to see if any messages were generated. If there +are none, the program did not really run (perhaps the original cc \fIa.out\fP +got called instead?) Normally there is at least a termination message like +.DS +(Message): program exits with status 0 at "awa.p", line 64, INR = 4124 +.DE +This says that the program terminated through an exit(0) on line 64 of the +file \fIawa.p\fP after 4124 EM instructions. +If this is the only message it is time to move to a bigger test case. +.PP +On the other hand, the program may come to a grinding halt with an error +message. +All messages (errors and warnings) have a format in which the sequence +.DS +"", line +.DE +occurs, which is the same sequence many compilers produce for their error +messages. Consequently, the \fIint.mess\fP file can be processed as any +compiler message output. +.PP +One such message can be +.DS +(Fatal error) a.em: trap "Addressing non existent memory" not caught at "a.c", line 2, INR = 16 +.DE +produced by the abysmal program +.DS +main() { + *(int*)200000 = 1; +} +.DE +.LP +Often the effects are more subtle, however. The program +.DS +main() { + int *a, b = 777; + + b = *a; +} +.DE +produces the following five warnings (in far less than a second): +.DS +(Warning 47, #1): Local data pointer expected at "t.c", line 4, INR = 17 +(Warning 61, cont.): Actual memory is undefined at "t.c", line 4, INR = 17 +(Warning 102, #1): Returned function result too small at "", line 0, INR = 21 +(Warning 43, #1): Local integer expected at "exit.c", line 11, INR = 34 +(Warning 61, cont.): Actual memory is undefined at "exit.c", line 11, INR = 34 +.DE +The one about the function result looks the most frightening, +but is the most easily solved: +\fImain\fP is a function returning an int, so the start-up routine expects a +(four-byte) integer but gets an empty (zero-byte) return area. +.LP +\fINote\fP: The experts are divided about this. The traditional school holds +that \fImain\fP is an int function and its result is the return code; this +leaves them with two ways of supplying a return code: one as the parameter +of \fIexit()\fP and one as the result +of \fImain\fP. The modern school (Berkeley 4.2 etc.) claims that +return codes are supplied exclusively +by \fIexit()\fP, and they have an \fIexit(0)\fP in +the start-up routine, just after the call to \fImain()\fP; leaving \fImain()\fP +through the bottom implies successful termination. +.LP +We shall satisfy both groups by +.DS +main() { + int *a, b = 777; + + b = *a; + exit(0); +} +.DE +This results in +.DS +(Warning 47, #1): Local data pointer expected at "t.c", line 4, INR = 17 +(Warning 61, cont.): Actual memory is undefined at "t.c", line 4, INR = 17 +(Message): program exits with status 0 at "exit.c", line 11, INR = 33 +.DE +which is pretty clear as it stands. +.sp +.LP +.I "Using stack dumps" +.PP +Let's, for the sake of argument +and to avoid the fierce realism of 10000-line programs, assume that the above +still puzzles you. +Since the error occurred in EM instruction number 17, we should like to see +more information around that moment. Call the interpreter again, now with the +shell variable AT set at 17: +.DS +int AT=17 t.em +.DE +(The interpreter has a number of internal variables that can be set by +assignments on the command line, like with \fImake\fP.) +This gives you a file called \fIint.log\fP containing the +stack dump of 150 lines presented at the end of this chapter. +.PP +Since dumping is a subfacility of logging in the interpreter, the formats of +the lines are +the same. If a line starts with an @, it will contain a file-name/line-number +indication; the next two characters are the subject and the log +level. Then comes the information, preceded by a space. The text contains +three stack dumps, one before the offending instruction, one at it, and one +after it; then the interpreter stops. All kinds of other dumps can be +obtained, but this is default. +.PP +For each instruction we have, in order: +.IP \- +an @x9 line, giving the position in the program, +.IP \- +the messages, warnings and errors from the instruction as it is being executed, +.IP \- +dump(s), as requested. +.PP +The first two lines mean that at line 4 in file \fIt.c\fP the interpreter +performed its 16-th instruction, with the Program Counter at 30 pointing at +opcode 180 in the text segment; the instruction was an LOL (LOad Local) +with the operand \-4 derived from the opcode. It copies the local at offset +\-4 to the top of the stack. The effect can be seen from the subsequent stack +dump, where the undefined word at addresses 2147483568 to ...571 (the variable +\fIa\fP) has been copied to the top of the stack at 2147483560 (copying +undefined values does not generate a warning). +Since we used the \fIem44\fP compiler, all pointers and ints in our dump are +4 bytes long. +So a variable at address X in reality extends from address X to X+3. +.br +Note that this is not the offending instruction; this stack dump represents +the situation just before the error. +.PP +The stack consists of a sequence of frames, each containing data followed by +a Return Status Block resulting from a call; the last frame ends in +top-of-stack. The first frame represents the stack when the program starts, +through a call to the start-up routine. This routine prepares the second +stack frame with the actual parameters to \fImain()\fP: +\fIargc\fP at 2147483596, \fIargv\fP at 2147483600 and \fIenviron\fP at +2147483604. +.LP +The RSB line shows that the call to \fImain()\fP was made from procedure 0 +which has 0 locals, with PC at +16, an LB of 2147483608 and file name and line number still unknown. +The \fIcode\fP in the RSB tells how this RSB was made; possible values are STP +(start-up), CAL, RTT (returnable trap) and NRT (non-returnable trap). +.PP +The next frame shows the local variable(s) of \fImain()\fP; there are two of +them, the pointer \fIa\fP at 2147483568, which is undefined, and variable +\fIb\fP at 2147483564, which has the value 777. Then comes a copy of \fIa\fP, +just made by the LOL instruction, at 2147483560. The following line shows that +the Function Return Area (which does not reside at the end of the stack, but +just happens to be printed here) has size 0 and is presently undefined. +The stack dump ends +by showing that the Actuals Base is at 2147483596 (pointing at \fIargc\fP), the +Locals Base at 2147483572 (pointing just above the local \fIa\fP), the Stack +Pointer at 2147483560 (pointing at the undefined pointer), the line count is 4 +and the file name is "t.c". +.LP +(Notice that there is one more stack frame than you would probably expect, the +one above the start-up routine.) +.LP +The Function Return Area +could have a size larger than 0 and still be undefined, for +example when an instruction that does not preserve the contents of the FRA has +just been executed; likewise the FRA could have size 0 and be defined +nevertheless, for example just after a RET 0 instruction. +.PP +All this has set the scene for the distaster which is about to strike in the +next instruction. This is indeed a LOI (LOad Indirect) of size 4, opcode 169; +it causes the message +.DS +warning: Local data pointer expected [stack.c: 242] +.DE +and its continuation +.DS +warning cont.: Actual memory is undefined +.DE +(detected in the interpreter file \fIstack.c\fP at line 242; this can be +useful for sorting out dubious semantics). We see that the effect, as shown in +the third frame of this stack dump (at instruction number 17) is somewhat +unexpected: the LOI has fetched the value 4 and stacked it. The reason is +that, unfortunately, undefinedness is not transitive in the interpreter. When +an undefined value is used in an operation (other than copying) a warning is +given, but thereafter the value is treated as if it were zero. So, after the +warning a normal null pointer remains, which is then used to pick up the value +at location 0. This is the place where the EM machine stores its current line +number, which is presently 4. +.PP +The third stack dump shows the final effect: the value 4 has been unstacked +and copied to variable \fIb\fP at 2147483564 through an STL (STore Local) +instruction. +.PP +Since this form of logging dumps the stack only, the log file is relatively +small as dumps go. +Nevertheless, a useful excerpt can be obtained with the command +.DS +grep 'd1' int.log +.DE +This extracts the Return Status Block lines from the log, thus producing three +traces of calls, one for each instruction in the log: +.DS + d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL + d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL + d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, LIN = 4, FIL = "t.c" + d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL + d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL + d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, LIN = 4, FIL = "t.c" + d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL + d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL + d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483564, HP = 848, LIN = 4, FIL = "t.c" +.DE +Theoretically, the pertinent trace is the middle one, but in practice all three +are equal. In the present case there isn't much to trace, but in real programs +the trace can be useful. +.sp +.LP +.I "Errors in libraries" +.PP +Since libraries are generally compiled with suppression of line number and +file name information, the line number and file name in the interpreter will +not be updated when it enters a library routine. Consequently, all messages +generated by interpreting library routines will seem to originate from the +line of the call. This is especially true for the routine malloc(), which, +from the nature of its business, often contains dubitable code. +.PP +A usual message is: +.DS +(Warning 43, #1): Local integer expected at "buff.c", line 18, INR = 266 +(Warning 64, cont.): Actual memory contains a data pointer at "buff.c", line 18, INR = 266 +.DE +and indeed at line 18 of the file buff.c we find: +.DS + buff = malloc(buff_size = BFSIZE); +.DE +This problem can be avoided by using a specially compiled version of the +library that contains the correct LIN and FIL instructions, or, less +elegantly, by including the source code of the library routines in the +program; in the latter case, make sure you have them all. +.sp +.LP +.I "Unavoidable messages" +.br +Some messages produced by the logging are almost unavoidable; sometimes the +writer of a library routine is forced to take liberties with the semantics of +EM. +.LP +Examples from C include the memory allocation routines. +For efficiency reasons, one bit of an pointer in the administration is used as +a flag; setting, clearing and reading this bit requires bitwise operations on +pointers, which gives the above messages. +Realloc causes a problem in that it may have to copy the originally allocated +area to a different place; this area may contain uninitialised bytes. +.bp +.DS +.ft CW +@x9 "t.c", line 4, INR = 16, PC = 30 OPCODE = 180 +@L6 "t.c", line 4, INR = 16, DoLOLm(-4) + d2 + d2 . . STACK_DUMP[4/4] . . INR = 16 . . STACK_DUMP . . + d2 ---------------------------------------------------------------- + d2 ADDRESS BYTE ITEM VALUE SHADOW + d2 2147483643 0 (Dp) + d2 2147483642 0 (Dp) + d2 2147483641 0 (Dp) + d2 2147483640 40 [ 40] (Dp) + d2 2147483639 0 (Dp) + d2 2147483638 0 (Dp) + d2 2147483637 3 (Dp) + d2 2147483636 64 [ 832] (Dp) + d2 2147483635 0 (In) + d2 2147483634 0 (In) + d2 2147483633 0 (In) + d2 2147483632 1 [ 1] (In) + d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL + d2 + d2 ADDRESS BYTE ITEM VALUE SHADOW + d2 2147483607 0 (Dp) + d2 2147483606 0 (Dp) + d2 2147483605 0 (Dp) + d2 2147483604 40 [ 40] (Dp) + d2 2147483603 0 (Dp) + d2 2147483602 0 (Dp) + d2 2147483601 3 (Dp) + d2 2147483600 64 [ 832] (Dp) + d2 2147483599 0 (In) + d2 2147483598 0 (In) + d2 2147483597 0 (In) + d2 2147483596 1 [ 1] (In) + d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL + d2 + d2 ADDRESS BYTE ITEM VALUE SHADOW + d2 2147483571 undef + d2 | | | | | | + d2 2147483568 undef (1 word) + d2 2147483567 0 (In) + d2 2147483566 0 (In) + d2 2147483565 3 (In) + d2 2147483564 9 [ 777] (In) + d2 2147483563 undef + d2 | | | | | | + d2 2147483560 undef (1 word) + d2 FRA: size = 0, undefined + d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, \e + LIN = 4, FIL = "t.c" + d2 ---------------------------------------------------------------- + d2 +@x9 "t.c", line 4, INR = 17, PC = 31 OPCODE = 169 +@w1 "t.c", line 4, INR = 17, warning: Local data pointer expected [stack.c: 242] +@w1 "t.c", line 4, INR = 17, warning cont.: Actual memory is undefined +@L6 "t.c", line 4, INR = 17, DoLOIm(4) + d2 + d2 . . STACK_DUMP[4/4] . . INR = 17 . . STACK_DUMP . . + d2 ---------------------------------------------------------------- + d2 ADDRESS BYTE ITEM VALUE SHADOW + d2 2147483643 0 (Dp) + d2 2147483642 0 (Dp) + d2 2147483641 0 (Dp) + d2 2147483640 40 [ 40] (Dp) + d2 2147483639 0 (Dp) + d2 2147483638 0 (Dp) + d2 2147483637 3 (Dp) + d2 2147483636 64 [ 832] (Dp) + d2 2147483635 0 (In) + d2 2147483634 0 (In) + d2 2147483633 0 (In) + d2 2147483632 1 [ 1] (In) + d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL + d2 + d2 ADDRESS BYTE ITEM VALUE SHADOW + d2 2147483607 0 (Dp) + d2 2147483606 0 (Dp) + d2 2147483605 0 (Dp) + d2 2147483604 40 [ 40] (Dp) + d2 2147483603 0 (Dp) + d2 2147483602 0 (Dp) + d2 2147483601 3 (Dp) + d2 2147483600 64 [ 832] (Dp) + d2 2147483599 0 (In) + d2 2147483598 0 (In) + d2 2147483597 0 (In) + d2 2147483596 1 [ 1] (In) + d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL + d2 + d2 ADDRESS BYTE ITEM VALUE SHADOW + d2 2147483571 undef + d2 | | | | | | + d2 2147483568 undef (1 word) + d2 2147483567 0 (In) + d2 2147483566 0 (In) + d2 2147483565 3 (In) + d2 2147483564 9 [ 777] (In) + d2 2147483563 0 (In) + d2 2147483562 0 (In) + d2 2147483561 0 (In) + d2 2147483560 4 [ 4] (In) + d2 FRA: size = 0, undefined + d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, \e + LIN = 4, FIL = "t.c" + d2 ---------------------------------------------------------------- + d2 +@x9 "t.c", line 4, INR = 18, PC = 32 OPCODE = 229 +@S6 "t.c", line 4, INR = 18, DoSTLm(-8) + d2 + d2 . . STACK_DUMP[4/4] . . INR = 18 . . STACK_DUMP . . + d2 ---------------------------------------------------------------- + d2 ADDRESS BYTE ITEM VALUE SHADOW + d2 2147483643 0 (Dp) + d2 2147483642 0 (Dp) + d2 2147483641 0 (Dp) + d2 2147483640 40 [ 40] (Dp) + d2 2147483639 0 (Dp) + d2 2147483638 0 (Dp) + d2 2147483637 3 (Dp) + d2 2147483636 64 [ 832] (Dp) + d2 2147483635 0 (In) + d2 2147483634 0 (In) + d2 2147483633 0 (In) + d2 2147483632 1 [ 1] (In) + d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL + d2 + d2 ADDRESS BYTE ITEM VALUE SHADOW + d2 2147483607 0 (Dp) + d2 2147483606 0 (Dp) + d2 2147483605 0 (Dp) + d2 2147483604 40 [ 40] (Dp) + d2 2147483603 0 (Dp) + d2 2147483602 0 (Dp) + d2 2147483601 3 (Dp) + d2 2147483600 64 [ 832] (Dp) + d2 2147483599 0 (In) + d2 2147483598 0 (In) + d2 2147483597 0 (In) + d2 2147483596 1 [ 1] (In) + d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL + d2 + d2 ADDRESS BYTE ITEM VALUE SHADOW + d2 2147483571 undef + d2 | | | | | | + d2 2147483568 undef (1 word) + d2 2147483567 0 (In) + d2 2147483566 0 (In) + d2 2147483565 0 (In) + d2 2147483564 4 [ 4] (In) + d2 FRA: size = 0, undefined + d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483564, HP = 848, \e + LIN = 4, FIL = "t.c" + d2 ---------------------------------------------------------------- + d2 +.DE diff --git a/doc/int/bib b/doc/int/bib new file mode 100644 index 000000000..dc034a0ae --- /dev/null +++ b/doc/int/bib @@ -0,0 +1,25 @@ +.\" Bibliography +.\" +.\" $Header$ +.bp +.DS C +BIBLIOGRAPHY +.DE +.LP +[1] A.S. Tanenbaum, H. van Staveren, E.G. Keizer and J.W. Stevenson. +\fIDescription of a Machine Architecture for use with Block Structured +Languages\fP. VU Informatica Rapport IR-81, august 1983. +.LP +[2] E.G. Keizer. \fIAck description file reference manual.\fP +.LP +[3] K. Jensen and N. Wirth. +\fIPASCAL, User Manual and Report\fP. Springer Verlag. +.LP +[4] B.W. Kernighan and D.M. Ritchie. +\fIThe C Programming Language\fP. Prentice-Hall, 1978. +.LP +[5] D.M. Ritchie. \fIC Reference Manual\fP. +.LP +[6] \fIAmsterdam Compiler Kit, reference manual.\fP +.LP +[7] \fIUnix Programmer's Manual, 4.1BSD\fP. UCB, August 1983. diff --git a/doc/int/cover b/doc/int/cover new file mode 100644 index 000000000..ee2374d82 --- /dev/null +++ b/doc/int/cover @@ -0,0 +1,26 @@ +.\" Front page +.\" +.\" $Header$ +.TL +The EM Interpreter +.AU +Eddo de Groot +Leo van den Berge +Dick Grune +.AI +Faculteit Wiskunde en Informatica +Vrije Universiteit, Amsterdam +.AB +This document describes the implementation +and usage of a new interpreter for the EM machine language. +This interpreter implements the full EM machine +and can be helpful to people writing new front-ends. +Moreover, it can be used as a thorough testing and debugging +tool by anyone familiar with the EM language. +.PP +A list of all warnings is given in appendix A; appendix B is a simple +tutorial. +.AE +.PP +.pn 1 +.bp diff --git a/doc/int/draw.mac b/doc/int/draw.mac new file mode 100644 index 000000000..51052c63d --- /dev/null +++ b/doc/int/draw.mac @@ -0,0 +1,24 @@ +.\" Macros for simple constant width drawings (uses font CW) +.\" +.\" $Header$ +.de Dr \" Drawing $1 (size) +.sp 1 +.ne \\$1 +.na +.nf +.ft CW \" constant width font +.lg 0 \" no ligatures +.. +.de Df \" Drawing Footer +.sp 1 +.ft R +.ce 1000 +.lg 1 +.. +.de De \" Drawing End $1 (lines) +.Df \" if it has not happened yet +.ce +.ad +.fi +.sp \\$1 +.. diff --git a/doc/int/txt2 b/doc/int/txt2 new file mode 100644 index 000000000..b7e97c2eb --- /dev/null +++ b/doc/int/txt2 @@ -0,0 +1,595 @@ +.\" Implementation details +.\" +.\" $Header$ +.bp +.NH +IMPLEMENTATION DETAILS. +.PP +The pertinent issues are addressed below, in arbitrary order. +.NH 2 +Stack manipulation and start-up +.PP +It is not at all easy to start the EM machine with the stack in a reasonable +and consistent state. One reason is the anomalous value of the ML register +and another is the absence of a proper RSB. It may be argued that the initial +stack does not have to be in a consistent state, since the first instruction +proper is only executed after \fIargc\fP, \fIargv\fP and \fIenviron\fP +have been stacked (which takes care of the empty stack) and the initial +procedure has been called (which creates a RSB). We would, however, like to +preform the stacking of these values and the calling of the initial procedure +using the normal stack and call routines, which again require the stack to be +in an acceptable state. +.NH 3 +The anomalous value of the ML register +.PP +All registers in the EM machine point to word boundaries, and all of them, +except ML, address the even-numbered byte at the boundary. +The exception has a good reason: the even numbered byte at the ML boundary does +not exist. +This problem is not particular to EM but is inherent in the number system: the +number of N-digit numbers can itself not be expressed in an N-digit number, and +the number of addresses in an N-bit machine will itself not fit in an N-bit +address. The problem is solved in the interpreter by having ML point to the +highest word boundary that has bytes on either side; this makes ML+1 +expressible. +.NH 3 +The absence of an initial Return Status Block +.PP +When the stack is empty, there is no legal value for AB, since there are no +actuals; LB can be set naturally to ML+1. This is all right when the +interpreter starts with a call of the initial routine which stores the value +of LB in the first RSB, but causes problems when finally this call returns. We +want this call to return completely before stopping the interpreter, to check +the integrity of the last RSB; restoring information from it will, however, +cause illegal values to be stored in LB and AB (ML+1 and ML+1+rsbsize, resp.). +On top of this, the initial (illegal) Procedure Identifier of the running +procedure will be restored; then, upon restoring the likewise illegal PC will +cause a check to see if it still is inside the running procedure. After a few +attempts at writing special cases, we have decided that it is possible, but not +worth the effort; the final (= initial) RSB will not be unstacked. +.NH 2 +Floating point numbers. +.PP +The interpreter is capable of working with 4- and 8-byte floating point (FP) +numbers. +In C-terms, this corresponds to objects of type float and double respectively. +Both types fit in a C-double so the obvious way to manipulate these entities +internally is in doubles. +Pushing a 8-byte FP, all bytes of the C-double are pushed. +Pushing a 4-byte FP causes the 4 bytes representing the smallest fraction +to be discarded. +.PP +In EM, floats can be obtained in two different ways: via conversion +of another type, or via initialization in the loadfile. +Initialized floats are represented in the loadfile by an ASCII string in +the syntax of a Pascal real (signed \fPUnsignedReal\fP). +I.e. a float looks like: +.DS +[ \fISign\fP ] \fIDigit\fP+ [ . \fIDigit\fP+ ] [ \fIExp\fP [ \fISign\fP ] \fIDigit\fP+ ] (G1) +.DE +followed by a null byte. +Here \fISign\fP = {+, \-}; \fIDigit\fP = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; +\fIExp\fP = {e, E}; [ \fIAnything\fP ] means that \fIAnything\fP is optional; +and a + means one or more times. +To accommodate some loose code generators, the actual grammar accepted is: +.DS +[ \fISign\fP ] \fIDigit\fP\(** [ . \fIDigit\fP\(** ] [ \fIExp\fP [ \fISign\fP ] \fIDigit\fP+ ] (G2) +.DE +followed by a null byte. Here \(** means zero or more times. A floating +denotation which is in G2 but not in G1 draws a warning, one that is not even +in G2 causes a fatal error. +.LP +A string, representing a float which does not fit in a double causes a +warning to be given. +In that case, the returned value will be the double 0.0. +.LP +Floating point arithmetic is handled by some simple routines, checking for +over/underflow, and returning appropriate values in case of an ignored error. +.PP +Since not all C compilers provide floating point operations, there is a +compile time flag NOFLOAT, which, if defined, suppresses the use of all +fp operations in the interpreter. The resulting interpreter will still load +EM files with floats in the global data area (and ignore them) but will give a +fatal error upon attempt to execute a floating point instruction; consequently +code involving floating point operations can be run as long as the actual +instructions are avoided. +.NH 2 +Pointers. +.PP +The following sub-sections both deal with problems concerning pointers. +First, something is said about pointer arithmetic in general. +Then, the null-pointer problem is dealt with. +.NH 3 +Pointer arithmetic. +.PP +Strictly speaking, pointer arithmetic is defined only within a \fBfragment\fP. +From the explanation of the term fragment however (as given in [1], page 3), +it is not quite clear what a fragment should look like +from an interpreter's point of view. +For this reason we introduced the term \fBsegment\fP, +bordering the various areas within which pointer arithmetic is allowed. +Every stack-frame is a segment, and so are the global data area (GDA) and +the heap area. +Thus, the number of segments varies over time, and at some point in time is +given by the number of currently active stack-frames +(#CAL + #CAI \- #RET \- #RTT) plus 2 (gda, heap). +Pointers in the area between heap and stack (which is inaccessible by +definition), are assumed to be in the heap segment. +.PP +The interpreter, while building a new stack-frame (i.e. segment), stores the +value of the last ActualBase in a pointer-array (\fIAB_list[\ ]\fP). +When a pointer (say \fIP\fP) is available for arithmetic, the number +of the segment where it points (say \fIS\d\s-2P\s+2\u\fP), +is determined first. +Next, the arithmetic is performed, followed by a check on the number +of the segment where the resulting pointer \fIR\fP points +(say \fIS\d\s-2R\s+2\u\fP). +Now, if \fIS\d\s-2P\s+2\u != S\d\s-2R\s+2\u\fP, a warning is given: +\fBPointer arithmetic yields pointer to bad segment\fP. +.br +It may also be clear now, why the illegal area between heap and stack +was joined with the heap segment. +When calculating a new heap pointer (\fIHP\fP), one will obtain intermediate +results being pointers in this area just before it is made legal. +We do not want error messages all of the time, just because someone is +allocating space in the heap. +.LP +A similar treatment is given to the pointers in the SBS instruction; they have +to point into the same fragment for subtraction to be meaningful. +.LP +The length of the \fIAB_list[\ ]\fP is initially 100, +and it is reallocated in the same way the dynamically growing partitions +are (see 1.1). +.NH 3 +Null pointer. +.PP +Because the EM language lacks an instruction for loading a null pointer, +most programs solve this problem by loading a pointer-sized integer of +value zero, and using this as a null pointer (this is also proposed in [1]). +\fBInt\fP allows this, and will not complain. +A warning is given however, when an attempt is made to add something to a +null pointer (i.e. the pointer-sized integer zero). +.LP +Since many programming languages use a pointer to location 0 as an illegal +value, it is desirable to detect its use. +The big problem is though that 0 is a perfectly legal EM address; +address 0 holds the current line number in the source file. It may be freely +read but is written only by means of the LIN instruction. This allows us to +declare the area consisting of the line number and the file name pointer to be +read-only memory. Thus a store will be caught (and result in a warning) but a +read will succeed (and yield the EM information stored there). +.NH 2 +Function Return Area (FRA). +.PP +The Function Return Area (\fIFRA[\ ]\fP) has a default size of 8 bytes; +this default can +be overridden through the use of the \fB\-r\fP-option, but cannot be +made smaller than the size of two pointers, in accordance with the +remark on page 5 of [1]. +The global variable \fIFRASize\fP keeps track of how many bytes were +stored in the FRA, the last time a RET instruction was executed. +The LFR instruction only works when its argument is equal to this size. +If not, the FRA contents are loaded anyhow, but one of the following warnings +is given: +\fBReturned function result too large\fP (\fIFRASize\fP > LFR size) or +\fBReturned function result too small\fP (\fIFRASize\fP < LFR size). +.LP +Note that a C-program, falling through the end of its code without doing +a proper \fIreturn\fP or \fIexit()\fP, will generate this warning. +.PP +The only instructions that do not disturb the contents of the FRA are +GTO, BRA, ASP and RET. +This is expressed in the program by setting \fIFRA_def\fP to "undefined" +in any instruction except these four. +We realize this is a useless action most of the time, but a more +efficient solution does not seem to be at hand. +If a result is loaded when \fIFRA_def\fP is "undefined", the warning: +\fBReturned function result may be garbled\fP is generated. +.LP +Note that the FRA needs a shadow-FRA in order to store the shadow +information when performing a LFR instruction. +.NH 2 +Environment interaction. +.PP +The EM machine represented by \fBint\fP can communicate with +the environment in three different ways. +A first possibility is by means of (UNIX) interrupts; +the second by executing (relatively) high level system calls (called +monitor calls). +A third means of interaction, especially interesting for the debugging +programmer, is via internal variables set on the command line. +The former two techniques, and the way they are implemented will be described +in this section. +The latter has been allotted a separate section (3). +.NH 3 +Traps and interrupts. +.PP +Simple user programs will generally not mess around with UNIX-signals. +In interpreting these programs, the default actions will be taken +when a signal is received by the program: it gives a message and +stops running. +.LP +There are programs however, which try to handle certain signals +themselves. +In C, this is achieved by the system call \fIsignal(\ sig_no,\ catch\ )\fP, +which calls the handling routine \fIcatch()\fP, as soon as signal +\fBsig_no\fP occurs. +EM does not provide this call; instead, the \fIsigtrp()\fP monitor call +is available for mapping UNIX signals onto EM traps. +This implies that a \fIsignal()\fP call in a C-program +must be translated by the EM library routine to a \fIsigtrp()\fP call in EM. +.PP +The interpreter keeps an administration of the mapping of UNIX-signals +onto EM traps in the array \fIsig_map[NSIG]\fP. +Initially, the signals all have their default values. +Now assume a \fIsigtrp()\fP occurs, telling to map signal \fBsig_no\fP onto +trap \fBtrap_no\fP. +This results in: +.IP 1. +setting the relevant array element +\fIsig_map[sig_no]\fP to \fBtrap_no\fP (after saving the old value), +.IP 2. +catching the next to come \fBsig_no\fP signal with the handling routine +\fIHndlEMSig\fP (by a plain UNIX \fIsignal()\fP of course), and +.IP 3. +returning the saved map-value on the stack so the user can know the previous +trap value onto which \fBsig_no\fP was mapped. +.LP +On an incoming signal, +the handling routine for signal \fBsig_no\fP arms the +correct EM trap by calling the routine \fIarm_trap()\fP with argument +\fIsig_map[sig_no]\fP. +At the end of the EM instruction the proper call of \fItrap()\fP is done. +\fITrap()\fP on its turn examines the value of the \fIHaltOnTrap\fP variable; +if it is set, the interpreter will stop with a message. In the normal case of +controlled trap handling this bit is not on and the interpreter examines +the value of the \fITrapPI\fP variable, +which contains the procedure identifier of the EM trap handling routine. +It then initiates a call to this routine and performs a \fIlongjmp()\fP +to the main +loop to bypass all further processing of the instruction that caused the trap. +\fITrapPI\fP should be set properly by the library routines, through the +SIG instruction. +.LP +In short: +.IP 1. +A UNIX interrupt is caught by the interpreter. +.IP 2. +A handling routine is called which generates the corresponding EM trap +(according to the mapping). +.IP 3. +The trap handler calls the corresponding EM routine which emulates a UNIX +interrupt for the benefit of the interpreted program. +.PP +When considering UNIX signals, it is important to notice that some of them +are real signals, i.e., messages coming from outside the program, like DEL +and QUIT, but some are actually program-caused synchronous traps, like Illegal +Instruction. The latter, if they happen, are incurred by the interpreter +itself and consequently are of no concern to the interpreted program: it +cannot catch them. The present code assumes that the UNIX signals between +SIGILL (4) and SIGSYS (12) are really traps; \fIdo_sigtrp()\fP +will fail on them. +.LP +To avoid losing the last line(s) of output files, the interpreter should +always do a proper close-down, even in the presence of signals. To this end, +all non-ignored genuine signals are initially caught by the interpreter, +through the routine \fIHndlIntSig\fP, which gives a message and preforms a +proper close-down. +Synchronous trap can only be caused by the interpreter itself; they are never +caught, and consequently the UNIX default action prevails. Generally they +cause a core dump. +Signals requested by the interpreted program are caught by the routine +\fIHndlEMSig\fP, as explained above. +.NH 3 +Monitor calls. +.PP +For the convenience of the programmer, as many monitor calls as possible +have been implemented. +The list of monitor calls given in [1] pages 20/21, has been implemented +completely, except for \fIptrace()\fP, \fIprofil()\fP and \fImpxcall()\fP. +The semantics of \fIptrace()\fP and \fIprofil()\fP from an interpreted program +is unclear; the data structure passed to \fImpxcall()\fP is non-trivial +and the system call has low portability and applicability. +For these calls, on invocation a warning is generated, and the arguments which +were meant for the call are popped properly, so the program can continue +without the stack being messed up. +The errorcode 5 (IOERROR) is pushed onto the stack (twice), in order to +fake an unsuccessful monitor call. +No other \- more meaningful \- errorcode is available in the errno-list. +.LP +Now for the implemented monitor calls. +The returned value is zero for a successful call. +When something goes wrong, the value of the external \fIerrno\fP variable +is pushed, thus enabling the user to find out what the reason of failure was. +The implementation of the majority of the monitor calls is straightforward. +Those working with a special format buffer, (e.g. \fIioctl()\fP, +\fItime()\fP and \fIstat()\fP variants), need some extra attention. +This is due to the fact that working with varying word/pointer size +combinations may cause alignment problems. +.LP +The data structure returned by the UNIX system call results from +C code that has been translated with the regular C compiler, which, +on the VAX, happens to be a 4-4 compiler. +The data structure expected by the interpreted program conforms +to the translation by \fBack\fP of the pertinent include file. +Depending on the exact call of \fBack\fP, sizes and alignment may differ. +.LP +An example is in order. The EM MON 18 instruction in the interpreted program +leads to a UNIX \fIstat()\fP system call by the interpreter. +This call fills the given struct with stat information, the contents +and alignments of which are determined by the version of UNIX and the +used C compiler, resp. +The interpreter, like any program wishing to do system calls that fill +structs, has to be translated by a C compiler that uses the +appropriate struct definition and alignments, so that it can use, e.g., +\fIstab.st_mtime\fP and expect to obtain the right field. +This struct cannot be copied directly to the EM memory to fulfill the +MON instruction. +First, the struct may contain extraneous, system-dependent fields, +pertaining, e.g., to symbolic links, sockets, etc. +Second, it may contain holes, due to alignment requirements. +The EM program runs on an EM machine, knows nothing about these +requirements and expects UNIX Version 7 fields, with offsets as +determined by the em22, em24 or em44 compiler, resp. +To do the conversion, the interpreter has a built-in table of the +offsets of all the fields in the structs that are filled by the MON +instruction. +The appropriate fields from the result of the UNIX \fIstat()\fP are copied +one by one to the appropriate positions in the EM memory to be filled +by MON 18. +.PP +The \fIioctl()\fP call (MON 54) poses additional problems. Not only does it +have a second argument which is a pointer to a struct, the type of +which is dynamically determined, but its first argument is an opcode +that varies considerably between the versions of UNIX. +To solve the first problem, the interpreter examines the opcode (request) and +treats the second argument accordingly. The second problem can be solved by +translating the UNIX Version 7 \fIioctl()\fP request codes to their proper +values on the various systems. This is, however, not always useful, since +some EM run-time systems use the local request codes. There is a compile-time +flag, V7IOCTL, which, if defined, will restrict the \fIioctl()\fP call to the +version 7 request codes and emulate them on the local system; otherwise the +request codes of the local system will be used (as far as implemented). +.PP +Minor problems also showed up with the implementation of \fIexecve()\fP +and \fIfork()\fP. +\fIExecve()\fP expects three pointers on the stack. +The first points to the name of the program to be executed, +the second and third are the beginnings of the \fBargv\fP and \fBenvp\fP +pointer arrays respectively. +We cannot pass these pointers to the system call however, because +the EM addresses to which they point do not correspond with UNIX +addresses. +Moreover, (it is not very likely to happen but) what if someone constructs +a program holding the contents for one of these pointers in the stack? +The stack is implemented upside down, so passing the pointer to +\fIexecve()\fP causes trouble for this reason too. +The only solution was to copy the pointer contents completely +to fresh UNIX memory, constructing vectors which can be passed to the +system call. +Any impending memory fault while making these copies results in failure of the +system call, with \fIerrno\fP set to EFAULT. +.PP +The implementation of the \fIfork()\fP call faced us with problems +concerning IO-channels. +Checking messages (as well as logging) must be divided over different files. +Otherwise, these messages will coincide. +This problem was solved by post-fixing the default message file +\fBint.mess\fP (as well as the logging file \fBint.log\fP) with an +automatically leveled number for every new forked process. +Children of the original process do their diagnostics +in files with postfix 1,2,3 etc. +Second generation processes are assigned files numbered 11, 12, 21 etc. +When 6 generations of processes exist at one moment, the seventh will +get the same message file as the sixth, for the length of the filename +will become too long. +.PP +Some of the monitor calls receive pointers (addresses) from to program, to be +passed to the kernel; examples are the struct stat for \fIstat()\fP, the area +to be filled for \fIread()\fP, etc. If the address is wrong, the kernel does +not generate a trap, but rather the system call returns with failure, while +\fIerrno\fP is set to EFAULT. This is implemented by consistent checking of +all pointers in the MON instruction. +.NH 2 +Internal arithmetic. +.PP +Doing arithmetic on signed integers, the smallest negative integer +(\fIminsint\fP) is considered a legal value. +This is in contradiction with the EM Manual [1], page 14, which proposes using +\fIminsint\fP for uninitialized integers. +The shadow bytes already check for uninitialized integers however, +so we do not need this special illegal value. +Although the EM Manual provides two traps, for undefined integers and floats, +undefined objects occur so frequently (e.g. in block copying partially +initialized areas) that the interpreter just gives a warning. +.LP +Except for arithmetic on unsigneds, all arithmetic checks for overflow. +The value that is pushed on the stack after an overflow occurs depends +on the UNIX behavior with regard to that particular calculation. +If UNIX would not accept the calculation (e.g. division by zero), a zero +is pushed as a convention. +Illegal computations which UNIX does accept in silence (e.g. one's +complement of \fIminsint\fP), simply push the UNIX-result after giving a +trap message. +.NH 2 +Shadow bytes implementation. +.PP +A great deal of run-time checking is performed by the interpreter (except if +used in the fast version). +This section gives all details about the shadow bytes. +In order to keep track of information about the contents of D-space (stack +and global data area), there is one shadow-byte for each byte in these spaces. +Each bit in a shadow-byte represents some piece +of information about the contents of its corresponding 'sun-byte'. +All bits off indicates an undefined sun-byte. +One or more bits on always guarantees a well-defined sun-byte. +The bits have the following meaning: +.IP "\(bu bit 0:" 8 +indicates that the sun-byte is (a part of) an integer. +.IP "\(bu bit 1:" 8 +the sun-byte is a part of a floating point number. +.IP "\(bu bit 2:" 8 +the sun-byte is a part of a pointer in dataspace. +.IP "\(bu bit 3:" 8 +the sun-byte is a part of a pointer in the instruction space. +According to [1] (paragraph 6.4), there are two types pointers which +must be distinguishable. +Conversion between these two types is impossible. +The shadow-bytes make the distinction here. +.IP "\(bu bit 4:" 8 +protection bit. +Indicates that the sun-byte is part of a protected piece of memory. +There is a protected area in the stack, the Return Status Block. +The EM machine language has no possibility to declare protected +memory, as is possible in EM assembly (the ROM instruction). The protection +bit is, however, set for the line number and filename pointer area near +location 0, to aid in catching references to location 0. +.IP "\(bu bit 5/6/7:" 8 +free for later use. +.LP +The shadow bytes are managed by the routines declared in \fIshadow.h\fP. +The warnings originating from checking these shadow-bytes during +run-time are various. +A list of them is given in appendix A, together with suggestions +(primarily for the C-programmer) where to look for the trouble maker(s). +.LP +A point to notice is, that once a warning is generated, it may be repeated +thousands of times. +Since repetitive warnings carry little information, but consume much +file space, the interpreter keeps track of the number of times a given warning +has been produced from a given line in a given file. +The warning message will +be printed only if the corresponding counter is a power of four (starting at +1). In this way, a logarithmic back-off in warning generation is established. +.LP +It might be argued that the counter should be kept for each (warning, PC +value) pair rather than for each (warning, file position) pair. Suppose, +however, that two instruction in a given line would cause the same message +regularly; this would produce two intertwined streams of identical messages, +with their counters jumping up and down. This does not seem desirable. +.NH 2 +Return Status Block (RSB) +.PP +According to the description in [1], at least the return address and the +base address of the previous RSB have to be pushed when performing a call. +Besides these two pointers, other information can be stored in the RSB +also. +The interpreter pushes the following items: +.IP \- +a pointer to the current filename, +.IP \- +the current line number (always four bytes), +.IP \- +the Local Base, +.IP \- +the return address (Program Counter), +.IP \- +the current procedure identifier +.IP \- +the RSB code, which distinguishes between initial start-up, normal call, +returnable trap and non-returnable trap (a word-size integer). +.LP +Consequently, the size of the RSB varies, depending on +word size and pointer size; its value is available as \fIrsbsize\fP. +When the RSB is removed from the stack (by a RET or RTT) the RSB code is under +the Stack Pointer for immediate checking. It is not clear what should be done +if RSB code and return instruction do not match; at present we give a message +and continue, for what it is worth. +.PP +The reason for pushing filename and line number is that some front-ends tend +to forget the LIN and FIL instructions after returning from a function. +This may result in error messages in wrong source files and/or line numbers. +.PP +The procedure identifier is kept and restored to check that the PC will not +move out of the running procedure. The PI is an index in the proctab, which +tells the limits in the text segment of the running procedure. +.PP +If the Return Status Block is generated as a result of a trap, more is +stacked. Before stacking the normal RSB, the trap function pushes the +following items: +.IP \- +the contents of the entire Function Return Area, +.IP \- +the number of bytes significant in the above (a word-size integer), +.IP \- +a word-size flag indicating if the contents of the FRA are valid, +.IP \- +the trap number (a word-size integer). +.LP +The latter is followed directly by the RSB, and consequently acts as the only +parameter to the trap handler. +.NH 2 +Operand access. +.PP +The EM Manual mentions two ways to access the operands of an instruction. It +should be noticed that the operand in EM is often not the direct operand of the +operation; the operand of the ADI instruction, e.g., is the width of the +integers to be added, not one of the integers themselves. The various operand +types are described in [1]. Each opcode in the text segment identifies an +instruction with a particular operand type; these relations are described in +computer-readable format in a file in the EM tree, \fIip_spec.t\fP. +.PP +The interpreter uses a variant of the second method. Several other approaches +can be designed, with increasing efficiency and equally increasing complexity. +They are briefly treated below. +.NH 3 +The Dispatch Table, Method 1. +.PP +When the interpreter starts, it reads the ip_spec.t file and constructs from it +a dispatch table. This table (of which there are actually three, +for primary, secondary +and tertiary opcodes) has 256 entries, each describing an instruction with +indications on how to decode the operand. For each instruction executed, the +interpreter finds the entry in the dispatch table, finds information there on +how to access the operand, constructs the operand and calls the appropriate +routine with the operand as calculated. There is one routine for each +instruction, which is called with the ready-made operand. Method 1 is easy to +program but requires constant interpretation of the dispatch table. +.NH 3 +Intelligent Routines, Method 2. +.PP +For each opcode there is a separate routine, and since an opcode uniquely +defines the instruction and the operand format, the routine knows how to get +the operand; this knowledge is built into the routine. Preferably the heading +of the routine is generated automatically from the ip_spec.t file. Operand +decoding is immediate, and no dispatch table is needed. Generation of the +469 required routines is, however, far from simple. Either a generated array +of routine names or a generated switch statement is used to map the opcode onto +the correct routine. The switch approach has the advantage that parameters can +be passed to the routines. +.LP +The interpreter uses a variant of the switch statement scheme. Numerical +information that can be deduced from the opcode is passed as parameters to the +routine; this includes the argument of minis, the high order byte of shorties, +and the fact that the result is to be multiplied by the word size. This +reduces the number of required routines to 338. +.NH 3 +Intelligent Calls. +.PP +The call in the switch statement does full operand construction, and the +resulting operand is passed to the routine. This reduces the number of +routines to 133, the number of EM instructions. Generation of the switch +statement from ip_spec.t will be complicated, but the routine space will be +much cleaner. This will not give any speed-up since the same actions are still +required; they are just performed in a different place. +.NH 3 +Static Evaluation. +.PP +It can be observed that the evaluation of the operand of a given instruction in +the text segment will always give the same result. It is therefore possible to +preprocess the text segment, decomposing the instructions into structs which +contain the address, the instruction code and the operand. No operand decoding +will be necessary at run-time: all operands have been precalculated. This will +probably give a considerable speed-up. Jumps, especially GTO jumps, will, +however, require more attention. +.NH 2 +Disassembly. +.PP +A disassembly facility is available, which gives a readable but not +letter-perfect disassembly of the EM object. The procedure structure is +indicated by placing the indication \fBP[n]\fP at the entry point of each +procedure, where \fBn\fP is the procedure identifier. The number of locals is +given in a comment. +.LP +The disassembler was generated by the software in the directory \fIswitch\fP +and then further processed by hand. diff --git a/doc/int/txt3 b/doc/int/txt3 new file mode 100644 index 000000000..63530f9c8 --- /dev/null +++ b/doc/int/txt3 @@ -0,0 +1,181 @@ +.\" Logging +.\" +.\" $Header$ +.bp +.NH +THE LOGGING MACHINE. +.PP +Since messages and warnings provided by \fBint\fP include source code file +names and line numbers, they alone often suffice to identify the error. +If, however, the necessity arises, much more extensive debugging information +can be obtained by activating the the Logging Machine. +This Logging Machine, which monitors all actions of the EM machine, is the +subject of this chapter. +.NH 2 +Implementation. +.PP +When inspecting the source code of \fBint\fP, many lines in the +following format will show up: +.DS +LOG(("@<\fIletter\fP><\fIdigit\fP> message", args)); +.DE +or +.DS +LOG(("\ <\fIletter\fP><\fIdigit\fP> message", args)); +.DE +The double parentheses are needed, because \fILOG()\fP is +declared as a define, and has a printf-like argument structure. +.PP +The <\fIletter\fP> classifies the log message and corresponds to an entry in +the \fIlogmask\fP, which holds a threshold for each class of messages. +The following classes exist: +.TS +tab(@); +l l l. +\(bu A\-Z@the flow of instructions: +@A: array +@B: branch +@C: convert +@F: floating point arithmetic +@I: integer arithmetic +@L: load +@M: miscellaneous +@P: procedure call +@R: pointer arithmetic +@S: store +@T: compare +@U: unsigned arithmetic +@X: logical +@Y: sets +@Z: increment/decrement/zero +\(bu d@stack dumping. +\(bu g@gda & heap manipulation. +\(bu s@stack manipulation. +\(bu r@reading the loadfile. +\(bu q@floating point calculations during reading the loadfile. +\(bu x@the instruction count, contents and file position. +\(bu m@monitor calls. +\(bu p@procedure calls and returns. +\(bu t@traps. +\(bu w@warnings. +.TE +.LP +When the interpreter reaches a LOG(()) statement it scans its first argument; +if \fIletter\fP +occurs in the logmask, and if \fIdigit\fP is lower or equal to the +threshold in the logmask, the message is given. +Depending on the first character, the message will be preceded by a +position indication (with the @) or will be printed as is (with the +space). +The \fIletter\fP is determines the message class +and the \fIdigit\fP is used to distinguish various levels +of logging, with a lower digit indicating a more important message. +We will call the <\fIletter\fP><\fIdigit\fP> combination the \fBid\fP of +the logging. +.LP +In general, the lower the \fIdigit\fP following the \fIletter\fP, +the more important the message. +E.g. m5 reports about unsuccessful monitor calls only, m9 also reports +about successful monitors (which are obviously less interesting). +New logging messages can be added to the source code on places you +think relevant. +.LP +Reasonable settings for the logmask are: +.TS +tab(@); +l l l. + @A\-Z9d4twx9@advised setting when trouble shooting (default). + @A\-Zx9@shows the flow of instructions & global information. + @pm9@shows the procedure & monitor calls. + @tw9@shows warning & trap information. +.TE +.PP +An EM interpreter without a Logging Machine can be obtained by undefining the +macro \fICHECKING\fP in the file \fIchecking.h\fP. +.NH 2 +Controlling the Logging machine. +.PP +The actions of the Logging Machine are controlled by a set of internal +variables (one of which is the log mask). +These variables can be set through assignments on the command line, as +explained int the manual page \fIint.1\fP, q.v. +Since there are a great many logging statements in the program, of which only a +few will be executed in any call of the interpreter, it is important to be able +to decide quickly if a given \fIid\fP has to be checked at all. +To this end all logging statements are guarded (in the #define) by a test for +the boolean variable \fIlogging\fP. +This variable will only be set if the command line assignments show the +potential need for logging (\fImust_log\fP) and the instruction count +(\fIinr\fP) is at least equal to \fIlog_start\fP (which derives from the +parameter \fBLOG\fP). +.LP +The log mask can be set by the assignment +.DS +"LOGMASK=\fIlogstring\fP" +.DE +which sets the current logmask to \fIlogstring\fP. +A logstring has the following form: +.DS +[ [ \fIletter\fP | \fIletter\fP \- \fIletter\fP ]+ \fIdigit\fP ]+ +.DE +E.g. LOGMASK=A\-D8x9R7c0hi4 will print all messages belonging to loggings +with \fBid\fPs: +\fIA0..A8,B0..B8,C0..C8,D0..D8,x0..x9,R0..R7,c0,h0..h4,i0..i4\fP. +.PP +The logging variable STOP can be used to prevent run-away logging +past the point where the user expects an error to occur. +STOP=\fInr\fP will stop the interpreter after instruction number \fInr\fP. +.PP +To simplify the use of the logging machine, a number of abbreviations have been +defined. +E.g., AT=\fInr\fP can be thought of as an abbreviation of LOG=\fInr\-1\fP +STOP=\fInr+1\fP; this causes three stack dumps, one before the suspect +instruction, one on it and one after it; then the interpreter stops. +.PP +Logging results will appear in a special logging file (default: \fIint.log\fP). +.NH 2 +Dumps. +.PP +There are three routines available to examine the memory contents: +.TS +tab(@); +l l l. + @\fIstd_all()\fP@dumps the contents of the stack (\fId1\fP or \fId2\fP must be in the logmask). + @\fIgdad_all()\fP@dumps the contents of the gda (\fI+1\fP must be in the logmask). + @\fIhpd_all()\fP@dumps the contents of the heap (\fI*1\fP must be in the logmask). +.TE +.LP +These routines can be used everywhere in the program to examine the +contents of memory. +The internal variables allow the +gda and heap to be dumped only once (according to the +corresponding internal variable). +The stack is dumped after each +instruction if the log mask contains d1 or d2; d2 gives a full formatted +dump, d1 produces a listing of the Return Status Blocks only. +An attempt is made to format the stack correctly, based on the shadow +bytes, which identify the Return Status Block. +.LP +Remember to set the correct \fBid\fP in the LOGMASK, and to give +LOG the correct value. +If dumping is needed before the first instruction, then LOG must be +set to 0. +.LP +The dumps of the global data area and the heap are controlled internally by +the id-s +1 and *1 resp.; the corresponding logmask entries are set +automatically by setting the GDA and HEAP variables. +.NH 2 +Forking. +.PP +As mentioned earlier, a call to \fIfork()\fP, causes an image of the current +program to start running. +To prevent a messy logfile, the child process gets its own logfile +(and message file, tally file, etc.). +These logfiles are distinguished from the parent logfile by the a +postfix, e.g., +\fIlogfile_1\fP for the first child, \fIlogfile_2\fP for the second child, +\fIlogfile_1_2\fP for the second child of the first child, etc. +.br +\fINote\fP: the implementation of this feature is shaky; it works for the log +file but should also work for other files and for the names of the logging +variables. diff --git a/util/int/int.1 b/util/int/int.1 new file mode 100644 index 000000000..37d50d1f6 --- /dev/null +++ b/util/int/int.1 @@ -0,0 +1,200 @@ +.\" Manual page +.\" +.\" $Header$ +.TH INT I +.ad +.SH NAME +int \- Interpreter for EM Machine Language +.SH SYNOPSIS +\fBint\fP [ intargs ] [ emfile [ emargs ] ] +.SH DESCRIPTION +This program interprets the EM machine-language, and replaces +the pascal written EM interpreter described in [1]. +The program interprets load files in \fIe.out\fP format (see [1], sec. 10.3). +.LP +\fIEmfile\fP is the name of the load file; if no name is +specified, the default name \fIe.out\fP is used. +The program can handle several word size / pointer size combinations. +The combinations presently supported are 2/2, 2/4 and 4/4. +.LP +\fIEmargs\fP are the arguments for the program being interpreted. +If any arguments are given, then \fIemfile\fP must be present. +.PP +The interpreter can generate diagnostic messages (warnings) about the +interpreted program. +Some of these warnings are given very frequently, +which may result in a large, non-functional message file. +To avoid this behavior, counters keep track of the number of times +a given warning occurs in a given file at a given line number. +Only when this counter is a power of 4, the warning will actually be +given. +`Logarithmic warning generation' is established in this way. +.PP +\fIInt\fP preempts the highest two file descriptors available, for +diagnostic purposes. +Interpreted programs can use the other file descriptors without +clash problems. +.PP +.I "Interpreter parameters" +.br +\fIInt\fP itself accepts the following options, all given as separate flags: +.IP \fB\-d\fP +The program will not be run; a disassembly listing of the program will +be written to standard output file instead. +The original names are lost, but the procedure structure is recovered. +.IP \fB\-h\fP\fIN\fP +The maximum size of the heap will be limited to \fIN\fP bytes. This can be +used to force a heap overflow trap. +.IP \fB\-I\fP\fIN\fP +It is possible to tell \fIint\fP to ignore traps in the range 0-15. +If a trap is ignored, every time the trap would have happened +a warning is generated instead. +The argument \fIN\fP is the trap number, as described in [1], sec. 9. +For ignoring more than one trap, several \fB\-I\fP flags are needed. +.IP \fB\-m\fP\fIfile\fP +The argument \fIfile\fP is the name of a file on which the messages will +appear. +The default file name is \fIint.mess\fP. +.IP \fB\-r\fP\fIN\fP +Determines the size of the Function Return Area. +Default: 2 \(mu pointer size. +.IP \fB\-s\fP\fIN\fP +The maximum size of the stack will be limited to \fIN\fP bytes. This can be +used to force a stack overflow trap. +.IP \fB\-t\fP +If given, a file \fIint.tally\fP will be produced upon program termination. +For each source file, it contains a list of line numbers visited, +with the number of times the line was visited and +the number of EM instructions executed on the line. +.IP \fB\-W\fP\fIN\fP +This option can be used to disable warnings. +The argument \fIN\fP is the number of the warning to be suppressed, +as found in the \fIint\fP documentation [3]. +For disabling more than one warning, several \fB\-W\fP flags are needed. +.PP +.I "The Logging Machine" +.br +The EM machine is monitored continually by a Logging Machine. This logging +machine keeps an instruction count and +can produce a trace of the actions of the EM machine, make readable +dumps of the stack, heap and global data area, and stop the EM machine after a +given instruction number. +The actions of the logging machine are controlled by +its internal variables, the values of which can be set by assignments on the +command line, much like setting macro names in a call of \fImake\fP. +These assignments can be interspersed with the options for the EM machine. +.PP +The logging machine has the following internal variables: +.IP \fBLOG\fP=\fIN\fP +Logging will start when the instruction count has reached \fIN\fP. +.IP \fBLOGMASK\fP=\fIstring\fP +The tracing actions are controlled by a log mask; the log mask consists of a +list of pairs of action classes and logging levels. +E.g. \fBLOGMASK\fP=\fIm9\fP means: trace all monitor calls. +The action classes are described fully in [3]. +The default log mask is reasonably suitable. +.IP \fBLOGFILE\fP=\fIstring\fP +The \fIstring\fP is the name of a file on which all logging information is +written. +The default file name is \fIint.log\fP. +.IP \fBSTOP\fP=\fIN\fP +The logging machine stops the EM machine after instruction \fIN\fP. +.PP +Stack dumps can be made after each instruction; they are controlled by the pair +\fBd4\fP in the log mask; gda and heap dumps can only be made after a specific +instruction. +The following internal variables pertain to memory dumps: +.IP \fBGDA\fP=\fIN\fP +The contents of the Global Data Area are dumped after instruction \fIN\fP. The +extent can be adjusted by setting \fBGMIN\fP=\fINmin\fP (default 0) and +\fBGMAX\fP=\fINmax\fP (default HB). +.IP \fBHEAP\fP=\fIN\fP +The contents of the heap are dumped after instruction \fIN\fP. +.IP \fBSTDSIZE\fP=\fIN\fP +The stack dump is restricted to the \fIN\fP topmost bytes. +.IP \fBRAWSTACK\fP=\fIN\fP +Normally the stack dump produced is divided into activation records +separated by formatted dumps of the Return Status Blocks. +If \fIN\fP is non-zero, this dividing and formatting is suppressed, and the +stack is dumped raw. +.PP +Some combinations of variable settings are generally useful and can be +abbreviated: +.IP \fBAT\fP=\fIN\fP +Is an abbreviation of \fBLOG\fP=\fIN\-1\fP \fBSTOP\fP=\fIN+1\fP. +The default log mask applies. +.IP \fBL\fP=\fIstring\fP +Is an abbreviation of \fBLOG\fP=\fI0\fP \fBLOGMASK\fP=\fIstring\fP. +E.g., \fBL\fP=\fIm9\fP will log all monitor calls +and \fBL\fP=\fIA\-Z9\fP will log all instructions (give a full trace). +.PP +When the interpreter forks, the child continues logging on a new file named +\fIint.log_1\fP, etc. +In principle it reevaluates the interpreter arguments, now looking for +\fBLOG_1\fP, \fBLOGMASK_1\fP, etc., but this feature has not been fully +implemented. +.PP +.I "Diagnostics" +.br +All diagnostics are written to the message file. +Diagnostics come in three flavors: +.IP \- +(messages): These inform you about NOP instructions, give more information +about incoming signals and display the exit status of the program. +.IP \- +(warnings): These are generated as a result of the checking. +In most cases the diagnostic is self-explanatory. +A complete description of the warnings can be found in the \fIint\fP +documentation [3]. +.IP \- +(fatal errors): This diagnostic is the result of an irrecoverable +error, generally before the program has started: incorrect call of the +interpreter, cannot access file, incorrect format of load file. A few follow +during interpretation: out of memory, uncaught traps, floating point operation +on a version without floating point; +execution stops immediately after the diagnostic is generated. +.PP +Further diagnostics are generated (on \fIstderr\fP) if files cannot +be opened or found. +.SH "SEE ALSO" +e.out(5), ack(1), em22(1), em24(1), em44(1). +.IP [1] +Andrew S. Tanenbaum, Hans van Staveren, Ed G. Keizer and Johan W. Stevenson, +\fIDescription of a Machine Architecture for use with Block +Structured Languages\fP, Informatica rapport IR-81. +.IP [2] +Amsterdam Compiler Kit, reference manual and UNIX manual pages. +.IP [3] +Eddo de Groot, Leo van den Berge, Dick Grune, +\fIThe EM Interpreter\fP. +.SH "FILES" +.ta 20n +int.mess contains messages +.br +int.log contains logging info, if requested +.br +int.tally contains tally results, if requested +.br +int.core produced upon fatal error; format provisional +.SH "BUGS" +The monitor calls +.IR mpxcall , +.I ptrace +and +.I profile +have not been implemented. +.br +The maximum number of bytes for rotation is 4. +.br +The UNIX V7 struct tchars is not emulated under System V. +.br +The P and N restrictions on operands are not checked. +.br +The start-up has a quadratic component in the number of procedures in the EM +program. +.SH "AUTHORS" +L.J.A. van den Berge. +.br +E.J. de Groot. +.br +D. Grune -- 2.34.1