Nick Downing downing.nick@gmail.com 2.11BSD source tree modified for cross compilation from x86-64 linux Initial commit was an unmodified 2.11BSD source tree taken from a boot tape. Unpacked from http://www.tuhs.org/Archive/PDP-11/Distributions/ucb/2.11BSD: 7263558ebfb676b4b9ddacc77da464c5 file7.tar.gz 77397e6d554361c127592b1fea2d776f file8.tar.gz The root of this repository is "/usr/src" on a running 2.11BSD system. The boot tape file7.tar.gz provides "/usr/src/include" and "/usr/src/sys". The boot tape file8.tar.gz provides everything else in "/usr/src". To compile this repository, simply run "./n.sh" (this stands for "Nick"). It will clean the source tree, compile and install a cross toolchain into "cross", clean again, then compile and install a 2.11BSD filesystem tree into "stage". The cross toolchain installed into "cross" consists of the following files: cross/bin/ar cross/bin/as cross/bin/cc cross/bin/ld cross/bin/nm cross/bin/size cross/lib/c0 cross/lib/c1 cross/lib/c2 cross/lib/cpp cross/usr/bin/lorder cross/usr/bin/mkdep cross/usr/bin/ranlib cross/usr/lib/libvmf.a cross/usr/lib/libvmf_p.a cross/usr/man/cat1/ar.0 cross/usr/man/cat1/ld.0 cross/usr/man/cat1/ranlib.0 cross/usr/man/cat3/vmf.0 cross/usr/man/cat5/ar.0 cross/usr/man/cat5/ranlib.0 cross/usr/ucb/strcompact cross/usr/ucb/symcompact cross/usr/ucb/symdump cross/usr/ucb/symorder The cross toolchain is created by modifying the appropriate sources in this tree to compile under gcc for x86-64 linux. I created a compatibility header file called "krcompat.h", copied to various source directories, which contains definitions for prototypes and varargs functions. So we should have a common source for the toolchain which can compile under x86-64 linux and 2.11BSD. I have fixed or suppressed all compiler warnings. A great many of these are due to the K&R to ANSI conversion, I fixed them by introducing a standard way of declaring function headers, etc. Declarations in the original source code like somefunc(a, p) char *p; { register b; ... } become int somefunc(a, p) int a; char *p; { register int b; ... } and functions which do not return a value have "int" changed (by me) to "void". Note that "void" is apparently not K&R, it is "extended K&R". However, I have just used "void", and if this causes any problems later, I'll change to "VOID", which I can then suppress by means of a compatibility define in "krcompat.h". In the course of this, I have changed all function headers to one line, and I have changed parameters like "char *p, *q;" to "char *p; char *q;". This occurs because I use "cproto" and a VERY ROUGH script (not included in the repository) to convert them automatically. It is also because I prefer the source that way. The varargs conversion is pretty simple. The K&R code is first converted to use rather than any ad-hoc convention it might have used before. Then, if __STDC__ is defined, it uses instead of , and defines the function using an ANSI header rather than K&R. Example in "lib/ccom/c01.c": #ifdef __STDC__ void werror(char *s, ...) #else void werror(s, va_alist) char *s; va_dcl #endif { va_list ap; if (Wflag) return; if (filename[0]) fprintf(stderr, "%s:", filename); fprintf(stderr, "%d: warning: ", line); va_start(ap, s); vfprintf(stderr, s, ap); va_end(ap); fprintf(stderr, "\n"); } Note: With it should be "va_start(ap)" but I hope the above is OK. In many cases the toolchain source code relied on running under 2.11BSD, e.g. the linker and symbol-related utilities are directly manipulating struct exec and struct nlist etc, and expecting the on-disk format to match the in-memory. To fix this I defined macros like OFF_T, INT, UNSIGNED_INT and whatever else was appropriate, which are the original types (off_t, int, unsigned int) on 2.11BSD but compatibility types (int32_t, int16_t, uint16_t) on x86-64 linux. Then I intercepted disk read/writes to occur through a temporary buffer, e.g.: #ifdef pdp11 fwrite(&stroff, sizeof (OFF_T), 1, fpin); #else temp[0] = (stroff >> 16) & 0xff; temp[1] = (stroff >> 24) & 0xff; temp[2] = stroff & 0xff; temp[3] = (stroff >> 8) & 0xff; fwrite(temp, sizeof (OFF_T), 1, fpin); #endif This handles byte order issues, in particular the PDP-11 convention of storing a long with the high word first (but storing each word low byte first). Since x86-64 is little-endian like the PDP-11 there may be a few places which are not fully converted. In particular, the conversion of "ld", "ar" and "nm" is a bit rough, and I plan to go back and make it similar to "symcompact" and friends. I used a temporary buffer instead of an in-place conversion like htons() and friends, because I'm concerned about C's aliasing rules and gcc's optimizer. Another issue was the compiler second pass "/lib/c1" when it generates floating point constaints or performs constant folding. The host system uses IEEE-754, whereas the PDP-11 uses its own conventions. Since I want the cross toolchain to generate EXACTLY the same binaries as the traditional PDP-11 hosted tools, I had to take the floating-point emulation code from "simh" and put it in "c1". A further issue was the definition of struct nlist (and others) like this: struct nlist { union { char *n_name; /* In memory address of symbol name */ OFF_T n_strx; /* String table offset (file) */ } n_un; u_char n_type; /* Type of symbol - see below */ char n_ovly; /* Overlay number */ U_INT n_value; /* Symbol value */ }; Unfortunately the n_name pointer breaks things on a 64-bit system because it is larger than OFF_T and causes sizeof(struct nlist) to be wrong. So I have made all the client programs that refer to this structure use n_strx exclusively, n_name is only defined when compiling for PDP-11. This was an easy change since there is always an associated string table so we just offset into it as needed. Since the above conversions tend to increase code bloat, and the PDP-11 tools are often running on the limit of memory, they do not apply when "pdp11" is defined, although in theory there should be no need to make this distinction. Since the host system is very similar to the target system, we can use tools like "/bin/sort", "/bin/sed" and the gnu "make" tool provided by the host, although it would also be possible to build cross versions of these tools. It is easier to use the native tools and work around occasional incompatibilities, for example I changed "sort -t/ +1" in a Makefile to simply "sort -t/" since the comparison of the first field was not going to change the outcome anyway. One problem with the host system being similar to the target system, is that when the cross tools include something like , the host wants to provide its own version. This is rather delicate to work around, for the sake of minimal change I created a subdirectory called "include" alongside any affected sources, for example "bin/ld/ld.c" includes so it gets a directory "bin/ld/include". In this directory there is a collection of links: a.out.h -> ../../../include/a.out.h ar.h -> ../../../include/ar.h nlist.h -> ../../../include/nlist.h ranlib.h -> ../../../include/ranlib.h vmf.h -> ../../../include/vmf.h There is also a further directory "bin/ld/include/sys" containing this link: exec.h -> ../../../../sys/h/exec.h This is a bit fragile since we cannot say whether the host might be trying to use its own deep inside some other include file like , so it's not really the best solution to the problem. It would be better to have a define saying where to include the files from, and perhaps even alternative names like "struct nlist_211bsd" instead of "struct nlist", but this is rather bloated, and I got around the problem by doing the above hacks for the moment. To build the cross toolchain from a common source which can also build the PDP-11 hosted toolchain, my strategy was to change the Makefiles as little as possible. The most significant change is to modify "cc" to "${CC}" and so on, in many cases it was already like this, but I had to introduce aliases for all of the cross build tools, so "mkdep" becomes "${MKDEP}" and so on. The "make" tool included in 2.11BSD only provides "CC" and "AS" by default, so each Makefile is supposed to have lines like "MKDEP=/usr/bin/mkdep", so as not to break the PDP-11 hosted buildsystem. BUT, I suspect most of these are missing, since I have not tested the PDP-11 hosted buildsystem yet. I will fix it later. So the "make" commands to build the cross toolchain look something like this: make CC="cc -Iinclude -Wall -Wno-char-subscripts -Wno-deprecated-declarations -Wno-format -Wno-maybe-uninitialized -Wno-parentheses -Wno-unused-result" CROSSDIR="/home/nick/src/211bsd.git/cross" STAGEDIR="/home/nick/src/211bsd.git/stage" SEPFLAG= make DESTDIR="/home/nick/src/211bsd.git/stage" install The above example is taken from building "cross/bin/cc" since it needs to know both CROSSDIR and STAGEDIR so it is a good example of how I handle directories. The compiled "bin/cc" expects to find things in various places which are hard coded into the source, so I changed the Makefile to pass the CROSSDIR and the STAGEDIR into the compilation using defines, giving a compilation command like: cc -Iinclude -Wall -Wno-char-subscripts -Wno-deprecated-declarations -Wno-format -Wno-maybe-uninitialized -Wno-parentheses -Wno-unused-result -DCROSSDIR="\"/home/nick/src/211bsd.git/cross\"" -DSTAGEDIR="\"/home/nick/src/211bsd.git/stage\"" -c -o cc.o cc.c Inside the source file "bin/cc/cc.c", I've adjusted the hard coded paths like: char *cpp = CROSSDIR "/lib/cpp"; char *ccom = CROSSDIR "/lib/c0"; char *ccom1 = CROSSDIR "/lib/c1"; char *c2 = CROSSDIR "/lib/c2"; char *as = CROSSDIR "/bin/as"; char *ld = CROSSDIR "/bin/ld"; char *crt0 = STAGEDIR "/lib/crt0.o"; This means before we can link, we'll have to build the C library into STAGEDIR. Since the "make" commands are hard to remember and inconvenient to type, and the top-level "n.sh" command is overkill since it builds everything and since it cleans instead of just rebuilding what has changed: I have put a smaller script "n.sh" in each directory I've visited, which gives the correct commands and installs the result (if it's a tool, it gets built as a cross tool and installed into CROSSDIR, otherwise it gets built for the target and installed into STAGEDIR). So it's easy to add debugging statements, compile with -g, etc. Most of the development work so far has gone into building and debugging the cross toolchain. We can't yet build all of the target, but we can build this: stage/lib/crt0.o stage/lib/libc.a stage/lib/mcrt0.o stage/include (copied from the source tree using "make install") stage/unix stage/usr/lib/libc_p.a stage/usr/lib/libkern.a I have done some ad-hoc tests, and these files work correctly when copied to an existing 2.11BSD system under "simh". For instance, I can compile and run the "adventure" game using the above C startup code and libraries. I can boot the kernel. I have verified that the kernel and all its object files are binary identical to what the PDP-11 hosted build produces. I do not yet have a way of making "ar" and "ranlib" produce a binary identical copy of a library, so I have not definitively verified these are the same, but I see no problem so far.