--- /dev/null
+Star Blazer disassembly notes
+
+- The original file I worked with is cracked by Mr. Xerox and obtained from:
+ https://archive.org/download/a2_Star_Blazer_1981_Star_Craft/Star_Blazer_1981_Star_Craft.do
+
+- The crack is a bit crude, basically it's a memory dump of how things looked
+ when the original loader was going to transfer control to the game code.
+
+- Since I did not want to disassemble a lot of junk I created a different crack
+ loader, it is similar to the original but it only loads the essential parts.
+
+- For details about this process see the files in /loader:
+
+ - The program /loader/dejunk.py overwrites parts of the original binary that
+ I think are junk, see the notes in the code about address range meanings.
+
+ - The files /loader/star_blazer_dejunked*.bin are dejunked with different
+ fill values, I run both of these to check that the dejunking did not break.
+
+ - The program /loader/hires_loader.py extracts the non-junk parts of the
+ dejunked binary (it works with the original or dejunked binary, but I
+ prefer the latter, since it makes the disassembly cleaner given that there
+ is one region where the alignment padding was not zeroed out originally),
+ concatenates them, and then splits them into parts for my crack loader.
+ It then installs my crack loader and composes the parts ready for loading.
+
+ - The binary /loader/hires_loader.bin is my short machine code program which
+ lives in the hires screen at $2000 and its function is to copy the tail of
+ the program from hires screen memory to the tail location at approx $7e00.
+ It also contains some initializations of zero page, registers and so forth.
+ I haven't included a separate source file as it gets disassembled later on.
+
+ - The result of the re-cracking is /loader/star_blazer_hires_loader.bin which
+ can be played the same as the original binary but is significantly smaller.
+
+- Then see the disassembly in /disasm, in particular /disasm/star_blazer.asm
+ is an ASxxxx source that assembles to /loader/star_blazer_hires_loader.bin.
+ It contains switches at the top of the file to control my game modifications
+ and if left alone (ALIGN = 1, SHAPE = 1) it will produce the original game.
+
+- The ASxxxx assembler used for this project is by Alan R. Baldwin and has a
+ home page at https://shop-pdp.net/ashtml/asxxxx.php. Use "as6500" for 6502.
+
+ - This assembler does not use the most conventional syntax since constants
+ are C-style, e.g. "0x2000" not "$2000", and addressing modes use square
+ brackets, e.g. "lda [0x2000],y" not "lda ($2000),y". The C-style constants
+ are good for projects that compbine C and assembly code, since you can use
+ common include files. The square bracket syntax is not good and I contacted
+ the author who told me it is for historical reasons and promised to fix it.
+
+ - Zero-page references are indicated by "*", e.g. "lda *0x20,x" produces the
+ 2-byte instruction whereas "lda 0x20,x" should produce the 3-byte version.
+ (I say *should* because there is an inconsistency in this process that I
+ discovered recently and I will investigate it later and fix the assembler).
+ The assembler *can* generate zero-page references automatically if you use
+ the ".dpage" pseudo-op in the ".area zpage" section, but I haven't done so
+ in case there are places where the reassembled binary doesn't match the
+ original. Probably there aren't, but having control via "*" is quite good.
+
+- I have used a procedure like this to produce the disassembly:
+
+ - Run /disasm/load.py to perform the relocation that is normally done by the
+ hires loader and output mem.bin which is a straight memory dump (no DOS 3.3
+ header) which gets loaded at 0x9fd. This gives the disassembler a clearer
+ picture of what's where, but is not runnable, and does not remove the need
+ for the loader (the loader is also responsible for other initialization).
+
+ - Run my disassembler, which is not included here as it's beyond the scope
+ of this document, passing it a runtime trace file (also not included here),
+ and a manual text file that gives areas and names/sizes of known symbols.
+
+ - The manual text file /disasm/star_blazer.txt is included and could form the
+ basis of a SourceGen or similar project, however, it is pretty terse and
+ does not include all of the information inferred by the disassembler from
+ the trace file. I am working on a way to make the disassembler output this.
+ It would be relatively easy to make a SourceGen project from the asm output
+ of the disassembler, but it would be easier if the process was automated.
+
+ - Run my shape extractor and compiler, this is an optional process since the
+ original .db statements for the shapes are still in /disasm/star_blazer.asm
+ (if you compile with SHAPE = 0) but extracting and recompiling the shapes
+ gives you the opportunity to edit them. I haven't included sources for this
+ process, which is complex, but I do include /disasm/shape0.png for viewing.
+
+- To regenerate the game, use steps like this:
+
+ as6500 -l -o star_blazer.asm
+ aslink -n -m -u -i -b zpage=0 -b udata0=0x200 -b udata1=0x400 -b text=0x9fd -b loader=0x2000 -b data0=0x4000 star_blazer.ihx star_blazer.rel
+ ./pack.py star_blazer.ihx star_blazer_hires_loader.bin
+
+ - The /disasm/pack.py is similar to /loader/hires_loader.py and it moves the
+ sections around for loading. Basically the idea is to move the last 0x2000
+ bytes of the game binary (actually 0x2000 less the loader size) into the
+ hires screen where it will be loaded by BLOAD, and then relocate it at run-
+ time. This prevents BLOAD from having to load 8 kbytes of "gap" at 0x2000.
+
+- The disassembly is far from complete, as I have not figured out all of the
+ game logic, and there may be issues with identifying all relocatable symbols.
+
+ - Basically it is relocatable, but I noticed that it will not always proceed
+ to the next level, i.e. it sometimes gets stuck in a limbo mode in between
+ missions, where you can fly around and shoot, but there are no baddies.
+
+ - I fixed a similar problem that turned out to be a table referenced only by
+ its high address -- I changed something like "lda #0xNN" to "lda #>SYMBOL".
+
+- I have a reasonably good understanding of the game's data structures, its
+ graphics package and its mathematics routines. I do not fully understand the
+ game physics (which was a major reason to do the disassembly) but I am quite
+ close to it, as I located things like the position and velocity of objects,
+ the angle of a missile, and even routines that look like homing the missile.
+
+- I discovered the following general principles about the engine:
+
+ - There are 0x100 shapes and (I think) 0x70 objects. I only have a tentative
+ understanding of the objects (see discussion of game microcode further on)
+ so I haven't included this yet. But essentially each object has a purpose,
+ e.g. I think 0x20..0x27 are stars in the star-field background, 0x41..0x43
+ are trees or cactuses, etc. The mapping of objects to shapes varies, but
+ within some limits, e.g. object 0x41 is shape 0x78 (tree) or 0x79 (cactus).
+ The maximum number of an object onscreen is dictated by its assigned slots
+ in the 0x70 objects, e.g. I think there can only be up to 3 trees/cactuses.
+
+ - Animation works by drawing shapes in "or" mode to make them appear, then
+ drawing them in "and-not" mode to make them disappear. This leaves a hole
+ in the screen, where any underlying objects are not visible after erasure.
+
+ - When drawing, the engine computes a number of things, such as the shape
+ address to use, the x coordinate mod 7 and so on. These are stored in the
+ object array, and reused (i.e. not recomputed) when it erases the shape.
+
+ - It seems to move the objects one at a time, relying on the fact that it
+ will soon re-draw an underlying object that was unintentionally erased.
+ You can see an artifact of this where a pair of objects scrolls along
+ the ground in a fixed relationship to each other, as one of them might be
+ animated with a "bite" out of it corresponding to previous position of the
+ other. The game uses only hires screen page 1, i.e. no double buffering.
+
+ - The playfield is logically 140 units wide (each is a pair of HGR pixels)
+ and (from memory) 160 units high, the remaining 32 lines being used for
+ the score, the ground, and the various displays like the current mission.
+
+ - Coordinates are kept as (x, y) bytes where 0x80 is the centre of the
+ screen (I think) and so the playfield extends somewhat in each direction
+ beyond the screen. Shapes are clipped if they are drawn partially off the
+ screen. Action can happen off-screen, e.g. a bomb hits a target and you
+ complete the mission, or a missile curves off-screen and comes back on.
+
+ - Velocities are kept with more precision, possibly 16 or 24 bits. There is
+ some complicated logic with 4-bit shifts, which may be to save on storage.
+
+ - I would like to understand the logic of how it generates terrain better,
+ but I suspect that it's somewhat controlled by the extended playfield,
+ e.g. a tree scrolls off the visible screen and continues to scroll to the
+ left invisibly, until it hits the left edge of the playfield, whereupon
+ it is immediately regenerated at a new invisible location somewhere in
+ the invisible right portion of the playfield. This theory is supported
+ by fields in "struct object" and routines that I found whose job is to
+ randomize the position and velocity of an object within given x and y
+ bounds per object. Quite a bit of the object's personality and generation
+ behaviour can be controlled with just these fields, e.g. ground-based
+ objects have the y-limits set the same, so that their y isn't randomized.
+
+ - There are two basic kinds of shapes, the shiftable kind which can be drawn
+ at any x-position on the screen, and the non-shiftable kind which can only
+ be drawn at byte-aligned positions. The shiftable kind is stored with 7
+ pre-shifted shapes, and the "struct shape" contains a pointer to the middle
+ one of these shapes, with indexing by up to +/- 3 * the shape size. This
+ is done to save time and code since a multiply by +/- 3 is cheaper than 7.
+
+ - Shiftable shapes are drawn at only even positions, due to the 140-pixel
+ logical screen width, but still require 7 pre-shifted shapes since even
+ screen-positions can be even or odd within respect to 7-bit screen bytes.
+
+ - The first 8 shapes correspond to individual pixels (really pixel pairs)
+ in different colours, i.e. the 8 HGR colours. These are drawn with a
+ special routine. As an experiment, I tried replacing these with ordinary
+ shapes and commenting the special routine (PIXEL_SHAPE = 1) which worked.
+
+ - Each scan line of each shape is either hi-bit clear (uses black, white,
+ green and purple) or set (uses black, white, blue and orange). The hi-bit
+ never changes within a scan line. When rendering the hi-bit is "ored" into
+ the screen like any other bit, so blue and orange will take precedence.
+
+ - Each shape has a dimension in logical units which defines its collision
+ rectangle, as well as a dimension in physical units (bytes) which defines
+ its drawn rectangle. In general the physical dimension is predictable from
+ the logical dimension, and this means that in some cases padding is drawn
+ on-screen in order that the logical dimension be as the designer intended.
+
+ - Non-shiftable shapes are drawn in general by replacing the previous screen
+ memory contents rather than "oring". The routine that does this is called
+ "draw_misc" in the disassembly, as it does miscellaneous parts like titles,
+ scoring etc. There is a data table which I haven't fully decoded but which
+ contains 16-byte entries, describing (I think) strings to draw and where.
+ The physical (byte) width and drawn rectangle is critical to this process,
+ as I discovered that if you change these, then the text gets all messed up.
+
+ - The "draw_misc" routine also has the ability to mask the shape data with
+ an alternating pair of masks, and this is used to draw the "STAR BLAZER"
+ title screen with colour cycling. Such shape data is stored with hi-bit
+ set, in order that the hi-bit can be selectively masked off as required.
+
+ - The shape table contains some blank rectangles, which obviously would not
+ have any effect when drawn in "or" mode. I discovered that these are used
+ for erasing parts of the screen when drawn with "draw_misc". Their width
+ is important, e.g. a blank rectangle erases "HIGH" before drawing "SCORE"
+ and it then seems to advance by the width of this before drawing "SCORE".
+
+- I discovered the following general principles about the data structures:
+
+ - There is essentially a "struct shape" and "struct object" containing the
+ variables that control a shape or an object, but they are implemented as a
+ separate array (indexed by shape or object number) for each field of the
+ struct, including lo and hi of pointers. This is quite normal in 6502 code.
+
+ - Interestingly the "struct object" seems to have what are essentially sub-
+ classes, because some of the arrays do not implement the entire 0x70-entry
+ range -- so you see code like "lda #object_XXX - 0x40,x" which means the
+ XXX field of "struct object" is only stored for e.g. objects 0x40..0x6f.
+
+ - In the disassembly, most lines which use an addressing mode like "NNNN,x"
+ are annotated in the comment field with a range like "x=40..6f". My custom
+ disassembler has extracte this information from a trace of the runnin game,
+ in which I attempted to exercise at least some of the game levels/features.
+ However, the range is only what it's *seen* and might be larger in reality.
+
+ - The disassembler uses the information about locations accessed by indexing
+ instructions, to build a partition of the data space into separate arrays.
+ It is independent of the base address that happened to be used for access.
+ Overlapping regions are merged, as it assumes they must be the same array.
+
+ - Interestingly, I found a few cases of what I think are game bugs, where the
+ author did not anticipate his access overrunning into a neighbouring array,
+ although it's possible that it was intentional and I didn't understand it.
+
+- I do not well understand the scoring and how the game proceeds through the
+ missions, but I did locate the important variables, so it would be easy to
+ figure it out. This hasn't been my major priority, which is why I didn't yet.
+
+- The part I am presently attacking is to understand the gameplay at a finer
+ level, in particular the collision detection, and what happens for various
+ kinds of collisions (figuring this out will also provide insights into the
+ scoring and how it proceeds through the missions, hence I tackle this first).
+
+ - In an earlier version of the disassembly I had located routines for things
+ like intersection of collision rectangles, but I deleted these symbols from
+ the present disassembly as my naming was based on a slightly earlier way of
+ thinking. It would be relatively easy to re-find and re-annotate this code.
+
+ - It turns out that the collision code is quite integrated into the rest of
+ the game logic. The game seems to use kind of an internal microcode which
+ consists of zero-terminated lists of bytes, and basically each game object
+ has various microcode routines, and many of the microcode bytes seem to be
+ the indices of other game objects that it needs to collision-test against.
+
+ - There are also other bytes such as 0xf0, which I think are not indices of
+ game objects, but rather, microcode commands, and you can see "cmp #0xf0"
+ and similar comparison chains throughout the code to implement these. I am
+ not sure if different kinds of collisions are implemented by different 0xfN
+ commands or by multiple object-indexed microcode tables or a combination.
+
+- I am making the disassembly available in its present state (warts and all)
+ so that others can pick it up if they want to and progress things. In the
+ present .zip I have omitted a lot of my files to keep it to a manageable
+ package, and even then, there is still a lot to take in (hires loader, etc).
+
+- My complete work directory with my emulator, shape extractor and compiler,
+ etc, is available at the following git repository on one of my servers:
+
+ https://git.ndcode.org/public/star_disasm.git
+
+ - Unfortunately the gitweb viewer on my server is broken. I think people are
+ hitting it from the Internet and causing it to crash and restart, and then
+ systemd is not letting it restart after a while. A git client still works.
+
+ - As I am not really ready to make a proper release, I haven't bothered with
+ LICENSE files and such. However, I intend to release my part of the work
+ (not the copyrighted material obviously) under a MIT license. This includes
+ the disassembler, the tracing infrastructure, etc. It's quite sophisticated
+ so I also considered GPLv2, but overall I prefer a more permissive license.
+
+ - It is a work in progress, and I am not all that happy with how it handles
+ the shape editing and various other things. The disassembler also does not
+ have good support for the control file or the ability to add comments or
+ override the operand fields in instructions, and so whilst it can do a lot
+ automatically, it's hard to deal with the case where it gets things wrong.
+ It also does not have good support for immediate operands yet, although it
+ handles all the other addressing modes intelligently. I plan to add an enum
+ feature so that it can more readably decompile constants and the microcode.
+
+- I will not be able to work on the project for a bit, so please enjoy for now.
+
+Nick Downing
+nick@ndcode.org