readme.txt

   1 Star Blazer disassembly notes
   2
   3 - The original file I worked with is cracked by Mr. Xerox and obtained from:
   4   https://archive.org/download/a2_Star_Blazer_1981_Star_Craft/Star_Blazer_1981_Star_Craft.do
   5
   6 - The crack is a bit crude, basically it's a memory dump of how things looked
   7   when the original loader was going to transfer control to the game code.
   8
   9 - Since I did not want to disassemble a lot of junk I created a different crack
  10   loader, it is similar to the original but it only loads the essential parts.
  11
  12 - For details about this process see the files in /loader:
  13
  14   - The program /loader/dejunk.py overwrites parts of the original binary that
  15     I think are junk, see the notes in the code about address range meanings.
  16
  17   - The files /loader/star_blazer_dejunked*.bin are dejunked with different
  18     fill values, I run both of these to check that the dejunking did not break.
  19
  20   - The program /loader/hires_loader.py extracts the non-junk parts of the
  21     dejunked binary (it works with the original or dejunked binary, but I
  22     prefer the latter, since it makes the disassembly cleaner given that there
  23     is one region where the alignment padding was not zeroed out originally),
  24     concatenates them, and then splits them into parts for my crack loader.
  25     It then installs my crack loader and composes the parts ready for loading.
  26
  27   - The binary /loader/hires_loader.bin is my short machine code program which
  28     lives in the hires screen at $2000 and its function is to copy the tail of
  29     the program from hires screen memory to the tail location at approx $7e00.
  30     It also contains some initializations of zero page, registers and so forth.
  31     I haven't included a separate source file as it gets disassembled later on.
  32
  33   - The result of the re-cracking is /loader/star_blazer_hires_loader.bin which
  34     can be played the same as the original binary but is significantly smaller.
  35
  36 - Then see the disassembly in /disasm, in particular /disasm/star_blazer.asm
  37   is an ASxxxx source that assembles to /loader/star_blazer_hires_loader.bin.
  38   It contains switches at the top of the file to control my game modifications
  39   and if left alone (ALIGN = 1, SHAPE = 1) it will produce the original game.
  40
  41 - The ASxxxx assembler used for this project is by Alan R. Baldwin and has a
  42    home page at https://shop-pdp.net/ashtml/asxxxx.php. Use "as6500" for 6502.
  43
  44   - This assembler does not use the most conventional syntax since constants
  45     are C-style, e.g. "0x2000" not "$2000", and addressing modes use square
  46     brackets, e.g. "lda [0x2000],y" not "lda ($2000),y". The C-style constants
  47     are good for projects that compbine C and assembly code, since you can use
  48     common include files. The square bracket syntax is not good and I contacted
  49     the author who told me it is for historical reasons and promised to fix it.
  50
  51   - Zero-page references are indicated by "*", e.g. "lda *0x20,x" produces the
  52     2-byte instruction whereas "lda 0x20,x" should produce the 3-byte version.
  53     (I say *should* because there is an inconsistency in this process that I
  54     discovered recently and I will investigate it later and fix the assembler).
  55     The assembler *can* generate zero-page references automatically if you use
  56     the ".dpage" pseudo-op in the ".area zpage" section, but I haven't done so
  57     in case there are places where the reassembled binary doesn't match the
  58     original. Probably there aren't, but having control via "*" is quite good.
  59
  60 - I have used a procedure like this to produce the disassembly:
  61
  62   - Run /disasm/load.py to perform the relocation that is normally done by the
  63     hires loader and output mem.bin which is a straight memory dump (no DOS 3.3
  64     header) which gets loaded at 0x9fd. This gives the disassembler a clearer
  65     picture of what's where, but is not runnable, and does not remove the need
  66     for the loader (the loader is also responsible for other initialization).
  67
  68   - Run my disassembler, which is not included here as it's beyond the scope
  69     of this document, passing it a runtime trace file (also not included here),
  70     and a manual text file that gives areas and names/sizes of known symbols.
  71
  72   - The manual text file /disasm/star_blazer.txt is included and could form the
  73     basis of a SourceGen or similar project, however, it is pretty terse and
  74     does not include all of the information inferred by the disassembler from
  75     the trace file. I am working on a way to make the disassembler output this.
  76     It would be relatively easy to make a SourceGen project from the asm output
  77     of the disassembler, but it would be easier if the process was automated.
  78
  79   - Run my shape extractor and compiler, this is an optional process since the
  80     original .db statements for the shapes are still in /disasm/star_blazer.asm
  81     (if you compile with SHAPE = 0) but extracting and recompiling the shapes
  82     gives you the opportunity to edit them. I haven't included sources for this
  83     process, which is complex, but I do include /disasm/shape0.png for viewing.
  84
  85 - To regenerate the game, use steps like this:
  86
  87   as6500 -l -o star_blazer.asm
  88   aslink -n -m -u -i -b zpage=0 -b udata0=0x200 -b udata1=0x400 -b text=0x9fd -b loader=0x2000 -b data0=0x4000 star_blazer.ihx star_blazer.rel
  89   ./pack.py star_blazer.ihx star_blazer_hires_loader.bin
  90
  91   - The /disasm/pack.py is similar to /loader/hires_loader.py and it moves the
  92     sections around for loading. Basically the idea is to move the last 0x2000
  93     bytes of the game binary (actually 0x2000 less the loader size) into the
  94     hires screen where it will be loaded by BLOAD, and then relocate it at run-
  95     time. This prevents BLOAD from having to load 8 kbytes of "gap" at 0x2000.
  96
  97 - The disassembly is far from complete, as I have not figured out all of the
  98   game logic, and there may be issues with identifying all relocatable symbols.
  99
 100   - Basically it is relocatable, but I noticed that it will not always proceed
 101     to the next level, i.e. it sometimes gets stuck in a limbo mode in between
 102     missions, where you can fly around and shoot, but there are no baddies.
 103
 104   - I fixed a similar problem that turned out to be a table referenced only by
 105     its high address -- I changed something like "lda #0xNN" to "lda #>SYMBOL".
 106
 107 - I have a reasonably good understanding of the game's data structures, its
 108   graphics package and its mathematics routines. I do not fully understand the
 109   game physics (which was a major reason to do the disassembly) but I am quite
 110   close to it, as I located things like the position and velocity of objects,
 111   the angle of a missile, and even routines that look like homing the missile.
 112
 113 - I discovered the following general principles about the engine:
 114
 115   - There are 0x100 shapes and (I think) 0x70 objects. I only have a tentative
 116     understanding of the objects (see discussion of game microcode further on)
 117     so I haven't included this yet. But essentially each object has a purpose,
 118     e.g. I think 0x20..0x27 are stars in the star-field background, 0x41..0x43
 119     are trees or cactuses, etc. The mapping of objects to shapes varies, but
 120     within some limits, e.g. object 0x41 is shape 0x78 (tree) or 0x79 (cactus).
 121     The maximum number of an object onscreen is dictated by its assigned slots
 122     in the 0x70 objects, e.g. I think there can only be up to 3 trees/cactuses.
 123
 124   - Animation works by drawing shapes in "or" mode to make them appear, then
 125     drawing them in "and-not" mode to make them disappear. This leaves a hole
 126     in the screen, where any underlying objects are not visible after erasure.
 127
 128     - When drawing, the engine computes a number of things, such as the shape
 129       address to use, the x coordinate mod 7 and so on. These are stored in the
 130       object array, and reused (i.e. not recomputed) when it erases the shape.
 131
 132   - It seems to move the objects one at a time, relying on the fact that it
 133     will soon re-draw an underlying object that was unintentionally erased.
 134     You can see an artifact of this where a pair of objects scrolls along
 135     the ground in a fixed relationship to each other, as one of them might be
 136     animated with a "bite" out of it corresponding to previous position of the
 137     other. The game uses only hires screen page 1, i.e. no double buffering.
 138
 139   - The playfield is logically 140 units wide (each is a pair of HGR pixels)
 140     and (from memory) 160 units high, the remaining 32 lines being used for
 141     the score, the ground, and the various displays like the current mission.
 142
 143   - Coordinates are kept as (x, y) bytes where 0x80 is the centre of the
 144     screen (I think) and so the playfield extends somewhat in each direction
 145     beyond the screen. Shapes are clipped if they are drawn partially off the
 146     screen. Action can happen off-screen, e.g. a bomb hits a target and you
 147     complete the mission, or a missile curves off-screen and comes back on.
 148
 149   - Velocities are kept with more precision, possibly 16 or 24 bits. There is
 150     some complicated logic with 4-bit shifts, which may be to save on storage.
 151
 152   - I would like to understand the logic of how it generates terrain better,
 153     but I suspect that it's somewhat controlled by the extended playfield,
 154     e.g. a tree scrolls off the visible screen and continues to scroll to the
 155     left invisibly, until it hits the left edge of the playfield, whereupon
 156     it is immediately regenerated at a new invisible location somewhere in
 157     the invisible right portion of the playfield. This theory is supported
 158     by fields in "struct object" and routines that I found whose job is to
 159     randomize the position and velocity of an object within given x and y
 160     bounds per object. Quite a bit of the object's personality and generation
 161     behaviour can be controlled with just these fields, e.g. ground-based
 162     objects have the y-limits set the same, so that their y isn't randomized.
 163
 164   - There are two basic kinds of shapes, the shiftable kind which can be drawn
 165     at any x-position on the screen, and the non-shiftable kind which can only
 166     be drawn at byte-aligned positions. The shiftable kind is stored with 7
 167     pre-shifted shapes, and the "struct shape" contains a pointer to the middle
 168     one of these shapes, with indexing by up to +/- 3 * the shape size. This
 169     is done to save time and code since a multiply by +/- 3 is cheaper than 7.
 170
 171     - Shiftable shapes are drawn at only even positions, due to the 140-pixel
 172       logical screen width, but still require 7 pre-shifted shapes since even
 173       screen-positions can be even or odd within respect to 7-bit screen bytes.
 174
 175     - The first 8 shapes correspond to individual pixels (really pixel pairs)
 176       in different colours, i.e. the 8 HGR colours. These are drawn with a
 177       special routine. As an experiment, I tried replacing these with ordinary
 178       shapes and commenting the special routine (PIXEL_SHAPE = 1) which worked.
 179
 180   - Each scan line of each shape is either hi-bit clear (uses black, white,
 181     green and purple) or set (uses black, white, blue and orange). The hi-bit
 182     never changes within a scan line. When rendering the hi-bit is "ored" into
 183     the screen like any other bit, so blue and orange will take precedence.
 184
 185   - Each shape has a dimension in logical units which defines its collision
 186     rectangle, as well as a dimension in physical units (bytes) which defines
 187     its drawn rectangle. In general the physical dimension is predictable from
 188     the logical dimension, and this means that in some cases padding is drawn
 189     on-screen in order that the logical dimension be as the designer intended.
 190
 191   - Non-shiftable shapes are drawn in general by replacing the previous screen
 192     memory contents rather than "oring". The routine that does this is called
 193     "draw_misc" in the disassembly, as it does miscellaneous parts like titles,
 194     scoring etc. There is a data table which I haven't fully decoded but which
 195     contains 16-byte entries, describing (I think) strings to draw and where.
 196     The physical (byte) width and drawn rectangle is critical to this process,
 197     as I discovered that if you change these, then the text gets all messed up.
 198
 199     - The "draw_misc" routine also has the ability to mask the shape data with
 200       an alternating pair of masks, and this is used to draw the "STAR BLAZER"
 201       title screen with colour cycling. Such shape data is stored with hi-bit
 202       set, in order that the hi-bit can be selectively masked off as required.
 203
 204     - The shape table contains some blank rectangles, which obviously would not
 205       have any effect when drawn in "or" mode. I discovered that these are used
 206       for erasing parts of the screen when drawn with "draw_misc". Their width
 207       is important, e.g. a blank rectangle erases "HIGH" before drawing "SCORE"
 208       and it then seems to advance by the width of this before drawing "SCORE".
 209
 210 - I discovered the following general principles about the data structures:
 211
 212   - There is essentially a "struct shape" and "struct object" containing the
 213     variables that control a shape or an object, but they are implemented as a
 214     separate array (indexed by shape or object number) for each field of the
 215     struct, including lo and hi of pointers. This is quite normal in 6502 code.
 216
 217   - Interestingly the "struct object" seems to have what are essentially sub-
 218     classes, because some of the arrays do not implement the entire 0x70-entry
 219     range -- so you see code like "lda #object_XXX - 0x40,x" which means the
 220     XXX field of "struct object" is only stored for e.g. objects 0x40..0x6f.
 221
 222   - In the disassembly, most lines which use an addressing mode like "NNNN,x"
 223     are annotated in the comment field with a range like "x=40..6f". My custom
 224     disassembler has extracte this information from a trace of the runnin game,
 225     in which I attempted to exercise at least some of the game levels/features.
 226     However, the range is only what it's *seen* and might be larger in reality.
 227
 228   - The disassembler uses the information about locations accessed by indexing
 229     instructions, to build a partition of the data space into separate arrays.
 230     It is independent of the base address that happened to be used for access.
 231     Overlapping regions are merged, as it assumes they must be the same array.
 232
 233   - Interestingly, I found a few cases of what I think are game bugs, where the
 234     author did not anticipate his access overrunning into a neighbouring array,
 235     although it's possible that it was intentional and I didn't understand it.
 236
 237 - I do not well understand the scoring and how the game proceeds through the
 238   missions, but I did locate the important variables, so it would be easy to
 239   figure it out. This hasn't been my major priority, which is why I didn't yet.
 240
 241 - The part I am presently attacking is to understand the gameplay at a finer
 242   level, in particular the collision detection, and what happens for various
 243   kinds of collisions (figuring this out will also provide insights into the
 244   scoring and how it proceeds through the missions, hence I tackle this first).
 245
 246   - In an earlier version of the disassembly I had located routines for things
 247     like intersection of collision rectangles, but I deleted these symbols from
 248     the present disassembly as my naming was based on a slightly earlier way of
 249     thinking. It would be relatively easy to re-find and re-annotate this code.
 250
 251   - It turns out that the collision code is quite integrated into the rest of
 252     the game logic. The game seems to use kind of an internal microcode which
 253     consists of zero-terminated lists of bytes, and basically each game object
 254     has various microcode routines, and many of the microcode bytes seem to be
 255     the indices of other game objects that it needs to collision-test against.
 256
 257   - There are also other bytes such as 0xf0, which I think are not indices of
 258     game objects, but rather, microcode commands, and you can see "cmp #0xf0"
 259     and similar comparison chains throughout the code to implement these. I am
 260     not sure if different kinds of collisions are implemented by different 0xfN
 261     commands or by multiple object-indexed microcode tables or a combination.
 262
 263 - I am making the disassembly available in its present state (warts and all)
 264   so that others can pick it up if they want to and progress things. In the
 265   present .zip I have omitted a lot of my files to keep it to a manageable
 266   package, and even then, there is still a lot to take in (hires loader, etc).
 267
 268 - My complete work directory with my emulator, shape extractor and compiler,
 269   etc, is available at the following git repository on one of my servers:
 270
 271   https://git.ndcode.org/public/star_disasm.git
 272
 273   - Unfortunately the gitweb viewer on my server is broken. I think people are
 274     hitting it from the Internet and causing it to crash and restart, and then
 275     systemd is not letting it restart after a while. A git client still works.
 276
 277   - As I am not really ready to make a proper release, I haven't bothered with
 278     LICENSE files and such. However, I intend to release my part of the work
 279     (not the copyrighted material obviously) under a MIT license. This includes
 280     the disassembler, the tracing infrastructure, etc. It's quite sophisticated
 281     so I also considered GPLv2, but overall I prefer a more permissive license.
 282
 283   - It is a work in progress, and I am not all that happy with how it handles
 284     the shape editing and various other things. The disassembler also does not
 285     have good support for the control file or the ability to add comments or
 286     override the operand fields in instructions, and so whilst it can do a lot
 287     automatically, it's hard to deal with the case where it gets things wrong.
 288     It also does not have good support for immediate operands yet, although it
 289     handles all the other addressing modes intelligently. I plan to add an enum
 290     feature so that it can more readably decompile constants and the microcode.
 291
 292 - I will not be able to work on the project for a bit, so please enjoy for now.
 293
 294 Nick Downing
 295 nick@ndcode.org