Star Blazer disassembly notes - The original file I worked with is cracked by Mr. Xerox and obtained from: - The crack is a bit crude, basically it's a memory dump of how things looked when the original loader was going to transfer control to the game code. - Since I did not want to disassemble a lot of junk I created a different crack loader, it is similar to the original but it only loads the essential parts. - For details about this process see the files in /loader: - The program /loader/ overwrites parts of the original binary that I think are junk, see the notes in the code about address range meanings. - The files /loader/star_blazer_dejunked*.bin are dejunked with different fill values, I run both of these to check that the dejunking did not break. - The program /loader/ extracts the non-junk parts of the dejunked binary (it works with the original or dejunked binary, but I prefer the latter, since it makes the disassembly cleaner given that there is one region where the alignment padding was not zeroed out originally), concatenates them, and then splits them into parts for my crack loader. It then installs my crack loader and composes the parts ready for loading. - The binary /loader/hires_loader.bin is my short machine code program which lives in the hires screen at $2000 and its function is to copy the tail of the program from hires screen memory to the tail location at approx $7e00. It also contains some initializations of zero page, registers and so forth. I haven't included a separate source file as it gets disassembled later on. - The result of the re-cracking is /loader/star_blazer_hires_loader.bin which can be played the same as the original binary but is significantly smaller. - Then see the disassembly in /disasm, in particular /disasm/star_blazer.asm is an ASxxxx source that assembles to /loader/star_blazer_hires_loader.bin. It contains switches at the top of the file to control my game modifications and if left alone (ALIGN = 1, SHAPE = 1) it will produce the original game. - The ASxxxx assembler used for this project is by Alan R. Baldwin and has a home page at Use "as6500" for 6502. - This assembler does not use the most conventional syntax since constants are C-style, e.g. "0x2000" not "$2000", and addressing modes use square brackets, e.g. "lda [0x2000],y" not "lda ($2000),y". The C-style constants are good for projects that compbine C and assembly code, since you can use common include files. The square bracket syntax is not good and I contacted the author who told me it is for historical reasons and promised to fix it. - Zero-page references are indicated by "*", e.g. "lda *0x20,x" produces the 2-byte instruction whereas "lda 0x20,x" should produce the 3-byte version. (I say *should* because there is an inconsistency in this process that I discovered recently and I will investigate it later and fix the assembler). The assembler *can* generate zero-page references automatically if you use the ".dpage" pseudo-op in the ".area zpage" section, but I haven't done so in case there are places where the reassembled binary doesn't match the original. Probably there aren't, but having control via "*" is quite good. - I have used a procedure like this to produce the disassembly: - Run /disasm/ to perform the relocation that is normally done by the hires loader and output mem.bin which is a straight memory dump (no DOS 3.3 header) which gets loaded at 0x9fd. This gives the disassembler a clearer picture of what's where, but is not runnable, and does not remove the need for the loader (the loader is also responsible for other initialization). - Run my disassembler, which is not included here as it's beyond the scope of this document, passing it a runtime trace file (also not included here), and a manual text file that gives areas and names/sizes of known symbols. - The manual text file /disasm/star_blazer.txt is included and could form the basis of a SourceGen or similar project, however, it is pretty terse and does not include all of the information inferred by the disassembler from the trace file. I am working on a way to make the disassembler output this. It would be relatively easy to make a SourceGen project from the asm output of the disassembler, but it would be easier if the process was automated. - Run my shape extractor and compiler, this is an optional process since the original .db statements for the shapes are still in /disasm/star_blazer.asm (if you compile with SHAPE = 0) but extracting and recompiling the shapes gives you the opportunity to edit them. I haven't included sources for this process, which is complex, but I do include /disasm/shape0.png for viewing. - To regenerate the game, use steps like this: as6500 -l -o star_blazer.asm aslink -n -m -u -i -b zpage=0 -b udata0=0x200 -b udata1=0x400 -b text=0x9fd -b loader=0x2000 -b data0=0x4000 star_blazer.ihx star_blazer.rel ./ star_blazer.ihx star_blazer_hires_loader.bin - The /disasm/ is similar to /loader/ and it moves the sections around for loading. Basically the idea is to move the last 0x2000 bytes of the game binary (actually 0x2000 less the loader size) into the hires screen where it will be loaded by BLOAD, and then relocate it at run- time. This prevents BLOAD from having to load 8 kbytes of "gap" at 0x2000. - The disassembly is far from complete, as I have not figured out all of the game logic, and there may be issues with identifying all relocatable symbols. - Basically it is relocatable, but I noticed that it will not always proceed to the next level, i.e. it sometimes gets stuck in a limbo mode in between missions, where you can fly around and shoot, but there are no baddies. - I fixed a similar problem that turned out to be a table referenced only by its high address -- I changed something like "lda #0xNN" to "lda #>SYMBOL". - I have a reasonably good understanding of the game's data structures, its graphics package and its mathematics routines. I do not fully understand the game physics (which was a major reason to do the disassembly) but I am quite close to it, as I located things like the position and velocity of objects, the angle of a missile, and even routines that look like homing the missile. - I discovered the following general principles about the engine: - There are 0x100 shapes and (I think) 0x70 objects. I only have a tentative understanding of the objects (see discussion of game microcode further on) so I haven't included this yet. But essentially each object has a purpose, e.g. I think 0x20..0x27 are stars in the star-field background, 0x41..0x43 are trees or cactuses, etc. The mapping of objects to shapes varies, but within some limits, e.g. object 0x41 is shape 0x78 (tree) or 0x79 (cactus). The maximum number of an object onscreen is dictated by its assigned slots in the 0x70 objects, e.g. I think there can only be up to 3 trees/cactuses. - Animation works by drawing shapes in "or" mode to make them appear, then drawing them in "and-not" mode to make them disappear. This leaves a hole in the screen, where any underlying objects are not visible after erasure. - When drawing, the engine computes a number of things, such as the shape address to use, the x coordinate mod 7 and so on. These are stored in the object array, and reused (i.e. not recomputed) when it erases the shape. - It seems to move the objects one at a time, relying on the fact that it will soon re-draw an underlying object that was unintentionally erased. You can see an artifact of this where a pair of objects scrolls along the ground in a fixed relationship to each other, as one of them might be animated with a "bite" out of it corresponding to previous position of the other. The game uses only hires screen page 1, i.e. no double buffering. - The playfield is logically 140 units wide (each is a pair of HGR pixels) and (from memory) 160 units high, the remaining 32 lines being used for the score, the ground, and the various displays like the current mission. - Coordinates are kept as (x, y) bytes where 0x80 is the centre of the screen (I think) and so the playfield extends somewhat in each direction beyond the screen. Shapes are clipped if they are drawn partially off the screen. Action can happen off-screen, e.g. a bomb hits a target and you complete the mission, or a missile curves off-screen and comes back on. - Velocities are kept with more precision, possibly 16 or 24 bits. There is some complicated logic with 4-bit shifts, which may be to save on storage. - I would like to understand the logic of how it generates terrain better, but I suspect that it's somewhat controlled by the extended playfield, e.g. a tree scrolls off the visible screen and continues to scroll to the left invisibly, until it hits the left edge of the playfield, whereupon it is immediately regenerated at a new invisible location somewhere in the invisible right portion of the playfield. This theory is supported by fields in "struct object" and routines that I found whose job is to randomize the position and velocity of an object within given x and y bounds per object. Quite a bit of the object's personality and generation behaviour can be controlled with just these fields, e.g. ground-based objects have the y-limits set the same, so that their y isn't randomized. - There are two basic kinds of shapes, the shiftable kind which can be drawn at any x-position on the screen, and the non-shiftable kind which can only be drawn at byte-aligned positions. The shiftable kind is stored with 7 pre-shifted shapes, and the "struct shape" contains a pointer to the middle one of these shapes, with indexing by up to +/- 3 * the shape size. This is done to save time and code since a multiply by +/- 3 is cheaper than 7. - Shiftable shapes are drawn at only even positions, due to the 140-pixel logical screen width, but still require 7 pre-shifted shapes since even screen-positions can be even or odd within respect to 7-bit screen bytes. - The first 8 shapes correspond to individual pixels (really pixel pairs) in different colours, i.e. the 8 HGR colours. These are drawn with a special routine. As an experiment, I tried replacing these with ordinary shapes and commenting the special routine (PIXEL_SHAPE = 1) which worked. - Each scan line of each shape is either hi-bit clear (uses black, white, green and purple) or set (uses black, white, blue and orange). The hi-bit never changes within a scan line. When rendering the hi-bit is "ored" into the screen like any other bit, so blue and orange will take precedence. - Each shape has a dimension in logical units which defines its collision rectangle, as well as a dimension in physical units (bytes) which defines its drawn rectangle. In general the physical dimension is predictable from the logical dimension, and this means that in some cases padding is drawn on-screen in order that the logical dimension be as the designer intended. - Non-shiftable shapes are drawn in general by replacing the previous screen memory contents rather than "oring". The routine that does this is called "draw_misc" in the disassembly, as it does miscellaneous parts like titles, scoring etc. There is a data table which I haven't fully decoded but which contains 16-byte entries, describing (I think) strings to draw and where. The physical (byte) width and drawn rectangle is critical to this process, as I discovered that if you change these, then the text gets all messed up. - The "draw_misc" routine also has the ability to mask the shape data with an alternating pair of masks, and this is used to draw the "STAR BLAZER" title screen with colour cycling. Such shape data is stored with hi-bit set, in order that the hi-bit can be selectively masked off as required. - The shape table contains some blank rectangles, which obviously would not have any effect when drawn in "or" mode. I discovered that these are used for erasing parts of the screen when drawn with "draw_misc". Their width is important, e.g. a blank rectangle erases "HIGH" before drawing "SCORE" and it then seems to advance by the width of this before drawing "SCORE". - I discovered the following general principles about the data structures: - There is essentially a "struct shape" and "struct object" containing the variables that control a shape or an object, but they are implemented as a separate array (indexed by shape or object number) for each field of the struct, including lo and hi of pointers. This is quite normal in 6502 code. - Interestingly the "struct object" seems to have what are essentially sub- classes, because some of the arrays do not implement the entire 0x70-entry range -- so you see code like "lda #object_XXX - 0x40,x" which means the XXX field of "struct object" is only stored for e.g. objects 0x40..0x6f. - In the disassembly, most lines which use an addressing mode like "NNNN,x" are annotated in the comment field with a range like "x=40..6f". My custom disassembler has extracte this information from a trace of the runnin game, in which I attempted to exercise at least some of the game levels/features. However, the range is only what it's *seen* and might be larger in reality. - The disassembler uses the information about locations accessed by indexing instructions, to build a partition of the data space into separate arrays. It is independent of the base address that happened to be used for access. Overlapping regions are merged, as it assumes they must be the same array. - Interestingly, I found a few cases of what I think are game bugs, where the author did not anticipate his access overrunning into a neighbouring array, although it's possible that it was intentional and I didn't understand it. - I do not well understand the scoring and how the game proceeds through the missions, but I did locate the important variables, so it would be easy to figure it out. This hasn't been my major priority, which is why I didn't yet. - The part I am presently attacking is to understand the gameplay at a finer level, in particular the collision detection, and what happens for various kinds of collisions (figuring this out will also provide insights into the scoring and how it proceeds through the missions, hence I tackle this first). - In an earlier version of the disassembly I had located routines for things like intersection of collision rectangles, but I deleted these symbols from the present disassembly as my naming was based on a slightly earlier way of thinking. It would be relatively easy to re-find and re-annotate this code. - It turns out that the collision code is quite integrated into the rest of the game logic. The game seems to use kind of an internal microcode which consists of zero-terminated lists of bytes, and basically each game object has various microcode routines, and many of the microcode bytes seem to be the indices of other game objects that it needs to collision-test against. - There are also other bytes such as 0xf0, which I think are not indices of game objects, but rather, microcode commands, and you can see "cmp #0xf0" and similar comparison chains throughout the code to implement these. I am not sure if different kinds of collisions are implemented by different 0xfN commands or by multiple object-indexed microcode tables or a combination. - I am making the disassembly available in its present state (warts and all) so that others can pick it up if they want to and progress things. In the present .zip I have omitted a lot of my files to keep it to a manageable package, and even then, there is still a lot to take in (hires loader, etc). - My complete work directory with my emulator, shape extractor and compiler, etc, is available at the following git repository on one of my servers: - Unfortunately the gitweb viewer on my server is broken. I think people are hitting it from the Internet and causing it to crash and restart, and then systemd is not letting it restart after a while. A git client still works. - As I am not really ready to make a proper release, I haven't bothered with LICENSE files and such. However, I intend to release my part of the work (not the copyrighted material obviously) under a MIT license. This includes the disassembler, the tracing infrastructure, etc. It's quite sophisticated so I also considered GPLv2, but overall I prefer a more permissive license. - It is a work in progress, and I am not all that happy with how it handles the shape editing and various other things. The disassembler also does not have good support for the control file or the ability to add comments or override the operand fields in instructions, and so whilst it can do a lot automatically, it's hard to deal with the case where it gets things wrong. It also does not have good support for immediate operands yet, although it handles all the other addressing modes intelligently. I plan to add an enum feature so that it can more readably decompile constants and the microcode. - I will not be able to work on the project for a bit, so please enjoy for now. Nick Downing