Yet another 6502 disassember!

I wanted to disassemble something, but I didn't find a disassembler that would automatically parse Atari boot sectors, and I wanted it to not try to disassemble data sections, which is a common problem in dissassemblers.  So I wrote my own.  Then I got feedback that it needed more features, so it grew a bit, and here it is.

I wrote it under Linux, but it should compile on most modern platforms.  You might need Cygwin on Windows.

Usage:

disasm [options] [file]

Dissassemble [file]
Options:
 --addr=[xxxx]   Load the file at the specified (hex) address
 --start=[xxxx]  Specify a starting address for code execution
 --start=[xxxx]  Specify another starting address (repeat as needed
 --labels=atari,cio,float,basic Specify wanted labels (basic off by default)
 --ltable=[filename] Load label table from a file (may be repeated)
 --lfile=[filename]  Load active labels from a file (may be repeated)
 --noundoc       Undocumented opcodes imply data, not instructions
 --syntax=[option][,option]  Set various syntax options:
     bracket     Use brackets for label math: [LABEL+1]
     noa         Leave off the 'A' on ASL, ROR, and the like
     org         Use '.org =' instead of '*=' to set PC
     colon       Put a colon after labels
     noundoc     Use comments for undocumented opcodes
     mads        Defaults for MADS assembler: noa,org,colon
     ca65        Defaults for ca65 assembler: noa,org,colon
     cc65        Defaults for cc65 assembler: noa,org,colon
     xa          Defaults for xa assembler: noundoc
     asmedit     Defaults for Atari Assembler/Editor cartridge: noundoc
     noscreencode Do not add comments about screen code characters
     listing      Print address and hex codes for each line (broken for multi-byte data
     indent=[#][s|t]   Specify a number of spaces or tabs to indent (default: 1t)

If no options are specified, the file is auto-parsed for type
Supported types:
  binary load    -- any file that starts with ffff
  ROM files      -- exactly 16K or 8K with valid init and run addresses
  boot sectors   -- default if no other match

This works by loading the target file into a 64K memory array.  It flags the start and init addresses as branch targets.  Then it goes through starting at each branch target and traces through the code flow, marking each instruction start byte and each new branch target, stopping when it hits an RTS, RTI, or JMP.  This should find all the executable code, but sometimes it does things like save an address in a table and then jump through the table.  In those cases, you'll need to set a --start=xxxx to point to the address.  Likewise if the file doesn't have any start or init address set.

It will also try to infer blocks of code if a valid set of instructions follows a JMP or RTS.  This can save the day if the code uses jump tables.  If an inferred code path hits a BRK or invalid instruction, it assumes it's data.

It had built-in label tables for standard Atari OS addresses.  By default, BASIC addresses are turned off, but the others are on.  You can override this with the --labels= option.  (Leave it with nothing after the '=' to turn off all included tables.)

You can add new tables with the --ltable and --lfile options.  If you use --ltable, it will only use the labels that match something it tries to add.  If you use --lfile, it will use all the labels in the file, so they'll all appear in your disassembly.

The syntax for label files is lines of the format:
  LABEL = $ADDR [optional flags]
with various extra options.  The ADDR can be in hex ($ or 0x prefix), decimal, or octal (0-prefix).  Spaces or tabs are optional.  Anything following a ';' or '#' character is ignored.  Any line without an '=' sign is ignored.

The options are:
  +[#]     for multi-byte data (# is length in decimal, $hex, 0xHEX, or 0octal)
  -[#]     for labels referenced with negative offsets
  -r or -w for read or write only hardware register
  word     for 16-bit word values (often pointers) instead of defaulting to bytes
  dbyte    for 16-bit word values in big-endian order
  base#    specify base 2,8,10,16 for data display or 256 for string
  string   same as saying base256
  screen   screen-code string (also base255)
  /d or /i label is for data or an instruction; override auto-detect
  /p=#     label is for a subroutine that adjusts the return pointer to consume # bytes

Handling of binary load files is a bit tricky, as they can load blocks that overlap previous blocks, which is perfectly normal for initialization routines.  The disassembler watches for this, and will display the output before overwriting memory.  There may be cases where this results in code being treated as data because it was accessed from code not yet loaded.  The disassembler does track the order of the blocks, so the output will be presented in the order that the blocks appear in the original file.

In theory, you should be able to disassemble a binary load file, reassemble the output, and get the same binary.  The only difference would be the elimination of any stray FFFF headers for subsequent blocks, which are supposed to be ignored anyway.
