isofrieze / diztinguish Goto Github PK

View Code? Open in Web Editor NEW

249.0 12.0 25.0 6.63 MB

A Super NES ROM Disassembler

License: GNU General Public License v3.0

C# 99.73% Assembly 0.19% PowerShell 0.07%

rom snes-rom-disassembler disassembler snes super-nes nintendo

diztinguish's People

Contributors

Stargazers

Watchers

diztinguish's Issues

Navigation is still kind of sloppy and can be frustrating

I'm just going to make a huge list here of all the annoyances that pop up while using the program that deal with navigating around the ROM while working on it.

Jumps and unconditional branches always move the selector to the effective address. If there are a large amount of jumps in one area (like a long compare-jump chain), this can be really annoying.
The Goto box doesn't remember what you typed in last, so in the case of above, you have to type the address every time.
Jumping to the next/previous 'risky' instruction, or B/D/M/X modifying instruction would be super helpful.
The selected row in the table 'snaps' on screen if it gets scrolled off the edge. This is unfortunate when you are looking for highlighted effective addresses; since you can only see as far as the window is tall.

move appveyor support on main repo

I don't currently have the admin access on Dotsarecool/DiztinGUIsh but, once we setup an auth token, Appveyor will build each commit, and will publish a release on any tag we put on Master.

Here's an example build it put together:
https://github.com/binary1230/DiztinGUIsh/releases/tag/v2.0.0.0-beta004

pretty cool

publish patched bsnes-plus and asar bundle

One of the main new features I added was the ability for Diz to talk over a local TCP socket connection to BSNES-plus, and receive compressed CDL/tracelog data in realtime.

I opened a PR at bsnes-plus, and while my code is very functional, it's not polished enough to go into upstream yet.
devinacker/bsnes-plus#268

I think for a 2.0 release of Diz, what we should do is make a bundle that takes my two patches for bsnes-plus and asar, and ships those with Diz as the officially supported partner tools. For now, we'll have to build both of those tools ourselves but, as the patches get merged back upstream [hopefully, someday], then we can go back to just linking to the stock versions.

So this task is:

build a copy of the patched bsnes-plus
build a copy of the patched Asar
Create a package with those and Diztinguish

Bug: Bad assembly code generated for bytes marked as 'Text'

Diztinguish allows me to mark any bytes as 'Text' in a project. However, if non-printable characters are marked as text, Diztinguish generates an asm output that asar cannot process.

Example: in a ROM where there is a C-like string containing a single underscore, I marked the two bytes as being 'Text' (the underscore itself, and the following 00 byte marking the end of the string).
The asm file generated by Diztinguish contains this: db "_ " (Here I wrote a space after the underscore but there is actually a 00 byte in the asm file which asar cannot process, causing the following error message:

test.asm:725: error: (E5029): Mismatched quotes. [ db "_]

If I only mark the underscore character as 'Text' and I use '8-Bit Data' for the 00 byte, then the asm file is fine and assembles correctly.
Yet it does not seem logical not to mark the 00 as text as it is part of the string.

The same issue occurs with any other non-printable characters like line feed, or most Japanese text data.
I think Diztinguish should only output quoted characters for byte values corresponding to ASCII printable characters. For any other byte value, it should output hex bytes so that asar will understand the asm file.

pick a final home for Settings.Defaults.*

Right now we're using Settings.Defaults.[whatever] all over the place, it's like an INI or XML file based storage for things you want to save, but not as part of the document. We're using it to remember some UI settings, and store the previously opened file.

enhancement / nice to have: store offsets in hex in the project file

One important goal for me is having some human readability in the XML file, so that when changes are being diff'd in git/github/etc, it's possible to have a good shot at reviewing what the changes were.

Right now we're storing offsets in base 10 (which is fine), but I think it'd be easier for humans if we stored in hex.

Here's an example of a label in the XML:

<sys:Item Key="12661979">	       
<Value Name="fn_battle_init" Comment="" />
</sys:Item>

It's more obvious to a reader that this is a ROM address if it looks like this instead:

<sys:Item Key="$C134DB">	       
<Value Name="fn_battle_init" Comment="" />
</sys:Item>

could support any or all of:

$C134DB or 0xC134DB or even $C1/34DB

that makes a lot of sense too when it's a RAM label like $7E/0001

Various "Label list" bugs and annoyances

Adding a label in the main window does not add the label in the Label List. You must save and reload the project for it to appear in the label list.

Importing labels empties the Label List. You must save and reload the project for labels to reappear.

You cannot resize columns in the Label List, so you cannot fully read long labels and comments

It would be nice if double clicking an offset would jump to the location (just like clicking the "Jump to" button)

Typing text in the Label list is very awkward and not standard: if you select some characters and press CTRL-C to copy, it copies the whole table line including tabulations. Pressing left/right arrows when editing text stops edition if you are on the first character or last character. Using Home/End keyboard keys does not work as expected when entering text (moving to first/last position).

Note: even with all these issues, DiztinGUIsh is already an excellent and invaluable tool!

create tests for some invalid situations

We should write a couple unit tests (in the new Xunit testing framework installed) that make sure nothing bad happens if we try and load a busted ROM, or other common scenarios.

finish asar patch - relative addressing fixes

I hit a bug in asar where Diz was generating the right code, but asar was doing the wrong thing (it has to do with relative addressing)

I wrote a PR to fix it over there and the maintainers said they would probably be ok merging as-is. I wanted to perform a few more tests to make sure I don't break anything in asar.
https://github.com/RPGHacker/asar/pull/171/files

Once that's merged in, that's the only open asar bug that was causing me issues in the 4MB ROM project I'm working with.

Bunch of random improvements in my branch

Hey guys,

I love this tool and have been hacking on it in my own fork. There's a few pull requests open if you want them but after I did those I've also been (not that carefully, for now) throwing lots of improvements in my fork sort of sloppily.

To name a few:
ux stuff: progress bar support for long loading tasks, remember last project file and auto open it, some validation, csv import of labels
Integrated from the other fork and improved/sped up a bit the usage log and memory map import. Added ability to add comments to labels for documentation.

I also found a bug in asar revealed by diztinguish's (correct) generated asm that I submitted a patch for RPGHacker/asar#171

I'm using diztinguish and kind of modifying as I go for a fairly ambitious disassembly project that's been going well.

My informal side goal is to make it so diztinguish can be sort of a browsable 'source of truth' for looking at the whole rom, with the exported disassembly being as one-way read-only as possible. (I.e. if things are discovered about variables, offsets, constants, etc, the workflow is: you update in diztinguish and re-export, never needing to touch the generated asm by hand).

My question is, is anyone here working on any larger changes to diztinguish, i.e. should I try coordinating for larger architectural changes before I just run off doing random stuff with the fork? If so, I'd be happy to be a bit more careful and repackage what I've done in some sane pull requests. Otherwise I'll probably just keep checking stuff directly into master on my fork.

Alternatively, I'd be happy to jump in as a maintainer, maybe we get another release going and tested.

Cc @Dotsarecool @VitorVilela7 @KonKeyHD and fork authors @gocha just to see if anyone has any strong opinions on any of this.

Thanks all

optimize speed for bsnes-plus tracelog socket client

I'm doing a few dumb things with threading in BsnesTraceLogCapture

It's almost fast enough to keep up but jussssssssssssst not quite there yet.

Ideas:

provide an array of per-SNES-address locks so there's not contention on all threads for a global lock on Project.Data
replace all the threading scaffolding with more simple PLinq.ForEach()
if we stick with Tasks, optimize # of tasks, there's too many right now they're probably eating each other's CPU time to some extent.

Data type "Dword pointer" reverts to "graphics" after saving/loading project

How to reproduce:
Open a project
Mark some bytes as "Dword pointer" type
Save the project and close DistinGUIsh
Load the project
The bytes are now marked as "Graphics" instead of "Dword pointer".

save before there is a project file crashes

yikes, found this.

hit New Project, open a ROM
file -> save, or CTRL+S
???????????????????????????????????????? bad, it crashes or exits or something that doesn't involve saving

Marking Many Window - safe last type

I was marking data or graphics of my project rom and I lost many time choosing the right type in the mark many window.
It would be neat if the last used type is stored in memory while the project / programm is open.

You should post the latest stable build on github

I downloaded v1.0.0.1 played around for a bit and the encountered an (already fixed) bug that prevented me from saving.

It would be nice if the fixed version was readily available instead of requiring users to build it.

Great program, by the way. 👍

support BSNES capture of code/data logging (CDL, i.e. usage map)

bsnes outputs a gamename-usage-map.bin file which we support importing. when a byte on the address bus is accessed for either read, write, or execute, that info is recorded in this file.

instead of [or in addition to] outputting to a file, we should be able to pretty easily add an option to pipe that data out over our capture socket as well. that would make it really easy to capture both CPU trace data and CDL at the same time, making marking up ROMs a breeze.

NullReferenceException when clearing a set label

When you set a label at a certain location and afterwards you clear it, you won't be able to save the project anymore because an error message appears with the NullReferenceException text description.

If a user removes a label on the disassembly, make sure to the tool remove the label reference as well for not causing that.

The easiest way to trigger it is creating a new project and erase one of the default generated labels for vector (Reset, NMI, IRQ, etc.).

store some user settings outside project file

Actually collaborating with someone on a project now, realizing a few things about the save file format:

In the actual save format (.dizraw or .diz), we're storing a few things:

the path to the ROM AttachedRomFilename
the filename of the ROM AttachedRomFilename
and the hash of the ROM InternalCheckSum
the current offset of the main table (so it saves your last place you were looking at) CurrentViewOffset

Of all that, we should probably ONLY store the hash InternalCheckSum in the project file, and save the other settings in the user-specific settings area (the DefaultSettings area). Otherwise when multiple people are collaborating, these settings are always going to be flapping in the XML diff of the save file.

Wrong Direct Page opcodes log generation (The low byte of DP is affecting the opcode itself)

Version: latest 1859cd6

Input / Expected Output

Hex dump	Instruction	IA	D
84 FA	STY.B $FA	004305	420B

Expressions like STY.B ($4305 - $420B) are also good, but asar probably does not support such a syntax, right?

Anyway, the instruction must output 84 FA when assembled.

Actual Output

Instruction	IA	D
STY.B $05	004305	420B

The instruction outputs 84 05 when assembled.

Note

Apparently @VitorVilela7 has already found this bug and fixed the code.
VitorVilela7@7ab27df

Bug: percentage complete UI doesn't update after import

Main window percentage complete doesn't refresh after an import of usage or trace map

Need to invalidate it post import

figure out how to deal with routines that are called with different DP and DB values

This is sort of the opposite problem of #34.

(forgive me if I get some of the SNES register guts wrong and addressing modes wrong, still relatively new to asm)

In #34, there are parts of game code which always use the same value in the DP register as an optimization to save some bytes in the ROM. When using a trace logger, the DP value never changes over multiple runs of the game and it's pretty safe to assume it'll never change.

The question here is: how should Diz handle the output for situations (like Absolute Indexed addressing) when it knows the DB register is not constant?

consider this example:
source bytes:
BD 01 00
That's LDA with Absolute Indexed, X addressing. which means (glossing over the M flag) -->
LDA $0001, X

In one tracelog run, Diz generates this assembly code:

UNREACH_EF0001 = $EF0001

DMA_copy_BYTES_to_RAM: 
LDA.W UNREACH_EF0001,X               ;C3059D|BD0100  |EF0001
STA.L SNES_WMDATA                    ;C305A0|8F802100|002180;

after importing trace data from another run, it now generates this:

DATA8_EC0001 = $EC0001

DMA_copy_BYTES_to_RAM: 
LDA.W DATA8_EC0001,X                 ;C3059D|BD0100  |EC0001;
STA.L SNES_WMDATA                    ;C305A0|8F802100|002180;

Each time you run the game with a different capture, there will be different DB value, since this function (happens to be a DMA routine) is grabbing data from all over the ROM.

One more look at two other examples in the debugger and walking through step by step on the math:

remember, the original code at $C3059D says:
LDA $0001,X

in first case, X=#$03AD, DB=#$E7
LDA $0001,X
computes the final memory address like this:
LDA [DB << 16] + X + #$0001
LDA #$E70000 + #$0001 + #$03AD
LDA $E703AE ; final memory address

in the second case, X=#$CFF7, DB=#$C3
so the instruction means:
LDA $0001,X
computes the final memory address like this:
LDA [DB << 16] + X + #$0001
LDA #$C30000 + #$0001 + #$CFF7
LDA $C3CFF8 ; final memory address

All versions, when parsed by Asar, generate the correct bytes in the final rom of BD 01 00

I am guessing this works because Asar is just chopping the top byte off the label and using the lower 16 bits, so it happens to work out. example: with the label Diz generates (0xEC0001) in this last run, Asar probably just ands with 0xFFFF to put in the correct result of $0001. So if Diz throws values of DATA8_EC0001 or UNREACH_EF0001, it doesn't matter, the important part is the lower 16bit "0x0001".

So, right now, Diz is taking the pieces it has (a last value for DB and #$0001) and generating a label for it. It works OK, but it's weird for humans because the label doesn't refer to anything useful. And each time we import new tracelog data, we are swapping around tons of new labels that flap around randomly based on what the last thing the game happened to access was.

(I'm a big proponent of the generated asm code being useful for humans to read so it's possible to better understand what's going on)

**So OK, my question on this issue is, in a situation like this,

what do we WANT Diz to output, and
do we collect enough information to infer that this is happening?**

I think my answer is this, but I'd like some feedback:

Part 1: When we're capturing with tracelogging, right now we can only store ONE value for each register of D and DB value here:
https://github.com/binary1230/DiztinGUIsh/blob/master/Diz.Core/model/ROMByte.cs#L12

RomByteData.dataBank
RomByteData.directPage

Let's consider just dataBank here,

I think we need to modify those fields (or add new ones somewhere) to store information about whether more than 1 dataBank or directPage has ever been seen here. Either we could store an array of every value (like DB) that's ever been seen when executing at this address, or, we could add a new flag to mark "we have seen more than 1 DB come through here".

if that flag is not set, then DB and DP can be interpreted as "this is the only DB or the only DP that ever come through here", solving #34.

for this issue, if it IS set, then...we can better tailor the output to be smarter. In our case above, I think we really do want to print $0001 instead of generating a label. or perhaps generate a label of OFFSET_0001, or, just leaving a comment, perhaps showing a typical example of what X and DB values might be when coming through here.

ignore this, here is reference stuff for me when I forget all this in the next 5 minutes.... : )

Absolute,X
http://www.6502.org/tutorials/65c816opcodes.html#5.3
Example: If the DBR is $12, the X register is $000A, and the m flag is 0, then LDA $FFFE,X loads the low byte of the data from address $130008
$120000 + $FFFE + $000A = $130008

DBR: Data bank register, holds the default bank for memory transfers. (in BSNES, this is 'DB')
D: Direct page register, used for direct page addressing modes. (in BSNES, this is 'D')

write unit tests for SNES rom headers in various mapping mode (Hirom/lorom/etc)

see #50 for the changes that prompted this.

This might be already handled but, we should update the unit tests that deal with SNES header stuff (Checksum/complement/cart title) and make sure they work with a few additional test ROMs, particularly lorom vs hirom/etc

Problem with ASM!

I tried to disassemble some games, but the ASM look like this:

                       lorom                                ;      |        |      ;  
                                                            ;      |        |      ;  
                                                            ;      |        |      ;  
                       ORG $808000                          ;      |        |      ;  
                                                            ;      |        |      ;  
                       db $00,$00,$00,$00,$00,$00,$00,$00   ;808000|        |      ;  
                       db $00,$00,$A3,$02,$85,$04,$A3,$01   ;808008|        |      ;  
                       db $85,$03,$18,$69,$03,$00,$83,$01   ;808010|        |000003;  
                       db $A0,$01,$00,$B7,$03,$85,$00,$C8   ;808018|        |      ;  
                       db $B7,$03,$85,$01,$20,$28,$80,$6B   ;808020|        |000003;  
                       db $AF,$08,$80,$80,$F0,$01,$60,$08   ;808028|        |808008;  
                       db $8B,$C2,$30,$A9,$FF,$FF,$8F,$17   ;808030|        |      ;  
                       db $06,$00,$E2,$20,$C2,$10,$A9,$FF   ;808038|        |000000;  
                       db $8F,$40,$21,$00,$A4,$00,$A5,$02   ;808040|        |002140;  
                       db $48,$AB,$C2,$30,$20,$59,$80,$A9   ;808048|        |      ;  
                       db $00,$00,$8F,$17,$06,$00,$AB,$28   ;808050|        |      ;  
                       db $60,$08,$C2,$30,$A9,$00,$30,$8F   ;808058|        |      ;  
                       db $41,$06,$00,$A9,$AA,$BB,$CF,$40   ;808060|        |000006;  
                       db $21,$00,$F0,$0D,$AF,$41,$06,$00   ;808068|        |000000;  
                       db $3A,$8F,$41,$06,$00,$D0,$EC,$80   ;808070|        |      ;  
                       db $FE,$E2,$20,$A9,$CC,$80,$2F,$B9   ;808078|        |0020E2;  
                       db $00,$00,$20,$03,$81,$EB,$A9,$00   ;808080|        |      ;  
                       db $80,$0F,$EB,$B9,$00,$00,$20,$03   ;808088|        |808099;  
                       db $81,$EB,$CF,$40,$21,$00,$D0,$FA   ;808090|        |0000EB;  
                       db $1A,$C2,$20,$8F,$40,$21,$00,$E2   ;808098|        |      ;

I am doing something wrong?

Feature Request: Keyboard shortcut for "Mark One" command.

This would be useful when manually labeling code with opcode/operand labels.

Feature request: mark a 16-bit operand as being an address

I'd like the ability to mark a 16 bit immediate value operand as being the bank or offset of a label, so that the source assembly code generated by Diztinguish would refer to the label instead of the immediate values.

Let me give an example. Here is a short excerpt of asm code as output by Diztinguish from an actual SNES ROM:
LDA.W #$F714
LDX.W #$0006
STX.W $D3FE
STA.W $D3FC

This code stores the ROM address 06F714 in a 32 bit variable located in WRAM at addresses $7ED3FC and $7ED3FE.
In Diztinguish, I create a label named "MyData" at address 06F714, where the useful data is located.

Because #$F714 and #$0006 are immediate values, Diztinguish currently has no way to guess that these are actually two parts of an address.
I would like a way to tell Diztinguish:
LDA.W #$F714 "This operand is the offset of the label MyVar"
LDX.W #$0006 "This operand is the bank of the label MyVar"

With this additional information (supplied manually by the reverse engineer), Diztinguish could generate improved assembly code looking like this:
LDA.W #MyVar
LDX.W #bank(MyVar)
STX.W $D3FE
STA.W $D3FC

The code above assembles correctly with asar. It is more readable and easily updated (moving the data to another location does not require updating the code as there are no more hard coded values).

restructure build output dir for distribution

Working on it, pretty simple:

move basically everything but the EXE into lib/
tell VS to look in there as a search path
copy the documentation in

do all this as a post-build step in visual studio

setup automated builds / auto-run unit tests

I want to use something like AppVeyor (or Travis or whatever the kids are using these days) to run:

Automated builds, so all our releases are done from a clean slate
Unit tests on each branch and PR.

test XML version migrations

One reason I went through the trouble of getting an XML based file format setup for Diz projects is ease of adding new types of data to the save file. You simply add a public get/set property to a class, and the library we're using, ExtendedXmlSerializer, picks it up and load/saves it to the XML file. Pretty great.

If we classes later, we need to add migration support to deal with those changes. Luckily, EXS has solid-looking support for that with its Migrations feature, as seen here: https://github.com/ExtendedXmlSerializer/home/wiki/Example-Scenarios#migrate-xml-based-on-older-class-model (search for .AddMigration()

This task basically boils down to: before we release v2.0, demonstrate that migration support (at least the basics) works. I don't want to have 2.0 ship and then have to deal with supporting project save files that work around not having setup migration support.

Bug: Bank and Direct page registers are ignored when replacing 8-bit addresses with labels

If I define a label named "TEST" for address $000016 then the instruction:
STZ.B $16
is replaced with
STZ.B TEST
in both the Diz user interface and the generated asm code, no matter which values are in the B and D columns!

But this is only appropriate if both the 'B' and 'D' columns are set to 0.
If these columns have other values, the IA column contains the true address of the operand, which should not be replaced with the label for the address $000016, but with the label for the address in the IA column.
This is related to #34

New Type Proposal: Code PageBank Word Pointer

A typically used indexed-indirect-* Jxx instruction will reference a table of addresses within the current PBR with an offset.

It would be really helpful to have an annotation for a pointer that allows inference of the IA from the code page, and by extension the 'T' navigation, to use that PBR as the high byte of the long IA.

Further specification could allow for this to be identified as an opcode IA, but I'll take what I can get.

new label-specific checkboxes aren't saving to the project settings

these checkboxes aren't saving to the project settings (or just the UI). they should be remembered:

Originally posted by @binary1230 in #18 (comment)

feature: how to deal with labels when using DP

it would be interesting if there's a way to solve this issue.

consider the following ASM code:

STZ $0A                            ;C70328|640A    |001E0A;

I know from running this game and capturing the tracelog data that the final RAM address it's going to is $001E0A (which is register DB value of $1E00 + this constant of $0A.

Let's say I have a label for this like character_hp =1E0A

In the final asm, it would be cool if there was a way to have this reference character_hp like this somehow:

STZ $character_hp

I have a feeling there's not really a way to directly put that label name in there, since DP is runtime dependent.

Still, I think Diz can at least know that it's likely to be character_hp and maybe note it in the label, or, make a search function in the app that can connect the dots here.

I'm basically trying to make it so humans can know when looking at this casually that this instruction is likely operating on character_hp

importing CSV into label list causes data to dissappear

seems like it's mostly a UI issue, not actually data going away. probably something with RebindProject()

add usage map import for other CPUs

So our usage map import works great for main CPU

In the BSNES "-usage.bin" files output though, it contains usage info for all of the various kinds of CPUs.

Right now we're just reading the first part of that file (main CPU) and ignoring the rest (SA1, SFX, SPC, etc)
looks like this:

We can totally parse that stuff though :)

BSNES code is doing stuff like this:

fp.read(SNES::cpu.usage, 1 << 24);
fp.read(SNES::smp.usage, 1 << 16);

if (SNES::cartridge.has_sa1())
    fp.read(SNES::sa1.usage, 1 << 24);

if (SNES::cartridge.hassuperfx())
    fp.read(SNES::superfx.usage, 1 << 23);

if (SNES::cartridge.mode() == SNES::Cartridge::Mode::SuperGameBoy)
    fp.read(SNES::supergameboy.usage, 1 << 24);

so we'd have to replicate that to read these files. one snag is they're calling SNES::cartridge.has_sa1(), which means we may need the same detection routines that BSNES has to know what data is next in the file. that's... kinda unfortunate, maybe we should add some kind of tagging into the BSNES file format so we don't have to detect the file format/etc.

Request for comment: new data model proposal

I'm looking for comments from folks who are more familiar with the memory mapping of retro systems (specifically the SNES for now, but, with an eye on other stuff like NES, Genesis, whatever else we want to throw at it).

I have been thinking a lot about the data model in Diz and how we could better support the following use cases:

Decoupling of the UI
Reference data from multiple source ROM files, multiple Diz projects, etc
Record some squishier metadata more of the archeology/history of the diasssembly work , who performed it, how certain are they of sections etc
Make it easier to ship data in and out of Diz via plugins, sockets, file formats, etc
Make the Diz project file be a useful reference database
Support multi-user collaboration (pipe dream: turn the backend into a REST API and have a web UI for people to mark up parts of a ROM via webpage, save in cloud, etc)
Support heavy decoupling of the UI from the underlying data format (so we can port other people's tools over, or add arbitrary 'views' of the underlying raw bytes like hex editor vs text assembly output vs grid view for disassembly etc.
Make it possible for other people to include our UIs easily in their projects (like, drop Diz UIs into your emulator core easily)
Multithread safe (so, CPU-heavy operations like capturing realtime trace data are zippy)
Slice and dice your data into multiple regions (i.e. data vs code vs compressed data), nest and collapse them
Ubiquitous data change notifications on all classes (so, all views update when underlying data changes).
Make all this run at reasonable performance
Deal with mirrored memory
Still support Diz's main disassembly workflow (the datagrid main screen) really well as its primary operation

I've been doing work on making the UI heavily decoupled in Diz which is nearing an end, which lays the groundwork for this next phase to begin.

As an exercise, I drew up a pseudocode class diagram of what this might end up looking like. No one needs to carefully read this, I'm more interested if any of this pops up as landmines to anyone. Or if code like this already exists out there we could integrate into here.

// the main thing that gets serialized as a .diz project file.
// Diz should support projects referencing each other, and editing multiple projects at once
Project:
- ByteSources[]      // places we can get bytes from (disk, images, roms, text, or generated as decompressed or processed parts of other already loaded data)
- RootRegion         // arbitrary tree of "regions" which are subsets of specific ByteSources with specific mappings. 
                              // holds per-byte annotations, which mark things like code, data, graphics, tracelog info, and arbitrary metadata
- Builds[]                // how to turn regions into output (like generated assembly, .bin files for graphics, etc)

// ------------
// ByteSource: Immutable data sources.
// ------------

abstract ByteSource:
- Bytes[] Get only

// system-agnostic, just represents a bunch of bytes read from disk somewhere. could be rom, text, images, whatever
ByteSourceFile : ByteSource:
- SourceFilename // examples: romfile.smc romfile_bank_C0.bin graphics_pack.bin dialog.txt file.png
- StartingFileOffset = 0
- ByteCountToReadFromFile = -1

// snes-specific stuff
SNESRomSourceFile : ByteSourceFile:
- skipsmcheader = true
- RomMapping (i.e. hirom, lowrom, etc)
- Speed
- other stuff like that

GenesisRomSourceFile : // ... whatever ... //

// --------------------------------------------------------------------------------------------
// Regions define arbitrary subsets of byte sources, and hold data related to the window offset
// and how to generate their Byte data from arbitrary sequences of bytes
// 
// Regions can overlap, be overlaid on top each other, have priorities/etc.
// i.e. a "patch" can be visualized as a couple regions which are overlaid on the main ROM
//
// some workflow ideas:
// 1. dump WRAM or SPCRAM and save as a .bin file, map it as an example of data in a Region,
// annotate, and export the annotations onto the section of the ROM containing the original code
// that was copied into WRAM/spc/etc.
// 2. dump VRAM data, mark it up
// --------------------------------------------------------------------------------------------

Region : is also a ByteSource
- Mapping	           	               // options: 1:1, or using compression algorithm
- Collection<RegionOffset, Annotation>

- SubRegions[] // regions whose ByteSource is set to 'this' region

// searches our subregions first, returns anything matching there as our override. 
// if nothing found, use our own mapping.
// good for stuff like patches, where patch modifications are a sub-region we want to override whatever comes from our mapping.
- byte GetByteAt(offset)        			
- Annotations[] GetAnnotationsAt(offset)	// aggregates all annotations associated with this offset from both us and our sub-regions

// this handles mapping in both a SNES sense (like hiRom, lowRom, etc)
// but in also any arbitrary sense
MappingType:
- ByteSource SourceData
- StartingOffset	// "window" into the byte source. i.e. set to 0x10000 and count = 0xFFFF for bank C0
- ByteCount

ArbitraryMapping:
- ByteProviderStartOffset, OutputOffset
- ByteProviderByteCount, OutputOffset

// maps byte offsets into arbitrary address space. this is HiROM, LowROM, ExHIRom, etc
MappingTypeSNES:
- MapType

// how about a byte source that reads compressed data from a region, decompresses it, and shows you the data in any of our viewers 
// (like hex editor, graphics viewer, )
ByteSourceCompressed : ByteSource:
- CompressAlgorithm // i.e. standard (.gz etc) vs some game-specific algorithm
- SourceRegion

// ---------------
// So here's an example of a SNES-specific mapping config
// ---------------

// up until this point, regions aren't mapped into anything address-space specific. here's an example of a SNES rom
// lower levels of the system shouldn't know anything about 'banks' etc
var SnesHiRom = new Mapping {
	Name="HiROM", 
	DestOffset=0xC00000, Count=0x40[#banks] x 0x1000[banksize]
}

var SnesWRAMHiRom = new Mapping {
	Name="WRAM",
	DestOffset=0x7E0000, Count=XX[#banks] x 0x1000[banksize],
	Mirrors = {0x00, ...} // define that this memory is mirrored to other places.
}

var DizProject {
  ByteSources[] = {
	SNESRomSourceFile {"somegame.smc", skipSMCHeader = true}
  }
  Regions[] = {
	{ Name = "ROM", ByteSource = ByteSources["somegame.smc"] }
  }
}

class SNES {
	Regions[] = {
		new Region {
			Name = "Main CPU",
			SubRegions[] = { 
			  { Name = "Rom", MappingType = SnesHiRom, Source=DizProject.Regions["ROM"] },
			  { Name = "WRamCapture-BattleMode", MappingType = SnesWRam, Source=DizProject.Regions["ramdump1"] },
			  { Name = "WRamCapture-OverworldMap", MappingType = SnesWRam, Source=DizProject.Regions["ramdump2"] },
			  { Name = "CompressedData", Algorithm=Games.NintendoZip2, ..src/dst offsets... }
			}
		},
}


// ---------------
// Annotations: i.e. Attach random metadata to ALL THE THINGS. attaches to offset on a particular region
// goals:
// 1. mark a single byte or a block of bytes with whatever metadata we want
// 2. be able to attach multiple of the same type of annotations to an offset, and pick one as "the real one" or "the example"
//    i.e. for tracelog data, it might be useful to keep all the previous tracelog import data, and mark one as "the real one", the rest are
//    "examples"
// 3. Store all this in a platform-agnostic format i.e. regions/annotations/etc shouldn't have to "know" they are SNES vs Genesis vs etc.
// 4. Keep or collapse as much as you like.
// ---------------

Annotation:
- metadata // optional rando metadata, dunno, like....
  - souce origin (i.e. was this marked by hand, gotten from CPU tracelog, CDL trace, etc)
  - author
  - date changed
  - data reference source // [i.e. https://romhacking.net/{some_page}, etc)
  - certainty // (100%, or not sure, or wrong disassembly, or guess)
  - tags, maybe? // "overworld", "battlesystem", "boss AI system"

AnnotationDataBlock
- StartingRegionOffset
- Count
- Type // (graphics, music, table, etc)
 
// labels a specific line, literally the "label" on the left hand side of the grid
AnnotationLabel : Annotation
- Text

AnnotationComment : Annotation
- Text

AnnotationFreeSpace : AnnotationDataBlock

// placed here either by hand, or, multiple per-byte if tracelogger finds new combinations
// only one of them is marked as the "real" one
Annotation65XCpuFlags : Annotation
- dataBank
- directPage
- xFlag
- mFlag

Annotation65XInstructionByte : Annotation
Annotation65XOperandByte : Annotation

// raw data from a CDL capture (was this byte read from? written to? code run from here? etc)
AnnotationCDLEntry : Annotation
- byteflags = {unknown, read_from, written_to, executed_from}


// -----------------
// all of the above stuff is just how to STORE data and map it and mark it up.
// it's nothing about how to display, modify, or export the data, which should all be in another layer.
// ------------------



dataGrid.DataSource = new RomByteDataGridRow[1000];

// for displaying stuff on a maingrid like what Diz does now, make a display-specific class like this.
// the datagrid class is generic and will respond to the metadata here for the columns
// and the specific field values are one row

// (this is actually pretty close to what it looks like in the current bleeidng edge GUI refactor)
public class RomByteDataGridRow : INotifyPropertyChanged
{
	private offsetInRegion;
	private region; // arbitrary, might typically be set to SNES.Region["CpuBus"]["ROM"]

	[DisplayName("Label")]
	[Editable(true)]
	[CustomConfig(col =>
    {
        col.DefaultCellStyle = new DataGridViewCellStyle
        {
            Alignment = DataGridViewContentAlignment.MiddleRight, Font = FontHuman,
        };
        col.MaxInputLength = 60;
        col.MinimumWidth = 6;
        col.Width = 200;
    })]
	public string Label
	{
		get => region.GetAnnotation<AnnotationLabel>(offsetInRegion).Name;

		// todo (validate for valid label characters)
		// (note: validation implemented in Furious's branch, integrate here)
		set
		{
			region.GetAnnotation<AnnotationLabel>(offsetInRegion).Name = value;
			OnPropertyChanged();
		}
	}

	// program counter (Read-only)
	[DisplayName("PC")]
	[ReadOnly(true)]
	public string Offset =>
		Util.NumberToBaseString(offsetInRegion, Util.NumberBase.Hexadecimal, 6);

	// ascii version of the byte
	[DisplayName("@")]
	[ReadOnly(true)]
	public char AsciiCharRep =>
		(char) region[offsetInRegion];

	// hex version of the byte
	[DisplayName("#")]
	[ReadOnly(true)]
	public string NumericRep =>
		Util.NumberToBaseString(region[offsetInRegion], Util.NumberBase.Hexadecimal);
		
	// ....snip, add whatever other properties you want to display....
}


// annotation generation (i.e. what Diz basically does right now as its core operation)
// example: 
// - adding labels
// - disassembly workflow (like CPU Step-through, Step-in, etc)
// - marking blocks of data as graphics, codes, pointer tables, etc

class 65816_CpuOperations {
	void Step(int offset, Region region) {
		// .........
	}
}

// builds - replaces current "Export Assembly"
// define how and when output artifacts (assembly files, .bin files, etc)
// are generated.
// already supported via command line
//
// would be cool if we could keep our management of this very lightweight, and use some existing build utilities.
// like generating Makefiles [or something that doesn't suck to deal with], so it can be run outside Diz.

DizProject = {
	...
	Builds[] {
		Build1={
			OutputAssemblyCode {"generated/", split_by_bank=true, flavor=CPU65816/SPC700/etc}
			Compilation {"asar.exe [params] main.asm", Output="generatedrom.sfc"}
			Defines {"RomVersion", United States", true}
                        RootRegion=this.RootRegion.SubRegion["SnesCPUBus"]["Rom"]
			Validation {
				MustBeByteIdentical {OriginalImportedRomFilename, "generatedrom.sfc"},
				MatchInternalCheckSum {[some checksum value from the rom]
                                NoPatchOverridesAllowed
			}
		},
               Build2={
                     Inherit=Build1
                     ApplyPatches[patchProject.RootRegion["InfiniteHitPointsPatch1"]
                    OutputDiff={build1.output, this.output, diffWRiteTo="patch.ips"} // something like this
	}
}


// fun bonus ideas:
// with this data structure, might make it easy to have either tighter integration with a Debugger (like BSNES)
// or also, invoke a real emulator on a section of a ROM (i.e. "hey BSNES: run starting at offset X til you reach 
// offset Y, using this RAM or savestate snapshot")
//
// It will also make writing custom tool integrations really simple, for things like graphics/audio/editors
// or integration game-specific tools that already exist.
//
// And, we can create arbitrary window layouts, do things like making other windows "follow along" with you, remember history.
// imagine clicking around on a ROM and when you have a line with a JMP statement, the other window shows you a preview of where you are jumping
//
// have Hex editors, byte grid viewers, assembled output previews, etc available
//
// or, hook this up to be the backend of a microservices API, and build an interactive web viewer for this data.
// imagine being able to query data from games, looking for patterns, etc. create hot-links and share them like we do with
// github issues

Labels & Mirroring for assembling

Since labels are assigned to PC offsets instead of SNES addresses, some problems arise due to the mirroring of certain memory locations in the SNES address space.

For example, if JML $808055 and JML $008055 both exist in the ROM, these will be assembled into different bytecode: 5C 55 80 80 vs 5C 55 80 00. However, both of these effective addresses map to the same PC offset, so if this offset is given a label, these two instructions will be identical, and will assemble identically.

Another example, SRAM is mapped to the lower half of banks $70-$7D in LoROM. The higher half is mapped to ROM. An effective address of $7DFFFF points to (the last byte in the bank) ROM. An autogenerated label will 'unmirror' this effective address to, say, $0DFFFF. But if this address is used as a negative index base to SRAM ($7DFFFF+1 = $7D0000), the effective address should refer to SRAM for this instruction, even though it points to ROM.

I think most of this will be fixed by mapping labels to ROM address, which will also allow for RAM addresses to be labeled as well.

silly: rename project?

This question is mostly "hey @Dotsarecool would you be offended if we renamed the project" : )

I was kinda thinking we rename it to Diz2 or just "Diz", so it's a little easier to type and pass around the name. The full name can still be "DiztinGUIsh" though.

Bug: Mark Many... excludes last byte.

Marking C43842-C43843 will only mark C43842 with the indicated label.

cartridge titles in SNES ROM header: verify we're doing the right thing if multi-byte

This is a continuation of #50 (more info there), if the Cartridge Rom header field in the SNES ROM header contains characters that are not one-byte per-character (like some Japanese glyphs are), then we might not always be doing the right thing when encoding/decoding.

We should write a unit test that shows a situation where:

var encoding = Encoding.GetEncoding(932); // 932 is "ShiftJIS"
var str = /*insert string here with multibyte Japanese chars in it*/;
var jisRawBytes = ByteUtil.GetRawShiftJisBytesFromStr(str);
Assert.NotEqual(jisRawBytes.Length, str.Length);

And then given those conditions, check all the other areas of the code to make sure it behaves nicely there. Particularly, paying attention to anything that uses RomUtil.LengthOfTitleName

I was confident enough for the couple ROM headers I tested that I think we have this decently implemented (and definitely for English). and, there is extensive unit testing support for this now. but it would be good to test this final bit of the edge case.

I broke the sample text label output

just realized I broke the label output in the sample assembly code [need to check the main output]. I'll have them fixed in my branch shortly, along with a fairly major rewrite.

I was going to stablize/test what I'm doing and get it ready for a release. It's backwards compatible but, probably worth a major version#

Marking Many - bytes should be set automaticly

When adding a higher value as end-offset it changes the start address(Offset).

It would be more userfriendly if the bytes are variable and start and end address are not set via the number of bytes.

Error in declared in pictures:

let's see we'd like to declare to D400...

whoops... start-address changed

feature: as a user I don't want to have to select the right generated output folder each time

annoying that Diz prompts you each time for the location of the generated output.

it should just be a text field in the "Export Disassembly" dialog box, with a browse button next to it.

then, make sure this gets saved correctly with the project settings and as a relative path so user doesn't have to think about it after setting it up one time.

revisit save file extension/naming, default compression settings

Right now, we save .diz files and internally compress with gzip.

The XML is mostly text BUT the main (largest) section that serializes RomByte is a custom non-xml text-based block of data that is intended to be human readable and mergable in git. I wanted Diz to be able to support team-oriented workflows. While merging data in close proximity to parts of the ROM might be painful, it should have a decent chance at merging data from different parts of the ROM, merged up with a tool like git.

Apart from that, the entire XML formatted output is also currently compressed with gzip as a final step.

On my computer with a 4MB ROM about 50% marked up, the gzip compressed file is ~100KB, and the uncompressed raw XML is ~1.3MB.

Decisions to be made are:

Should the gzip compression be on by default? (I am leaning towards no, just because 1.3MB aint bad, and the benefits of being able to have humans merge their teams' work together is important)
Filenames end with .diz or .dizraw currently. Should we just make it .xml? or leaning towards yourfilename.diz.xml. Diz itself doesn't care or look at what the file is called, it always tries to decompress it with gzip first, and if there's an issue, it'll try again without it. This decision means our file format extension is purely for humans. I like the idea of hinting that it's an XML file.

fix Bizhawk CDL importer if needed

I may have broken it, not sure. I know it used to work, tried it recently and it didn't.

add semver support to appveyor

Appveyor is now setup and doing builds, though, we may need to update it to use the SemVer versioning format that the actual project uses.

replace messagebox.show with a custom form

this is just a low-pri UI enhancement,

the app relies on Messagebox.Show() a lot, and in particular the error messages when you're importing, loading, saving, etc, should probably go through a custom form that doesn't make it look so "error-y" and gives you a sense of the progress in your workflow (when opening/importing/etc)

came from #50

update documentation

most user-facing stuff hasn't changed so this isn't too bad but we should touch on:

the new XML format
probably make a quick video about how to connect bsnes-plus and Diz together for the traceloggin
new visuals window

support BSNES capturing other tracelog CPU data besides just main CPU

our mod to BSNES is outputting just one CPU for the moment, the main CPU.

it's pretty trivial on the BSNES side to extend the functionality to other CPUs like the SPC, SA-1, etc.

to do that, each CPU in BSNES-plus would need a new ::disassemble_opcode_bin() function, looking like the one here:
https://github.com/binary1230/bsnes-plus/blob/e30dfc784f3c40c0db0a09124db4ec83189c575c/bsnes/snes/cpu/core/disassembler/disassembler.cpp#L224

We should pick new header IDs for each CPU and its abridged format.

and then, in Tracer, just hook up the remaining calls to dumb to the new disassemble_opcode_bin() functions:
https://github.com/binary1230/bsnes-plus/blob/e30dfc784f3c40c0db0a09124db4ec83189c575c/bsnes/ui-qt/debugger/tracer.cpp#L62

specifically, in the following:

Tracer::outputSa1Trace()
Tracer::outputSfxTrace()
Tracer::outputSgbTrace()

BUG: Japanese character ShiftJIS encoding not handled correctly in XML serialization

SNES ROM header supports encoding of Japanese characters in the game title field in ShiftJIS format. However, we're incorectly interpreting these bytes as Unicode.

Diz projects for ROMS with japanese chars in this title field will serialize the incorrect encoding to the XML, and on load, this XML will incorrectly cause Diz to think the name of the cart is different. A verification check will fail and the project will refuse to load.

I have a fix in #49 underway, along with an XML migration that will fix the issue for any affected users.

Technically we don't actually need to store the title in the Project file since we are storing the checksum bytes. However, it's nice to have one extra layer of redundancy so, let's keep it.

The fix adds a lot of extra unit testing and some extra functionality for working with rom titles, checksums, etc. I'm not sure it's comprehensive, but, it appears to work well so far.

(Originally reported by LuigisBlood in SNESLab Discord, thanks!)

get tracelog import working for other CPUs (like SA-1)

tracelog file import and, separately, tracelog capture (bsnes --> Diz over a socket) work and are well tested for the main SNES CPU, but not really implemented for the other CPUs (like SA-1)

Here's what we'd need to do to fix that:

Add the SA-1 CPU to the list of CPUs in the Architecture enum here:
https://github.com/Dotsarecool/DiztinGUIsh/blob/master/Diz.Core/model/Enums.cs#L61
Each line of a tracelog file is parsed by BsnesTraceLogImporte.ParseLine()
https://github.com/Dotsarecool/DiztinGUIsh/blob/master/Diz.Core/import/BsnesTraceLogImporter.Parsers.cs#L55

For SA-1, we'd need to verify that it's capturing the same data, or build an SA-1 specific parser (should be pretty easy)

That parser is populating a ModificationData object (it's one object per line in the tracelog). The only thing that would need to be changed is adding the Architecture enum in there and marking it as from SA-1 like this:

// something like...
modData.arch = CPU_SA1; // add the 'arch' field there, set it to the new enum

That 'arch' field would need to be added here:
https://github.com/Dotsarecool/DiztinGUIsh/blob/master/Diz.Core/import/BsnesTraceLogImporter.ModificationsList.cs#L15

That's pretty straightforward though there's a little bit of extra checking/etc to pay attention to since ModificationData is heavily optimized for doing the tracelog network capturing.

unit tests
If anything's different, grab a couple lines from a tracelog from SA-1, and write a unit test or two to cover it:
https://github.com/Dotsarecool/DiztinGUIsh/blob/862e32f93d27bf9c545f147b4ed0c43c3a38bc81/Diz.Test/Tests/TracelogTests/TraceLogTests.cs#L8
UI: Probably add a new option to the Import menu so it has separate items for 'import SA1 tracelog' vs 'import 65816 main cpu tracelog'. If we wanted to get fancier we could setup a dialog box that has some nicer looking options. eh, later.

Somewhere on the BSNES tracelog import class is where we can stash which type of byte we're capturing.

It can all be traced pretty easily by looking at the on click handler in importBsnesTracelogText_Click()

UI: this is optional but, it might be a good idea to change the background color (or whatever) for bytes marked as SA1 so it's obvious visually. Easiest place is in here:
https://github.com/Dotsarecool/DiztinGUIsh/blob/master/DiztinGUIsh/window/MainWindow.MainTable.cs#L228

That function takes a (hardcoded...woof) column number and the index of the row (i.e. the snes byte address).

You can get to the arch like this:

// in there, something vaguely like....
style.ForeColor = Data.GetArchitecture(offset) == Architecture.SA1 ? Color.Gray : Color.Black;

test : )

I haven't tested Diz when it has different architectures in there. theoretically should be OK, but, might be worth doing some CPU step operations, mark operations, etc especially on boundaries of bytes where things go from one CPU into another.

isofrieze / diztinguish Goto Github PK

diztinguish's People

Contributors

Stargazers

Watchers

Forkers

diztinguish's Issues

Input / Expected Output

Actual Output

Note

Recommend Projects

Recommend Topics

Recommend Org

Jobs