isofrieze / diztinguish Goto Github PK
View Code? Open in Web Editor NEWA Super NES ROM Disassembler
License: GNU General Public License v3.0
A Super NES ROM Disassembler
License: GNU General Public License v3.0
I'm just going to make a huge list here of all the annoyances that pop up while using the program that deal with navigating around the ROM while working on it.
I don't currently have the admin access on Dotsarecool/DiztinGUIsh but, once we setup an auth token, Appveyor will build each commit, and will publish a release on any tag we put on Master.
Here's an example build it put together:
https://github.com/binary1230/DiztinGUIsh/releases/tag/v2.0.0.0-beta004
pretty cool
One of the main new features I added was the ability for Diz to talk over a local TCP socket connection to BSNES-plus, and receive compressed CDL/tracelog data in realtime.
I opened a PR at bsnes-plus, and while my code is very functional, it's not polished enough to go into upstream yet.
devinacker/bsnes-plus#268
I think for a 2.0 release of Diz, what we should do is make a bundle that takes my two patches for bsnes-plus and asar, and ships those with Diz as the officially supported partner tools. For now, we'll have to build both of those tools ourselves but, as the patches get merged back upstream [hopefully, someday], then we can go back to just linking to the stock versions.
So this task is:
Diztinguish allows me to mark any bytes as 'Text' in a project. However, if non-printable characters are marked as text, Diztinguish generates an asm output that asar cannot process.
Example: in a ROM where there is a C-like string containing a single underscore, I marked the two bytes as being 'Text' (the underscore itself, and the following 00 byte marking the end of the string).
The asm file generated by Diztinguish contains this: db "_ " (Here I wrote a space after the underscore but there is actually a 00 byte in the asm file which asar cannot process, causing the following error message:
test.asm:725: error: (E5029): Mismatched quotes. [ db "_]
If I only mark the underscore character as 'Text' and I use '8-Bit Data' for the 00 byte, then the asm file is fine and assembles correctly.
Yet it does not seem logical not to mark the 00 as text as it is part of the string.
The same issue occurs with any other non-printable characters like line feed, or most Japanese text data.
I think Diztinguish should only output quoted characters for byte values corresponding to ASCII printable characters. For any other byte value, it should output hex bytes so that asar will understand the asm file.
Right now we're using Settings.Defaults.[whatever] all over the place, it's like an INI or XML file based storage for things you want to save, but not as part of the document. We're using it to remember some UI settings, and store the previously opened file.
One important goal for me is having some human readability in the XML file, so that when changes are being diff'd in git/github/etc, it's possible to have a good shot at reviewing what the changes were.
Right now we're storing offsets in base 10 (which is fine), but I think it'd be easier for humans if we stored in hex.
Here's an example of a label in the XML:
<sys:Item Key="12661979">
<Value Name="fn_battle_init" Comment="" />
</sys:Item>
It's more obvious to a reader that this is a ROM address if it looks like this instead:
<sys:Item Key="$C134DB">
<Value Name="fn_battle_init" Comment="" />
</sys:Item>
could support any or all of:
$C134DB or 0xC134DB or even $C1/34DB
that makes a lot of sense too when it's a RAM label like $7E/0001
Adding a label in the main window does not add the label in the Label List. You must save and reload the project for it to appear in the label list.
Importing labels empties the Label List. You must save and reload the project for labels to reappear.
You cannot resize columns in the Label List, so you cannot fully read long labels and comments
It would be nice if double clicking an offset would jump to the location (just like clicking the "Jump to" button)
Typing text in the Label list is very awkward and not standard: if you select some characters and press CTRL-C to copy, it copies the whole table line including tabulations. Pressing left/right arrows when editing text stops edition if you are on the first character or last character. Using Home/End keyboard keys does not work as expected when entering text (moving to first/last position).
Note: even with all these issues, DiztinGUIsh is already an excellent and invaluable tool!
We should write a couple unit tests (in the new Xunit testing framework installed) that make sure nothing bad happens if we try and load a busted ROM, or other common scenarios.
I hit a bug in asar where Diz was generating the right code, but asar was doing the wrong thing (it has to do with relative addressing)
I wrote a PR to fix it over there and the maintainers said they would probably be ok merging as-is. I wanted to perform a few more tests to make sure I don't break anything in asar.
https://github.com/RPGHacker/asar/pull/171/files
Once that's merged in, that's the only open asar bug that was causing me issues in the 4MB ROM project I'm working with.
Hey guys,
I love this tool and have been hacking on it in my own fork. There's a few pull requests open if you want them but after I did those I've also been (not that carefully, for now) throwing lots of improvements in my fork sort of sloppily.
To name a few:
ux stuff: progress bar support for long loading tasks, remember last project file and auto open it, some validation, csv import of labels
Integrated from the other fork and improved/sped up a bit the usage log and memory map import. Added ability to add comments to labels for documentation.
I also found a bug in asar revealed by diztinguish's (correct) generated asm that I submitted a patch for RPGHacker/asar#171
I'm using diztinguish and kind of modifying as I go for a fairly ambitious disassembly project that's been going well.
My informal side goal is to make it so diztinguish can be sort of a browsable 'source of truth' for looking at the whole rom, with the exported disassembly being as one-way read-only as possible. (I.e. if things are discovered about variables, offsets, constants, etc, the workflow is: you update in diztinguish and re-export, never needing to touch the generated asm by hand).
My question is, is anyone here working on any larger changes to diztinguish, i.e. should I try coordinating for larger architectural changes before I just run off doing random stuff with the fork? If so, I'd be happy to be a bit more careful and repackage what I've done in some sane pull requests. Otherwise I'll probably just keep checking stuff directly into master on my fork.
Alternatively, I'd be happy to jump in as a maintainer, maybe we get another release going and tested.
Cc @Dotsarecool @VitorVilela7 @KonKeyHD and fork authors @gocha just to see if anyone has any strong opinions on any of this.
Thanks all
I'm doing a few dumb things with threading in BsnesTraceLogCapture
It's almost fast enough to keep up but jussssssssssssst not quite there yet.
Ideas:
How to reproduce:
Open a project
Mark some bytes as "Dword pointer" type
Save the project and close DistinGUIsh
Load the project
The bytes are now marked as "Graphics" instead of "Dword pointer".
yikes, found this.
I downloaded v1.0.0.1 played around for a bit and the encountered an (already fixed) bug that prevented me from saving.
It would be nice if the fixed version was readily available instead of requiring users to build it.
Great program, by the way. ๐
bsnes outputs a gamename-usage-map.bin
file which we support importing. when a byte on the address bus is accessed for either read, write, or execute, that info is recorded in this file.
instead of [or in addition to] outputting to a file, we should be able to pretty easily add an option to pipe that data out over our capture socket as well. that would make it really easy to capture both CPU trace data and CDL at the same time, making marking up ROMs a breeze.
When you set a label at a certain location and afterwards you clear it, you won't be able to save the project anymore because an error message appears with the NullReferenceException text description.
If a user removes a label on the disassembly, make sure to the tool remove the label reference as well for not causing that.
The easiest way to trigger it is creating a new project and erase one of the default generated labels for vector (Reset, NMI, IRQ, etc.).
Actually collaborating with someone on a project now, realizing a few things about the save file format:
In the actual save format (.dizraw or .diz), we're storing a few things:
AttachedRomFilename
AttachedRomFilename
InternalCheckSum
CurrentViewOffset
Of all that, we should probably ONLY store the hash InternalCheckSum
in the project file, and save the other settings in the user-specific settings area (the DefaultSettings area). Otherwise when multiple people are collaborating, these settings are always going to be flapping in the XML diff of the save file.
Version: latest 1859cd6
Hex dump | Instruction | IA | D |
---|---|---|---|
84 FA | STY.B $FA | 004305 | 420B |
Expressions like STY.B ($4305 - $420B)
are also good, but asar probably does not support such a syntax, right?
Anyway, the instruction must output 84 FA
when assembled.
Instruction | IA | D |
---|---|---|
STY.B $05 | 004305 | 420B |
The instruction outputs 84 05
when assembled.
Apparently @VitorVilela7 has already found this bug and fixed the code.
VitorVilela7@7ab27df
Main window percentage complete doesn't refresh after an import of usage or trace map
Need to invalidate it post import
This is sort of the opposite problem of #34.
(forgive me if I get some of the SNES register guts wrong and addressing modes wrong, still relatively new to asm)
In #34, there are parts of game code which always use the same value in the DP register as an optimization to save some bytes in the ROM. When using a trace logger, the DP value never changes over multiple runs of the game and it's pretty safe to assume it'll never change.
The question here is: how should Diz handle the output for situations (like Absolute Indexed addressing) when it knows the DB register is not constant?
consider this example:
source bytes:
BD 01 00
That's LDA with Absolute Indexed, X addressing. which means (glossing over the M flag) -->
LDA $0001, X
In one tracelog run, Diz generates this assembly code:
UNREACH_EF0001 = $EF0001
DMA_copy_BYTES_to_RAM:
LDA.W UNREACH_EF0001,X ;C3059D|BD0100 |EF0001
STA.L SNES_WMDATA ;C305A0|8F802100|002180;
after importing trace data from another run, it now generates this:
DATA8_EC0001 = $EC0001
DMA_copy_BYTES_to_RAM:
LDA.W DATA8_EC0001,X ;C3059D|BD0100 |EC0001;
STA.L SNES_WMDATA ;C305A0|8F802100|002180;
Each time you run the game with a different capture, there will be different DB value, since this function (happens to be a DMA routine) is grabbing data from all over the ROM.
One more look at two other examples in the debugger and walking through step by step on the math:
remember, the original code at $C3059D says:
LDA $0001,X
in first case, X=#$03AD, DB=#$E7
LDA $0001,X
computes the final memory address like this:
LDA [DB << 16] + X + #$0001
LDA #$E70000 + #$0001 + #$03AD
LDA $E703AE ; final memory address
in the second case, X=#$CFF7, DB=#$C3
so the instruction means:
LDA $0001,X
computes the final memory address like this:
LDA [DB << 16] + X + #$0001
LDA #$C30000 + #$0001 + #$CFF7
LDA $C3CFF8 ; final memory address
All versions, when parsed by Asar, generate the correct bytes in the final rom of BD 01 00
I am guessing this works because Asar is just chopping the top byte off the label and using the lower 16 bits, so it happens to work out. example: with the label Diz generates (0xEC0001) in this last run, Asar probably just ands with 0xFFFF to put in the correct result of $0001. So if Diz throws values of DATA8_EC0001 or UNREACH_EF0001, it doesn't matter, the important part is the lower 16bit "0x0001".
So, right now, Diz is taking the pieces it has (a last value for DB and #$0001) and generating a label for it. It works OK, but it's weird for humans because the label doesn't refer to anything useful. And each time we import new tracelog data, we are swapping around tons of new labels that flap around randomly based on what the last thing the game happened to access was.
(I'm a big proponent of the generated asm code being useful for humans to read so it's possible to better understand what's going on)
**So OK, my question on this issue is, in a situation like this,
I think my answer is this, but I'd like some feedback:
Part 1: When we're capturing with tracelogging, right now we can only store ONE value for each register of D and DB value here:
https://github.com/binary1230/DiztinGUIsh/blob/master/Diz.Core/model/ROMByte.cs#L12
RomByteData.dataBank
RomByteData.directPage
Let's consider just dataBank here,
I think we need to modify those fields (or add new ones somewhere) to store information about whether more than 1 dataBank or directPage has ever been seen here. Either we could store an array of every value (like DB) that's ever been seen when executing at this address, or, we could add a new flag to mark "we have seen more than 1 DB come through here".
if that flag is not set, then DB and DP can be interpreted as "this is the only DB or the only DP that ever come through here", solving #34.
for this issue, if it IS set, then...we can better tailor the output to be smarter. In our case above, I think we really do want to print $0001 instead of generating a label. or perhaps generate a label of OFFSET_0001, or, just leaving a comment, perhaps showing a typical example of what X and DB values might be when coming through here.
ignore this, here is reference stuff for me when I forget all this in the next 5 minutes.... : )
Absolute,X
http://www.6502.org/tutorials/65c816opcodes.html#5.3
Example: If the DBR is $12, the X register is $000A, and the m flag is 0, then LDA $FFFE,X loads the low byte of the data from address $130008
$120000 + $FFFE + $000A = $130008
DBR: Data bank register, holds the default bank for memory transfers. (in BSNES, this is 'DB')
D: Direct page register, used for direct page addressing modes. (in BSNES, this is 'D')
see #50 for the changes that prompted this.
This might be already handled but, we should update the unit tests that deal with SNES header stuff (Checksum/complement/cart title) and make sure they work with a few additional test ROMs, particularly lorom vs hirom/etc
I tried to disassemble some games, but the ASM look like this:
lorom ; | | ;
; | | ;
; | | ;
ORG $808000 ; | | ;
; | | ;
db $00,$00,$00,$00,$00,$00,$00,$00 ;808000| | ;
db $00,$00,$A3,$02,$85,$04,$A3,$01 ;808008| | ;
db $85,$03,$18,$69,$03,$00,$83,$01 ;808010| |000003;
db $A0,$01,$00,$B7,$03,$85,$00,$C8 ;808018| | ;
db $B7,$03,$85,$01,$20,$28,$80,$6B ;808020| |000003;
db $AF,$08,$80,$80,$F0,$01,$60,$08 ;808028| |808008;
db $8B,$C2,$30,$A9,$FF,$FF,$8F,$17 ;808030| | ;
db $06,$00,$E2,$20,$C2,$10,$A9,$FF ;808038| |000000;
db $8F,$40,$21,$00,$A4,$00,$A5,$02 ;808040| |002140;
db $48,$AB,$C2,$30,$20,$59,$80,$A9 ;808048| | ;
db $00,$00,$8F,$17,$06,$00,$AB,$28 ;808050| | ;
db $60,$08,$C2,$30,$A9,$00,$30,$8F ;808058| | ;
db $41,$06,$00,$A9,$AA,$BB,$CF,$40 ;808060| |000006;
db $21,$00,$F0,$0D,$AF,$41,$06,$00 ;808068| |000000;
db $3A,$8F,$41,$06,$00,$D0,$EC,$80 ;808070| | ;
db $FE,$E2,$20,$A9,$CC,$80,$2F,$B9 ;808078| |0020E2;
db $00,$00,$20,$03,$81,$EB,$A9,$00 ;808080| | ;
db $80,$0F,$EB,$B9,$00,$00,$20,$03 ;808088| |808099;
db $81,$EB,$CF,$40,$21,$00,$D0,$FA ;808090| |0000EB;
db $1A,$C2,$20,$8F,$40,$21,$00,$E2 ;808098| | ;
I am doing something wrong?
This would be useful when manually labeling code with opcode/operand labels.
I'd like the ability to mark a 16 bit immediate value operand as being the bank or offset of a label, so that the source assembly code generated by Diztinguish would refer to the label instead of the immediate values.
Let me give an example. Here is a short excerpt of asm code as output by Diztinguish from an actual SNES ROM:
LDA.W #$F714
LDX.W #$0006
STX.W $D3FE
STA.W $D3FC
This code stores the ROM address 06F714 in a 32 bit variable located in WRAM at addresses $7ED3FC and $7ED3FE.
In Diztinguish, I create a label named "MyData" at address 06F714, where the useful data is located.
Because #$F714 and #$0006 are immediate values, Diztinguish currently has no way to guess that these are actually two parts of an address.
I would like a way to tell Diztinguish:
LDA.W #$F714 "This operand is the offset of the label MyVar"
LDX.W #$0006 "This operand is the bank of the label MyVar"
With this additional information (supplied manually by the reverse engineer), Diztinguish could generate improved assembly code looking like this:
LDA.W #MyVar
LDX.W #bank(MyVar)
STX.W $D3FE
STA.W $D3FC
The code above assembles correctly with asar. It is more readable and easily updated (moving the data to another location does not require updating the code as there are no more hard coded values).
Working on it, pretty simple:
do all this as a post-build step in visual studio
I want to use something like AppVeyor (or Travis or whatever the kids are using these days) to run:
One reason I went through the trouble of getting an XML based file format setup for Diz projects is ease of adding new types of data to the save file. You simply add a public get/set property to a class, and the library we're using, ExtendedXmlSerializer, picks it up and load/saves it to the XML file. Pretty great.
If we classes later, we need to add migration support to deal with those changes. Luckily, EXS has solid-looking support for that with its Migrations feature, as seen here: https://github.com/ExtendedXmlSerializer/home/wiki/Example-Scenarios#migrate-xml-based-on-older-class-model (search for .AddMigration()
This task basically boils down to: before we release v2.0, demonstrate that migration support (at least the basics) works. I don't want to have 2.0 ship and then have to deal with supporting project save files that work around not having setup migration support.
If I define a label named "TEST" for address $000016 then the instruction:
STZ.B $16
is replaced with
STZ.B TEST
in both the Diz user interface and the generated asm code, no matter which values are in the B and D columns!
But this is only appropriate if both the 'B' and 'D' columns are set to 0.
If these columns have other values, the IA column contains the true address of the operand, which should not be replaced with the label for the address $000016, but with the label for the address in the IA column.
This is related to #34
A typically used indexed-indirect-* Jxx instruction will reference a table of addresses within the current PBR with an offset.
It would be really helpful to have an annotation for a pointer that allows inference of the IA from the code page, and by extension the 'T' navigation, to use that PBR as the high byte of the long IA.
Further specification could allow for this to be identified as an opcode IA, but I'll take what I can get.
these checkboxes aren't saving to the project settings (or just the UI). they should be remembered:
Originally posted by @binary1230 in #18 (comment)
it would be interesting if there's a way to solve this issue.
consider the following ASM code:
STZ $0A ;C70328|640A |001E0A;
I know from running this game and capturing the tracelog data that the final RAM address it's going to is $001E0A
(which is register DB value of $1E00
+ this constant of $0A
.
Let's say I have a label for this like character_hp =1E0A
In the final asm, it would be cool if there was a way to have this reference character_hp
like this somehow:
STZ $character_hp
I have a feeling there's not really a way to directly put that label name in there, since DP
is runtime dependent.
Still, I think Diz can at least know that it's likely to be character_hp and maybe note it in the label, or, make a search function in the app that can connect the dots here.
I'm basically trying to make it so humans can know when looking at this casually that this instruction is likely operating on character_hp
seems like it's mostly a UI issue, not actually data going away. probably something with RebindProject()
So our usage map import works great for main CPU
In the BSNES "-usage.bin" files output though, it contains usage info for all of the various kinds of CPUs.
Right now we're just reading the first part of that file (main CPU) and ignoring the rest (SA1, SFX, SPC, etc)
looks like this:
We can totally parse that stuff though :)
BSNES code is doing stuff like this:
fp.read(SNES::cpu.usage, 1 << 24);
fp.read(SNES::smp.usage, 1 << 16);
if (SNES::cartridge.has_sa1())
fp.read(SNES::sa1.usage, 1 << 24);
if (SNES::cartridge.hassuperfx())
fp.read(SNES::superfx.usage, 1 << 23);
if (SNES::cartridge.mode() == SNES::Cartridge::Mode::SuperGameBoy)
fp.read(SNES::supergameboy.usage, 1 << 24);
so we'd have to replicate that to read these files. one snag is they're calling SNES::cartridge.has_sa1()
, which means we may need the same detection routines that BSNES has to know what data is next in the file. that's... kinda unfortunate, maybe we should add some kind of tagging into the BSNES file format so we don't have to detect the file format/etc.
I'm looking for comments from folks who are more familiar with the memory mapping of retro systems (specifically the SNES for now, but, with an eye on other stuff like NES, Genesis, whatever else we want to throw at it).
I have been thinking a lot about the data model in Diz and how we could better support the following use cases:
I've been doing work on making the UI heavily decoupled in Diz which is nearing an end, which lays the groundwork for this next phase to begin.
As an exercise, I drew up a pseudocode class diagram of what this might end up looking like. No one needs to carefully read this, I'm more interested if any of this pops up as landmines to anyone. Or if code like this already exists out there we could integrate into here.
// the main thing that gets serialized as a .diz project file.
// Diz should support projects referencing each other, and editing multiple projects at once
Project:
- ByteSources[] // places we can get bytes from (disk, images, roms, text, or generated as decompressed or processed parts of other already loaded data)
- RootRegion // arbitrary tree of "regions" which are subsets of specific ByteSources with specific mappings.
// holds per-byte annotations, which mark things like code, data, graphics, tracelog info, and arbitrary metadata
- Builds[] // how to turn regions into output (like generated assembly, .bin files for graphics, etc)
// ------------
// ByteSource: Immutable data sources.
// ------------
abstract ByteSource:
- Bytes[] Get only
// system-agnostic, just represents a bunch of bytes read from disk somewhere. could be rom, text, images, whatever
ByteSourceFile : ByteSource:
- SourceFilename // examples: romfile.smc romfile_bank_C0.bin graphics_pack.bin dialog.txt file.png
- StartingFileOffset = 0
- ByteCountToReadFromFile = -1
// snes-specific stuff
SNESRomSourceFile : ByteSourceFile:
- skipsmcheader = true
- RomMapping (i.e. hirom, lowrom, etc)
- Speed
- other stuff like that
GenesisRomSourceFile : // ... whatever ... //
// --------------------------------------------------------------------------------------------
// Regions define arbitrary subsets of byte sources, and hold data related to the window offset
// and how to generate their Byte data from arbitrary sequences of bytes
//
// Regions can overlap, be overlaid on top each other, have priorities/etc.
// i.e. a "patch" can be visualized as a couple regions which are overlaid on the main ROM
//
// some workflow ideas:
// 1. dump WRAM or SPCRAM and save as a .bin file, map it as an example of data in a Region,
// annotate, and export the annotations onto the section of the ROM containing the original code
// that was copied into WRAM/spc/etc.
// 2. dump VRAM data, mark it up
// --------------------------------------------------------------------------------------------
Region : is also a ByteSource
- Mapping // options: 1:1, or using compression algorithm
- Collection<RegionOffset, Annotation>
- SubRegions[] // regions whose ByteSource is set to 'this' region
// searches our subregions first, returns anything matching there as our override.
// if nothing found, use our own mapping.
// good for stuff like patches, where patch modifications are a sub-region we want to override whatever comes from our mapping.
- byte GetByteAt(offset)
- Annotations[] GetAnnotationsAt(offset) // aggregates all annotations associated with this offset from both us and our sub-regions
// this handles mapping in both a SNES sense (like hiRom, lowRom, etc)
// but in also any arbitrary sense
MappingType:
- ByteSource SourceData
- StartingOffset // "window" into the byte source. i.e. set to 0x10000 and count = 0xFFFF for bank C0
- ByteCount
ArbitraryMapping:
- ByteProviderStartOffset, OutputOffset
- ByteProviderByteCount, OutputOffset
// maps byte offsets into arbitrary address space. this is HiROM, LowROM, ExHIRom, etc
MappingTypeSNES:
- MapType
// how about a byte source that reads compressed data from a region, decompresses it, and shows you the data in any of our viewers
// (like hex editor, graphics viewer, )
ByteSourceCompressed : ByteSource:
- CompressAlgorithm // i.e. standard (.gz etc) vs some game-specific algorithm
- SourceRegion
// ---------------
// So here's an example of a SNES-specific mapping config
// ---------------
// up until this point, regions aren't mapped into anything address-space specific. here's an example of a SNES rom
// lower levels of the system shouldn't know anything about 'banks' etc
var SnesHiRom = new Mapping {
Name="HiROM",
DestOffset=0xC00000, Count=0x40[#banks] x 0x1000[banksize]
}
var SnesWRAMHiRom = new Mapping {
Name="WRAM",
DestOffset=0x7E0000, Count=XX[#banks] x 0x1000[banksize],
Mirrors = {0x00, ...} // define that this memory is mirrored to other places.
}
var DizProject {
ByteSources[] = {
SNESRomSourceFile {"somegame.smc", skipSMCHeader = true}
}
Regions[] = {
{ Name = "ROM", ByteSource = ByteSources["somegame.smc"] }
}
}
class SNES {
Regions[] = {
new Region {
Name = "Main CPU",
SubRegions[] = {
{ Name = "Rom", MappingType = SnesHiRom, Source=DizProject.Regions["ROM"] },
{ Name = "WRamCapture-BattleMode", MappingType = SnesWRam, Source=DizProject.Regions["ramdump1"] },
{ Name = "WRamCapture-OverworldMap", MappingType = SnesWRam, Source=DizProject.Regions["ramdump2"] },
{ Name = "CompressedData", Algorithm=Games.NintendoZip2, ..src/dst offsets... }
}
},
}
// ---------------
// Annotations: i.e. Attach random metadata to ALL THE THINGS. attaches to offset on a particular region
// goals:
// 1. mark a single byte or a block of bytes with whatever metadata we want
// 2. be able to attach multiple of the same type of annotations to an offset, and pick one as "the real one" or "the example"
// i.e. for tracelog data, it might be useful to keep all the previous tracelog import data, and mark one as "the real one", the rest are
// "examples"
// 3. Store all this in a platform-agnostic format i.e. regions/annotations/etc shouldn't have to "know" they are SNES vs Genesis vs etc.
// 4. Keep or collapse as much as you like.
// ---------------
Annotation:
- metadata // optional rando metadata, dunno, like....
- souce origin (i.e. was this marked by hand, gotten from CPU tracelog, CDL trace, etc)
- author
- date changed
- data reference source // [i.e. https://romhacking.net/{some_page}, etc)
- certainty // (100%, or not sure, or wrong disassembly, or guess)
- tags, maybe? // "overworld", "battlesystem", "boss AI system"
AnnotationDataBlock
- StartingRegionOffset
- Count
- Type // (graphics, music, table, etc)
// labels a specific line, literally the "label" on the left hand side of the grid
AnnotationLabel : Annotation
- Text
AnnotationComment : Annotation
- Text
AnnotationFreeSpace : AnnotationDataBlock
// placed here either by hand, or, multiple per-byte if tracelogger finds new combinations
// only one of them is marked as the "real" one
Annotation65XCpuFlags : Annotation
- dataBank
- directPage
- xFlag
- mFlag
Annotation65XInstructionByte : Annotation
Annotation65XOperandByte : Annotation
// raw data from a CDL capture (was this byte read from? written to? code run from here? etc)
AnnotationCDLEntry : Annotation
- byteflags = {unknown, read_from, written_to, executed_from}
// -----------------
// all of the above stuff is just how to STORE data and map it and mark it up.
// it's nothing about how to display, modify, or export the data, which should all be in another layer.
// ------------------
dataGrid.DataSource = new RomByteDataGridRow[1000];
// for displaying stuff on a maingrid like what Diz does now, make a display-specific class like this.
// the datagrid class is generic and will respond to the metadata here for the columns
// and the specific field values are one row
// (this is actually pretty close to what it looks like in the current bleeidng edge GUI refactor)
public class RomByteDataGridRow : INotifyPropertyChanged
{
private offsetInRegion;
private region; // arbitrary, might typically be set to SNES.Region["CpuBus"]["ROM"]
[DisplayName("Label")]
[Editable(true)]
[CustomConfig(col =>
{
col.DefaultCellStyle = new DataGridViewCellStyle
{
Alignment = DataGridViewContentAlignment.MiddleRight, Font = FontHuman,
};
col.MaxInputLength = 60;
col.MinimumWidth = 6;
col.Width = 200;
})]
public string Label
{
get => region.GetAnnotation<AnnotationLabel>(offsetInRegion).Name;
// todo (validate for valid label characters)
// (note: validation implemented in Furious's branch, integrate here)
set
{
region.GetAnnotation<AnnotationLabel>(offsetInRegion).Name = value;
OnPropertyChanged();
}
}
// program counter (Read-only)
[DisplayName("PC")]
[ReadOnly(true)]
public string Offset =>
Util.NumberToBaseString(offsetInRegion, Util.NumberBase.Hexadecimal, 6);
// ascii version of the byte
[DisplayName("@")]
[ReadOnly(true)]
public char AsciiCharRep =>
(char) region[offsetInRegion];
// hex version of the byte
[DisplayName("#")]
[ReadOnly(true)]
public string NumericRep =>
Util.NumberToBaseString(region[offsetInRegion], Util.NumberBase.Hexadecimal);
// ....snip, add whatever other properties you want to display....
}
// annotation generation (i.e. what Diz basically does right now as its core operation)
// example:
// - adding labels
// - disassembly workflow (like CPU Step-through, Step-in, etc)
// - marking blocks of data as graphics, codes, pointer tables, etc
class 65816_CpuOperations {
void Step(int offset, Region region) {
// .........
}
}
// builds - replaces current "Export Assembly"
// define how and when output artifacts (assembly files, .bin files, etc)
// are generated.
// already supported via command line
//
// would be cool if we could keep our management of this very lightweight, and use some existing build utilities.
// like generating Makefiles [or something that doesn't suck to deal with], so it can be run outside Diz.
DizProject = {
...
Builds[] {
Build1={
OutputAssemblyCode {"generated/", split_by_bank=true, flavor=CPU65816/SPC700/etc}
Compilation {"asar.exe [params] main.asm", Output="generatedrom.sfc"}
Defines {"RomVersion", United States", true}
RootRegion=this.RootRegion.SubRegion["SnesCPUBus"]["Rom"]
Validation {
MustBeByteIdentical {OriginalImportedRomFilename, "generatedrom.sfc"},
MatchInternalCheckSum {[some checksum value from the rom]
NoPatchOverridesAllowed
}
},
Build2={
Inherit=Build1
ApplyPatches[patchProject.RootRegion["InfiniteHitPointsPatch1"]
OutputDiff={build1.output, this.output, diffWRiteTo="patch.ips"} // something like this
}
}
// fun bonus ideas:
// with this data structure, might make it easy to have either tighter integration with a Debugger (like BSNES)
// or also, invoke a real emulator on a section of a ROM (i.e. "hey BSNES: run starting at offset X til you reach
// offset Y, using this RAM or savestate snapshot")
//
// It will also make writing custom tool integrations really simple, for things like graphics/audio/editors
// or integration game-specific tools that already exist.
//
// And, we can create arbitrary window layouts, do things like making other windows "follow along" with you, remember history.
// imagine clicking around on a ROM and when you have a line with a JMP statement, the other window shows you a preview of where you are jumping
//
// have Hex editors, byte grid viewers, assembled output previews, etc available
//
// or, hook this up to be the backend of a microservices API, and build an interactive web viewer for this data.
// imagine being able to query data from games, looking for patterns, etc. create hot-links and share them like we do with
// github issues
Since labels are assigned to PC offsets instead of SNES addresses, some problems arise due to the mirroring of certain memory locations in the SNES address space.
For example, if JML $808055
and JML $008055
both exist in the ROM, these will be assembled into different bytecode: 5C 55 80 80
vs 5C 55 80 00
. However, both of these effective addresses map to the same PC offset, so if this offset is given a label, these two instructions will be identical, and will assemble identically.
Another example, SRAM is mapped to the lower half of banks $70-$7D in LoROM. The higher half is mapped to ROM. An effective address of $7DFFFF points to (the last byte in the bank) ROM. An autogenerated label will 'unmirror' this effective address to, say, $0DFFFF. But if this address is used as a negative index base to SRAM ($7DFFFF+1 = $7D0000), the effective address should refer to SRAM for this instruction, even though it points to ROM.
I think most of this will be fixed by mapping labels to ROM address, which will also allow for RAM addresses to be labeled as well.
This question is mostly "hey @Dotsarecool would you be offended if we renamed the project" : )
I was kinda thinking we rename it to Diz2 or just "Diz", so it's a little easier to type and pass around the name. The full name can still be "DiztinGUIsh" though.
Marking C43842-C43843 will only mark C43842 with the indicated label.
This is a continuation of #50 (more info there), if the Cartridge Rom header field in the SNES ROM header contains characters that are not one-byte per-character (like some Japanese glyphs are), then we might not always be doing the right thing when encoding/decoding.
We should write a unit test that shows a situation where:
var encoding = Encoding.GetEncoding(932); // 932 is "ShiftJIS"
var str = /*insert string here with multibyte Japanese chars in it*/;
var jisRawBytes = ByteUtil.GetRawShiftJisBytesFromStr(str);
Assert.NotEqual(jisRawBytes.Length, str.Length);
And then given those conditions, check all the other areas of the code to make sure it behaves nicely there. Particularly, paying attention to anything that uses RomUtil.LengthOfTitleName
I was confident enough for the couple ROM headers I tested that I think we have this decently implemented (and definitely for English). and, there is extensive unit testing support for this now. but it would be good to test this final bit of the edge case.
just realized I broke the label output in the sample assembly code [need to check the main output]. I'll have them fixed in my branch shortly, along with a fairly major rewrite.
I was going to stablize/test what I'm doing and get it ready for a release. It's backwards compatible but, probably worth a major version#
When adding a higher value as end-offset it changes the start address(Offset).
It would be more userfriendly if the bytes are variable and start and end address are not set via the number of bytes.
Error in declared in pictures:
let's see we'd like to declare to D400...
whoops... start-address changed
annoying that Diz prompts you each time for the location of the generated output.
it should just be a text field in the "Export Disassembly" dialog box, with a browse button next to it.
then, make sure this gets saved correctly with the project settings and as a relative path so user doesn't have to think about it after setting it up one time.
Right now, we save .diz files and internally compress with gzip.
The XML is mostly text BUT the main (largest) section that serializes RomByte is a custom non-xml text-based block of data that is intended to be human readable and mergable in git. I wanted Diz to be able to support team-oriented workflows. While merging data in close proximity to parts of the ROM might be painful, it should have a decent chance at merging data from different parts of the ROM, merged up with a tool like git.
Apart from that, the entire XML formatted output is also currently compressed with gzip as a final step.
On my computer with a 4MB ROM about 50% marked up, the gzip compressed file is ~100KB, and the uncompressed raw XML is ~1.3MB.
Decisions to be made are:
yourfilename.diz.xml
. Diz itself doesn't care or look at what the file is called, it always tries to decompress it with gzip first, and if there's an issue, it'll try again without it. This decision means our file format extension is purely for humans. I like the idea of hinting that it's an XML file.I may have broken it, not sure. I know it used to work, tried it recently and it didn't.
Appveyor is now setup and doing builds, though, we may need to update it to use the SemVer versioning format that the actual project uses.
this is just a low-pri UI enhancement,
the app relies on Messagebox.Show() a lot, and in particular the error messages when you're importing, loading, saving, etc, should probably go through a custom form that doesn't make it look so "error-y" and gives you a sense of the progress in your workflow (when opening/importing/etc)
came from #50
most user-facing stuff hasn't changed so this isn't too bad but we should touch on:
our mod to BSNES is outputting just one CPU for the moment, the main CPU.
it's pretty trivial on the BSNES side to extend the functionality to other CPUs like the SPC, SA-1, etc.
to do that, each CPU in BSNES-plus would need a new ::disassemble_opcode_bin() function, looking like the one here:
https://github.com/binary1230/bsnes-plus/blob/e30dfc784f3c40c0db0a09124db4ec83189c575c/bsnes/snes/cpu/core/disassembler/disassembler.cpp#L224
We should pick new header IDs for each CPU and its abridged format.
and then, in Tracer, just hook up the remaining calls to dumb to the new disassemble_opcode_bin() functions:
https://github.com/binary1230/bsnes-plus/blob/e30dfc784f3c40c0db0a09124db4ec83189c575c/bsnes/ui-qt/debugger/tracer.cpp#L62
specifically, in the following:
Tracer::outputSa1Trace()
Tracer::outputSfxTrace()
Tracer::outputSgbTrace()
SNES ROM header supports encoding of Japanese characters in the game title field in ShiftJIS format. However, we're incorectly interpreting these bytes as Unicode.
Diz projects for ROMS with japanese chars in this title field will serialize the incorrect encoding to the XML, and on load, this XML will incorrectly cause Diz to think the name of the cart is different. A verification check will fail and the project will refuse to load.
I have a fix in #49 underway, along with an XML migration that will fix the issue for any affected users.
Technically we don't actually need to store the title in the Project file since we are storing the checksum bytes. However, it's nice to have one extra layer of redundancy so, let's keep it.
The fix adds a lot of extra unit testing and some extra functionality for working with rom titles, checksums, etc. I'm not sure it's comprehensive, but, it appears to work well so far.
(Originally reported by LuigisBlood in SNESLab Discord, thanks!)
tracelog file import and, separately, tracelog capture (bsnes --> Diz over a socket) work and are well tested for the main SNES CPU, but not really implemented for the other CPUs (like SA-1)
Here's what we'd need to do to fix that:
Add the SA-1 CPU to the list of CPUs in the Architecture
enum here:
https://github.com/Dotsarecool/DiztinGUIsh/blob/master/Diz.Core/model/Enums.cs#L61
Each line of a tracelog file is parsed by BsnesTraceLogImporte.ParseLine()
https://github.com/Dotsarecool/DiztinGUIsh/blob/master/Diz.Core/import/BsnesTraceLogImporter.Parsers.cs#L55
For SA-1, we'd need to verify that it's capturing the same data, or build an SA-1 specific parser (should be pretty easy)
That parser is populating a ModificationData
object (it's one object per line in the tracelog). The only thing that would need to be changed is adding the Architecture
enum in there and marking it as from SA-1 like this:
// something like...
modData.arch = CPU_SA1; // add the 'arch' field there, set it to the new enum
That 'arch' field would need to be added here:
https://github.com/Dotsarecool/DiztinGUIsh/blob/master/Diz.Core/import/BsnesTraceLogImporter.ModificationsList.cs#L15
That's pretty straightforward though there's a little bit of extra checking/etc to pay attention to since ModificationData is heavily optimized for doing the tracelog network capturing.
unit tests
If anything's different, grab a couple lines from a tracelog from SA-1, and write a unit test or two to cover it:
https://github.com/Dotsarecool/DiztinGUIsh/blob/862e32f93d27bf9c545f147b4ed0c43c3a38bc81/Diz.Test/Tests/TracelogTests/TraceLogTests.cs#L8
UI: Probably add a new option to the Import menu so it has separate items for 'import SA1 tracelog' vs 'import 65816 main cpu tracelog'. If we wanted to get fancier we could setup a dialog box that has some nicer looking options. eh, later.
Somewhere on the BSNES tracelog import class is where we can stash which type of byte we're capturing.
It can all be traced pretty easily by looking at the on click handler in importBsnesTracelogText_Click()
That function takes a (hardcoded...woof) column number and the index of the row (i.e. the snes byte address).
You can get to the arch like this:
// in there, something vaguely like....
style.ForeColor = Data.GetArchitecture(offset) == Architecture.SA1 ? Color.Gray : Color.Black;
I haven't tested Diz when it has different architectures in there. theoretically should be OK, but, might be worth doing some CPU step operations, mark operations, etc especially on boundaries of bytes where things go from one CPU into another.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.